Add new variable
April 28, 2026 · View on GitHub
How to add a new data variable to an existing dataset.
1. Add the variable to the template
1a. Download a real, recent source file for the new variable. Open it (e.g. with rasterio, gdalinfo, or xarray) to inspect its attributes (units, grid dimensions, CRS, nodata/sentinel values) and data values (range, NaN/missing coverage, geographic distribution). This grounds the variable configuration in observed data rather than assumptions.
1b. Add a new DataVar to your dataset’s TemplateConfig.data_vars (usually in src/reformatters/<provider>/<model>/<variant>/template_config.py).
- Name + externally visible attrs: match existing naming/attrs used in this repo where possible; otherwise follow CF Conventions. Variable names generally follow the format
<long name>_<level>. - Internal attrs: derive from
gdalinfooutput on a representative source file (and GRIB index if relevant) - Encoding: match existing variables, setting
keep_mantissa_bitsto 7 by default, 6 for wind variables, 8 for precipitation and 10 for pressure variables with unitspa.
1c. Regenerate the checked-in Zarr template metadata:
uv run main <DATASET_ID> update-template
1d. Edit the test_backfill_local_and_operational_update test in the dataset's dynamical_dataset_test.py to add the new variable. Run the test and add a quick print or plot to confirm data is successfully being processed for the new variable. Abandon these test changes after confirming to keep our integration tests from getting slow.
1e. Open and merge a PR containing the template changes (template_config.py and zarr metadata in templates/latest.zarr/)
2. Write data for the new variable
2a. Run an operational update
After the PR is merged, run an operational update once (or wait for the next scheduled run) so the dataset’s Zarr metadata includes the new variable before you backfill history.
To run it immediately, use the GitHub Action Manual: Create Job from CronJob (requires reformatters repo write access) setting "CronJob to create job from" to {dataset-id}-update.
2b. Backfill history for the variable
Prerequisite: You can build and push Docker images (see
README.md> Deploying to the cloud > Setup).
DOCKER_REPOSITORYis set in your shell.- Your local Docker is authenticated to that repo.
Run a backfill filtered to just the new variable:
DYNAMICAL_ENV=prod uv run main <DATASET_ID> backfill-kubernetes \
<APPEND_DIM_END> <JOBS_PER_POD> <MAX_PARALLELISM> \
--overwrite-existing \
--filter-variable-names <VARIABLE_NAME>
APPEND_DIM_END= the current approximate UTC timestamp. The operational update in 2a will have already filled the latest data, so the exact value just needs to be somewhere around the present time.JOBS_PER_POD= 1 or 2 in most cases. Set to 3-4 if the jobs run very fast to amortize container startup time.MAX_PARALLELISM= 100 if the data source can support highly parallel reads, else lower.--overwrite-existingwrites into the existing store rather than attempting (and failing) to create a new one.--filter-variable-names= your new variable's name.
3. Validate
Follow docs/validation.md — it walks through running run-all, reading validation_summary.md, inspecting every plot, and the full data quality checklist. When validating a new variable it is often useful to restrict with --variable <name> to iterate faster.
4. Update dataset catalog documentation
Update the dataset catalog docs on dynamical.org by rebuilding (npm run build) and merging updates to main in https://github.com/dynamical-org/dynamical.org.