π₯π°οΈ CanadaFireSat Data
August 18, 2025 Β· View on GitHub
This repository contains the code for building the benchmark CanadaFireSat. In this benchmark, we investigate the potential of deep learning with multiple sensors for high-resolution wildfire forecasting.
- πΏ Dataset on Hugging Face
- π Paper on ArXiv
- π€ Model repository on GitHub & Weights on Hugging Face
Summary Representation:
Sources
In this section, we describe the different sources necessary to build the CanadaFireSat benchmark.
π₯π Fire Polygons Source
- π» National Burned Area Composite (NBAC π¨π¦): Polygons Shapefile downloaded from CWFIS Datamart
- π
Filter fires since 2015 aligning with Sentinel-2 imagery availability
- π No restrictions are applied on ignition source or other metadata
- β Spatial aggregation: Fires are mapped to a 2.8 km Γ 2.8 km grid | Temporal aggregation into 8-day windows
π°οΈπΊοΈ Satellite Image Time Series Source
- π°οΈ Sentinel-2 (S2) Level-1C Satellite Imagery (2015β2023) from Google Earth Engine
- πΊοΈ For each grid cell (2.8β―km Γ 2.8β―km): Collect cloud-free S2 images (β€ 40% cloud cover) over a 64-day period before prediction
- β οΈ We discard samples with: Fewer than 3 valid images | Less than 40 days of coverage
π¦οΈπ² Environmental Predictors
- π‘οΈ Hydrometeorological Drivers: Key variables like temperature, precipitation, soil moisture, and humidity from ERA5-Land (11 km, available on Google Earth Engine) and MODIS11 (1 km, available on Google Earth Engine), aggregated over 8-day windows using mean, max, and min values.
- πΏ Vegetation Indices (MODIS13 and MODIS15): NDVI, EVI, LAI, and FPAR (500 m) captured in 8 or 16-day composites, informing on vegetation state.
- π₯ Fire Danger Metrics (CEMS previously on CDS): Fire Weather Index and Drought Code from the Canadian FWI system (0.25Β° resolution).
- π For each sample, we gather predictor data from 64 days prior, to reflect pre-fire conditions.
ποΈ Land Cover
- βοΈ Exclusively used for adversarial sampling and post-training analysis.
- πΎ Data extracted is the 2020 North American Land Cover 30-meter dataset, produced as part of the North American Land Change Monitoring System (NALCMS) (available on Google Earth Engine)
π οΈ Set-Up
In order to run the pipeline steps below, you will need a Google Account and run the cells in notebooks/ee_test.ipynb to get the Earth Engine token.
Then, you also need to install the Python virtual environment:
python -m venv data-env
source data-env/bin/activate
pip install -r requirements/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu117
πͺ Pipeline Steps
Create Grid of Positive Samples:
src.preprocess.create_grid: Initialize the spatial grid over Canada | Config: Nonesrc.preprocess.burned_area: Preprocess NBAC input data and aggregate spatially and temporally the fire polygons | Config:ba_preprocess.yamlsrc.preprocess.temporal_freq: Temporal aggregation over 8-day window of the positive samples | Config:ba_preprocess.yaml
Download MODIS and ERA5 Data:
src.env_download: Download from EE complete tiles over Canada for each date on your DRIVE | Configs:era5.yaml&modis.yaml- Manually copy the GeoTiffs from your DRIVE to your local machine.
Download FWI Data:
src.cds_download: Download from CEMS (previously CDS) 2015 - 2022 consolidated data and in 2023 intermediate data | Config:cds.yaml
Download Land Cover Data from EE:
- Manually download the 2020 GeoTiff only and merge the tiff via:
src.postprocess.env_vals| Config: None
Postprocess MODIS, ERA5, FWI, and Land Cover Data:
src.postprocess.env_vals: Postprocess the environmental predictors for extreme or unknown values | Config: None
Create Negative Samples:
src.sampling.negative: Sample the negatives for Train, Validation, and Test splits | Config:sampling.yamlsrc.sampling.negative_hard: Sample the negatives for the Test Hard split | Config:samling_hard.yaml
Download S2 Data:
src.download: This script needs to be run for the positive samples, negative, and negative hard (usually run per-region) | Config:ba_s2.yaml
Postprocess S2 Data:
src.postprocess.s2_post: Post-processing of the S2 tiles based on cloud cover and filtering of time series containing not enough images or covering not enough days | Config:postprocess.yaml
Compute S2 Bands Statistics:
src.postprocess.band_stats: Compute mean and std of each band of the positive and negative samples | Config:stats.yaml
Rasterize Label Polygons:
src.postprocess.rasterize: Rasterized the fire polygons in binary arrays using the S2 GeoTiffs as reference | Config:rasterize.yaml
Aligned Environmental Variables with S2 Tiles:
src.postprocess.spatial_alignment: Spatially aligned the environmental predictors by extracting windows centered around S2 tiles | Config:spatial_alignment.yaml&spatial_alignment_lc.yamlsrc.postprocess.alignment: Weighted average mean of the environmental predictors over the S2 tiles | Config:alignment.yaml
Compute Environment Variables Statistics:
src.postprocess.env_stats: Compute the mean and std of each environment variable on the positive and negative sample population | Config:env_stats.yaml
Create the split file:
src.postprocess.split: Create the main split file for model training and evaluation | Config:split.yaml
Transform SITS GeoTiff to npy files
src.postprocess.transform: Extract all the Sentinel-2 bands GeoTiff and concatenate in groups of npy files per-resolution | Config:transform.yaml
Upload to Hugging Face π€
src.huggingface.upload: Upload CanadaFireSat to HuggingFace and the metadata files | Config:upload.yaml&manual-upload.yaml
π· Outputs
π CanadaFireSat Dataset Statistics (without Test Hard):
| Statistic | Value |
|---|---|
| Total Samples | 177,801 |
| Target Spatial Resolution | 100 m |
| Region Coverage | Canada |
| Temporal Coverage | 2016 - 2023 |
| Sample Area Size | 2.64 km Γ 2.64 km |
| Fire Occurrence Rate | 39% of samples |
| Total Fire Patches | 16% of patches |
| Training Set (2016β2021) | 78,030 samples |
| Validation Set (2022) | 14,329 samples |
| Test Set (2023) | 85,442 samples |
| Sentinel-2 Temporal Median Coverage | 55 days (8 images) |
| Number of Environmental Predictors | 58 |
| Data Sources | ERA5, MODIS, CEMS |
π Samples Localisation:
Figure 1: Spatial distribution of positive (left) and negative (right) wildfire samples.
π°οΈ Example of S2 time series:
Figure 2: Row 1-3 Samples of Sentinel-2 input time series for 4 locations in Canada, with only the RGB bands with rescaled intensity. Row 4 Sentinel-2 images after the fire occurred. Row 5 Fire polygons used as labels with the Sentinel-2 images post-fire.
ποΈ Citation
@article{porta2025canadafiresat,
title={CanadaFireSat: Toward high-resolution wildfire forecasting with multiple modalities},
author={Porta, Hugo and Dalsasso, Emanuele and McCarty, Jessica L and Tuia, Devis},
journal={arXiv preprint arXiv:2506.08690},
year={2025}
}