🔥🛰️ CanadaFireSat Data

August 18, 2025 · View on GitHub

This repository contains the code for building the benchmark CanadaFireSat. In this benchmark, we investigate the potential of deep learning with multiple sensors for high-resolution wildfire forecasting.

💿 Dataset on Hugging Face
📝 Paper on ArXiv
🤖 Model repository on GitHub & Weights on Hugging Face

Summary Representation:

Sources

In this section, we describe the different sources necessary to build the CanadaFireSat benchmark.

🔥📍 Fire Polygons Source

💻 National Burned Area Composite (NBAC 🇨🇦): Polygons Shapefile downloaded from CWFIS Datamart
📅 Filter fires since 2015 aligning with Sentinel-2 imagery availability
🛑 No restrictions are applied on ignition source or other metadata
➕ Spatial aggregation: Fires are mapped to a 2.8 km × 2.8 km grid | Temporal aggregation into 8-day windows

🛰️🗺️ Satellite Image Time Series Source

🛰️ Sentinel-2 (S2) Level-1C Satellite Imagery (2015–2023) from Google Earth Engine
🗺️ For each grid cell (2.8 km × 2.8 km): Collect cloud-free S2 images (≤ 40% cloud cover) over a 64-day period before prediction
⚠️ We discard samples with: Fewer than 3 valid images | Less than 40 days of coverage

🌦️🌲 Environmental Predictors

🌡️ Hydrometeorological Drivers: Key variables like temperature, precipitation, soil moisture, and humidity from ERA5-Land (11 km, available on Google Earth Engine) and MODIS11 (1 km, available on Google Earth Engine), aggregated over 8-day windows using mean, max, and min values.
🌿 Vegetation Indices (MODIS13 and MODIS15): NDVI, EVI, LAI, and FPAR (500 m) captured in 8 or 16-day composites, informing on vegetation state.
🔥 Fire Danger Metrics (CEMS previously on CDS): Fire Weather Index and Drought Code from the Canadian FWI system (0.25° resolution).
🕒 For each sample, we gather predictor data from 64 days prior, to reflect pre-fire conditions.

🏞️ Land Cover

⛔️ Exclusively used for adversarial sampling and post-training analysis.
💾 Data extracted is the 2020 North American Land Cover 30-meter dataset, produced as part of the North American Land Change Monitoring System (NALCMS) (available on Google Earth Engine)

🛠️ Set-Up

In order to run the pipeline steps below, you will need a Google Account and run the cells in notebooks/ee_test.ipynb to get the Earth Engine token.

Then, you also need to install the Python virtual environment:

python -m venv data-env
source data-env/bin/activate
pip install -r requirements/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu117

🪜 Pipeline Steps

Create Grid of Positive Samples:

src.preprocess.create_grid: Initialize the spatial grid over Canada | Config: None
src.preprocess.burned_area: Preprocess NBAC input data and aggregate spatially and temporally the fire polygons | Config: ba_preprocess.yaml
src.preprocess.temporal_freq: Temporal aggregation over 8-day window of the positive samples | Config: ba_preprocess.yaml

Download MODIS and ERA5 Data:

src.env_download: Download from EE complete tiles over Canada for each date on your DRIVE | Configs: era5.yaml & modis.yaml
Manually copy the GeoTiffs from your DRIVE to your local machine.

Download FWI Data:

src.cds_download: Download from CEMS (previously CDS) 2015 - 2022 consolidated data and in 2023 intermediate data | Config: cds.yaml

Download Land Cover Data from EE:

Manually download the 2020 GeoTiff only and merge the tiff via: src.postprocess.env_vals| Config: None

Postprocess MODIS, ERA5, FWI, and Land Cover Data:

src.postprocess.env_vals: Postprocess the environmental predictors for extreme or unknown values | Config: None

Create Negative Samples:

src.sampling.negative: Sample the negatives for Train, Validation, and Test splits | Config: sampling.yaml
src.sampling.negative_hard: Sample the negatives for the Test Hard split | Config: samling_hard.yaml

Download S2 Data:

src.download: This script needs to be run for the positive samples, negative, and negative hard (usually run per-region) | Config: ba_s2.yaml

Postprocess S2 Data:

src.postprocess.s2_post: Post-processing of the S2 tiles based on cloud cover and filtering of time series containing not enough images or covering not enough days | Config: postprocess.yaml

Compute S2 Bands Statistics:

src.postprocess.band_stats: Compute mean and std of each band of the positive and negative samples | Config: stats.yaml

Rasterize Label Polygons:

src.postprocess.rasterize: Rasterized the fire polygons in binary arrays using the S2 GeoTiffs as reference | Config: rasterize.yaml

Aligned Environmental Variables with S2 Tiles:

src.postprocess.spatial_alignment: Spatially aligned the environmental predictors by extracting windows centered around S2 tiles | Config: spatial_alignment.yaml & spatial_alignment_lc.yaml
src.postprocess.alignment: Weighted average mean of the environmental predictors over the S2 tiles | Config: alignment.yaml

Compute Environment Variables Statistics:

src.postprocess.env_stats: Compute the mean and std of each environment variable on the positive and negative sample population | Config: env_stats.yaml

Create the split file:

src.postprocess.split: Create the main split file for model training and evaluation | Config: split.yaml

Transform SITS GeoTiff to npy files

src.postprocess.transform: Extract all the Sentinel-2 bands GeoTiff and concatenate in groups of npy files per-resolution | Config: transform.yaml

Upload to Hugging Face 🤗

src.huggingface.upload: Upload CanadaFireSat to HuggingFace and the metadata files | Config: upload.yaml & manual-upload.yaml

📷 Outputs

📊 CanadaFireSat Dataset Statistics (without Test Hard):

Statistic	Value
Total Samples	177,801
Target Spatial Resolution	100 m
Region Coverage	Canada
Temporal Coverage	2016 - 2023
Sample Area Size	2.64 km × 2.64 km
Fire Occurrence Rate	39% of samples
Total Fire Patches	16% of patches
Training Set (2016–2021)	78,030 samples
Validation Set (2022)	14,329 samples
Test Set (2023)	85,442 samples
Sentinel-2 Temporal Median Coverage	55 days (8 images)
Number of Environmental Predictors	58
Data Sources	ERA5, MODIS, CEMS

📍 Samples Localisation:

Positive Samples Negative Samples

Figure 1: Spatial distribution of positive (left) and negative (right) wildfire samples.

🛰️ Example of S2 time series:

Figure 2: Row 1-3 Samples of Sentinel-2 input time series for 4 locations in Canada, with only the RGB bands with rescaled intensity. Row 4 Sentinel-2 images after the fire occurred. Row 5 Fire polygons used as labels with the Sentinel-2 images post-fire.

🖋️ Citation

@article{porta2025canadafiresat,
  title={CanadaFireSat: Toward high-resolution wildfire forecasting with multiple modalities},
  author={Porta, Hugo and Dalsasso, Emanuele and McCarty, Jessica L and Tuia, Devis},
  journal={arXiv preprint arXiv:2506.08690},
  year={2025}
}