Copernicus Foundation Model

October 2, 2025 Β· View on GitHub

arXiv License: Code License: model License: data HuggingFace Copernicus-FM HuggingFace Copernicus-Pretrain HuggingFace Copernicus-Bench HuggingFace Copernicus-Embed-025deg

This repository contains the official implementation of the paper "Towards a Unified Copernicus Foundation Model for Earth Vision" (ICCV 2025 oral).

Description

Key features

  • 🌍 Copernicus-Pretrain: A massive-scale pretraining dataset with 18.7M aligned images from all major Copernicus Sentinel missions, spanning from the Earth's surface to its atmosphere.
  • πŸ€– Copernicus-FM: A unified foundation model capable of processing any spectral or non-spectral sensor modality using extended dynamic hypernetworks and flexible metadata encoding.
  • πŸ“Š Copernicus-Bench: A systematic evaluation benchmark with 15 hierarchical downstream tasks ranging from preprocessing to specialized applications for each Sentinel mission.
  • 🌐 Copernicus-Embed-025deg: An embedding dataset that provides a global embedding map (721x1440x768) at 0.25Β°, integrating various sources of satellite observations at an extremely high compression ratio.

Copernicus-Pretrain

Copernicus-Pretrain is an extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P). The images are organized into ~310K regional grids (0.25Β°x0.25Β°, consistent with ERA5), densely covering the whole land surface and near-land ocean with time series from eight distinct Sentinel modalities.

Framework Diagram

πŸ”½ Dataset access:

  • Raw format (GeoTiff): This version is available on HuggingFace.
  • Streaming format (WebDataset): This version is available on HuggingFace.

πŸ“‚ Further details: Copernicus-Pretrain/

Copernicus-FM

Copernicus-FM is an extension of the DOFA foundation model that can process any spectral or non-spectral sensor modality using extended dynamic hypernetworks and flexible metadata encoding. The model is pretrained on the Copernicus-Pretrain dataset with masked image modeling and continual distillation.

Framework Diagram

πŸ”½ Weights access: The model weights are available on HuggingFace.

πŸ“‚ Further details: Copernicus-FM/

Copernicus-Bench

Copernicus-Bench is a systematic evaluation benchmark with 15 hierarchical downstream datasets spread into three level of applications covering all major Sentinel missions (S1,2,3,5P). Among them, 9 are derived from existing datasets, and 6 are newly curated.

LevelNameModalityTaskSource
L1Cloud-S2S2 TOAsegmentation (cloud)CloudSEN12
L1Cloud-S3S3 OLCIsegmentation (cloud)new
L2EuroSAT-S1S1 GRDclassification (LULC)EuroSAT-SAR
L2EuroSAT-S2S2 TOAclassification (LULC)EuroSAT
L2BigEarthNet-S1S1 GRDclassification (LULC)BigEarthNet v2.0
L2BigEarthNet-S2S2 SRclassification (LULC)BigEarthNet v2.0
L2LC100Cls-S3S3 OLCIclassification (LULC)new
L2DFC2020-S1S1 GRDsegmentation (LULC)DFC2020
L2DFC2020-S2S2 TOAsegmentation (LULC)DFC2020
L2LC100Seg-S3S3 OLCIsegmentation (LULC)new
L3Flood-S1S1 GRDchange detection (flood)Kuro Siwo
L3LCZ-S2S2 TOAclassification (local climate zone)So2Sat LCZ42
L3Biomass-S3S3 OLCIregression (biomass)new
L3AQ-NO2-S5PS5P NO2regression (air quality)new
L3AQ-O3-S5PS5P O3regression (air quality)new

πŸ”½ Dataset access: The benchmark datasets are available on HuggingFace.

πŸ“‚ Further details: Copernicus-Bench/

Copernicus-Embed-025deg

Copernicus-Embed-025deg is an embedding dataset that provides a global embedding map (721x1440x768) at 0.25Β°, integrating various sources of satellite observations at an extremely high compression ratio. It has been shown to be beneficial for linking Earth's surface to the atmosphere, unlocking new possibilities in the development of weather/climate foundation models.

embed_map

πŸ”½ Dataset access: The embedding datasets are available on HuggingFace.

πŸ“‚ Further details: Copernicus-Embed-025deg/

License

This repo is licensed under the Apache License 2.0, with portions of third-party code licensed under the MIT/CC-BY-NC-4.0 License. The Copernicus-Pretrain dataset, the newly-curated datasets in Copernicus-Bench, and the pretrained weights of Copernicus-FM are licensed under the CC-BY-4.0 license.

Citation

@misc{wang2025unifiedcopernicusfoundationmodel,
      title={Towards a Unified Copernicus Foundation Model for Earth Vision}, 
      author={Yi Wang and Zhitong Xiong and Chenying Liu and Adam J. Stewart and Thomas Dujardin and Nikolaos Ioannis Bountos and Angelos Zavras and Franziska Gerken and Ioannis Papoutsis and Laura Leal-TaixΓ© and Xiao Xiang Zhu},
      year={2025},
      eprint={2503.11849},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.11849}, 
}