SSL4EO-S12

April 20, 2024 ยท View on GitHub

The SSL4EO-S12 dataset is a large-scale multimodal multitemporal dataset for unsupervised/self-supervised pre-training in Earth observation. The dataset consists of unlabeled patch triplets (Sentinel-1 dual-pol SAR, Sentinel-2 top-of-atmosphere multispectral, Sentinel-2 surface reflectance multispectral) from 251079 locations across the globe, each patch covering 2640mx2640m and including four seasonal time stamps.

ssl4eo-s12

Access the dataset

  • Raw dataset: The full SSL4EO-S12 dataset (1.5TB, 500GB for each modality) is accessible at mediaTUM. There are some void IDs (gaps in folder names), see data/void_ids.csv. Center coordinates of all locations are available here.
  • Example subset: An example 100-patch subset (600MB) is available at Google Drive.
  • Compressed dataset: A compressed 8-bit version (20-50GB for each modality, including an RGB version) is available at mediaTUM. The raw 16/32-bit values are normalized by mean and std and converted to uint8, plus a default geotiff JPEG compression with quality 75. Note: in our experiments, 8-bit input (without JPEG compression) performs comparably well as 16-bit.
  • A 50k (random) RGB subset (18GB) is available here (link broken). Sample IDs see data/50k_ids_random.csv.

Updates

  • For faster access in some regions, we have hosted a copy of the data in HuggingFace. Note that only the original data in mediaTUM has a proper DOI.
  • We've got some feedback that the compressed dataset (with JPEG compression) has a performance drop compared to the raw data, which could be because of the lossy compression. We plan to update it with a lossless version (yet the file size will increase). Also, do you have INode (number of single files) limit on your server? We could consider updating one resampled GeoTiff for all bands (as in SSL4EO-L). If you have any issues or wish for updates, let us know!

Collect your own data

Check src/download_data for instructions to download sentinel or other products from Google Earth Engine.

Pre-trained models

The pre-trained models with different SSL methods are provided as follows (13 bands of S2-L1C, 100 epochs, input clip to [0,1] by dividing 10000).

SSL methodArchBigEarthNet*EuroSATSo2Sat-LCZ42DownloadUsage
MoCoResNet5091.8%99.1%60.9%full ckptbackbonelogsdefine model, load weights
MoCoViT-S/1689.9%98.6%61.6%full ckptbackbonelogsdefine model, load weights
DINOResNet5090.7%99.1%63.6%full ckptbackbonelogsdefine model, load weights
DINOViT-S/1690.5%99.0%62.2%full ckptbackbonelogsdefine model, load weights
MAEViT-S/1688.9%98.7%63.9%full ckptbackbonelogsdefine model, load weights
Data2vecViT-S/1690.3%99.1%64.8%full ckptbackbonelogsdefine model, load weights

* Note the results for BigEarthNet are based on the train/val split following SeCo and In-domain representation learning for RS.

Other pre-trained models:

SSL methodArchInputDownload
MoCoResNet18S2-L1C 13 bandsfull ckptbackbonelogs
ResNet18S2-L1C RGBfull ckpt, full ckpt ep200backbonelogs
ResNet50S2-L1C RGBfull ckptbackbonelogs
ResNet50S1 SAR 2 bandsfull ckptbackbonelogs
MAEViT-S/16S1 SAR 2 bandsfull ckptbackbone
ViT-B/16S1 SAR 2 bandsfull ckptbackbone
ViT-L/16S1 SAR 2 bandsfull ckptbackbone
ViT-H/14S1 SAR 2 bandsfull ckptbackbone
ViT-B/16S2-L1C 13 bandsfull ckptbackbone
ViT-L/16S2-L1C 13 bandsfull ckptbackbone
ViT-H/14S2-L1C 13 bandsfull ckptbackbone

* The pretrained models are also available in TorchGeo.

License

This repository is released under the Apache 2.0 license. The dataset and pretrained model weights are released under the CC-BY-4.0 license.

Citation

@article{wang2022ssl4eo,
  title={SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation},
  author={Wang, Yi and Braham, Nassim Ait Ali and Xiong, Zhitong and Liu, Chenying and Albrecht, Conrad M and Zhu, Xiao Xiang},
  journal={arXiv preprint arXiv:2211.07044},
  year={2022}
}