Multi Pretext Masked Autoencoder (MP-MAE)

May 12, 2025 · View on GitHub

This repository contains code used to create the models and results presented in this paper MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning. It modifies the ConvNext V2 architecture to be used with MMEarth, which is a multi-modal geospatial remote sensing data.

📢 Latest Updates

:fire::fire::fire: Last Updated on 2025.02.03 :fire::fire::fire:

Added torch.hub integration for easier finetuning.
Added NEW_DATASET readme for instruction on adding new datasets.
Updated repository to allow for various datasets during finetuning.
Updated installation scripts and repository.
Paper accepted to ECCV 2024 !!
Model now pretrained on MMEarth v001 & evaluated on GEO-Bench v1.0.
Updated model scripts to work with MMEarth-v001.
Data augmentation fix: Random crops are now aligned across modalities
Test metrics fix: Metrics are now overall instead of mini-batch averages, matching GEO-Bench metrics.
Added ffcv dataloader for both pretraining and finetuning. (training speed increased significantly.)

model-grey

Installation

See INSTALL.md for more instructions on the installation of dependencies

Training

See TRAINING.md for more details on training and finetuning. We recommend using those scripts to reproduce results in the paper, but we also have a simplified version using torch.hub.

import torch
mmearth_model = torch.hub.load('vishalned/mmearth-train', 'MPMAE', model_name='convnextv2_atto', pretrained=True, linear_probe=True)

A full example of finetuning from torch.hub is provided here.

Evaluating on new custom datasets

See NEW_DATASET.md for more details on finetuning on custom datasets.

Model Checkpoints

All the pretraining weights can be downloaded from here. The folders are named in the format shown below. Inside the folder you will find a checkpoint .pth weight file. An example to load the weights is in the examples folder.

CHECKPOINT FOLDER FORMAT
pt-($INPUT)_($MODEL)_($DATA)_($LOSS)_($MODEL_IMG_SIZE)_($PATCH_SIZE)/

$INPUT:
      - S2 # for s2-12 bands as input and output
      - all_mod # for s2-12 bands as input and all modalities as output
      - img_mod # for s2-12 bands as input and image level modalities as output
      - pix_mod # for s2-12 bands as input and pixel level modalities as output
      - rgb # for s2-bgr as input and output (we trained the model using bgr ordering)

$MODEL:
      - atto
      - tiny

$DATA:
      - 100k_128 # MMEarth100k, 100k locations and image size 128
      - 1M_64 # MMEarth64, 1.2M locations and image size 64
      - 1M_128 # MMEarth, 1.2M locations and image size 128

$LOSS: # loss weighting strategy
      - uncertainty
      - unweighted

$MODEL_IMG_SIZE # input size passed to the model
      - 56 # when using the data with image size 64
      - 112 # when using the data with image size 128

$PATCH_SIZE
      - 8
      - 16

Note: The only exception is when using the model trained on imagenet, the folder path is pt-imagenet_atto_200epochs_224_32/

A detailed overview of each checkpoint is shown in the table below.

INPUT	OUTPUT	MODEL	DATASET	LOSS	MODEL_IMG_SIZE	PATCH_SIZE	CKPT
S2 12 band	all modalities	Atto	MMEarth64	Uncertainty	56x56	8x8	download
S2 12 band	all modalities	Atto	MMEarth64	Unweighted	56x56	8x8	download
S2 12 band	all modalities	Atto	MMEarth	Uncertainty	112x112	16x16	download
S2 12 band	all modalities	Tiny	MMEarth64	Uncertainty	56x56	8x8	download
S2 12 band	all modalities	Atto	MMEarth100k	Uncertainty	112x112	16x16	download
S2 12 band	image level modalities	Atto	MMEarth64	Uncertainty	56x56	8x8	download
S2 12 band	pixel level modalities	Atto	MMEarth64	Uncertainty	56x56	8x8	download
S2 12 band	S2 12 band	Atto	MMEarth64	Uncertainty	56x56	8x8	download
S2 bgr	S2 bgr	Atto	MMEarth64	Uncertainty	56x56	8x8	download
S2 bgr	S2 bgr	Atto	MMEarth	Uncertainty	128x128	16x16	download

Acknowledgment

This repository borrows from the ConvNeXt V2 repository.

Citation

Please cite our paper if you use this code or any of the provided data.

Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke, Serge Belongie, Christian Igel, & Nico Lang (2024). MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning.

@inproceedings{nedungadi2024mmearth,
  title={MMEarth: Exploring multi-modal pretext tasks for geospatial representation learning},
  author={Nedungadi, Vishal and Kariryaa, Ankit and Oehmcke, Stefan and Belongie, Serge and Igel, Christian and Lang, Nico},
  booktitle={European Conference on Computer Vision},
  pages={164--182},
  year={2024},
  organization={Springer}
}