Key features

December 9, 2025 · View on GitHub

Adaptive Gradient Calibration for Single-Positive Multi-Label Learning in Remote Sensing Image Scene Classification

This repository contains the official implementation of the paper "Adaptive Gradient Calibration for Single-Positive Multi-Label Learning in Remote Sensing Image Scene Classification".
Chenying Liu*1,3  Gianmarco Perantoni *2  Lorenzo Bruzzone 2  Xiao Xiang Zhu1,3
1Technical University of Munich (TUM)  2University of Trento  3Munich Center for Machine Learning (MCML) 
*equal contribution

[arXiv][Project]

Single- and multi-label annotation examples from (a) AID-multilabel and (b) refined BigEarthNet datasets with the corresponding CLC mask for reference. Compared to the single-class labels, the multi-label annotations can more comprehensively describe the scene, yet require much more annotation efforts.
    🔹 Single-Positive Multi-Label (SPML): SPML offers a scalable alternative to conventional multi-label annotation by assigning only one relevant label per image, while the model is expected to infer the full set of underlying labels. Although highly efficient and practical, this paradigm inherently introduces substantial ambiguity.
    🔹 We propose Adaptive Gradient Calibration (AdaGC), a novel and generalizable SPML framework tailored to RS imagery to solve this issue.

Key features

  • 🔆​​ AdaGC Framework: a two-stage SPML framework with early-learning detection and dual-EMA pseudo-labeling, enabling adaptive Gradient Calibration to enhance label completeness and robustness against noise.
  • 💻​ Benchmark SPML Experiments: extensive evaluations on high- and low-resolution benchmark multi-label RS datasets using Random, Dominant, and Manual SPML settings, offering comprehensive benchmarks for existing SPML and noise-robust methods.
  • 🖍​ Manual Annotation Tool: a lightweight manual tool that enables quick and efficient generation of single-positive multi-label annotations.

AdaGC Framework

We propose AdaGC, a two-stage SPML framework for remote sensing that integrates an early-learning detection mechanism to adaptively trigger Gradient Calibration (GC), together with a dual-EMA pseudo-labeling strategy that leverages temporal prediction fusion to enhance label completeness and reliability. This design effectively mitigates both underfitting and overfitting caused by label noise, and is supported by theoretical analyses that validate its effectiveness. We also incorporate the Mixup data augmentation technique within GC to further improve model’s generalizability.

Fig. 1. Flowchart of the proposed Adaptive Gradient Calibration (AdaGC) method for single-positive multi-label learning in remote sensing image classification.

📥 Installation

First, clone the repo:

    git clone ... && cd AdaGC

Installation with Docker.

    sh launch_docker.sh /path/to/data 
    docker exec -it AdaGC_$USER bash  
    # please replace your username at $USER

🎰 Prepare the data

Please structure your data as follows:

data 
├── AID 
│   ├── images_test
   │   ├── Airport
   │   │   ├── airport_81.jpg
   │   │   └── ...
   │   └── ...
│   ├── images_tr
   │   ├── Airport
   │   │   ├── airport_1.jpg
   │   │   └── ...
   │   └── ...
   ├── multilabel_correct_final.csv       # corrected multi-label annotations, accessible at annotations/AID
   └── single_positive_labels_manual.csv  # human-annotated SPML labels, accessible at annotations/AID
└── BigEarthNetV2_LMDB
    ├── test_BENv2.lmdb 
    ├── train_BENv2.lmdb 
    ├── val_BENv2.lmdb 
    ├── metadata.parquet 
    ├── train_random_single_labels.npy     # randomly generated SPML labels, accessible at annotations/BigEarthNetv2
    └── train_dominant_single_labels.npy   # SPML labels generated from the dominant classes in the CLC masks, accessible at annotations/BigEarthNetv2
  • Data can be downloaded from BigEarthNetV2 and AID-multilabel.
  • The LMDB file for BigEarthNetv2 can be generated using notebooks/bigearthnetv2.ipynb.
  • We found some errors in the original AID-multilabel annotations, particularly in the Meadow folder, where nearly all images were originally labeled as grass and pavement, which was inconsistent with the actual content (see Fig. 2 for examples). We correct them and release the new annotations at corrected-AID-multilabel.
Fig. 2. Three examples with original multi-label annotations of grass and pavement. Below are the reassigned multi-label annotations corrected by us.

​🤖​​ Train the model

  • The configuration files (hyperparameter settings) used in this work can be found under scripts/config_files
  • The settings for AdaGC are scripts/config_files/train_multilabel_model_AdaGC-*.yaml
  • Train the model with configured ymal file as
    python -u scripts/train_multilabel_model.py \
           -c path/to/config/file
    # e.g., path/to/config/file can be replaced with scripts/config_files/train_multilabel_model_AdaGC-BENv2-random.yaml

At the end of training, test will be automatically triggered

👻​ Test the model

You can test your well-trained model using:

    python -u scripts/test_multilabel_model.py \
           -c path/to/config/file
    # path/to/config/file is the same as the settings in the training step

☄️​ Hyperparemter tuning

We also provide scripts for hyperparamter tuning with noisy labels:

    python -u scripts/meta_parameters_tuning.py \
           -c path/to/config/file
    # e.g., path/to/config/file can be replaced with scripts/config_files/meta_parameters_tuning_AN.yaml

Benchmark SPML Experiments

We provide benchmark results for state-of-the-art methods on the BigEarthNet and AID-multilabel datasets under various single-positive noisy-label simulation settings. All methods can be reproduced using python -u scripts/train_multilabel_model.py together with the corresponding configuration files.

  • On the BigEarthNet dataset, we apply two kinds of noisy label simulation strategies, that is, randomly picking one positive label for each image or using the dominant classes indicated by the CLC masks as the single positive labels for each image.
Table I. Test Performance Comparison of Different Methods on the reBEN-Random Dataset. The Best Average Metric Values Are Reported in Bold, Second Bests Are in Italic. The Related Standard Deviations Are Reported in Brackets.

Table II. Test Performance Comparison of Different Methods on the reBEN-Dominant Dataset.

  • On the AID dataset, we apply random single-positive label noise simulation as well as human-annotated single-positive labels. The manual-annotated labels were generated with our lightweight annotation tool introduced below,
Table III. Test Performance Comparison of Different Methods on the AID-Random Dataset.

Table IV. Test Performance Comparison of Different Methods on the AID-manual Dataset.

Manual Annotation Tool

We provide a lightweight annotation tool (annotator/annotation_tool.py) for quickly and conveniently generating human-annotated single-positive labels, as illustrated in Fig. 2. For each image, four randomly shuffled candidate classes are displayed to accelerate annotation. If none of them appear plausible, the annotator can simply press "Next Batch of Categories" to refresh the options.

Fig. 2. Annotator interface for generating manual single positive labels.

Citation

@misc{liu2025adagc,
      title={Adaptive Gradient Calibration for Single-Positive Multi-Label Learning in Remote Sensing Image Scene Classification}, 
      author={Chenying Liu and Gianmarco Perantoni and Lorenzo Bruzzone and Xiao Xiang Zhu},
      year={2025},
      eprint={2510.08269},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.08269}, 
}