Rectifying Soft-Label Entangled Bias in Long-Tailed Dataset Distillation

October 27, 2025 ยท View on GitHub

Repository Structure

  • SRe2L/, GVBSM/, EDC/: Implement three-stage training scripts for the corresponding distillation methods on CIFAR-10/100 and ImageNet-LT.
  • perturbation_analysis/: Perturbation experiments and visualization code from the paper (including analysis.ipynb).
  • Subdirectories such as SRe2L/cifar10 provide scripts/ folders encapsulating common squeeze, recover, and relabel scheduling scripts.

Quick Start

  • Place the original datasets under the dataset/ directory or create a symbolic link in this repository (e.g., ln -s /path/to/cifar10 ./dataset/cifar10), and set the path in scripts or configs.
  • Example three-stage process with SRe2L on CIFAR-10:
    cd SRe2L/cifar10
    bash scripts/squeeze_cifar.sh
    bash scripts/recover_cifar.sh
    # SRe2L without ADSA
    bash scripts/relabel_cifar.sh
    # SRe2L with ADSA
    bash scripts/relabel_cifar_adsa.sh
    
  • Usage with GVBSM, EDC, and other methods is similar, with hyperparameters, paths, and scripts adjusted in their respective subdirectories.
  • Additionally, we implement a new stonger baseline with resample:
    cd SRe2L/cifar10
    bash scripts/squeeze_cifar_resample.sh
    bash scripts/recover_cifar_resample.sh
    bash scripts/relabel_cifar_resample.sh