README.md
November 7, 2025 ยท View on GitHub
Robust Dataset Condensation using Supervised Contrastive Learning
Authors: Nicole Hee-Yeon Kim and Hwanjun Song
Official repository for the paper: Robust Dataset Condensation using Supervised Contrastive Learning (ICCV 2025)
๐ Summary
Dataset condensation compresses large datasets into compact synthetic sets, but existing methods break down under noisy labels. We propose Robust Dataset Condensation (RDC), the first end-to-end method that directly generates noise-resilient synthetic data without extra cleaning steps. RDC integrates supervised contrastive learning with Golden MixUp Contrast, which sharpens class boundaries and enriches diversity using clean samples from noisy data. Experiments on CIFAR-10/100 show that RDC consistently outperforms prior methods across asymmetric, symmetric, and real-world noise.
๐ Paper
Our paper is now available on the ICCV 2025 Open Access page.
๐ผ๏ธ Overview
RDC enhances dataset condensation by integrating supervised contrastive learning and a novel Golden MixUp Contrast strategy, enabling robust synthetic data generation even under noisy labels.
๐ Results
We report classification accuracy (%) under noisy CIFAR-10:
Similar improvements are consistently observed on CIFAR-100 and Tiny-ImageNet.
๐จ Visualization
RDC improves the quality of condensed images under noisy labels.

- (a) Acc-DD: severe interference across classes.
- (b) Two-stage (cleaning + condensation): reduces noise but still leaves contamination.
- (c) RDC (ours): clean, well-separated class features without interference.
๐ Setup
Clone the repository and set up the environment:
# Clone the repository
git clone https://github.com/DISL-Lab/RDC-ICCV2025.git
cd RDC-ICCV2025
# Create a new conda environment with Python 3.9
conda create -n rdc python=3.10
# Activate the environment
conda activate rdc
# Install the required packages
pip install -r requirements.txt
๐ป Usage
1. IDM
IDM Baseline (Reference)
Run dataset condensation with IDM on CIFAR-10 (IPC=50):
python3 IDM_cifar.py --dataset CIFAR10 --ipc 50 --ce_weight 0.1
Main Options:
--dataset: CIFAR10 / CIFAR10N_asym_40 / CIFAR10N_asym_20 / CIFAR10N_sym_40 / CIFAR10N_sym_20 / CIFAR10N_ran1 / CIFAR10N_worse / CIFAR100 / CIFAR100N_asym_20 / CIFAR100N_asym_40 / CIFAR100N_sym_20 / CIFAR100N_sym_40 / CIFAR100N_noisy
--ipc: 1 / 10 / 50 (images per class)
--model: ResNet18 (default)
--eval_interval: 1000 (for ipc=1,10) / 2000 (for ipc=50)
--ce_weight: 0.5 (ipc=1,10) / 0.1 (ipc=50)
RDC + IDM
Run dataset condensation with IDM+RDC on CIFAR-10 (IPC=50):
python3 IDM_cifar+RDC.py --dataset CIFAR10 --ipc 50 --ce_weight 0.1
Main Options:
--dataset: CIFAR10 / CIFAR10N_asym_40 / CIFAR10N_asym_20 / CIFAR10N_sym_40 / CIFAR10N_sym_20 / CIFAR10N_ran1 / CIFAR10N_worse / CIFAR100 / CIFAR100N_asym_20 / CIFAR100N_asym_40 / CIFAR100N_sym_20 / CIFAR100N_sym_40 / CIFAR100N_noisy
--ipc: 1 / 10 / 50 (images per class)
--model: ResNet18 (default)
--eval_interval: 1000 (for ipc=1,10) / 2000 (for ipc=50)
--ce_weight: 0.5 (ipc=1,10) / 0.1 (ipc=50)
2. Acc-DD
The Acc-DD workflow consists of two main steps:
(1) Pretrain early-stage models on real data, and
(2) Optimize condensed data using the pretrained models.
Below is a simplified guideline (adapted from the Acc-DD repository).
1. Pretrain Early-Stage Models
python pretrain.py \
-d <dataset> \
--nclass 10 \
-n resnet \
--pt_from 2 \
--aug_type color_crop_cutout_flip_scale_rotate
2. Acc-DD Baseline: Condensation Step (Reference)
python condense.py \
--reproduce \
-d <dataset> \
-f 2 \
--ipc 10 \
-n resnet \
--model_path <PRETRAINED_DIR>
3. Acc-DD + RDC: Condensation Step
python condense+RDC.py \
--reproduce \
-d <dataset> \
-f 2 \
--ipc 10 \
-n resnet \
--model_path <PRETRAINED_DIR>
-d: dataset (cifar10 / cifar10n_asym_40 / cifar10n_asym_20 / cifar10n_sym_40 / cifar10n_sym_20 / cifar10n_ran1 / cifar10n_worse)
๐ Citation
Please check back once the ICCV 2025 paper is officially available.
BibTeX will be provided here.
๐ Acknowledgments
Our code implementations are based on IDM (Improved Distribution Matching for Dataset Condensation) and Acc-DD (Accelerating Dataset Distillation via Model Augmentation).
We thank the authors for releasing their code.