README.md

July 1, 2025 · View on GitHub

Zheng Zhou¹ Wenquan Feng¹ Qiaosheng Zhang^{2 3} Shuchang Lyu¹ Qi Zhao¹ Guangliang Cheng⁴

¹ Beihang University
² Shanghai Artificial Intelligence Laboratory
³ Shanghai Innovation Institute
⁴ University of Liverpool

International Conference on Machine Learning (ICML), 2025

This repository contains the code and implementation details for the research paper titled "ROME is Forged in Adversity: Robust Distilled Datasets via Information Bottleneck".

🎯 Overview of ROME

Overview

Figure 1: Comparison of previous DD methods and the proposed ROME under adversarial attacks: (a) Previous DD methods align representations between original and synthetic datasets but remain vulnerable to adversarial attacks due to neglecting the mutual information among input, latent representations, and output, leading to reduced accuracy under perturbations. (b) Our method, ROME, employs the information bottleneck principle to minimize mutual information between input and latent representations, while maximizing it between output and latent representations, thereby enhancing adversarial robustness and maintaining high accuracy under perturbations.

Abstract: Dataset Distillation (DD) compresses large datasets into smaller, synthetic subsets, enabling models trained on them to achieve performance comparable to those trained on the full data. However, these models remain vulnerable to adversarial attacks, limiting their use in safety-critical applications. While adversarial robustness has been widely studied in related fields, research on improving DD robustness is still limited. To address this, we propose ROME, a novel method that enhances the adversarial RObustness of DD by leveraging the InforMation BottlenEck (IB) principle. ROME includes two components: a performance-aligned term to preserve accuracy and a robustness-aligned term to improve robustness by aligning feature distributions between synthetic and perturbed images. Furthermore, we introduce the Improved Robustness Ratio (I-RR), a refined metric to better evaluate DD robustness. Extensive experiments on CIFAR-10 and CIFAR-100 datasets demonstrate that ROME outperforms existing DD methods in adversarial robustness, achieving maximum I-RR improvements of nearly 40% under white-box attacks and nearly 35% under black-box attacks.

🔥 Key Features and Contributions

Method

Figure 2: The framework of ROME: ROME utilizes the information bottleneck to frame the robust dataset distillation problem as a min-max optimization of mutual information. It consists of two key components: (a) The performance-aligned term maximizes the mutual information between the latent representations and the output by aligning the logits with the true labels. (b) The robustness-aligned term minimizes the mutual information between latent representations and the input, conditioned on a robust prior (the adversarially perturbed dataset), by aligning the embeddings to reduce the discrepancy.

Theoretical Framework: Introduces the Information Bottleneck (IB) principle into dataset distillation, leveraging the Conditional Entropy Bottleneck (CEB) to incorporate adversarial robustness as a prior.
Algorithm Design: Proposes performance-aligned and robustness-aligned terms to balance model accuracy and adversarial robustness, enhanced by robust priors from pretrained models.
Evaluation and Validations: Introduces I-RR and achieves up to 40% and 35% robustness gains under white-box and black-box attacks on CIFAR datasets.

📈 Experimental Results

We evaluate and compare the adversarial robustness of ROME and other DD methods against both white-box and black-box attacks, under both targeted and untargeted settings:

Result

Table 1: Comparison of model robustness when trained using various DD methods with IPC settings of {1, 10, 50}, against both white-box targeted and untargeted attacks on the CIFAR-10 and CIFAR-100 datasets. Robustness evaluation metrics include RR and CREI, as well as their improved versions I-RR and I-CREI. The best results between the baseline and proposed methods are highlighted in bold, while the second-best results are underlined. Improvements in metrics compared to the second-best results are highlighted in red.

Result

Figure 3: Robustness heatmap of models trained on distilled CIFAR-10 datasets with IPC-50 settings under targeted and untargeted attacks. The vertical axis represents attacked models, and the horizontal axis shows models used for transfer attacks. Heatmap values represent I-RR, with darker colors indicating higher I-RR values and thus better robustness against adversarial attacks.

🛠 Getting Started

Follow these steps to set up the environment and run the code.

Step 1: Clone the Repository

Run the following command to download the repository:
```
git clone https://github.com/zhouzhengqd/ROME.git
```

Step 2: Download Datasets

Download the CIFAR-10/100 datasets from the official source, or use the shared download link provided by BEARD for quicker access. Place them in the relevant directory.

Step 3: Set Up the Conda Environment

Run the following commands to create and activate the conda environment:

cd ROME
cd Code
conda env create -f environment.yml
conda activate rome

📁 Directory Structure

ROME
- Code
  - data
    - datasets
  - checkpoints
  - result
  - Files for ROME
  - command.txt
  - enviroment.yml
  - ...
  - ...
  - ...

🌟 Command for Reproducing Experiment Results and Evaluation

Training the Distilled Datasets

Follow the training command in the command.txt. For example, to train ROME on CIFAR-10 with IPC-50, run the following command:

  python3 -u ROME_cifar10.py --dataset CIFAR10 --model ConvNet --ipc 50 --dsa_strategy color_crop_cutout_flip_scale_rotate --init real --lr_img 0.2 --num_exp 5 --num_eval 5 --net_train_real --eval_interval 500 --outer_loop 1 --mismatch_lambda 0 --net_decay --embed_last 1000 --syn_ce --ce_weight 0.1 --train_net_num 1 --aug

Evaluating the Distilled Datasets

Follow the BEARD benchmark configuration:

Step 1: Download the BEARD repository.
Step 2: Download the Distilled Dataset and Model, and follow the BEARD instructions for quick evaluation.
Step 3: Replace the distilled datasets with your own finished training results.

🙏 Acknowledgments

We would like to thank the contributors of the following projects that inspired and supported this work: DC, DSA, DM, MTT, IDM, BACON, and BEARD.

🎓 Citation

@inproceedings{zhou2025rome,
  title={ROME is Forged in Adversity: Robust Distilled Datasets via Information Bottleneck},
  author = {Zhou, Zheng and Feng, Wenquan and Zhang, Qiaosheng and Lyu, Shuchang and Zhao, Qi and Cheng, Guangliang},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2025}
}