Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory (CVPR 2025)

April 7, 2026 · View on GitHub

Official implementation of MCT, a stable and storage-efficient dataset distillation method presented at CVPR 2025.

Authors

Wenliang Zhong1, Haoyu Tang1*, Qinghai Zheng2, Mingzhu Xu1, Yupeng Hu1, Weili Guan3

1 Shandong University
2 Fuzhou University
3 Harbin Institute of Technology (Shenzhen)
* Corresponding author: tanghao258@sdu.edu.cn


Table of Contents


Updates

  • [03/2025] Initial release of the CVPR 2025 official code
  • [06/2024] Initial release on arXiv

Introduction

本项目是论文 Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory (CVPR 2025) 的官方实现。

Summary

Dataset Distillation (DD) aims to synthesize a small dataset to replace a large one for efficient training. However, traditional trajectory matching methods often suffer from training instability and high storage costs for expert trajectories.

Matching Convexified Trajectory (MCT) addresses these issues by:

  • Convexifying Trajectories: Transforming jagged expert trajectories into smooth, convex ones for more stable matching.
  • Storage-efficient Interpolation: Storing only a few key waypoints and reconstructing trajectories via interpolation, dramatically reducing storage requirements.

This repository provides:

  • Training code for generating expert buffers.
  • Trajectory convexification and compression scripts.
  • Core distillation scripts for synthesizing high-quality datasets.

Highlights

  • Stable Training: Convexified trajectories lead to more consistent distillation performance.
  • Storage Efficient: Reduces repository/storage footprint compared to full trajectory storage.
  • Support for Multiple Datasets: CIFAR-10, CIFAR-100, and Tiny-ImageNet.
  • CVPR 2025: Official code for reproducing the results in the paper.

Project Structure

.
├── exps/                  # Ablation and visualization scripts
│   ├── trajectory_compression.py  # Logic for waypoint interpolation and compression
│   ├── visualize_synthetic.py     # Results visualization
│   └── ...
├── buffer.py              # Script to generate expert trajectory buffers
├── convexify.py           # Core logic for trajectory convexification
├── distill_compressed.py   # Main distillation script using MCT
├── networks.py            # Neural network architectures (ConvNet, ResNet, etc.)
├── utils.py               # Evaluation, data augmentation, and helper functions
├── reparam_module.py      # Module for gradient-through-parameters
├── requirements.txt       # Dependencies
└── CVPR_MCT_POSTER.png     # Poster overview

Installation

1. Clone the repository

git clone https://github.com/iLearn-Lab/CVPR25-MCT.git
cd CVPR25-MCT

2. Install dependencies

pip install -r requirements.txt

Usage

1. Generate Expert Buffer

First, generate the expert trajectories for the target dataset:

python buffer.py --dataset=CIFAR10 --zca

2. Compress Trajectory

Generate the convexified and compressed trajectories:

python exps/trajectory_compression.py --num_interpolation=4 --zca --dataset=CIFAR10

3. Dataset Distillation

Run the main distillation script using the compressed trajectories:

# CIFAR-10, IPC=1, with ZCA
python distill_compressed.py --dataset=CIFAR10 --ipc=1 --syn_steps=60 --expert_epochs=6 --max_start_epoch=4 --lr_img=1e3 --lr_lr=1e-07 --lr_init=1e-2 --zca

Visualization

You can find the presentation poster below: CVPR_MCT_POSTER


Citation

If you find this work useful, please cite our paper:

@InProceedings{Zhong_2025_CVPR,
    author    = {Zhong, Wenliang and Tang, Haoyu and Zheng, Qinghai and Xu, Mingzhu and Hu, Yupeng and Guan, Weili},
    title     = {Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {25581-25589}
}

Acknowledgement

  • This work was supported by the iLearn-Lab at Shandong University and collaborators.
  • Code style and foundations are built upon previous works in Trajectory Matching.

License

This project is released under the Apache License 2.0.