Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory (CVPR 2025)
April 7, 2026 · View on GitHub
Official implementation of MCT, a stable and storage-efficient dataset distillation method presented at CVPR 2025.
Authors
Wenliang Zhong1, Haoyu Tang1*, Qinghai Zheng2, Mingzhu Xu1, Yupeng Hu1, Weili Guan3
1 Shandong University
2 Fuzhou University
3 Harbin Institute of Technology (Shenzhen)
* Corresponding author: tanghao258@sdu.edu.cn
Links
- Paper: https://openaccess.thecvf.com/content/CVPR2025/papers/Zhong_Towards_Stable_and_Storage-efficient_Dataset_Distillation_Matching_Convexified_Trajectory_CVPR_2025_paper.pdf
- Project Page: GitHub (iLearn-Lab/CVPR25-MCT)
- CVPR 2025 Virtual Page: Poster 34469
Table of Contents
- Updates
- Introduction
- Highlights
- Method / Framework
- Project Structure
- Installation
- Usage
- Visualization
- TODO
- Citation
- Acknowledgement
- License
Updates
- [03/2025] Initial release of the CVPR 2025 official code
- [06/2024] Initial release on arXiv
Introduction
本项目是论文 Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory (CVPR 2025) 的官方实现。
Summary
Dataset Distillation (DD) aims to synthesize a small dataset to replace a large one for efficient training. However, traditional trajectory matching methods often suffer from training instability and high storage costs for expert trajectories.
Matching Convexified Trajectory (MCT) addresses these issues by:
- Convexifying Trajectories: Transforming jagged expert trajectories into smooth, convex ones for more stable matching.
- Storage-efficient Interpolation: Storing only a few key waypoints and reconstructing trajectories via interpolation, dramatically reducing storage requirements.
This repository provides:
- Training code for generating expert buffers.
- Trajectory convexification and compression scripts.
- Core distillation scripts for synthesizing high-quality datasets.
Highlights
- Stable Training: Convexified trajectories lead to more consistent distillation performance.
- Storage Efficient: Reduces repository/storage footprint compared to full trajectory storage.
- Support for Multiple Datasets: CIFAR-10, CIFAR-100, and Tiny-ImageNet.
- CVPR 2025: Official code for reproducing the results in the paper.
Project Structure
.
├── exps/ # Ablation and visualization scripts
│ ├── trajectory_compression.py # Logic for waypoint interpolation and compression
│ ├── visualize_synthetic.py # Results visualization
│ └── ...
├── buffer.py # Script to generate expert trajectory buffers
├── convexify.py # Core logic for trajectory convexification
├── distill_compressed.py # Main distillation script using MCT
├── networks.py # Neural network architectures (ConvNet, ResNet, etc.)
├── utils.py # Evaluation, data augmentation, and helper functions
├── reparam_module.py # Module for gradient-through-parameters
├── requirements.txt # Dependencies
└── CVPR_MCT_POSTER.png # Poster overview
Installation
1. Clone the repository
git clone https://github.com/iLearn-Lab/CVPR25-MCT.git
cd CVPR25-MCT
2. Install dependencies
pip install -r requirements.txt
Usage
1. Generate Expert Buffer
First, generate the expert trajectories for the target dataset:
python buffer.py --dataset=CIFAR10 --zca
2. Compress Trajectory
Generate the convexified and compressed trajectories:
python exps/trajectory_compression.py --num_interpolation=4 --zca --dataset=CIFAR10
3. Dataset Distillation
Run the main distillation script using the compressed trajectories:
# CIFAR-10, IPC=1, with ZCA
python distill_compressed.py --dataset=CIFAR10 --ipc=1 --syn_steps=60 --expert_epochs=6 --max_start_epoch=4 --lr_img=1e3 --lr_lr=1e-07 --lr_init=1e-2 --zca
Visualization
You can find the presentation poster below:

Citation
If you find this work useful, please cite our paper:
@InProceedings{Zhong_2025_CVPR,
author = {Zhong, Wenliang and Tang, Haoyu and Zheng, Qinghai and Xu, Mingzhu and Hu, Yupeng and Guan, Weili},
title = {Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {25581-25589}
}
Acknowledgement
- This work was supported by the iLearn-Lab at Shandong University and collaborators.
- Code style and foundations are built upon previous works in Trajectory Matching.
License
This project is released under the Apache License 2.0.