๐ OD3: Optimization-free Dataset Distillation for Object Detection
January 26, 2026 ยท View on GitHub
Salwa K. Al Khatib1*, Ahmed ElHagry1*, Shitong Shao2,1*, Zhiqiang Shen1
1MBZUAI 2HKUST (Guangzhou) *Equal contributors
๐ฐ News
- [2026-01-26] Paper accepted to ICLR2026๐ฅ.
- [2025-06-13] Distilled datasets released on Hugging Face.
- [2025-06-02] Codebase released.
- [2025-06-02] ODยณ is released on arXiv.
๐ง Abstract
Training large neural networks on large-scale datasets requires substantial computational resources, particularly for dense prediction tasks such as object detection. Although dataset distillation (DD) has been proposed to alleviate these demands by synthesizing compact datasets from larger ones, most existing work focuses solely on image classification, leaving the more complex detection setting largely unexplored. In this paper, we introduce OD3, a novel optimization-free data distillation framework specifically designed for object detection. Our approach involves two stages: first, a candidate selection process in which object instances are iteratively placed in synthesized images based on their suitable locations, and second, a candidate screening process using a pre-trained observer model to remove low-confidence objects. We perform our data synthesis framework on MS COCO and PASCAL VOC, two popular detection datasets, with compression ratios ranging from 0.25% to 5%. Compared to the prior solely existing dataset distillation method on detection and conventional core set selection methods, OD3 delivers superior accuracy and establishes new state-of-the-art results, surpassing the prior best method by more than 14% on COCO mAP50 at a compression ratio of 1.0%.
โ๏ธ Installation
The code has been tested with: Python 3.9, CUDA 11.3, PyTorch 1.12.1
Follow this official guide on how to setup the openmmlab environment.
๐ฏ Pre-trained Observer
Download checkpoints for FasterRCNN-R101 and RetinaNet-R101
โโโ ./mmdetection/checkpoints/
โโโ faster_rcnn_r101_fpn_2x_coco_bbox_mAP-0.398_20200504_210455-1d2dac9c.pth
โโโ retinanet_r101_fpn_2x_coco_20200131-5560aee8.pth
๐๏ธ Dataset
COCO: train2017 | val2017
Make sure to change the data_root argument in the used config file, e.g. mmdetection/configs/dd/data_synthesis/data_synthesis_faster-rcnn_r101_fpn_coco.py, to your downloaded COCO path
๐ฌ Distillation
To distill the COCO dataset into a condensed version using OD3, run the following script with output_dir (where to save condensed coco), original_dir (the path of the downloaded MS COCO), IPD (images per dataset/compression ratio), and (optionally) model arguments, e.g. coco-1percent/ data/ms-coco/ 1184 retinanet.
sh scripts/data_synthesis.sh {output_dir} {original_dir} {IPD} {model (optional)}
You can also download distilled data from .
| dataset | IPD | files |
|---|---|---|
| MS COCO | 0.5% | images |
| MS COCO | 0.5% | images |
| MS COCO | 1.0% | images |
โ Evaluation
To evaluate performance on a condensed dataset, run the following script with training_config (PKD config file e.g. mmrazor/configs/distill/mmdet/pkd/pkd_fpn_faster-rcnn_r101_faster-rcnn_r50_2x_coco.py), num_gpus, data_annotation (path to final_soft_labels.json), data_dir (root directory of output_dir), and work_dir (directory to save logs).
sh scripts/post_training.sh {training_config} {num_gpus} {data_annotation} {data_dir} {work_dir}
๐ Acknowledgement
This codebase is built on mmdetection and mmrazor.
๐ Citation
If you find our work useful, please cite it:
@article{alkhatib2024od3,
title={OD3: Optimization-free Dataset Distillation for Object Detection},
author={Al Khatib, Salwa K. and ElHagry, Ahmed and Shao, Shitong and Shen, Zhiqiang},
journal={arXiv preprint arXiv:2506.01942},
year={2025}
}