Task-Specific Zero-shot Quantization-Aware Training for Object Detection
September 11, 2025 Β· View on GitHub
Note
π’ Unified and Extensible! This repo offers a plug-and-play Zero-Shot QAT framework for four major object detection backbones β YOLOv5, YOLO11, Mask R-CNN, and ViT. Ideal for researchers and practitioners looking to build on a robust and versatile foundation.
π₯ Accepted at ICCV 2025!
Overview
We introduce a novel framework that enables task-specific zero-shot quantization of object detection networks, ensuring high performance even in the absence of real data or retraining resources.
Our contributions are threefold:
-
Diagnosing the Limitations of Task-Agnostic Calibration:
We show that task-agnostic synthetic data often fails to reflect the structure of object detection tasks. Our method emphasizes the use of task-specific synthetic data, better aligning with real-world detection patterns. -
Task-aware Synthetic Image Generation for Detection:
We propose a new bounding-box sampling method that synthesizes object-centric images by reconstructing object category distributions, spatial layouts, and size priorsβwithout relying on any real data. -
Task-specific Quantized Network Distillation:
We integrate task-aware fine-tuning into the quantized network distillation process, restoring detection performance while maintaining the compactness and efficiency of quantized models.
π Repository Structure
This repository is organized as a monorepo containing implementations for four popular detection backbones. Each subdirectory includes code and instructions for reproducing our experiments on that architecture.
| Directory | Description |
|---|---|
resnet | Experiments using ResNet-based Mask R-CNN |
vit | Experiments using ViT-based (Transformer) Mask R-CNN |
yolo11 | Experiments on the YOLOv11 series |
yolov5 | Experiments on the YOLOv5 series |
tools | Utility scripts for dataset preparation |
Note: These submodules may require different environments or dependencies. Please refer to the README.md in each subdirectory for setup and usage instructions.
π Pretrained Model
Please refer to the huggingface page to see the pretrained model of YOLOv5, YOLO11 and ViT series.
π TODO
- Release pretrained weights of Resnet based masked rcnn models.
- Release instructions of the evaluation of models.
- Convert model to a universal and ergomatic format.
π License
This project includes code from Ultralytics and is therefore distributed under the AGPL license in compliance with their licensing terms.
π€ Contribution
We welcome community contributions!
If you find a bug or have an enhancement idea, feel free to open an issue or submit a pull request.
π Citation
If you find our work helpful, please consider citing:
@misc{li2025taskspecificzeroshotquantizationawaretraining,
title={Task-Specific Zero-shot Quantization-Aware Training for Object Detection},
author={Changhao Li and Xinrui Chen and Ji Wang and Kang Zhao and Jianfei Chen},
year={2025},
eprint={2507.16782},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.16782},
}