Task-Specific Zero-shot Quantization-Aware Training for Object Detection

September 11, 2025 · View on GitHub

Task-Specific Zero-shot Quantization-Aware Training for Object Detection

Note

📢 Unified and Extensible! This repo offers a plug-and-play Zero-Shot QAT framework for four major object detection backbones — YOLOv5, YOLO11, Mask R-CNN, and ViT. Ideal for researchers and practitioners looking to build on a robust and versatile foundation.

🔥 Accepted at ICCV 2025!

Overview

We introduce a novel framework that enables task-specific zero-shot quantization of object detection networks, ensuring high performance even in the absence of real data or retraining resources.

Our contributions are threefold:

Diagnosing the Limitations of Task-Agnostic Calibration:
We show that task-agnostic synthetic data often fails to reflect the structure of object detection tasks. Our method emphasizes the use of task-specific synthetic data, better aligning with real-world detection patterns.
Task-aware Synthetic Image Generation for Detection:
We propose a new bounding-box sampling method that synthesizes object-centric images by reconstructing object category distributions, spatial layouts, and size priors—without relying on any real data.
Task-specific Quantized Network Distillation:
We integrate task-aware fine-tuning into the quantized network distillation process, restoring detection performance while maintaining the compactness and efficiency of quantized models.

Pipeline Structure

📁 Repository Structure

This repository is organized as a monorepo containing implementations for four popular detection backbones. Each subdirectory includes code and instructions for reproducing our experiments on that architecture.

Directory	Description
`resnet`	Experiments using ResNet-based Mask R-CNN
`vit`	Experiments using ViT-based (Transformer) Mask R-CNN
`yolo11`	Experiments on the YOLOv11 series
`yolov5`	Experiments on the YOLOv5 series
`tools`	Utility scripts for dataset preparation

Note: These submodules may require different environments or dependencies. Please refer to the README.md in each subdirectory for setup and usage instructions.

🎀 Pretrained Model

Please refer to the huggingface page to see the pretrained model of YOLOv5, YOLO11 and ViT series.

🎈 TODO

Release pretrained weights of Resnet based masked rcnn models.
Release instructions of the evaluation of models.
Convert model to a universal and ergomatic format.

@misc{li2025taskspecificzeroshotquantizationawaretraining,
      title={Task-Specific Zero-shot Quantization-Aware Training for Object Detection}, 
      author={Changhao Li and Xinrui Chen and Ji Wang and Kang Zhao and Jianfei Chen},
      year={2025},
      eprint={2507.16782},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.16782}, 
}

Task-Specific Zero-shot Quantization-Aware Training for Object Detection

Task-Specific Zero-shot Quantization-Aware Training for Object Detection

Overview

📁 Repository Structure

🎀 Pretrained Model

🎈 TODO

📜 License

🤝 Contribution

📚 Citation