DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization
April 24, 2026 ยท View on GitHub
Welcome to the official code repository for "DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization".
๐ฐ News
- [2026/04/21] ๐ Our DuQuant++ code is released!
- [2026/04/21] ๐ Our DuQuant++ paper is available on arXiv!
- [2024/09/26] ๐ Our DuQuant paper has been accepted for an Oral presentation at NeurIPS 2024!
๐ Introduction
DuQuant++ extends DuQuant to the MXFP4 (Microscaling FP4) quantization format, achieving state-of-the-art W4A4 quantization performance for LLMs with fine-grained rotation transformations.
Key features:
- MXFP4 W4A4 quantization with block_size=32 aligned to MXFP4 group size
- Fine-grained rotation transformation for outlier distribution
- Optional GPTQ compensation for further accuracy improvement
- Support for LLaMA-3 model families
๐ง Installation
conda create -n duquant python=3.10 -y
conda activate duquant
pip install --upgrade pip
pip install -r requirements.txt
โ๏ธ Usage
1. Preprocessing
# Generate rotation matrices (run once for all models)
python get_rot.py
# Generate activation scales and shifts (run once per model)
python generate_act_scale_shift.py --model meta-llama/Llama-3-8B
2. Quantization & Evaluation
The bash script for DuQuant++ can be found in run.sh.
# DuQuant++ (without GPTQ)
python main.py \
--block_size 32 \
--max_rotation_step 256 \
--wbits 4 \
--abits 4 \
--model meta-llama/Llama-3-8B \
--alpha 0.6 \
--smooth \
--eval_ppl \
--bath_size 64 \
--tasks arc_easy,arc_challenge,winogrande,hellaswag,openbookqa,lambada_openai,piqa
# DuQuant++* (with GPTQ)
python main.py \
--block_size 32 \
--max_rotation_step 256 \
--wbits 4 \
--abits 4 \
--model meta-llama/Llama-3-8B \
--alpha 0.6 \
--gptq \
--smooth \
--eval_ppl \
--bath_size 64 \
--tasks arc_easy,arc_challenge,winogrande,hellaswag,openbookqa,lambada_openai,piqa
Explanation of arguments:
--model: the local model path or HuggingFace model name.--wbits: weight quantization bits.--abits: activation quantization bits.--block_size: the block size of rotation matrices (32 for MXFP4).--max_rotation_step: the max greedy search steps of rotation transformation.--gptq: enable GPTQ for weight error compensation.--resume: loading pre-trained DuQuant parameters.--multigpu: to inference larger network on multiple GPUs.--save_dir: saving the quantization model for further exploration.--eval_ppl: evaluating the perplexity of quantized models.--tasks: evaluating on zero-shot QA tasks (comma-separated).
3. Model Zoo
Currently, we support the following model families:
| Models | Supported |
|---|---|
| LLaMA-3 | โ |
| LLaMA-3.1 | โ |
| LLaMA-3.2 | โ |
๐ Contact
For immediate queries or further information, please open an issue or contact haokunlin2-c@my.cityu.edu.hk.
๐ Acknowledgement
This repo is built upon the following projects:
We thank the authors for their code.
๐ Citation
We kindly request that you cite our work if you utilize the code or reference our findings in your research:
@article{lin2026duquant++,
title={DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization},
author={Lin, Haokun and Jia, Xinle and Xu, Haobo and Yao, Bingchen and Guo, Xianglong and Wu, Yichen and Lu, Zhichao and Wei, Ying and Zhang, Qingfu and Sun, Zhenan},
journal={arXiv preprint arXiv:2604.17789},
year={2026}
}
@article{lin2024duquant,
title={DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs},
author={Lin, Haokun and Xu, Haobo and Wu, Yichen and Cui, Jingzhi and Zhang, Yingtao and Mou, Linzhan and Song, Linqi and Sun, Zhenan and Wei, Ying},
journal={arXiv preprint arXiv:2406.01721},
year={2024}
}