๐PTQ4ARVG: Post-Training Quantization for AutoRegressive Visual Generation Models
February 3, 2026 ยท View on GitHub
๐ Introduction
This repository contains the official PyTorch implementation for the paper "PTQ4ARVG: Post-Training Quantization for AutoRegressive Visual Generation Models". Our study reveals the quantization challenges of AutoRegressive Visual Generation Models (ARVG) across the channel, token, and sample dimensions. Correspondingly, we propose PTQ4ARVG, a training-free and hardware-friendly PTQ framework tailored for ARVG models.
To the best of our knowledge, this work is the first to propose a theoretically supported scaling-based solution for handling outliers. It also represents the first complete PTQ framework specifically designed for ARVG models. We hope our work will further advance the research and applicability of ARVG models.
๐น Challenges
๐น Methods
๐ Getting Started
๐๏ธ Installation
Clone this repository, and then create and activate a suitable conda environment named arvg by using the following command:
git clone https://github.com/BienLuky/PTQ4ARVG.git
cd PTQ4ARVG
conda create --name arvg python=3.10
conda activate arvg
pip install -r requirements.txt
๐ง Usage
๐ Evalution
We provide a evalution script for reproducing the generation results on ImageNet-1K benchmark (the same as guided-diffusion), where /PATH/TO/YOU/NPZ/ refers to the .npz file containing the generated images.
cd evaluation
wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/imagenet/256/VIRTUAL_imagenet256_labeled.npz
python ./evaluation/test.py ./evaluation/VIRTUAL_imagenet256_labeled.npz /PATH/TO/YOU/NPZ/
๐ ๏ธ Deployment
The quantized models are deployed by utilizing CUTLASS and the same deployment toolkit of SmoothQuant. The specifical implementation is based on the open-source project torch_quantizer. The accelerated evaluation in both the repository and the paper is conducted with a sequence length of 256.
๐ผ๏ธ Random samples
RAR-XL (2.99ร Acceleration)
NOTE: Random samples of RAR-XL with 6-bit quantization.
VAR-d16 (2.92ร Acceleration)
NOTE: Random samples of VAR-d16 with 6-bit quantization.
๐ Speedup and Memory Saving
Deployment (RTX 3090 GPU)
NOTE: Inference latency and peak memory usage are evaluated with a batch size of 100 across varying token sequence lengths.
๐ Main Results
RAR Results
NOTE: Quarot experiments are excluded from RAR-XXL results as the model does not meet Quarot's requirements.
VAR Results
PAR Results
NOTE: Quarot experiments are excluded from PAR results as the models do not meet Quarot's requirements.
MAR Results
๐ Citation
If you find PTQ4ARVG is useful in your research or applications, please consider giving us a star ๐ and citing it by the following BibTeX entry.
@article{liu2026ptq4arvg,
title={PTQ4ARVG: Post-Training Quantization for AutoRegressive Visual Generation Models},
author={Liu, Xuewen and Li, Zhikai and Zhang, Jing and Chen, Mengjuan and Gu, Qingyi},
journal={arXiv preprint arXiv:2601.21238},
year={2026}
}
๐ Acknowledgments
The development of PTQ4ARVG is based on RepQ-ViT and SmoothQuant. We deeply appreciate their contributions to the community.