๐Ÿš€PTQ4ARVG: Post-Training Quantization for AutoRegressive Visual Generation Models

February 3, 2026 ยท View on GitHub

arXiv GitHub Stars

๐Ÿ“– Introduction

This repository contains the official PyTorch implementation for the paper "PTQ4ARVG: Post-Training Quantization for AutoRegressive Visual Generation Models". Our study reveals the quantization challenges of AutoRegressive Visual Generation Models (ARVG) across the channel, token, and sample dimensions. Correspondingly, we propose PTQ4ARVG, a training-free and hardware-friendly PTQ framework tailored for ARVG models.

To the best of our knowledge, this work is the first to propose a theoretically supported scaling-based solution for handling outliers. It also represents the first complete PTQ framework specifically designed for ARVG models. We hope our work will further advance the research and applicability of ARVG models.

๐Ÿ”น Challenges

๐Ÿ”น Methods

๐Ÿ”“ Getting Started

๐Ÿ—๏ธ Installation

Clone this repository, and then create and activate a suitable conda environment named arvg by using the following command:

git clone https://github.com/BienLuky/PTQ4ARVG.git
cd PTQ4ARVG
conda create --name arvg python=3.10
conda activate arvg
pip install -r requirements.txt

๐Ÿ”ง Usage

๐Ÿ“ Evalution

We provide a evalution script for reproducing the generation results on ImageNet-1K benchmark (the same as guided-diffusion), where /PATH/TO/YOU/NPZ/ refers to the .npz file containing the generated images.

cd evaluation
wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/imagenet/256/VIRTUAL_imagenet256_labeled.npz
python ./evaluation/test.py ./evaluation/VIRTUAL_imagenet256_labeled.npz /PATH/TO/YOU/NPZ/

๐Ÿ› ๏ธ Deployment

The quantized models are deployed by utilizing CUTLASS and the same deployment toolkit of SmoothQuant. The specifical implementation is based on the open-source project torch_quantizer. The accelerated evaluation in both the repository and the paper is conducted with a sequence length of 256.

๐Ÿ–ผ๏ธ Random samples

RAR-XL (2.99ร— Acceleration)

NOTE: Random samples of RAR-XL with 6-bit quantization.

VAR-d16 (2.92ร— Acceleration)

NOTE: Random samples of VAR-d16 with 6-bit quantization.

๐Ÿ“ˆ Speedup and Memory Saving

Deployment (RTX 3090 GPU)

NOTE: Inference latency and peak memory usage are evaluated with a batch size of 100 across varying token sequence lengths.

๐Ÿ“Š Main Results

RAR Results

NOTE: Quarot experiments are excluded from RAR-XXL results as the model does not meet Quarot's requirements.

VAR Results
PAR Results

NOTE: Quarot experiments are excluded from PAR results as the models do not meet Quarot's requirements.

MAR Results

๐Ÿ“š Citation

If you find PTQ4ARVG is useful in your research or applications, please consider giving us a star ๐ŸŒŸ and citing it by the following BibTeX entry.

@article{liu2026ptq4arvg,
  title={PTQ4ARVG: Post-Training Quantization for AutoRegressive Visual Generation Models},
  author={Liu, Xuewen and Li, Zhikai and Zhang, Jing and Chen, Mengjuan and Gu, Qingyi},
  journal={arXiv preprint arXiv:2601.21238},
  year={2026}
}

๐Ÿ’™ Acknowledgments

The development of PTQ4ARVG is based on RepQ-ViT and SmoothQuant. We deeply appreciate their contributions to the community.