IBQ.md

October 22, 2025 ยท View on GitHub

Scalable Image Tokenization with Index Backpropagation Quantization

Fengyuan Shi*, Zhuoyan Luo*, Yixiao Ge, Yujiu Yang, Ying Shan, Limin Wang
Nanjing University, Tsinghua University, ARC Lab Tencent PCG

arXivย 

This is the official repository for Index Backpropagation Quantization (IBQ), a novel vector quantization (VQ) method that revolutionizes the scalability and performance of visual tokenizers.

Highlights

  • ๐Ÿš€ Scalable Visual Tokenizers: IBQ enables scalable training of visual tokenizers, and achieves a large-scale codebook of size (262144) and high-dimensional embeddings (256), ensuring high utilization.
  • ๐Ÿ’ก Innovative Approach: Unlike conventional VQ methods prone to codebook collapse due to the partial-updating, IBQ leverages a straight-through estimator on the categorical distribution, enabling the joint optimization of all codebook embeddings and the visual encoder, for consistent latent space.
  • ๐Ÿ† Superior Performance: Demonstrates competitive results on ImageNet:
    • Reconstruction: 1.00 rFID, outperforming Open-MAGVIT2 (1.17 rFID)
    • Autoregressive Visual Generation: 2.05 gFID, outperforming previous vanilla autoregressive transformers.

This repository provides the scripts and checkpoints to replicate our results.

๐Ÿ”ฅ Quick Start

Class Conditional Image Generation

Stage I: Training of Visual Tokenizer

๐Ÿš€ Training Scripts
bash scripts/train_tokenizer/IBQ/run_16384.sh MASTER_ADDR MASTER_PORT NODE_RANK
bash scripts/train_tokenizer/IBQ/run_262144.sh MASTER_ADDR MASTER_PORT NODE_RANK
๐Ÿบ Performance and Models
Method#TokensCodebook SizerFIDLPIPSCodebook UtilizationCheckpoint
IBQ16 ร—\times 1610242.240.258099%Tokenizer-1024
IBQ16 ร—\times 1681921.870.243798%Tokenizer-8192
IBQ16 ร—\times 16163841.370.223596%Tokenizer-16384
IBQ16 ร—\times 162621441.000.203084%Tokenizer-262144
๐Ÿš€ Evaluation Scripts
bash scripts/evaluation/evaluation_256.sh

Stage II: Training of Auto-Regressive Models

๐Ÿš€ Training Scripts

Please see in scripts/train_autogressive/run.sh for different model configurations.

bash scripts/train_autogressive/run.sh MASTER_ADDR MASTER_PORT NODE_RANK
๐Ÿš€ Sample Scripts

Please see in scripts/train_autogressive/run.sh for different sampling hyper-parameters for different scale of models.

bash scripts/evaluation/sample_npu.sh or scripts/evaluation/sample_gpu.sh Your_Total_Rank
๐Ÿบ Performance and Models
MethodParams#TokensFIDISCheckpoint
IBQ342M16 ร—\times 162.88254.73AR_256_B
IBQ649M16 ร—\times 162.45267.48AR_256_L
IBQ1.1B16 ร—\times 162.14278.99AR_256_XL
IBQ2.1B16 ร—\times 162.05286.73AR_256_XXL

Text-conditional Image Generation

Stage I: Training of Visual Tokenizer

Data Preparation

We use CapFusion, LAION-COCO, CC12M, CC3M, LAION-HD, LAION-Aesthetic-umap, LAION-Aesthetic-v2 and JourneyDB for Pretraining.

๐Ÿš€ Training Scripts
bash scripts/train_tokenizer/IBQ/pretrain_256.sh MASTER_ADDR MASTER_PORT NODE_RANK
๐Ÿš€ Evaluation Scripts
bash scripts/evaluation/evaluation_256.sh
๐Ÿบ Performance comparison and Models
MethodQuantizer TypeTraining DataRatioResolutionCodebook SizeCheckpointrFID(COCO)PSNR(COCO)SSIM(COCO)rFID(In1k)PSNR(In1k)SSIM(In1k)
LlamaGenVQ70M16256 ร—\times 25616384-8.4020.280.552.4720.650.54
Show-oLFQ35M16256 ร—\times 2568192-9.2620.900.593.5021.340.59
CosmosFSQ-16256 ร—\times 25664000-11.9719.220.484.5719.930.49
Open-MAGVIT2LFQ100M16256 ร—\times 25616384Open-MAGVIT2_Pretrain_256_163847.9322.210.622.5522.210.62
Open-MAGVIT2LFQ100M16256 ร—\times 256262144Open-MAGVIT2_Pretrain_256_2621446.7622.310.651.6722.700.64
IBQIBQ200M16256 ร—\times 25616384IBQ_Pretrain_256_163847.6721.580.622.0622.010.61
IBQIBQ200M16256 ร—\times 256262144IBQ_Pretrain_256_2621446.7922.280.651.5322.690.64