IBQ.md

October 22, 2025 · View on GitHub

Scalable Image Tokenization with Index Backpropagation Quantization

Fengyuan Shi*, Zhuoyan Luo*, Yixiao Ge, Yujiu Yang, Ying Shan, Limin Wang
Nanjing University, Tsinghua University, ARC Lab Tencent PCG

This is the official repository for Index Backpropagation Quantization (IBQ), a novel vector quantization (VQ) method that revolutionizes the scalability and performance of visual tokenizers.

Highlights

🚀 Scalable Visual Tokenizers: IBQ enables scalable training of visual tokenizers, and achieves a large-scale codebook of size (262144) and high-dimensional embeddings (256), ensuring high utilization.
💡 Innovative Approach: Unlike conventional VQ methods prone to codebook collapse due to the partial-updating, IBQ leverages a straight-through estimator on the categorical distribution, enabling the joint optimization of all codebook embeddings and the visual encoder, for consistent latent space.
🏆 Superior Performance: Demonstrates competitive results on ImageNet:
- Reconstruction: 1.00 rFID, outperforming Open-MAGVIT2 (1.17 rFID)
- Autoregressive Visual Generation: 2.05 gFID, outperforming previous vanilla autoregressive transformers.

This repository provides the scripts and checkpoints to replicate our results.

🔥 Quick Start

Class Conditional Image Generation

Stage I: Training of Visual Tokenizer

🚀 Training Scripts

$256\times 256$ Tokenizer Training

bash scripts/train_tokenizer/IBQ/run_16384.sh MASTER_ADDR MASTER_PORT NODE_RANK

bash scripts/train_tokenizer/IBQ/run_262144.sh MASTER_ADDR MASTER_PORT NODE_RANK

🍺 Performance and Models

Method	#Tokens	Codebook Size	rFID	LPIPS	Codebook Utilization	Checkpoint
IBQ	16 $\times$ 16	1024	2.24	0.2580	99%	Tokenizer-1024
IBQ	16 $\times$ 16	8192	1.87	0.2437	98%	Tokenizer-8192
IBQ	16 $\times$ 16	16384	1.37	0.2235	96%	Tokenizer-16384
IBQ	16 $\times$ 16	262144	1.00	0.2030	84%	Tokenizer-262144

🚀 Evaluation Scripts

bash scripts/evaluation/evaluation_256.sh

Stage II: Training of Auto-Regressive Models

🚀 Training Scripts

Please see in scripts/train_autogressive/run.sh for different model configurations.

bash scripts/train_autogressive/run.sh MASTER_ADDR MASTER_PORT NODE_RANK

🚀 Sample Scripts

Please see in scripts/train_autogressive/run.sh for different sampling hyper-parameters for different scale of models.

bash scripts/evaluation/sample_npu.sh or scripts/evaluation/sample_gpu.sh Your_Total_Rank

🍺 Performance and Models

Method	Params	#Tokens	FID	IS	Checkpoint
IBQ	342M	16 $\times$ 16	2.88	254.73	AR_256_B
IBQ	649M	16 $\times$ 16	2.45	267.48	AR_256_L
IBQ	1.1B	16 $\times$ 16	2.14	278.99	AR_256_XL
IBQ	2.1B	16 $\times$ 16	2.05	286.73	AR_256_XXL

Text-conditional Image Generation

Stage I: Training of Visual Tokenizer

Data Preparation

We use CapFusion, LAION-COCO, CC12M, CC3M, LAION-HD, LAION-Aesthetic-umap, LAION-Aesthetic-v2 and JourneyDB for Pretraining.

🚀 Training Scripts

bash scripts/train_tokenizer/IBQ/pretrain_256.sh MASTER_ADDR MASTER_PORT NODE_RANK

🚀 Evaluation Scripts

$256\times 256$ Tokenizer Evaluation

bash scripts/evaluation/evaluation_256.sh

🍺 Performance comparison and Models

Method	Quantizer Type	Training Data	Ratio	Resolution	Codebook Size	Checkpoint	rFID(COCO)	PSNR(COCO)	SSIM(COCO)	rFID(In1k)	PSNR(In1k)	SSIM(In1k)
LlamaGen	VQ	70M	16	256 $\times$ 256	16384	-	8.40	20.28	0.55	2.47	20.65	0.54
Show-o	LFQ	35M	16	256 $\times$ 256	8192	-	9.26	20.90	0.59	3.50	21.34	0.59
Cosmos	FSQ	-	16	256 $\times$ 256	64000	-	11.97	19.22	0.48	4.57	19.93	0.49
Open-MAGVIT2	LFQ	100M	16	256 $\times$ 256	16384	Open-MAGVIT2_Pretrain_256_16384	7.93	22.21	0.62	2.55	22.21	0.62
Open-MAGVIT2	LFQ	100M	16	256 $\times$ 256	262144	Open-MAGVIT2_Pretrain_256_262144	6.76	22.31	0.65	1.67	22.70	0.64
IBQ	IBQ	200M	16	256 $\times$ 256	16384	IBQ_Pretrain_256_16384	7.67	21.58	0.62	2.06	22.01	0.61
IBQ	IBQ	200M	16	256 $\times$ 256	262144	IBQ_Pretrain_256_262144	6.79	22.28	0.65	1.53	22.69	0.64