IBQ.md
October 22, 2025 ยท View on GitHub
Scalable Image Tokenization with Index Backpropagation Quantization
Fengyuan Shi*, Zhuoyan Luo*, Yixiao Ge, Yujiu Yang, Ying Shan, Limin Wang
Nanjing University, Tsinghua University, ARC Lab Tencent PCG
This is the official repository for Index Backpropagation Quantization (IBQ), a novel vector quantization (VQ) method that revolutionizes the scalability and performance of visual tokenizers.
Highlights
- ๐ Scalable Visual Tokenizers: IBQ enables scalable training of visual tokenizers, and achieves a large-scale codebook of size (262144) and high-dimensional embeddings (256), ensuring high utilization.
- ๐ก Innovative Approach: Unlike conventional VQ methods prone to codebook collapse due to the partial-updating, IBQ leverages a straight-through estimator on the categorical distribution, enabling the joint optimization of all codebook embeddings and the visual encoder, for consistent latent space.
- ๐ Superior Performance: Demonstrates competitive results on ImageNet:
- Reconstruction: 1.00 rFID, outperforming Open-MAGVIT2 (1.17 rFID)
- Autoregressive Visual Generation: 2.05 gFID, outperforming previous vanilla autoregressive transformers.
This repository provides the scripts and checkpoints to replicate our results.
๐ฅ Quick Start
Class Conditional Image Generation
Stage I: Training of Visual Tokenizer
๐ Training Scripts
bash scripts/train_tokenizer/IBQ/run_16384.sh MASTER_ADDR MASTER_PORT NODE_RANK
bash scripts/train_tokenizer/IBQ/run_262144.sh MASTER_ADDR MASTER_PORT NODE_RANK
๐บ Performance and Models
| Method | #Tokens | Codebook Size | rFID | LPIPS | Codebook Utilization | Checkpoint |
|---|---|---|---|---|---|---|
| IBQ | 16 16 | 1024 | 2.24 | 0.2580 | 99% | Tokenizer-1024 |
| IBQ | 16 16 | 8192 | 1.87 | 0.2437 | 98% | Tokenizer-8192 |
| IBQ | 16 16 | 16384 | 1.37 | 0.2235 | 96% | Tokenizer-16384 |
| IBQ | 16 16 | 262144 | 1.00 | 0.2030 | 84% | Tokenizer-262144 |
๐ Evaluation Scripts
bash scripts/evaluation/evaluation_256.sh
Stage II: Training of Auto-Regressive Models
๐ Training Scripts
Please see in scripts/train_autogressive/run.sh for different model configurations.
bash scripts/train_autogressive/run.sh MASTER_ADDR MASTER_PORT NODE_RANK
๐ Sample Scripts
Please see in scripts/train_autogressive/run.sh for different sampling hyper-parameters for different scale of models.
bash scripts/evaluation/sample_npu.sh or scripts/evaluation/sample_gpu.sh Your_Total_Rank
๐บ Performance and Models
| Method | Params | #Tokens | FID | IS | Checkpoint |
|---|---|---|---|---|---|
| IBQ | 342M | 16 16 | 2.88 | 254.73 | AR_256_B |
| IBQ | 649M | 16 16 | 2.45 | 267.48 | AR_256_L |
| IBQ | 1.1B | 16 16 | 2.14 | 278.99 | AR_256_XL |
| IBQ | 2.1B | 16 16 | 2.05 | 286.73 | AR_256_XXL |
Text-conditional Image Generation
Stage I: Training of Visual Tokenizer
Data Preparation
We use CapFusion, LAION-COCO, CC12M, CC3M, LAION-HD, LAION-Aesthetic-umap, LAION-Aesthetic-v2 and JourneyDB for Pretraining.
๐ Training Scripts
bash scripts/train_tokenizer/IBQ/pretrain_256.sh MASTER_ADDR MASTER_PORT NODE_RANK
๐ Evaluation Scripts
bash scripts/evaluation/evaluation_256.sh
๐บ Performance comparison and Models
| Method | Quantizer Type | Training Data | Ratio | Resolution | Codebook Size | Checkpoint | rFID(COCO) | PSNR(COCO) | SSIM(COCO) | rFID(In1k) | PSNR(In1k) | SSIM(In1k) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LlamaGen | VQ | 70M | 16 | 256 256 | 16384 | - | 8.40 | 20.28 | 0.55 | 2.47 | 20.65 | 0.54 |
| Show-o | LFQ | 35M | 16 | 256 256 | 8192 | - | 9.26 | 20.90 | 0.59 | 3.50 | 21.34 | 0.59 |
| Cosmos | FSQ | - | 16 | 256 256 | 64000 | - | 11.97 | 19.22 | 0.48 | 4.57 | 19.93 | 0.49 |
| Open-MAGVIT2 | LFQ | 100M | 16 | 256 256 | 16384 | Open-MAGVIT2_Pretrain_256_16384 | 7.93 | 22.21 | 0.62 | 2.55 | 22.21 | 0.62 |
| Open-MAGVIT2 | LFQ | 100M | 16 | 256 256 | 262144 | Open-MAGVIT2_Pretrain_256_262144 | 6.76 | 22.31 | 0.65 | 1.67 | 22.70 | 0.64 |
| IBQ | IBQ | 200M | 16 | 256 256 | 16384 | IBQ_Pretrain_256_16384 | 7.67 | 21.58 | 0.62 | 2.06 | 22.01 | 0.61 |
| IBQ | IBQ | 200M | 16 | 256 256 | 262144 | IBQ_Pretrain_256_262144 | 6.79 | 22.28 | 0.65 | 1.53 | 22.69 | 0.64 |