README.md

September 16, 2025 · View on GitHub

MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

arXiv

Mingkai Jia1,2, Wei Yin2*§, Xiaotao Hu1,2, Jiaxin Guo3, Xiaoyang Guo2
Qian Zhang2, Xiao-Xiao Long4, Ping Tan1

HKUST1, Horizon Robotics2, CUHK3, NJU4
* Corresponding Author, § Project Leader

🚀News

  • [September 2025] Achieve SOTA at TokBench image reconstruction leaderboards: Beat VAEs (VA-VAE, SD-3.5, SD-XL, and FLUX.1-dev) on multiple resolutions(256p, 512p, and 1024p) on Text-Accuracy, Text-NED, and Face-Similarity metrics.
  • [August 2025] Achieve SOTA at paperwithcode leaderboards: Image Reconstruction on ImageNet and UHDBench.
  • [August 2025] Released Inference Code
  • [August 2025] Released model zoo.
  • [August 2025] Released dataset for ultra-high-definition image reconstruction evaluation. Our proposed super-resolution image reconstruction UHDBench dataset is released.
  • [July 2025] Released paper.

🔨TO DO LIST

  • Training code.
  • More demos.
  • Models & Evaluation code.
  • Huggingface models.
  • Release zero-shot reconstruction benchmarks.

🙈 Model Zoo

ModelDownsampleGroupsCodebook SizeTraining DataLink
mgvq-f8c32-g48432768imagenetlink
mgvq-f8c32-g88816384imagenetlink
mgvq-f16c32-g416432768imagenetlink
mgvq-f16c32-g816816384imagenetlink
mgvq-f16c32-g4-mix16432768mixlink
mgvq-f32c32-g8-mix32816384mixlink

🔑 Quick Start

Installation

git clone https://github.com/MKJia/MGVQ.git
cd MGVQ
pip3 install requirements.txt

Download models

Download the pretrained models from our model zoo to your /path/to/your/ckpt.

Data Preparation

Try our UHDBench dataset on huggingface and download to your /path/to/your/dataset.

Evaluation on Reconstruction

Remember to change the paths of ckpt and dataset_root, and make sure you are evaluating the expected model on dataset.

cd evaluation
python3 eval_recon.sh

Generation Demo&Evaluation

You can download the pretrained GPT model for generation on huggingface, and test it with our mgvq-f16c32-g4 tokenizer model for demo image sampling. Remember to change the paths of gpt_ckpt and vq_ckpt.

cd evaluation
python3 demo_gen.sh

We also provide our .npz file on huggingface sampled by sample_c2i_ddp.py for evaluation.

cd evaluation
python3 evaluator.py /path/to/your/VIRTUAL_imagenet256_labeled.npz /path/to/your/GPT_XXL_300ep_topk_12.npz

🗄️Demos

  • 🔥 Qualitative reconstruction images with $16 x downsampling on \2560 x \1440$ UHDBench dataset.
  • 🔥 Qualitative class-to-image generation of Imagenet. The classes are dog(Golden Retriever and Husky), cliff, and bald eagle.
  • 🔥 Reconstruction evaluation on 256×256 ImageNet benchmark.
  • 🔥 Zero-shot reconstruction evaluation with a downsample ratio of 16 on 512×512 datasets.
  • 🔥 Zero-shot reconstruction evaluation with a downsample ratio of 16 on 2560×1440 datasets.
  • 🔥 Reconstruction evaluation on TokBench.
MethodTypeFactorT-ACC(small)↑T-ACC(mean)↑T-NED(small)↑T-NED(mean)↑F-Sim(small)↑F-sim(mean)↑
FlexTokDiscrete1D0.556.957.8021.090.060.15
VQGANDiscrete160.051.104.348.220.050.10
LlamaGenDiscrete160.164.285.4114.770.070.15
OpenMagvit2Discrete160.8010.589.5927.590.080.20
VARDiscrete161.2415.7410.8934.190.100.23
VA-VAEContinuous166.9237.0425.1456.320.220.49
MGVQDiscrete1611.0843.1532.8062.290.220.47
MethodTypeFactorT-ACC(small)↑T-ACC(mean)↑T-NED(small)↑T-NED(mean)↑F-Sim(small)↑F-sim(mean)↑
LlamaGenDiscrete84.3929.4119.6949.000.170.40
OpenMagvit2Discrete89.3340.2430.8259.970.230.48
SD-3.5Continuous836.2667.0459.0480.580.430.70
FLEX.1-devContinuous850.6975.9170.7086.420.520.76
MGVQDiscrete863.8382.6580.1890.960.580.80

🗄️Demos

📌 Citation

If the paper and code from MGVQ help your research, we kindly ask you to give a citation to our paper ❤️. Additionally, if you appreciate our work and find this repository useful, giving it a star ⭐️ would be a wonderful way to support our work. Thank you very much.

@article{jia2025mgvq,
  title={MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization},
  author={Jia, Mingkai and Yin, Wei and Hu, Xiaotao and Guo, Jiaxin and Guo, Xiaoyang and Zhang, Qian and Long, Xiao-Xiao and Tan, Ping},
  journal={arXiv preprint arXiv:2507.07997},
  year={2025}
}

License

This repository is under the MIT License. For more license questions, please contact Mingkai Jia (mjiaab@connect.ust.hk) and Wei Yin (yvanwy@outlook.com).