README.md

September 16, 2025 · View on GitHub

MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

Mingkai Jia^1,2, Wei Yin^2*§, Xiaotao Hu^1,2, Jiaxin Guo³, Xiaoyang Guo²
Qian Zhang², Xiao-Xiao Long⁴, Ping Tan¹

HKUST¹, Horizon Robotics², CUHK³, NJU⁴
^* Corresponding Author, ^§ Project Leader

🚀News

[September 2025] Achieve SOTA at TokBench image reconstruction leaderboards: Beat VAEs (VA-VAE, SD-3.5, SD-XL, and FLUX.1-dev) on multiple resolutions(256p, 512p, and 1024p) on Text-Accuracy, Text-NED, and Face-Similarity metrics.
[August 2025] Achieve SOTA at paperwithcode leaderboards: Image Reconstruction on ImageNet and UHDBench.
[August 2025] Released Inference Code
[August 2025] Released model zoo.
[August 2025] Released dataset for ultra-high-definition image reconstruction evaluation. Our proposed super-resolution image reconstruction UHDBench dataset is released.
[July 2025] Released paper.

🙈 Model Zoo

Model	Downsample	Groups	Codebook Size	Training Data	Link
mgvq-f8c32-g4	8	4	32768	imagenet	link
mgvq-f8c32-g8	8	8	16384	imagenet	link
mgvq-f16c32-g4	16	4	32768	imagenet	link
mgvq-f16c32-g8	16	8	16384	imagenet	link
mgvq-f16c32-g4-mix	16	4	32768	mix	link
mgvq-f32c32-g8-mix	32	8	16384	mix	link

🔑 Quick Start

Installation

git clone https://github.com/MKJia/MGVQ.git
cd MGVQ
pip3 install requirements.txt

Download models

Download the pretrained models from our model zoo to your /path/to/your/ckpt.

Data Preparation

Try our UHDBench dataset on huggingface and download to your /path/to/your/dataset.

Evaluation on Reconstruction

Remember to change the paths of ckpt and dataset_root, and make sure you are evaluating the expected model on dataset.

cd evaluation
python3 eval_recon.sh

You can download the pretrained GPT model for generation on huggingface, and test it with our mgvq-f16c32-g4 tokenizer model for demo image sampling. Remember to change the paths of gpt_ckpt and vq_ckpt.

cd evaluation
python3 demo_gen.sh

We also provide our .npz file on huggingface sampled by sample_c2i_ddp.py for evaluation.

cd evaluation
python3 evaluator.py /path/to/your/VIRTUAL_imagenet256_labeled.npz /path/to/your/GPT_XXL_300ep_topk_12.npz

🗄️Demos

🔥 Qualitative reconstruction images with $16 $x downsampling on \$ 2560 $x \$ 1440$ UHDBench dataset.

🔥 Qualitative class-to-image generation of Imagenet. The classes are dog(Golden Retriever and Husky), cliff, and bald eagle.

🔥 Reconstruction evaluation on 256×256 ImageNet benchmark.

🔥 Zero-shot reconstruction evaluation with a downsample ratio of 16 on 512×512 datasets.

🔥 Zero-shot reconstruction evaluation with a downsample ratio of 16 on 2560×1440 datasets.

🔥 Reconstruction evaluation on TokBench.

Method	Type	Factor	T-ACC(small)↑	T-ACC(mean)↑	T-NED(small)↑	T-NED(mean)↑	F-Sim(small)↑	F-sim(mean)↑
FlexTok	Discrete	1D	0.55	6.95	7.80	21.09	0.06	0.15
VQGAN	Discrete	16	0.05	1.10	4.34	8.22	0.05	0.10
LlamaGen	Discrete	16	0.16	4.28	5.41	14.77	0.07	0.15
OpenMagvit2	Discrete	16	0.80	10.58	9.59	27.59	0.08	0.20
VAR	Discrete	16	1.24	15.74	10.89	34.19	0.10	0.23
VA-VAE	Continuous	16	6.92	37.04	25.14	56.32	0.22	0.49
MGVQ	Discrete	16	11.08	43.15	32.80	62.29	0.22	0.47

Method	Type	Factor	T-ACC(small)↑	T-ACC(mean)↑	T-NED(small)↑	T-NED(mean)↑	F-Sim(small)↑	F-sim(mean)↑
LlamaGen	Discrete	8	4.39	29.41	19.69	49.00	0.17	0.40
OpenMagvit2	Discrete	8	9.33	40.24	30.82	59.97	0.23	0.48
SD-3.5	Continuous	8	36.26	67.04	59.04	80.58	0.43	0.70
FLEX.1-dev	Continuous	8	50.69	75.91	70.70	86.42	0.52	0.76
MGVQ	Discrete	8	63.83	82.65	80.18	90.96	0.58	0.80

🗄️Demos

📌 Citation

If the paper and code from MGVQ help your research, we kindly ask you to give a citation to our paper ❤️. Additionally, if you appreciate our work and find this repository useful, giving it a star ⭐️ would be a wonderful way to support our work. Thank you very much.

@article{jia2025mgvq,
  title={MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization},
  author={Jia, Mingkai and Yin, Wei and Hu, Xiaotao and Guo, Jiaxin and Guo, Xiaoyang and Zhang, Qian and Long, Xiao-Xiao and Tan, Ping},
  journal={arXiv preprint arXiv:2507.07997},
  year={2025}
}

License

This repository is under the MIT License. For more license questions, please contact Mingkai Jia (mjiaab@connect.ust.hk) and Wei Yin (yvanwy@outlook.com).

README.md

MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

🚀News

🔨TO DO LIST

🙈 Model Zoo

🔑 Quick Start

Installation

Download models

Data Preparation

Evaluation on Reconstruction

Generation Demo&Evaluation

🗄️Demos

🗄️Demos

📌 Citation

License