README.md
June 21, 2026 ยท View on GitHub
๐ A Comprehensive Study on Visual Token Redundancy for
Discrete Diffusion-based Multimodal Large Language Models
Duo Li*ย ย Zuhao Yang*ย ย Xiaoqin Zhangย ย Ling Shaoย ย Shijian Luโ
*Equal contribution ย ย โ Corresponding author
โจ Highlights
- ๐ Our paper has been accepted to CVPR 2026 Findings.
๐งญ Navigation
Overview ยท Environment ยท Methods ยท Evaluation ยท Compression Code ยท Citation ยท License
๐ญ Overview
We conduct a systematic study of whether visual token redundancy exists in prevalent dMLLMs, how visual token pruning affects inference accuracy and efficiency, and how these observations can guide effective pruning strategies. Our analysis covers both from-scratch and AR-to-diffusion dMLLMs across different architectures, tasks, retention ratios, and pruning schedules.
๐ ๏ธ Environment Setup
The environments for both models are identical to those used by their official repositories. Please follow the corresponding installation instructions:
Follow the official README files to install the dependencies, download the model checkpoints, and configure Hugging Face access. No additional project-specific dependencies are required.
๐งฉ Compression Methods
The eval directories of both LaViDa and LLaDA-V contain implementations of
the following six visual token compression methods:
| Method | Implementation directory | Category |
|---|---|---|
| DivPrune | llava_divprune | Diversity-aware pruning |
| FastV | llava_fastv | Attention-based pruning |
| SparseVLM | llava_sparsevlm | Text-guided pruning |
| ToMe | llava_tome | Token merging |
| TRIM | llava_trim | Text-relevant token reduction |
| VTW | llava_vtw | Visual token weighting |
The evaluation framework imports the implementation from a directory named
llava. Before evaluating a compression method, temporarily rename its
llava_<method> directory to llava.
Important
Only one implementation can be active at a time. Back up or rename the
existing llava directory before switching methods. Do not overwrite it.
For example, to evaluate DivPrune with LaViDa:
cd LaViDa/eval
mv llava llava_base
mv llava_divprune llava
bash run_dream.sh
# Restore the directory names after evaluation.
mv llava llava_divprune
mv llava_base llava
The procedure is the same for LLaDA-V:
cd LLaDA-V/eval
mv llava llava_base
mv llava_divprune llava
bash scripts/evaluate.sh
# Restore the directory names after evaluation.
mv llava llava_divprune
mv llava_base llava
Replace llava_divprune with the directory name of any other method to
evaluate that implementation.
๐ Evaluation
๐ LaViDa
The LaViDa evaluation entry point is:
LaViDa/eval/run_dream.sh
After selecting a compression implementation, run:
cd LaViDa/eval
bash run_dream.sh
The tasks, GPUs, process count, and output directory can be configured through environment variables:
CUDA_VISIBLE_DEVICES=0,1 \
NUM_PROCESSES=2 \
TASK_NAMES=mme,chartqa \
OUTPUT_PATH=exp/lavida_eval \
bash run_dream.sh
The default model is jacklishufan/lavida-dream-v1.0-instruct, and the
default task is mme.
๐ LLaDA-V
The LLaDA-V evaluation entry point is:
LLaDA-V/eval/scripts/evaluate.sh
After selecting a compression implementation, run:
cd LLaDA-V/eval
bash scripts/evaluate.sh
The tasks, GPUs, and output directory can be configured through environment variables:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
TASK_NAMES=mme,chartqa \
OUTPUT_PATH=exp/llada_v_eval \
bash scripts/evaluate.sh
The default model is GSAI-ML/LLaDA-V, and the default task is mme.
The current script sets accelerate --num_processes to 8. To use a
different number of GPUs, update this argument in the script accordingly.
๐ง Compression Code
The following paths use llava_divprune as an example. Implementations of
the other methods are located in their corresponding llava_<method>
directories.
๐ LaViDa
The main modifications are located in:
LaViDa/eval/llava_divprune/model/llava_arch.py
LaViDa/eval/llava_divprune/model/language_model/dream/generation_utils.py
LaViDa/eval/llava_divprune/model/language_model/dream/modeling_dream.py
To enable compression, set the following flag to True:
START_COMPRESSION_MODE = True
The flag is currently defined in:
LaViDa/eval/llava_divprune/model/llava_arch.py
LaViDa/eval/llava_divprune/model/language_model/dream/generation_utils.py
๐ LLaDA-V
The main modifications are located in:
LLaDA-V/eval/llava_divprune/model/llava_arch.py
LLaDA-V/eval/llava_divprune/model/language_model/modeling_llada.py
To enable compression, set the flag in both files to:
START_COMPRESSION_MODE = True
To evaluate the uncompressed baseline, set the relevant flags to False or
use the original llava implementation.
๐ Acknowledgements
This project is built upon the excellent open-source implementations of LaViDa, LLaDA-V, and lmms-eval. We also thank the authors of the six visual token compression methods evaluated in this study.
๐ Citation
If you find this project helpful, please consider citing our paper:
@InProceedings{Li_2026_CVPR,
author = {Li, Duo and Yang, Zuhao and Zhang, Xiaoqin and Shao, Ling and Lu, Shijian},
title = {A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
month = {June},
year = {2026},
pages = {2823--2833}
}
๐ License
This project uses the repository-level LICENSE. Code derived from LaViDa, LLaDA-V, and the compression methods remains subject to the respective original licenses.