README.md

June 21, 2026 · View on GitHub

🔍 A Comprehensive Study on Visual Token Redundancy for
Discrete Diffusion-based Multimodal Large Language Models

Duo Li^* Zuhao Yang^* Xiaoqin Zhang Ling Shao Shijian Lu^†

^*Equal contribution ^†Corresponding author

✨ Highlights

🎉 Our paper has been accepted to CVPR 2026 Findings.

Overview · Environment · Methods · Evaluation · Compression Code · Citation · License

We conduct a systematic study of whether visual token redundancy exists in prevalent dMLLMs, how visual token pruning affects inference accuracy and efficiency, and how these observations can guide effective pruning strategies. Our analysis covers both from-scratch and AR-to-diffusion dMLLMs across different architectures, tasks, retention ratios, and pruning schedules.

Overview of visual token redundancy and pruning behavior in dMLLMs

🛠️ Environment Setup

The environments for both models are identical to those used by their official repositories. Please follow the corresponding installation instructions:

Follow the official README files to install the dependencies, download the model checkpoints, and configure Hugging Face access. No additional project-specific dependencies are required.

🧩 Compression Methods

The eval directories of both LaViDa and LLaDA-V contain implementations of the following six visual token compression methods:

Method	Implementation directory	Category
DivPrune	`llava_divprune`	Diversity-aware pruning
FastV	`llava_fastv`	Attention-based pruning
SparseVLM	`llava_sparsevlm`	Text-guided pruning
ToMe	`llava_tome`	Token merging
TRIM	`llava_trim`	Text-relevant token reduction
VTW	`llava_vtw`	Visual token weighting

The evaluation framework imports the implementation from a directory named llava. Before evaluating a compression method, temporarily rename its llava_<method> directory to llava.

Important

Only one implementation can be active at a time. Back up or rename the existing llava directory before switching methods. Do not overwrite it.

For example, to evaluate DivPrune with LaViDa:

cd LaViDa/eval
mv llava llava_base
mv llava_divprune llava

bash run_dream.sh

# Restore the directory names after evaluation.
mv llava llava_divprune
mv llava_base llava

The procedure is the same for LLaDA-V:

cd LLaDA-V/eval
mv llava llava_base
mv llava_divprune llava

bash scripts/evaluate.sh

# Restore the directory names after evaluation.
mv llava llava_divprune
mv llava_base llava

Replace llava_divprune with the directory name of any other method to evaluate that implementation.

🚀 Evaluation

🌙 LaViDa

The LaViDa evaluation entry point is:

LaViDa/eval/run_dream.sh

After selecting a compression implementation, run:

cd LaViDa/eval
bash run_dream.sh

The tasks, GPUs, process count, and output directory can be configured through environment variables:

CUDA_VISIBLE_DEVICES=0,1 \
NUM_PROCESSES=2 \
TASK_NAMES=mme,chartqa \
OUTPUT_PATH=exp/lavida_eval \
bash run_dream.sh

The default model is jacklishufan/lavida-dream-v1.0-instruct, and the default task is mme.

🌌 LLaDA-V

The LLaDA-V evaluation entry point is:

LLaDA-V/eval/scripts/evaluate.sh

After selecting a compression implementation, run:

cd LLaDA-V/eval
bash scripts/evaluate.sh

The tasks, GPUs, and output directory can be configured through environment variables:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
TASK_NAMES=mme,chartqa \
OUTPUT_PATH=exp/llada_v_eval \
bash scripts/evaluate.sh

The default model is GSAI-ML/LLaDA-V, and the default task is mme.

The current script sets accelerate --num_processes to 8. To use a different number of GPUs, update this argument in the script accordingly.

🧠 Compression Code

The following paths use llava_divprune as an example. Implementations of the other methods are located in their corresponding llava_<method> directories.

🌙 LaViDa

The main modifications are located in:

LaViDa/eval/llava_divprune/model/llava_arch.py
LaViDa/eval/llava_divprune/model/language_model/dream/generation_utils.py
LaViDa/eval/llava_divprune/model/language_model/dream/modeling_dream.py

To enable compression, set the following flag to True:

START_COMPRESSION_MODE = True

The flag is currently defined in:

LaViDa/eval/llava_divprune/model/llava_arch.py
LaViDa/eval/llava_divprune/model/language_model/dream/generation_utils.py

🌌 LLaDA-V

The main modifications are located in:

LLaDA-V/eval/llava_divprune/model/llava_arch.py
LLaDA-V/eval/llava_divprune/model/language_model/modeling_llada.py

To enable compression, set the flag in both files to:

START_COMPRESSION_MODE = True

To evaluate the uncompressed baseline, set the relevant flags to False or use the original llava implementation.

🙏 Acknowledgements

This project is built upon the excellent open-source implementations of LaViDa, LLaDA-V, and lmms-eval. We also thank the authors of the six visual token compression methods evaluated in this study.

📝 Citation

If you find this project helpful, please consider citing our paper:

@InProceedings{Li_2026_CVPR,
    author    = {Li, Duo and Yang, Zuhao and Zhang, Xiaoqin and Shao, Ling and Lu, Shijian},
    title     = {A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
    month     = {June},
    year      = {2026},
    pages     = {2823--2833}
}

📜 License

This project uses the repository-level LICENSE. Code derived from LaViDa, LLaDA-V, and the compression methods remains subject to the respective original licenses.

README.md

🔍 A Comprehensive Study on Visual Token Redundancy for
Discrete Diffusion-based Multimodal Large Language Models

✨ Highlights

🧭 Navigation

🔭 Overview

🛠️ Environment Setup

🧩 Compression Methods

🚀 Evaluation

🌙 LaViDa

🌌 LLaDA-V

🧠 Compression Code

🌙 LaViDa

🌌 LLaDA-V

🙏 Acknowledgements

📝 Citation

📜 License

🔍 A Comprehensive Study on Visual Token Redundancy forDiscrete Diffusion-based Multimodal Large Language Models

🔍 A Comprehensive Study on Visual Token Redundancy for
Discrete Diffusion-based Multimodal Large Language Models