README.md

June 21, 2026 ยท View on GitHub

๐Ÿ” A Comprehensive Study on Visual Token Redundancy for
Discrete Diffusion-based Multimodal Large Language Models

Duo Li*ย ย  Zuhao Yang*ย ย  Xiaoqin Zhangย ย  Ling Shaoย ย  Shijian Luโ€ 

*Equal contribution ย ย  โ€ Corresponding author

Paper Code


โœจ Highlights

  • ๐ŸŽ‰ Our paper has been accepted to CVPR 2026 Findings.

Overview ยท Environment ยท Methods ยท Evaluation ยท Compression Code ยท Citation ยท License

๐Ÿ”ญ Overview

We conduct a systematic study of whether visual token redundancy exists in prevalent dMLLMs, how visual token pruning affects inference accuracy and efficiency, and how these observations can guide effective pruning strategies. Our analysis covers both from-scratch and AR-to-diffusion dMLLMs across different architectures, tasks, retention ratios, and pruning schedules.

Overview of visual token redundancy and pruning behavior in dMLLMs

๐Ÿ› ๏ธ Environment Setup

The environments for both models are identical to those used by their official repositories. Please follow the corresponding installation instructions:

Follow the official README files to install the dependencies, download the model checkpoints, and configure Hugging Face access. No additional project-specific dependencies are required.

๐Ÿงฉ Compression Methods

The eval directories of both LaViDa and LLaDA-V contain implementations of the following six visual token compression methods:

MethodImplementation directoryCategory
DivPrunellava_divpruneDiversity-aware pruning
FastVllava_fastvAttention-based pruning
SparseVLMllava_sparsevlmText-guided pruning
ToMellava_tomeToken merging
TRIMllava_trimText-relevant token reduction
VTWllava_vtwVisual token weighting

The evaluation framework imports the implementation from a directory named llava. Before evaluating a compression method, temporarily rename its llava_<method> directory to llava.

Important

Only one implementation can be active at a time. Back up or rename the existing llava directory before switching methods. Do not overwrite it.

For example, to evaluate DivPrune with LaViDa:

cd LaViDa/eval
mv llava llava_base
mv llava_divprune llava

bash run_dream.sh

# Restore the directory names after evaluation.
mv llava llava_divprune
mv llava_base llava

The procedure is the same for LLaDA-V:

cd LLaDA-V/eval
mv llava llava_base
mv llava_divprune llava

bash scripts/evaluate.sh

# Restore the directory names after evaluation.
mv llava llava_divprune
mv llava_base llava

Replace llava_divprune with the directory name of any other method to evaluate that implementation.

๐Ÿš€ Evaluation

๐ŸŒ™ LaViDa

The LaViDa evaluation entry point is:

LaViDa/eval/run_dream.sh

After selecting a compression implementation, run:

cd LaViDa/eval
bash run_dream.sh

The tasks, GPUs, process count, and output directory can be configured through environment variables:

CUDA_VISIBLE_DEVICES=0,1 \
NUM_PROCESSES=2 \
TASK_NAMES=mme,chartqa \
OUTPUT_PATH=exp/lavida_eval \
bash run_dream.sh

The default model is jacklishufan/lavida-dream-v1.0-instruct, and the default task is mme.

๐ŸŒŒ LLaDA-V

The LLaDA-V evaluation entry point is:

LLaDA-V/eval/scripts/evaluate.sh

After selecting a compression implementation, run:

cd LLaDA-V/eval
bash scripts/evaluate.sh

The tasks, GPUs, and output directory can be configured through environment variables:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
TASK_NAMES=mme,chartqa \
OUTPUT_PATH=exp/llada_v_eval \
bash scripts/evaluate.sh

The default model is GSAI-ML/LLaDA-V, and the default task is mme.

The current script sets accelerate --num_processes to 8. To use a different number of GPUs, update this argument in the script accordingly.

๐Ÿง  Compression Code

The following paths use llava_divprune as an example. Implementations of the other methods are located in their corresponding llava_<method> directories.

๐ŸŒ™ LaViDa

The main modifications are located in:

LaViDa/eval/llava_divprune/model/llava_arch.py
LaViDa/eval/llava_divprune/model/language_model/dream/generation_utils.py
LaViDa/eval/llava_divprune/model/language_model/dream/modeling_dream.py

To enable compression, set the following flag to True:

START_COMPRESSION_MODE = True

The flag is currently defined in:

LaViDa/eval/llava_divprune/model/llava_arch.py
LaViDa/eval/llava_divprune/model/language_model/dream/generation_utils.py

๐ŸŒŒ LLaDA-V

The main modifications are located in:

LLaDA-V/eval/llava_divprune/model/llava_arch.py
LLaDA-V/eval/llava_divprune/model/language_model/modeling_llada.py

To enable compression, set the flag in both files to:

START_COMPRESSION_MODE = True

To evaluate the uncompressed baseline, set the relevant flags to False or use the original llava implementation.

๐Ÿ™ Acknowledgements

This project is built upon the excellent open-source implementations of LaViDa, LLaDA-V, and lmms-eval. We also thank the authors of the six visual token compression methods evaluated in this study.

๐Ÿ“ Citation

If you find this project helpful, please consider citing our paper:

@InProceedings{Li_2026_CVPR,
    author    = {Li, Duo and Yang, Zuhao and Zhang, Xiaoqin and Shao, Ling and Lu, Shijian},
    title     = {A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
    month     = {June},
    year      = {2026},
    pages     = {2823--2833}
}

๐Ÿ“œ License

This project uses the repository-level LICENSE. Code derived from LaViDa, LLaDA-V, and the compression methods remains subject to the respective original licenses.