RedundancyLens: Revealing and Exploiting Visual Token Processing Redundancy for Efficient Decoder-Only MLLMs

October 6, 2025 · View on GitHub

RedundancyLens: Revealing and Exploiting Visual Token Processing Redundancy for Efficient Decoder-Only MLLMs

Overall Architecture

TODO List

Release the code for the model-side changes
Release more user-friendly tools and update to latest models

In the models folder, we provide modified implementations of the Attention and FFN modules for InternVL2-8B, Qwen2VL, and MiniCPM-V-2_6. Replace the corresponding files in the original model implementations with the ones provided here. Because we changed parts of the LLM stack and some models originally relied directly on Hugging Face wrappers, you may see an additional file related to the LLM.

Copyright

This repository can only be used for non-commercial research purposes.
Copyright 2025, Deep Learning and Vision Computing Lab (DLVC-Lab), South China University of Technology.

Citation

@inproceedings{li-etal-2025-redundancylens,
    title = "{R}edundancy{L}ens: Revealing and Exploiting Visual Token Processing Redundancy for Efficient Decoder-Only {MLLM}s",
    author = "Li, Hongliang  and Zhang, Jiaxin  and Liao, Wenhui  and Peng, Dezhi  and Ding, Kai  and Jin, Lianwen",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    year = "2025",
}

RedundancyLens: Revealing and Exploiting Visual Token Processing Redundancy for Efficient Decoder-Only MLLMs

TODO List

Usage

Copyright

Citation