RedundancyLens: Revealing and Exploiting Visual Token Processing Redundancy for Efficient Decoder-Only MLLMs

October 6, 2025 ยท View on GitHub

RedundancyLens: Revealing and Exploiting Visual Token Processing Redundancy for Efficient Decoder-Only MLLMs

arXiv preprint Code

Overall Architecture

TODO List

  • Release the code for the model-side changes
  • Release more user-friendly tools and update to latest models

Usage

In the models folder, we provide modified implementations of the Attention and FFN modules for InternVL2-8B, Qwen2VL, and MiniCPM-V-2_6. Replace the corresponding files in the original model implementations with the ones provided here. Because we changed parts of the LLM stack and some models originally relied directly on Hugging Face wrappers, you may see an additional file related to the LLM.

Citation

@inproceedings{li-etal-2025-redundancylens,
    title = "{R}edundancy{L}ens: Revealing and Exploiting Visual Token Processing Redundancy for Efficient Decoder-Only {MLLM}s",
    author = "Li, Hongliang  and Zhang, Jiaxin  and Liao, Wenhui  and Peng, Dezhi  and Ding, Kai  and Jin, Lianwen",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    year = "2025",
}