README.md

November 13, 2025 · View on GitHub

Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration

Yuhang Han^1*, Xuyang Liu^2*, Zihan Zhang³, Pengxiang Ding¹, Junjie Chen²,
Donglin Wang¹, Honggang Chen², Qingsen Yan^4,5, Siteng Huang^6✉

¹ Westlake University
² Sichuan University
³ Johns Hopkins University
⁴ Northwestern Polytechnical University
⁵ Shenzhen Research Institute of Northwestern Polytechnical University
⁶ Zhejiang University

🔥 News

2025.11.08 🎉🎉 Our FiCoCo and GlobalCom² have been accepted by AAAI 2026!
2025.01.10 🤗🤗 We release our latest work GlobalCom², a "global-to-local" approach for training-free acceleration of high-resolution MLLMs. Code is available!
2024.11.17 🤗🤗 We release our work FiCoCo which proposes a unified paradigm to demystify the popular works and guide the future designs of training-free token reduction for MLLMs.

👀 Overview

TLDR: This study introduces a unified "filter-correlate-compress" paradigm to streamline training-free token reduction in Multimodal Large Language Models (MLLMs), achieving up to 82.4% FLOPs reduction with minimal performance impact and outperforming existing methods across 10 benchmarks.

🛠 Preparation

Clone this repository.

git clone https://github.com/kawhiiiileo/FiCoCo.git
cd FiCoCo

Environment Setup and Preparation

 conda create -n FiCoCo python=3.10 -y
 conda activate FiCoCo
 pip install -e .

Download Multimodal Benchmark

Please follow the detailed instruction in LLaVA-Evaluation.

Download LLaVA and put them under ./liuhaotian/llava-v1.5-7b.

🚀 Run and Evaluation

To configure the FiCoCo model with these parameters, update the corresponding settings in your code or configuration file. Below is an example configuration:

For example:
merge_visual: true # Enable FiCoCo-V for visual tokens compression
AT: true # Enable FiCoCo-L for visual tokens compression
r: 42 # Compress 42 tokens per layer
control_encoding_layer: 11 # Start compression from the 12th transformer layer

Example for evaluating SQA results (r=42, control_encoding_layer=11, merge_visual=True):

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/sqa.sh

To calculate the theoretical computational efficiency shown above, we recommend the methodology presented in the work of LLM-Viewer. We deeply appreciate their outstanding contribution to this field.

🚀 Exploring Without CLS Token

Considering that some MLLM visual encoders do not involve a [CLS] token, we propose a feasible alternative. The specific results are as follows, and further details can be found in the paper.

📌 Citation

If you use FiCoCo in your research, please cite our work by using the following BibTeX entry:

@misc{han2025filtercorrelatecompresstrainingfree,
      title={Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration}, 
      author={Yuhang Han and Xuyang Liu and Zihan Zhang and Pengxiang Ding and Donglin Wang and Honggang Chen and Qingsen Yan and Siteng Huang},
      year={2025},
      eprint={2411.17686},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17686}, 
}

👍 Acknowledgment

We extend our gratitude to the open-source efforts of LLaVA, ToMe and Open-LLaVA-NeXT.

:e-mail: Contact

For any question about our paper or code, please email yuhangh984@gmail.com or liuxuyang@stu.scu.edu.cn.