README.md

December 9, 2025 · View on GitHub

FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models

Jintao Tong¹, Wenwei Jin², Pengda Qin², Anqi Li³, Yixiong Zou^1✉,
Yuhong Li^2✉, Yuhua Li¹, Ruixuan Li¹

¹School of Computer Science and Technology, Huazhong University of Science and Technology
²Xiaohongshu Inc., ³Shanghai Jiao Tong University

🔥 News

2025.09.18 🔥 Our FlowCut is accepted at NeurIPS 2025 !
2025.07.01 🚀 We release the implementation of FlowCut for Qwen2-VL! See details here.
2025.05.29 🤗 The checkpoints of llava-v1.5-7b-flowcut128 and llava-v1.5-7b-flowcut192, retaining 128 and 192 visual tokens respectively, have been released!
2025.05.28 🚀 Code is available, and FlowCut can be easily installed with pip install flowcut！
2025.05.26 📝 We release our latest work FlowCut, a plug-and-play, training-free token reduction method that seamlessly integrates into various VLMs for efficient training and inference.

💡 Highlights

mask

TLDR: To address inefficiency from excessive visual tokens in LVLMs, we propose a unified, bottom-up perspective based on information-flow, revealing dynamic redundancy emergence and introduce FlowCut, making pruning decision aligned with the model's inherent behavior, outperforming all existing approaches.

🛠 Preparation

Our code is easy to use.

Clone the LLaVA's repository.

git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA

Install the LLaVA's environment.

conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  
pip install -e .
pip install flash-attn --no-build-isolation

For formal usage, you can install the package from PyPI by running the following command:

pip install flowcut

For development, you can install the package by cloning the repository and running the following command:

git clone https://github.com/TungChintao/FlowCut
cd flowcut
pip install -e .

File organization as follow:

├── LLaVA-main
    ├── flowcut
    ├── llava
    ├── playground
    ├── script

🚀 Quick Start

from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_model
from flowcut import flowcut
model_path = "liuhaotian/llava-v1.5-7b"

tokenizer, model, image_processor, context_len = load_pretrained_model(
    model_path=model_path,
    model_base=None,
    model_name=get_model_name_from_path(model_path)
)
## FlowCut retains 64 visual tokens
model = flowcut(model, target_num=64)

📖 Evaluation

The evaluation code follows the structure of LLaVA or Lmms-Eval. After loading the model, simply add two lines as shown below:

## Load LLaVA Model (code from llava.eval.model_vqa_loader)
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, args.model_base, model_name)
## add FlowCut
from flowcut import flowcut
model = flowcut(model, target_num=64)

Script templetes (please follow the detailed instruction in LLaVA-Evaluation).

bash scripts/v1_5/eval/[Benchmark].sh

Examples:

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mme.sh

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash scripts/v1_5/eval/vqav2.sh

🎯 Training

The training code follows the structure of LLaVA. After loading the model, simply add two lines as shown below:

## Load LLaVA Model (code from llava.train)
code of loading model...
## add FlowCut
from flowcut import flowcut
model = flowcut(model, target_num=64)
## training
trainer = LLaVATrainer(model=model,
                tokenizer=tokenizer,
                args=training_args,
                **data_module)

🔑 License

This project is released under the Apache 2.0 license.

📌 Citation

If you find this project useful in your research, please consider citing:

@article{tong2025flowcut,
  title={FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models},
  author={Tong, Jintao and Jin, Wenwei and Qin, Pengda and Li, Anqi and Zou, Yixiong and Li, Yuhong and Li, Yuhua and Li, Ruixuan},
  journal={arXiv preprint arXiv:2505.19536},
  year={2025}
}

👍 Acknowledgment

This work is built upon LLaVA, Qwen VL, and Video-LLaVA. We thank them for their excellent open-source contributions.
We also thank FastV, SparseVLM, VisionZip and others for their contributions, which have provided valuable insights.

FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models

Jintao Tong1, Wenwei Jin2, Pengda Qin2, Anqi Li3, Yixiong Zou1✉, Yuhong Li2✉, Yuhua Li1, Ruixuan Li1 1School of Computer Science and Technology, Huazhong University of Science and Technology 2Xiaohongshu Inc., 3Shanghai Jiao Tong University

Jintao Tong¹, Wenwei Jin², Pengda Qin², Anqi Li³, Yixiong Zou^1✉,
Yuhong Li^2✉, Yuhua Li¹, Ruixuan Li¹

¹School of Computer Science and Technology, Huazhong University of Science and Technology
²Xiaohongshu Inc., ³Shanghai Jiao Tong University