Beyond Token Pruning: Operation Pruning in Vision-Language Models

June 15, 2025 ยท View on GitHub

A tuning-free VLM/MLLM inference acceleration framework that searches to prune operations rather than tokens.


๐Ÿ”ง Installation

conda create -n gsop python=3.10 -y
conda activate gsop

cd lmms-eval
pip install -e .

cd ../LLaVA
pip install -e .

pip install easydict

For additional setup instructions, please refer to:


๐Ÿš€ Usage

Inference

bash scripts/gsop_inference.sh
bash scripts/gsop_search.sh

Some benchmarks (e.g., TextVQA) may produce results that differ from commonly reported metrics when run on lmms-eval. Please follow the evaluation setup detailed in Evaluation.md for those benchmarks.