Beyond Token Pruning: Operation Pruning in Vision-Language Models

June 15, 2025 · View on GitHub

A tuning-free VLM/MLLM inference acceleration framework that searches to prune operations rather than tokens.

🔧 Installation

conda create -n gsop python=3.10 -y
conda activate gsop

cd lmms-eval
pip install -e .

cd ../LLaVA
pip install -e .

pip install easydict

For additional setup instructions, please refer to:

LLaVA Repository
lmms-eval Repository

🚀 Usage

Inference

bash scripts/gsop_inference.sh

Search

bash scripts/gsop_search.sh

Some benchmarks (e.g., TextVQA) may produce results that differ from commonly reported metrics when run on lmms-eval. Please follow the evaluation setup detailed in Evaluation.md for those benchmarks.