README.md

October 13, 2025 · View on GitHub

VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient Large Language Models Inference

Setup:

You can build the environment as follow:

pip install -r requirements.txt

Download Eval Data:

You can download evaluation benchmarks according to Evaluation.md, and put them into playground/data/eval.

Evaluation:

You can evaluate the download benchmarks according to .sh files in scripts folder. For example, use scripts/mmbench.sh to evaluate mmbench benchmark.
single GPU:

CUDA_VISIBLE_DEVICES=0 bash scripts/mmbench.sh

Multple GPUs:

CUDA_VISIBLE_DEVICES=0,1 bash scripts/mmbench.sh

Adjust compression ratios:

You can modify the hyperparameter pRate in the LlamaModel class within the llava/model/language_model/modeling_llama_visa.py file to balance between model performance and inference speed.