README.md
October 13, 2025 ยท View on GitHub
VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient Large Language Models Inference
Setup:
You can build the environment as follow:
pip install -r requirements.txt
Download Eval Data:
You can download evaluation benchmarks according to Evaluation.md, and put them into playground/data/eval.
Evaluation:
You can evaluate the download benchmarks according to .sh files in scripts folder.
For example, use scripts/mmbench.sh to evaluate mmbench benchmark.
single GPU:
CUDA_VISIBLE_DEVICES=0 bash scripts/mmbench.sh
Multple GPUs:
CUDA_VISIBLE_DEVICES=0,1 bash scripts/mmbench.sh
Adjust compression ratios:
You can modify the hyperparameter pRate in the LlamaModel class within the llava/model/language_model/modeling_llama_visa.py file to balance between model performance and inference speed.