README.md
October 28, 2025 ยท View on GitHub
The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs" [Paper]
The implementation changes of LLaVA-SP are in llava_arch.py, clip_encoder.py, llava_trainer.py and train.py.
Install
Please see instructions for https://github.com/haotian-liu/LLaVA/
LLaVA-SP Weights
Please check out https://huggingface.co/Levideus/models for all public LLaVA-SP checkpoints.
Quick Start
python llava/eval/run_llava.py
--model_path /path/llava-sp-cropping-lora
--model_base /path/vicuna-1.5-7b
Citation
If you find LLaVA-SP useful for your research and applications, please cite using this BibTeX:
@InProceedings{Lou_2025_ICCV,
author = {Lou, Haoran and Fan, Chunxiao and Liu, Ziyan and Wu, Yuexin and Wang, Xinliang},
title = {LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {22014-22024}
}