VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM

March 10, 2026 Β· View on GitHub

hf_paper arXiv Python PyTorch Transformers

πŸš€ Quick Start

Installation

conda create -n vptrack python==3.10
conda activate vptrack

cd ms-swift
conda install -c conda-forge pyarrow sentencepiece
pip install -e .
pip install "sglang[all]" -U
pip install "vllm>=0.5.1" "transformers<4.55" "trl<0.21" -U
pip install "lmdeploy>=0.5" -U
pip install autoawq -U --no-deps
pip install auto_gptq optimum bitsandbytes "gradio<5.33" -U
pip install git+https://github.com/modelscope/ms-swift.git
pip install timm -U
pip install "deepspeed" -U
pip install flash-attn==2.7.4.post1 --no-build-isolation

conda install av -c conda-forge
pip install qwen_vl_utils qwen_omni_utils decord librosa icecream soundfile -U
pip install liger_kernel nvitop pre-commit math_verify py-spy -U

Data Preparation

Datasets: TNL2K, TNLLT

|-- data
β”‚   β”œβ”€β”€ tnl2k
β”‚   β”‚   β”œβ”€β”€test
β”‚   β”‚   |   β”œβ”€β”€advSamp_Baseball_game_002-Done
β”‚   β”‚   |   └──...
β”‚   β”‚   └──train
β”‚   β”‚       β”œβ”€β”€Arrow_Video_ZZ04_done
β”‚   β”‚       └──...
β”‚   └── tnllt
β”‚       β”œβ”€β”€JE_Assian_ship_v01
β”‚       └──...

Data PreParation

bash data_preparation.sh

Model Training

bash train.sh

Model Testing

bash infer.sh

πŸ“¦ Checkpoints

You can download it from HuggingFace: VPTracker

πŸ‘€ Visualization

πŸ™ Acknowledgments

This code is developed on the top of ms-swift

βœ‰οΈ Contact

Email: jcwang@stu.ecnu.edu.cn. Any kind discussions are welcomed!


πŸ“– Citation

If our work is useful for your research, please consider cite:

@misc{wang2025vptrackerglobalvisionlanguagetracking,
      title={VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM}, 
      author={Jingchao Wang and Kaiwen Zhou and Zhijian Wu and Kunhua Ji and Dingjiang Huang and Yefeng Zheng},
      year={2025},
      eprint={2512.22799},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.22799}, 
}