README.md
June 18, 2026 ยท View on GitHub
RAGTrack: Language-aware RGBT Tracking
with Retrieval-Augmented Generation
Hao Li, Yuhao Wang, Wenning Hao๐ง, Pingping Zhang๐ง, Dong Wang, Huchuan Lu
๐ Paper ย |ย ๐ป Code ย |ย ๐ค Models ย |ย ๐ Results ย |ย ๐ Benchmark
๐ Abstract
This repository contains the official implementation of RAGTrack, the first language-aware RGBT tracking framework powered by Retrieval-Augmented Generation (RAG). We construct four RGB-T-L benchmarks by introducing textual descriptions into GTOT, RGBT210, RGBT234, and LasHeR via MLLM-based annotation pipelines. We propose a novel framework consisting of a Multi-modal Transformer Encoder (MTE), Adaptive Token Fusion (ATF), and Context-aware Reasoning Module (CRM). Included are training/evaluation codes, models, and results.
๐ฅ Motivation
Figure 1. (a) Existing RGBT trackers suffer from inadequate appearance modeling, search redundancy, and modality gap.
(b) Our RAGTrack introduces linguistic reasoning, dynamic token selection, and adaptive channel exchange for robust tracking.
๐๏ธ Framework
Figure 2. Overall framework of RAGTrack. MTE performs unified visual-language modeling, ATF dynamically selects target-relevant tokens and enables adaptive channel exchange, and CRM retrieves relevant contexts for context-aware reasoning.
๐ฌ Adaptive Token Fusion (ATF)
Figure 3. Details of ATF. Dynamic token selection leverages text-guided attention scores to retain target-relevant tokens, while adaptive channel exchange bridges heterogeneous modality gaps.
โ๏ธ Installation
1. Clone the repository and create the conda environment:
git clone https://github.com/IdolLab/RAGTrack.git
cd RAGTrack
conda create -n RAGTrack python=3.10
conda activate RAGTrack
2. Download auxiliary models:
- CLIP: Baidu Drive (pwd:
tea6) / Google Drive - Qwen-VL: Baidu Drive (pwd:
61h4) / Google Drive
3. Install dependencies:
pip install -r requirements.txt
๐ Data Preparation
Download the following datasets and place them under ./data/:
- RGB-T images: GTOT, RGBT210, RGBT234, LasHeR
- Textual annotations: Baidu Drive (pwd:
nayy) / Google Drive
The expected directory structure is as follows:
RAGTrack/
โโโ data/
โโโ GTOT/
โ โโโ BlackCar/
โ โ โโโ i/
โ โ โโโ v/
โ โ โโโ groundTruth_i.txt
โ โ โโโ groundTruth_v.txt
โ โ โโโ visible_description.txt
โ โ โโโ class.txt
โ โโโ ...
โโโ RGBT210/
โ โโโ afterrain/
โ โ โโโ infrared/
โ โ โโโ visible/
โ โ โโโ init.txt
โ โ โโโ visible_description.txt
โ โ โโโ class.txt
โ โโโ ...
โโโ RGBT234/
โ โโโ ... (same structure as RGBT210)
โโโ LasHeR/
โโโ train/
โ โโโ 1boygo/
โ โ โโโ infrared/
โ โ โโโ visible/
โ โ โโโ init.txt
โ โ โโโ visible_description.txt
โ โโโ ...
โโโ test/
โโโ 1blackteacher/
โ โโโ infrared/
โ โโโ visible/
โ โโโ init.txt
โ โโโ visible_description.txt
โ โโโ class.txt
โโโ ...
๐ง Setup & Configuration
Run the following command to initialize local paths:
python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output
Alternatively, manually edit the path configs:
./lib/train/admin/local.py # paths for training
./lib/test/evaluation/local.py # paths for testing
๐๏ธ Training
1. Download the pretrained backbone:
Download the pretrained model from Baidu Drive (pwd: 3ure) or Google Drive and place it under ./pretrained/.
2. Launch training:
bash train.sh
๐ก You can switch between different model variants by modifying the arguments in
train.sh.
๐ Testing
Benchmark Evaluation
1. Download the model:
Download the model from Baidu Drive (pwd: 3ure) or Google Drive and place it under ./output/.
2. Launch testing:
Modify <DATASET_PATH> and <SAVE_PATH> in ./RGBT_workspace/test_rgbt_mgpus.py, then run:
bash test.sh
Evaluation Toolkit
For GTOT / RGBT210 / RGBT234 / LasHeR, please use the official Evaluation Toolkit.
๐๏ธ Benchmark Construction
We construct four RGB-T-L benchmarks by introducing high-quality textual descriptions into GTOT, RGBT210, RGBT234, and LasHeR. The annotation pipeline consists of three stages:
1. Initial Description Generation
Download Qwen2.5-VL (pwd: 8swe) and generate initial descriptions:
cd Qwen2.5-VL
python generate_description.py
2. Automatic Refinement
Download Qwen3 (pwd: nqqc) and correct the generated descriptions:
cd Qwen3
python correct_description.py
3. Human Review & Verification
We open-source an annotation review tool to facilitate efficient human verification:
- ๐ AnnotationCheck
๐ก The final annotations are reviewed by our annotation team to correct remaining hallucinations, grammatical errors, and mixed-language content, ensuring high-quality semantic labels.
Annotation Statistics
| Dataset | Split | Sequences | Descriptions |
|---|---|---|---|
| LasHeR | Train | 979 | 514,081 |
| LasHeR / GTOT / RGBT210 / RGBT234 | Test | 739 | 739 |
Annotation Team
We sincerely thank our annotation team for their dedicated efforts:
| Lixin Wang ๏ผ็ๅฉ้ซ๏ผ | Lihong Huang ๏ผ้ป็ซๅฎ๏ผ | Zhongtian Li ๏ผๆไธญๅคฉ๏ผ | Zijing Gong ๏ผๅทฉๅญ้๏ผ | Yilin Zhang ๏ผๅผ ็ฟผ้บ๏ผ |
| Yangyang Liu ๏ผๅๆจๆจ๏ผ | Yuhang Zhang ๏ผๅผ ๅฎ่ช๏ผ | Chaoyuan Liu ๏ผๅ่ถ ่ฟ๏ผ | Jiani Qiu ๏ผ้ฑๅๅฆฎ๏ผ | Qing'en Zhu ๏ผ็ฅๅบๆฉ๏ผ |
| Junji Wang ๏ผๆฑชไฟไฝถ๏ผ | Xi Zhang ๏ผๅผ ็๏ผ | Ningxin Hu ๏ผ่กๅฎ้ฆจ๏ผ | Ruoshui Qu ๏ผๆฒ่ฅๆฐด๏ผ | Huiyu Luo ๏ผ็ฝ็ฒๅฎ๏ผ |
| Jian Shi ๏ผๅฒๅฅ๏ผ | Yue Xiong ๏ผ็ๆฆ๏ผ | Shuyan Tian ๏ผ็ฐไนฆ้ข๏ผ | Xuanyu Zhang ๏ผๅผ ๆ้จ๏ผ | Enhui Wang ๏ผ็ๆฉๆ ๏ผ |
| Qiwei Yang ๏ผๆจ้ช็ฎ๏ผ | Kuanxin Shen ๏ผๆฒๅฎฝๅฟ๏ผ | Yakun Huo ๏ผ้ไบๅค๏ผ | Haojing Zhou ๏ผๅจ็้๏ผ | Deyu Hong ๏ผๆดชๅพทๅฎ๏ผ |
| Zi Wang ๏ผ็ๅญ๏ผ | Xiaowen Wu ๏ผๅดๆๆ๏ผ | Longquan Shang ๏ผๅฐ้พๆณ๏ผ | Tao He ๏ผไฝๆถ๏ผ | Jinxu Zhao ๏ผ่ตต้ๆญ๏ผ |
| Yongfeng Lv ๏ผๅๆณณ้๏ผ | Weicheng Yi ๏ผๆ้ฆไธ๏ผ | Bowen Liu ๏ผๅๅๆ๏ผ | Xingyu Huang ๏ผ้ปๆๅฎ๏ผ | Minghe Chen ๏ผ้ๆ่ตซ๏ผ |
| Zixin Wu ๏ผๅดๅญ้ซ๏ผ | Jiaqi Long ๏ผ้พไฝณ็ช๏ผ | Sijia Cui ๏ผๅดๆไฝณ๏ผ | Liyong Liu ๏ผๅ็คผๅ๏ผ |
๐ผ๏ธ Poster
๐ Acknowledgements
This repo is based on DUTrack and MUST. We sincerely thank the authors for their excellent works.
๐ Citation
If you find RAGTrack is helpful for your research, please consider citing:
@inproceedings{li2026ragtrack,
title={RAGTrack: Language-aware RGBT Tracking with Retrieval-Augmented Generation},
author={Li, Hao and Wang, Yuhao and Hao, Wenning and Zhang, Pingping and Wang, Dong and Lu, Huchuan},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages={28179--28189},
year={2026}
}
Star โญ this repo if you like our work!