README.md

June 18, 2026 ยท View on GitHub

RAGTrack: Language-aware RGBT Tracking
with Retrieval-Augmented Generation

CVPR 2026

Hao Li, Yuhao Wang, Wenning Hao๐Ÿ“ง, Pingping Zhang๐Ÿ“ง, Dong Wang, Huchuan Lu

๐Ÿ“„ Paper ย |ย  ๐Ÿ’ป Code ย |ย  ๐Ÿค– Models ย |ย  ๐Ÿ“Š Results ย |ย  ๐Ÿ“ˆ Benchmark


๐Ÿ“ Abstract

This repository contains the official implementation of RAGTrack, the first language-aware RGBT tracking framework powered by Retrieval-Augmented Generation (RAG). We construct four RGB-T-L benchmarks by introducing textual descriptions into GTOT, RGBT210, RGBT234, and LasHeR via MLLM-based annotation pipelines. We propose a novel framework consisting of a Multi-modal Transformer Encoder (MTE), Adaptive Token Fusion (ATF), and Context-aware Reasoning Module (CRM). Included are training/evaluation codes, models, and results.


๐Ÿ”ฅ Motivation

RAGTrack Motivation
Figure 1. (a) Existing RGBT trackers suffer from inadequate appearance modeling, search redundancy, and modality gap. (b) Our RAGTrack introduces linguistic reasoning, dynamic token selection, and adaptive channel exchange for robust tracking.


๐Ÿ—๏ธ Framework

RAGTrack Pipeline
Figure 2. Overall framework of RAGTrack. MTE performs unified visual-language modeling, ATF dynamically selects target-relevant tokens and enables adaptive channel exchange, and CRM retrieves relevant contexts for context-aware reasoning.


๐Ÿ”ฌ Adaptive Token Fusion (ATF)

Adaptive Token Fusion
Figure 3. Details of ATF. Dynamic token selection leverages text-guided attention scores to retain target-relevant tokens, while adaptive channel exchange bridges heterogeneous modality gaps.


โš™๏ธ Installation

1. Clone the repository and create the conda environment:

git clone https://github.com/IdolLab/RAGTrack.git
cd RAGTrack
conda create -n RAGTrack python=3.10
conda activate RAGTrack

2. Download auxiliary models:

3. Install dependencies:

pip install -r requirements.txt

๐Ÿ“ Data Preparation

Download the following datasets and place them under ./data/:

The expected directory structure is as follows:

RAGTrack/
โ””โ”€โ”€ data/
    โ”œโ”€โ”€ GTOT/
    โ”‚   โ”œโ”€โ”€ BlackCar/
    โ”‚   โ”‚   โ”œโ”€โ”€ i/
    โ”‚   โ”‚   โ”œโ”€โ”€ v/
    โ”‚   โ”‚   โ”œโ”€โ”€ groundTruth_i.txt
    โ”‚   โ”‚   โ”œโ”€โ”€ groundTruth_v.txt
    โ”‚   โ”‚   โ”œโ”€โ”€ visible_description.txt
    โ”‚   โ”‚   โ””โ”€โ”€ class.txt
    โ”‚   โ””โ”€โ”€ ...
    โ”œโ”€โ”€ RGBT210/
    โ”‚   โ”œโ”€โ”€ afterrain/
    โ”‚   โ”‚   โ”œโ”€โ”€ infrared/
    โ”‚   โ”‚   โ”œโ”€โ”€ visible/
    โ”‚   โ”‚   โ”œโ”€โ”€ init.txt
    โ”‚   โ”‚   โ”œโ”€โ”€ visible_description.txt
    โ”‚   โ”‚   โ””โ”€โ”€ class.txt
    โ”‚   โ””โ”€โ”€ ...
    โ”œโ”€โ”€ RGBT234/
    โ”‚   โ””โ”€โ”€ ...  (same structure as RGBT210)
    โ””โ”€โ”€ LasHeR/
        โ”œโ”€โ”€ train/
        โ”‚   โ”œโ”€โ”€ 1boygo/
        โ”‚   โ”‚   โ”œโ”€โ”€ infrared/
        โ”‚   โ”‚   โ”œโ”€โ”€ visible/
        โ”‚   โ”‚   โ”œโ”€โ”€ init.txt
        โ”‚   โ”‚   โ””โ”€โ”€ visible_description.txt
        โ”‚   โ””โ”€โ”€ ...
        โ””โ”€โ”€ test/
            โ”œโ”€โ”€ 1blackteacher/
            โ”‚   โ”œโ”€โ”€ infrared/
            โ”‚   โ”œโ”€โ”€ visible/
            โ”‚   โ”œโ”€โ”€ init.txt
            โ”‚   โ”œโ”€โ”€ visible_description.txt
            โ”‚   โ””โ”€โ”€ class.txt
            โ””โ”€โ”€ ...

๐Ÿ”ง Setup & Configuration

Run the following command to initialize local paths:

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output

Alternatively, manually edit the path configs:

./lib/train/admin/local.py      # paths for training
./lib/test/evaluation/local.py  # paths for testing

๐Ÿ‹๏ธ Training

1. Download the pretrained backbone:

Download the pretrained model from Baidu Drive (pwd: 3ure) or Google Drive and place it under ./pretrained/.

2. Launch training:

bash train.sh

๐Ÿ’ก You can switch between different model variants by modifying the arguments in train.sh.


๐Ÿš€ Testing

Benchmark Evaluation

1. Download the model:

Download the model from Baidu Drive (pwd: 3ure) or Google Drive and place it under ./output/.

2. Launch testing:

Modify <DATASET_PATH> and <SAVE_PATH> in ./RGBT_workspace/test_rgbt_mgpus.py, then run:

bash test.sh

Evaluation Toolkit

For GTOT / RGBT210 / RGBT234 / LasHeR, please use the official Evaluation Toolkit.


๐Ÿ—‚๏ธ Benchmark Construction

We construct four RGB-T-L benchmarks by introducing high-quality textual descriptions into GTOT, RGBT210, RGBT234, and LasHeR. The annotation pipeline consists of three stages:

1. Initial Description Generation

Download Qwen2.5-VL (pwd: 8swe) and generate initial descriptions:

cd Qwen2.5-VL
python generate_description.py

2. Automatic Refinement

Download Qwen3 (pwd: nqqc) and correct the generated descriptions:

cd Qwen3
python correct_description.py

3. Human Review & Verification

We open-source an annotation review tool to facilitate efficient human verification:

๐Ÿ’ก The final annotations are reviewed by our annotation team to correct remaining hallucinations, grammatical errors, and mixed-language content, ensuring high-quality semantic labels.

Annotation Statistics

DatasetSplitSequencesDescriptions
LasHeRTrain979514,081
LasHeR / GTOT / RGBT210 / RGBT234Test739739

Annotation Team

We sincerely thank our annotation team for their dedicated efforts:

Lixin Wang
๏ผˆ็Ž‹ๅˆฉ้‘ซ๏ผ‰
Lihong Huang
๏ผˆ้ป„็ซ‹ๅฎ๏ผ‰
Zhongtian Li
๏ผˆๆŽไธญๅคฉ๏ผ‰
Zijing Gong
๏ผˆๅทฉๅญ้–๏ผ‰
Yilin Zhang
๏ผˆๅผ ็ฟผ้บŸ๏ผ‰
Yangyang Liu
๏ผˆๅˆ˜ๆจๆจ๏ผ‰
Yuhang Zhang
๏ผˆๅผ ๅฎ‡่ˆช๏ผ‰
Chaoyuan Liu
๏ผˆๅˆ˜่ถ…่ฟœ๏ผ‰
Jiani Qiu
๏ผˆ้‚ฑๅ˜‰ๅฆฎ๏ผ‰
Qing'en Zhu
๏ผˆ็ฅๅบ†ๆฉ๏ผ‰
Junji Wang
๏ผˆๆฑชไฟŠไฝถ๏ผ‰
Xi Zhang
๏ผˆๅผ ็†™๏ผ‰
Ningxin Hu
๏ผˆ่ƒกๅฎ้ฆจ๏ผ‰
Ruoshui Qu
๏ผˆๆ›ฒ่‹ฅๆฐด๏ผ‰
Huiyu Luo
๏ผˆ็ฝ—็ฒๅฎ‡๏ผ‰
Jian Shi
๏ผˆๅฒๅฅ๏ผ‰
Yue Xiong
๏ผˆ็†Šๆ‚ฆ๏ผ‰
Shuyan Tian
๏ผˆ็”ฐไนฆ้ขœ๏ผ‰
Xuanyu Zhang
๏ผˆๅผ ๆš„้›จ๏ผ‰
Enhui Wang
๏ผˆ็Ž‹ๆฉๆƒ ๏ผ‰
Qiwei Yang
๏ผˆๆจ้ช็Žฎ๏ผ‰
Kuanxin Shen
๏ผˆๆฒˆๅฎฝๅฟƒ๏ผ‰
Yakun Huo
๏ผˆ้œไบšๅค๏ผ‰
Haojing Zhou
๏ผˆๅ‘จ็š“้–๏ผ‰
Deyu Hong
๏ผˆๆดชๅพทๅฎ‡๏ผ‰
Zi Wang
๏ผˆ็Ž‹ๅญ๏ผ‰
Xiaowen Wu
๏ผˆๅดๆ™“ๆ–‡๏ผ‰
Longquan Shang
๏ผˆๅฐš้พ™ๆณ‰๏ผ‰
Tao He
๏ผˆไฝ•ๆถ›๏ผ‰
Jinxu Zhao
๏ผˆ่ตต้‡‘ๆ—ญ๏ผ‰
Yongfeng Lv
๏ผˆๅ•ๆณณ้”‹๏ผ‰
Weicheng Yi
๏ผˆๆ˜“้Ÿฆไธž๏ผ‰
Bowen Liu
๏ผˆๅˆ˜ๅšๆ–‡๏ผ‰
Xingyu Huang
๏ผˆ้ป„ๆ˜Ÿๅฎ‡๏ผ‰
Minghe Chen
๏ผˆ้™ˆๆ˜Ž่ตซ๏ผ‰
Zixin Wu
๏ผˆๅดๅญ้‘ซ๏ผ‰
Jiaqi Long
๏ผˆ้พ™ไฝณ็ช๏ผ‰
Sijia Cui
๏ผˆๅด”ๆ€ไฝณ๏ผ‰
Liyong Liu
๏ผˆๅˆ˜็คผๅ‹‡๏ผ‰

๐Ÿ–ผ๏ธ Poster

CVPR 2026 Poster


๐Ÿ™ Acknowledgements

This repo is based on DUTrack and MUST. We sincerely thank the authors for their excellent works.


๐Ÿ“š Citation

If you find RAGTrack is helpful for your research, please consider citing:

@inproceedings{li2026ragtrack,
  title={RAGTrack: Language-aware RGBT Tracking with Retrieval-Augmented Generation},
  author={Li, Hao and Wang, Yuhao and Hao, Wenning and Zhang, Pingping and Wang, Dong and Lu, Huchuan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={28179--28189},
  year={2026}
}

Star โญ this repo if you like our work!