CLDTracker
August 17, 2025 ยท View on GitHub
The official implementation for the CLDTracker: A Comprehensive Language Description for Visual Tracking.
Accepted at Information Fusion [Paper] [arXiv]
[Models][Raw Results][Comprehensive Bag of Textual Descriptions]
Highlights
:star2: New Comprehensive Language Description for Visual Tracking Tracking Framework
CLDTracker is a simple, high-performance Vision-Language (VL) tracker that leverages comprehensive bag of textual descriptions to robustly track. CLDTracker achieves SOTA performance on multiple benchmarks. CLDTracker and the comprehensive bag of textual descriptions can serve as a strong source for further research.
| Tracker | LaSOT (AUC) | LaSOT EXT (AUC) | TrackingNet (AUC) | TNL2K (AUC) | OTB99-Lang (AUC) | GOT-10K (AO) |
|---|---|---|---|---|---|---|
| CLDTracker | 74.0 | 53.1 | 85.1 | 61.5 | 77.8 | 77.5 |
Install the environment
Option1: Use the Anaconda (CUDA 11.3)
conda create -n cldtrack python=3.8.5 -y
conda activate cldtrack
bash install_env.sh
Option2: Use the Anaconda (CUDA 11.3)
conda env create -f cldtrack_env.yaml
Set project paths
Run the following command to set paths for this project
python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output
After running this command, you can also modify paths by editing these two files
lib/train/admin/local.py # paths about training
lib/test/evaluation/local.py # paths about testing
Data Preparation
Put the tracking datasets in ./data. It should look like this:
${PROJECT_ROOT}
-- data
-- lasot
|-- airplane
|-- basketball
|-- bear
...
-- got10k
|-- test
|-- train
|-- val
-- coco
|-- annotations
|-- images
-- trackingnet
|-- TRAIN_0
|-- TRAIN_1
...
|-- TRAIN_11
|-- TEST
-- TNL2K
|-- TRAIN
|-- TEST
...
-- LaSOText
|-- atv
|-- badminton
...
-- OTB99-Lang
|-- Basketball
|-- Biker
|-- Bird1
...
Dataset Download Links:
Training
Download pre-trained MAE ViT-Base weights and put it under $PROJECT_ROOT$/pretrained_models (different pretrained models can also be used, see MAE for more details).
python tracking/train.py --script cldtrack --config vitb_384_mae_ce_32x4_ep300 --save_dir ./output --mode single --nproc_per_node 1 --use_wandb 0
# or
python tracking/train.py --script cldtrack --config vitb_384_mae_ce_32x4_ep300 --save_dir ./output --mode multiple --nproc_per_node 4 --use_wandb 0
Replace --config with the desired model config under experiments/cldtrack. We use wandb to record detailed training logs, in case you don't want to use wandb, set --use_wandb 0.
Evaluation
Download the model weights from Google Drive
Put the downloaded weights on $PROJECT_ROOT$/output/checkpoints/train/cldtrack
Change the corresponding values of lib/test/evaluation/local.py to the actual benchmark saving paths
Some testing examples:
- LaSOT or other off-line evaluated benchmarks (modify
--datasetcorrespondingly)
# LaSOT
python tracking/test.py cldtrack vitb_384_mae_ce_32x4_ep300 --dataset lasot --threads 1 --num_gpus 1
python tracking/analysis_results.py # need to modify tracker configs and names
# LaSOText
python tracking/test.py cldtrack vitb_384_mae_ce_32x4_ep300 --dataset lasotextensionsubset --threads 1 --num_gpus 1
python tracking/analysis_results.py # need to modify tracker configs and names
# OTB99-Lang
python tracking/test.py cldtrack vitb_384_mae_ce_32x4_ep300 --dataset otb --threads 1 --num_gpus 1
python tracking/analysis_results.py # need to modify tracker configs and names
# TNL2K
python tracking/test.py cldtrack vitb_384_mae_ce_32x4_ep300 --dataset tnl2k --threads 1 --num_gpus 1
python tracking/analysis_results.py # need to modify tracker configs and names
- GOT10K-test
python tracking/test.py cldtrack vitb_384_mae_ce_32x4_got10k_ep100 --dataset got10k_test --threads 1 --num_gpus 1
python lib/test/utils/transform_got10k.py --tracker_name cldtrack --cfg_name vitb_384_mae_ce_32x4_got10k_ep100
- TrackingNet
python tracking/test.py cldtrack vitb_384_mae_ce_32x4_ep300 --dataset trackingnet --threads 1 --num_gpus 1
python lib/test/utils/transform_trackingnet.py --tracker_name cldtrack --cfg_name vitb_384_mae_ce_32x4_ep300
Visualization or Debug
Visdom is used for visualization.
-
Alive visdom in the server by running
visdom: -
Simply type visdom in a terminal and in another terminal set
--debug 1during inference for visualization, e.g.:
# termminal 1
visdom
# terminal 2
python tracking/test.py cldtrack vitb_384_mae_ce_32x4_ep300 --dataset lasot --threads 1 --num_gpus 1 --debug 1
-
Open
http://localhost:8097in your browser (remember to change the IP address and port according to the actual situation). -
Then you can visualize the results.

Test FLOPs, and Speed
Note: The speeds reported in our paper were tested on a single GeForce RTX 3080 GPU.
# Profiling vitb_384_mae_ce_32x4_ep300
python tracking/profile_model.py --script cldtrack --config vitb_384_mae_ce_32x4_ep300
Acknowledgments
- Thanks for the OSTrack, CiteTracker, STARK, and PyTracking libraries, which helps us to quickly implement our ideas.
- Also we would like to thank CoCoOp, WaffleCLIP, and CLIP Adapter.
- We use the implementation of the ViT from the Timm repo.
- We would like to thank the abovementioned works for their valuable contributions to the field and for sharing their work with the community. Their ideas and code have been instrumental in the development of this project and we are grateful for the opportunity to build upon their work.
Citation
If our work is useful for your research, please consider citing:
@article{alansari2025cldtracker,
title = {CLDTracker: A Comprehensive Language Description for visual Tracking},
journal = {Information Fusion},
volume = {124},
pages = {103374},
year = {2025},
issn = {1566-2535},
doi = {https://doi.org/10.1016/j.inffus.2025.103374},
url = {https://www.sciencedirect.com/science/article/pii/S1566253525004476},
author = {Mohamad Alansari and Sajid Javed and Iyyakutti Iyappan Ganapathi and Sara Alansari and Muzammal Naseer},
}
@article{alansari2025cldtracker,
title={CLDTracker: A Comprehensive Language Description for Visual Tracking},
author={Alansari, Mohamad and Javed, Sajid and Ganapathi, Iyyakutti Iyappan and Alansari, Sara and Naseer, Muzammal},
journal={arXiv preprint arXiv:2505.23704},
year={2025}
}