2. Create a virtual environment
May 25, 2026 Β· View on GitHub
π¬ (AAAI 2026) ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval
1School of Software, Shandong University Β Β Β2School of Computer Science and Technology, Shandong Jianzhu UniversityΒ Β Β
βΒ Corresponding authorΒ Β
Accepted by AAAI 2026: An evidence-driven framework tackling both the π¬ Composed Video Retrieval (CVR) and π Composed Image Retrieval (CIR) tasks.
π Introduction
ReTrack is an advanced open-source PyTorch framework designed to improve multi-modal query understanding by calibrating directional bias in composed features. It achieves state-of-the-art (SOTA) performance across both Composed Video Retrieval (CVR) and Composed Image Retrieval (CIR) benchmarks.
π’ News
- [2026-03-20] π Official paper is released at AAAI 2026.
- [2026-03-19] π Released all training and evaluation codes for ReTrack.
- [2025-11-08] π₯ Our paper "ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval" has been accepted by AAAI 2026!
β¨ Key Features
- π― Dual-Stream Directional Anchor Calibration: Explicitly identifies and calibrates visual and textual semantic contributions to resolve directional bias in multi-modal composition.
- βοΈ Reliable Evidence-Driven Alignment: Leverages Dempster-Shafer Theory to evaluate similarity reliability, greatly reducing uncertainty caused by highly similar retrieval candidates.
- π§© Unified Framework: Built on top of BLIP-2 (via the Salesforce LAVIS library), seamlessly supporting both video (CVR) and image (CIR) retrieval tasks.
- βοΈ Modular & Scalable: Entirely managed by Hydra and Lightning Fabric for flexible configuration, easy hyperparameter overrides, and scalable multi-GPU training.
ποΈ Architecture
πββοΈ Experiment-Results
CVR Task Performance
Table 1. Performance comparison on the test set of the CVR dataset, WebVid-CoVR, relative to R@k(%). The overall best results are in bold, while the best results over baselines are underlined.
CIR Task Performance
Table 2. Performance comparison on the CIR dataset, FashionIQ and CIRR, relative to R@k(%). The overall best results are in bold, while the best results over baselines are underlined.
Table of Contents
- Introduction
- News
- Key Features
- Architecture
- Experiment Results
- Quick Start & Installation
- Repository Structure
- Configuration Overview
- Data Preparation
- Training
- Evaluation/Testing
- Output & Checkpoints
- Acknowledgement
- Contact
- Citation
- Support & Contributing
π Quick Start & Installation
We recommend using Anaconda to manage your environment following CoVR-Project. Note: This project was developed and tested with Python 3.8 and PyTorch 2.1.0.
# 1. Clone the repository
git clone https://github.com/Lee-zixu/ReTrack.git
cd ReTrack
# 2. Create a virtual environment
conda create -n retrack python=3.8 -y
conda activate retrack
# 3. Install PyTorch (Adjust CUDA version based on your hardware)
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# 4. Install other dependencies
pip install -r requirements.txt
π Repository Structure
Our codebase is highly modular. Here is a brief overview of the core files and directories:
ReTrack/
βββ configs/ # βοΈ Hydra configuration files (data, model, trainer, etc.)
βββ src/ # π§ Source code (dataloaders, model implementations, testing)
βββ train_CVR.py # βΆοΈ Training entry point for WebVid-CoVR
βββ train_CIR.py # βΆοΈ Training entry point for FashionIQ & CIRR
βββ test.py # π§ͺ Evaluation entry point
βββ requirements.txt # π¦ Project dependencies
βοΈ Configuration Overview
All hyperparameters and paths are managed by Hydra under the configs/ directory. The key configuration groups are:
configs/data/β Dataset loaders and dataset-specific path definitions.configs/model/β Model architecture, checkpoints, optimizers, schedulers, and loss functions.configs/trainer/β Lightning Fabric training settings (devices, precision, checkpointing).configs/machine/β Hardware/Machine settings (batch size, num workers, default root paths).configs/test/β Evaluation presets across different test splits.
ποΈ Data Preparation
By default, the datasets are expected to be placed under a common root directory (e.g., /root/autodl-tmp/data/).
π‘ Path Configuration: You must adjust these paths for your local setup. There are two recommended ways to do this:
- Edit YAML directly (Preferred): Modify
configs/machine/default.yamlor the specific files inconfigs/data/*.yaml.- Override via CLI: Append
machine.default.datasets_dir=/path/to/datato your run commands.
1. Composed Video Retrieval (CVR)
Dataset: WebVid-CoVR
Expected directory structure (configs/data/webvid-covr.yaml):
datasets_dir/
βββ WebVid-CoVR/
βββ videos/
β βββ 2M/
β βββ 8M/
βββ annotation/
βββ webvid2m-covr_train.csv
βββ webvid8m-covr_val.csv
βββ webvid8m-covr_test.csv
2. Composed Image Retrieval (CIR)
Expected directory structure:
datasets_dir/
βββ FashionIQ/
β βββ captions/
β β βββ cap.dress.[train|val|test].json
β β βββ ...
β βββ image_splits/
β β βββ split.dress.[train|val|test].json
β β βββ ...
β βββ dress/
β βββ shirt/
β βββ toptee/
βββ CIRR/
βββ train/
βββ dev/
βββ test1/
βββ cirr/
βββ captions/
β βββ cap.rc2.[train|val|test1].json
βββ image_splits/
βββ split.rc2.[train|val|test1].json
βΆοΈ Training
You can easily override hyperparameters, datasets, and paths directly from the command line using Hydra syntax.
Train CVR Model (WebVid-CoVR)
python train_CVR.py
Train CIR Model (FashionIQ or CIRR)
python train_CIR.py
β οΈ Before running CIR training, make sure to update the dataset selection in
configs/train_CIR.yaml(dataandtestindefaults) to your target dataset (e.g.fashioniqorcirr).For example:
defaults: - data: fashioniq - test: fashioniqor:
defaults: - data: cirr - test: cirr-all
π§ͺ Evaluation / Testing
To evaluate a trained model, use test.py and specify the target benchmark.
python test.py
(Make sure to specify the dataset and path to your trained checkpoint via the config overrides or by updating the relevant configs/test/*.yaml file).
π Output & Checkpoints
Hydra automatically manages your experiment logs and weights.
- Outputs are systematically written to:
outputs/<dataset>/<model>/<ckpt>/<experiment>/<run_name>/. - Checkpoints are saved inside the run directory as
ckpt_last.ckpt(orckpt_<epoch>.ckptif configured viasave_ckpt=all).
π€ Acknowledgements
This codebase is built upon several great open-source projects. We thank the authors of:
- CoVR and CoVR-2 for the foundational Composed Video Retrieval baselines and datasets.
- LAVIS for providing robust Vision-Language models like BLIP-2.
βοΈ Contact
For any questions, issues, or feedback, please open an issue on GitHub or reach out to me at lizixu.cs@gmail.com.
π Related Projects
Ecosystem & Other Works from our Team
![]() TEMA (ACL'26) Paper | Project | Code |
![]() ConeSep (CVPR'26) Paper | Project | Code | Blog Post (Chinese) |
![]() Air-Know (CVPR'26) Paper | Project | Code | Blog Post (Chinese) |
![]() HABIT (AAAI'26) Paper | Project | Code |
![]() INTENT (AAAI'26) Paper | Project | Code |
![]() HUD (ACM MM'25) Paper | Project | Code |
![]() OFFSET (ACM MM'25) Paper | Project | Code |
![]() ENCODER (AAAI'25) Paper | Project | Code |
πβοΈ Citation
If you find our work or this code useful in your research, please consider leaving a StarβοΈ or Citingπ our paper π₯°. Your support is our greatest motivation!
@inproceedings{ReTrack,
title={ReTrack: Evidence Driven Dual Stream Directional Anchor Calibration Network for Composed Video Retrieval},
author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Huang, Qinlei and Qiu, Guozhi and Fu, Zhiheng and Liu, Meng},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}
π«‘ Support & Contributing
We welcome all forms of contributions! If you have any questions, ideas, or find a bug, please feel free to:
- Open an Issue for discussions or bug reports.
- Submit a Pull Request to improve the codebase.
π License
This project is released under the terms of the LICENSE file included in this repository.







