2. Create a virtual environment

May 25, 2026 · View on GitHub

🎬 (AAAI 2026) ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval

Zixu Li¹, Yupeng Hu^1✉, Zhiwei Chen¹, Qinlei Huang¹, Guozhi Qiu¹, Zhiheng Fu¹, Meng Liu²

¹School of Software, Shandong University
²School of Computer Science and Technology, Shandong Jianzhu University
^✉Corresponding author

Accepted by AAAI 2026: An evidence-driven framework tackling both the 🎬 Composed Video Retrieval (CVR) and 🌁 Composed Image Retrieval (CIR) tasks.

📖 Introduction

ReTrack is an advanced open-source PyTorch framework designed to improve multi-modal query understanding by calibrating directional bias in composed features. It achieves state-of-the-art (SOTA) performance across both Composed Video Retrieval (CVR) and Composed Image Retrieval (CIR) benchmarks.

⬆ Back to top

📢 News

[2026-03-20] 🚀 Official paper is released at AAAI 2026.
[2026-03-19] 🚀 Released all training and evaluation codes for ReTrack.
[2025-11-08] 🔥 Our paper "ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval" has been accepted by AAAI 2026!

⬆ Back to top

✨ Key Features

🎯 Dual-Stream Directional Anchor Calibration: Explicitly identifies and calibrates visual and textual semantic contributions to resolve directional bias in multi-modal composition.
⚖️ Reliable Evidence-Driven Alignment: Leverages Dempster-Shafer Theory to evaluate similarity reliability, greatly reducing uncertainty caused by highly similar retrieval candidates.
🧩 Unified Framework: Built on top of BLIP-2 (via the Salesforce LAVIS library), seamlessly supporting both video (CVR) and image (CIR) retrieval tasks.
⚙️ Modular & Scalable: Entirely managed by Hydra and Lightning Fabric for flexible configuration, easy hyperparameter overrides, and scalable multi-GPU training.

⬆ Back to top

🏗️ Architecture

ReTrack architecture

Figure 1. The proposed ReTrack consists of three key modules: (a) Semantic Contribution Disentanglement, (b) Composition Geometry Calibration, and (c) Reliable Evidence-driven Alignment.

⬆ Back to top

🏃‍♂️ Experiment-Results

CVR Task Performance

Table 1. Performance comparison on the test set of the CVR dataset, WebVid-CoVR, relative to R@k(%). The overall best results are in bold, while the best results over baselines are underlined.

CIR Task Performance

Table 2. Performance comparison on the CIR dataset, FashionIQ and CIRR, relative to R@k(%). The overall best results are in bold, while the best results over baselines are underlined.

⬆ Back to top

Table of Contents

Introduction
News
Key Features
Architecture
Experiment Results
Quick Start & Installation
Repository Structure
Configuration Overview
Data Preparation
Training
Evaluation/Testing
Output & Checkpoints
Acknowledgement
Contact
Citation
Support & Contributing

🚀 Quick Start & Installation

We recommend using Anaconda to manage your environment following CoVR-Project. Note: This project was developed and tested with Python 3.8 and PyTorch 2.1.0.

# 1. Clone the repository
git clone https://github.com/Lee-zixu/ReTrack.git
cd ReTrack

# 2. Create a virtual environment
conda create -n retrack python=3.8 -y
conda activate retrack

# 3. Install PyTorch (Adjust CUDA version based on your hardware)
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

# 4. Install other dependencies
pip install -r requirements.txt

⬆ Back to top

📂 Repository Structure

Our codebase is highly modular. Here is a brief overview of the core files and directories:

ReTrack/
├── configs/               # ⚙️ Hydra configuration files (data, model, trainer, etc.)
├── src/                   # 🧠 Source code (dataloaders, model implementations, testing)
├── train_CVR.py           # ▶️ Training entry point for WebVid-CoVR
├── train_CIR.py           # ▶️ Training entry point for FashionIQ & CIRR
├── test.py                # 🧪 Evaluation entry point
└── requirements.txt       # 📦 Project dependencies

⬆ Back to top

⚙️ Configuration Overview

All hyperparameters and paths are managed by Hydra under the configs/ directory. The key configuration groups are:

configs/data/ — Dataset loaders and dataset-specific path definitions.
configs/model/ — Model architecture, checkpoints, optimizers, schedulers, and loss functions.
configs/trainer/ — Lightning Fabric training settings (devices, precision, checkpointing).
configs/machine/ — Hardware/Machine settings (batch size, num workers, default root paths).
configs/test/ — Evaluation presets across different test splits.

⬆ Back to top

🗃️ Data Preparation

By default, the datasets are expected to be placed under a common root directory (e.g., /root/autodl-tmp/data/).

💡 Path Configuration: You must adjust these paths for your local setup. There are two recommended ways to do this:

Edit YAML directly (Preferred): Modify configs/machine/default.yaml or the specific files in configs/data/*.yaml.

Override via CLI: Append machine.default.datasets_dir=/path/to/data to your run commands.

1. Composed Video Retrieval (CVR)

Dataset: WebVid-CoVR

Expected directory structure (configs/data/webvid-covr.yaml):

datasets_dir/
└── WebVid-CoVR/
    ├── videos/
    │   ├── 2M/
    │   └── 8M/
    └── annotation/
        ├── webvid2m-covr_train.csv
        ├── webvid8m-covr_val.csv
        └── webvid8m-covr_test.csv

2. Composed Image Retrieval (CIR)

Datasets: FashionIQ and CIRR

Expected directory structure:

datasets_dir/
├── FashionIQ/
│   ├── captions/
│   │   ├── cap.dress.[train|val|test].json
│   │   └── ...
│   ├── image_splits/
│   │   ├── split.dress.[train|val|test].json
│   │   └── ...
│   ├── dress/
│   ├── shirt/
│   └── toptee/
└── CIRR/
    ├── train/
    ├── dev/
    ├── test1/
    └── cirr/
        ├── captions/
        │   └── cap.rc2.[train|val|test1].json
        └── image_splits/
            └── split.rc2.[train|val|test1].json

⬆ Back to top

▶️ Training

You can easily override hyperparameters, datasets, and paths directly from the command line using Hydra syntax.

Train CVR Model (WebVid-CoVR)

python train_CVR.py

Train CIR Model (FashionIQ or CIRR)

python train_CIR.py

⚠️ Before running CIR training, make sure to update the dataset selection in configs/train_CIR.yaml (data and test in defaults) to your target dataset (e.g. fashioniq or cirr).

For example:
defaults:
  - data: fashioniq
  - test: fashioniq
or:
defaults:
  - data: cirr
  - test: cirr-all

⬆ Back to top

🧪 Evaluation / Testing

To evaluate a trained model, use test.py and specify the target benchmark.

python test.py

(Make sure to specify the dataset and path to your trained checkpoint via the config overrides or by updating the relevant configs/test/*.yaml file).

⬆ Back to top

📌 Output & Checkpoints

Hydra automatically manages your experiment logs and weights.

Outputs are systematically written to: outputs/<dataset>/<model>/<ckpt>/<experiment>/<run_name>/.
Checkpoints are saved inside the run directory as ckpt_last.ckpt (or ckpt_<epoch>.ckpt if configured via save_ckpt=all).

⬆ Back to top

🤝 Acknowledgements

This codebase is built upon several great open-source projects. We thank the authors of:

CoVR and CoVR-2 for the foundational Composed Video Retrieval baselines and datasets.
LAVIS for providing robust Vision-Language models like BLIP-2.

⬆ Back to top

✉️ Contact

For any questions, issues, or feedback, please open an issue on GitHub or reach out to me at lizixu.cs@gmail.com.

⬆ Back to top

Ecosystem & Other Works from our Team

TEMA (ACL'26) Paper \| Project \| Code	ConeSep (CVPR'26) Paper \| Project \| Code \| Blog Post (Chinese)	Air-Know (CVPR'26) Paper \| Project \| Code \| Blog Post (Chinese)
HABIT (AAAI'26) Paper \| Project \| Code	INTENT (AAAI'26) Paper \| Project \| Code	HUD (ACM MM'25) Paper \| Project \| Code
OFFSET (ACM MM'25) Paper \| Project \| Code	ENCODER (AAAI'25) Paper \| Project \| Code

📝⭐️ Citation

If you find our work or this code useful in your research, please consider leaving a Star⭐️ or Citing📝 our paper 🥰. Your support is our greatest motivation!

@inproceedings{ReTrack,
  title={ReTrack: Evidence Driven Dual Stream Directional Anchor Calibration Network for Composed Video Retrieval},
  author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Huang, Qinlei and Qiu, Guozhi and Fu, Zhiheng and Liu, Meng},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026}
}

⬆ Back to top

🫡 Support & Contributing

We welcome all forms of contributions! If you have any questions, ideas, or find a bug, please feel free to:

Open an Issue for discussions or bug reports.
Submit a Pull Request to improve the codebase.

⬆ Back to top

📄 License

This project is released under the terms of the LICENSE file included in this repository.

ReTrack Demo