TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection (AAAI 2024 Paper)

February 22, 2025 · View on GitHub

by Hao Sun^{* 1}, Mingyao Zhou^{* 1}, Wenjing Chen^†2, Wei Xie^†1

¹ Central China Normal University, ² Hubei University of Technology, ^* Equal Contribution, ^† Corresponding authors.

Prerequisites

0. Clone this repository

git clone https://github.com/mingyao1120/TR-DETR.git
cd tr_detr

1. Prepare datasets

If any dataset link becomes invalid, you can refer to Hugging Face for alternative resources.

QVHighlights

Download the official feature files for the QVHighlights dataset from Moment-DETR.

Download moment_detr_features.tar.gz (8GB) and extract it under the ../features directory.
You can modify the data directory by changing the feat_root parameter in the shell scripts located in the tr_detr/scripts/ directory.

tar -xf path/to/moment_detr_features.tar.gz

TVSum

Download the feature files for the TVSum dataset from UMT.

Download TVSum (69.1MB) and either extract it under the ../features/tvsum/ directory or modify the feat_root parameter in the TVSum shell scripts located in the tr_detr/scripts/tvsum/ directory.

2. Install dependencies

Python version 3.7 is required. Install dependencies using:

pip install -r requirements.txt

Note: The requirements.txt includes additional libraries that may not be required. These will be cleaned up in future updates. For Anaconda setup, refer to the official Moment-DETR GitHub.

QVHighlights

Training

You can train the model using only video features or both video and audio features:

bash tr_detr/scripts/train.sh   # Only video
bash tr_detr/scripts/train_audio.sh   # Video + audio

The best validation accuracy is achieved at the last epoch.

Inference Evaluation and Codalab Submission

After training, you can generate hl_val_submission.jsonl and hl_test_submission.jsonl for validation and test sets by running:

bash tr_detr/scripts/inference.sh results/{direc}/model_best.ckpt 'val'
bash tr_detr/scripts/inference.sh results/{direc}/model_best.ckpt 'test'

Replace {direc} with the path to your saved checkpoint. For more details on submission, see standalone_eval/README.md.

TVSum

Training

Similar to QVHighlights, you can train the model on the TVSum dataset:

bash tr_detr/scripts/tvsum/train_tvsum.sh   # Only video
bash tr_detr/scripts/tvsum/train_tvsum_audio.sh   # Video + audio

The best results are saved in results_[domain_name]/best_metric.jsonl.

Citation

If you find this repository useful, please cite our work:

@inproceedings{sun_zhou2024tr,
  title={Tr-detr: Task-reciprocal transformer for joint moment retrieval and highlight detection},
  author={Sun, Hao and Zhou, Mingyao and Chen, Wenjing and Xie, Wei},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={5},
  pages={4998--5007},
  year={2024}
}

License

The annotation files and parts of the implementation are borrowed from Moment-DETR and QD-DETR. Consequently, our code is also released under the MIT License.