TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection (AAAI 2024 Paper)
February 22, 2025 · View on GitHub
by Hao Sun* 1, Mingyao Zhou* 1, Wenjing Chen†2, Wei Xie†1
1 Central China Normal University, 2 Hubei University of Technology, * Equal Contribution, † Corresponding authors.
[Paper]
Prerequisites
0. Clone this repository
git clone https://github.com/mingyao1120/TR-DETR.git
cd tr_detr
1. Prepare datasets
If any dataset link becomes invalid, you can refer to Hugging Face for alternative resources.
QVHighlights
Download the official feature files for the QVHighlights dataset from Moment-DETR.
- Download moment_detr_features.tar.gz (8GB) and extract it under the
../featuresdirectory. - You can modify the data directory by changing the
feat_rootparameter in the shell scripts located in thetr_detr/scripts/directory.
tar -xf path/to/moment_detr_features.tar.gz
TVSum
Download the feature files for the TVSum dataset from UMT.
- Download TVSum (69.1MB) and either extract it under the
../features/tvsum/directory or modify thefeat_rootparameter in the TVSum shell scripts located in thetr_detr/scripts/tvsum/directory.
2. Install dependencies
Python version 3.7 is required. Install dependencies using:
pip install -r requirements.txt
Note: The
requirements.txtincludes additional libraries that may not be required. These will be cleaned up in future updates. For Anaconda setup, refer to the official Moment-DETR GitHub.
QVHighlights
Training
You can train the model using only video features or both video and audio features:
bash tr_detr/scripts/train.sh # Only video
bash tr_detr/scripts/train_audio.sh # Video + audio
The best validation accuracy is achieved at the last epoch.
Inference Evaluation and Codalab Submission
After training, you can generate hl_val_submission.jsonl and hl_test_submission.jsonl for validation and test sets by running:
bash tr_detr/scripts/inference.sh results/{direc}/model_best.ckpt 'val'
bash tr_detr/scripts/inference.sh results/{direc}/model_best.ckpt 'test'
Replace {direc} with the path to your saved checkpoint. For more details on submission, see standalone_eval/README.md.
TVSum
Training
Similar to QVHighlights, you can train the model on the TVSum dataset:
bash tr_detr/scripts/tvsum/train_tvsum.sh # Only video
bash tr_detr/scripts/tvsum/train_tvsum_audio.sh # Video + audio
The best results are saved in results_[domain_name]/best_metric.jsonl.
Citation
If you find this repository useful, please cite our work:
@inproceedings{sun_zhou2024tr,
title={Tr-detr: Task-reciprocal transformer for joint moment retrieval and highlight detection},
author={Sun, Hao and Zhou, Mingyao and Chen, Wenjing and Xie, Wei},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={5},
pages={4998--5007},
year={2024}
}
License
The annotation files and parts of the implementation are borrowed from Moment-DETR and QD-DETR. Consequently, our code is also released under the MIT License.