FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding
April 17, 2025 ยท View on GitHub
This repository is the official implementation of the paper FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding. (WACV 2025)
Zhuo Cao, Bingqing Zhang, Heming Du, Xin Yu, Xue Li, Sen Wang
The University of Queensland, Australia
Preparation | Training | Inference and Evaluation | Model Zoo

๐จ Preparation
-
Set up the environment for running the experiments.
-
Clone this repository.
git clone https://github.com/Zhuo-Cao/FlashVTG.git -
Download the packages we used for training. Python version 3.12.2 is required for reproduce.
pip install -r requirements.txt
-
-
Download datasets
Download QVHighlights and other datasets, please follow the instruction of CGDETR.
For feature extracted by InternVideo2, you can download from Hugging Face.
๐๏ธ Training
We provide training scripts for all datasets in FlashVTG/scripts/ directory.
QVHighlights
For Internvideo2 feature:
bash FlashVTG/scripts/qv_internvideo2/train.sh
For SlowFast+CLIP feature:
bash FlashVTG/scripts/train_qv_slowclip.sh
Charades-STA
For Internvideo2 feature:
bash FlashVTG/scripts/charades_sta_internvideo2/train.sh
For VGG feature:
bash FlashVTG/scripts/charades_sta/train_vgg.sh
TACos
bash FlashVTG/scripts/tacos/train.sh
TVSum
bash FlashVTG/scripts/tvsum/train.sh
Youtube-hl
bash FlashVTG/scripts/youtube_uni/train.sh
๐ Inference and Evaluation
Using inference.sh to do inference. Hint: data/MR.py for Moment Retrieval task and data/HD.py for Highlight Detection task. Here is a sample shows how to use inference.sh.
bash FlashVTG/scripts/inference.sh data/MR.py results/QVHihlights_IV2/model_best.ckpt 'val'
For QVHighlights test set, you could do the evaluation on codalab. For more details, check standalone_eval/README.md.
๐ฆ Model Zoo
We provide multiple checkpoints and training logs here. Configuration can be find in each opt.json file.
| Dataset | Model file |
|---|---|
| QVHighlights (Slowfast + CLIP) | checkpoint and trainng log |
| QVHighlights (InternVideo2) | checkpoint and trainng log |
| Charades (InternVideo2) | checkpoint and trainng log |
| Charades (VGG) | checkpoint and trainng log |
| TACoS | checkpoint and trainng log |
| TVSum | checkpoint |
๐ Citation
If you find our work helps, please cite our paper.
@InProceedings{Cao_2025_WACV,
author = {Cao, Zhuo and Zhang, Bingqing and Du, Heming and Yu, Xin and Li, Xue and Wang, Sen},
title = {FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding},
booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
month = {February},
year = {2025},
pages = {9208-9218}
}
Acknowledgements
This work is supported by Australian Research Council (ARC) Discovery Project DP230101753 and the code is based on CGDETR.