FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding

April 17, 2025 · View on GitHub

This repository is the official implementation of the paper FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding. (WACV 2025)

Zhuo Cao, Bingqing Zhang, Heming Du, Xin Yu, Xue Li, Sen Wang

The University of Queensland, Australia

Preparation | Training | Inference and Evaluation | Model Zoo

🔨 Preparation

Set up the environment for running the experiments.
- Clone this repository.
```
git clone https://github.com/Zhuo-Cao/FlashVTG.git
```
- Download the packages we used for training. Python version 3.12.2 is required for reproduce.
  
  pip install -r requirements.txt
Download datasets

Download QVHighlights and other datasets, please follow the instruction of CGDETR.

For feature extracted by InternVideo2, you can download from Hugging Face.

🏋️ Training

We provide training scripts for all datasets in FlashVTG/scripts/ directory.

QVHighlights

For Internvideo2 feature:

bash FlashVTG/scripts/qv_internvideo2/train.sh

For SlowFast+CLIP feature:

bash FlashVTG/scripts/train_qv_slowclip.sh

Charades-STA

For Internvideo2 feature:

bash FlashVTG/scripts/charades_sta_internvideo2/train.sh

For VGG feature:

bash FlashVTG/scripts/charades_sta/train_vgg.sh

TACos

bash FlashVTG/scripts/tacos/train.sh

TVSum

bash FlashVTG/scripts/tvsum/train.sh

Youtube-hl

bash FlashVTG/scripts/youtube_uni/train.sh

🏆 Inference and Evaluation

Using inference.sh to do inference. Hint: data/MR.py for Moment Retrieval task and data/HD.py for Highlight Detection task. Here is a sample shows how to use inference.sh.

bash FlashVTG/scripts/inference.sh data/MR.py results/QVHihlights_IV2/model_best.ckpt 'val'

For QVHighlights test set, you could do the evaluation on codalab. For more details, check standalone_eval/README.md.

📦 Model Zoo

We provide multiple checkpoints and training logs here. Configuration can be find in each opt.json file.

Dataset	Model file
QVHighlights (Slowfast + CLIP)	checkpoint and trainng log
QVHighlights (InternVideo2)	checkpoint and trainng log
Charades (InternVideo2)	checkpoint and trainng log
Charades (VGG)	checkpoint and trainng log
TACoS	checkpoint and trainng log
TVSum	checkpoint

🎓 Citation

If you find our work helps, please cite our paper.

@InProceedings{Cao_2025_WACV,
    author    = {Cao, Zhuo and Zhang, Bingqing and Du, Heming and Yu, Xin and Li, Xue and Wang, Sen},
    title     = {FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding},
    booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
    month     = {February},
    year      = {2025},
    pages     = {9208-9218}
}

Acknowledgements

This work is supported by Australian Research Council (ARC) Discovery Project DP230101753 and the code is based on CGDETR.