Temporal Action Detection

October 8, 2024 ยท View on GitHub

We use the ActionFormer detection pipeline as our baseline method and replace its I3D feature with the feature extracted by VideoMAE V2-g.

DatasetBackboneHeadmAPFeatures
THUMOS14VideoMAE V2-gActionFormer69.6th14_mae_g_16_4.tar.gz
FineActionVideoMAE V2-gActionFormer18.2fineaction_mae_g.tar.gz

Extract feature

Use extract_tad_feature.py to extract the feature of datasets. For example, to extract the feature of THUMOS14, running the following command:

python extract_tad_feature.py \
    --data_set THUMOS14 \
    --data_path YOUR_PATH/thumos14_videos \
    --save_path YOUR_PATH/th14_vit_g_16_4 \
    --model vit_giant_patch14_224 \
    --ckpt_path YOUR_PATH/vit_g_hyrbid_pt_1200e_k710_ft.pth