Progressive Spatio-temporal Perception for Audio-Visual Question Answering (ACMMM'23) [arXiv]
August 11, 2023 ยท View on GitHub
PyTorch code accompanies our PSTP-Net.
Guangyao Li, Wenxuan Hou, Di Hu
Requirements
python3.6 +
pytorch1.6.0
tensorboardX
ffmpeg
numpy
Usage
-
Clone this repo
git clone https://github.com/GeWu-Lab/PSTP-Net.git -
Download data
MUSIC-AVQA: https://gewu-lab.github.io/MUSIC-AVQA/
-
Feature extraction
feat_script/extract_clip_feat python extract_patch-level_feat.py -
Training
python main_train.py \ --temp_select True --segs 12 --top_k 2 \ --spat_select True --top_m 25 \ --a_guided_attn True \ --global_local True \ --batch-size 64 --epochs 30 --lr 1e-4 --gpu 0 \ --checkpoint PSTP_Net \ --model_save_dir models_pstp -
Testing
python main_test.py
Citation
If you find this work useful, please consider citing it.
coming soon!
Acknowledgement
This research was supported by Public Computing Cloud, Renmin University of China.