Video-QAP (NAACL21)
June 8, 2021 ยท View on GitHub
Video Question Answering with Phrases via Semantic Roles
Arka Sadhu, Kan Chen Ram Nevatia
NAACL 2021
Video Question Answering has been studied through the lens of N-way phrase classification. While this eases evaluation, it severely limits its application in the wild. Here, we require the model to generate the answer and we propose a novel evaluation metric using relative scoring and contrastive scoring. We further create ActivityNet-SRL-QA and Charades-SRL-QA.
Quickstart
Quick Start
-
Clone repo:
git clone https://github.com/TheShadow29/Video-QAP cd Video-QAP export ROOT=$(pwd) -
Setup a new conda environment using the file vidqap_env.yml file provided. Please refer to Miniconda for details on installing conda.
MINICONDA_ROOT=[to your Miniconda/Anaconda root directory] conda env create -f vidqap_env.yml --prefix $MINICONDA_ROOT/envs/vidqap_pyt conda activate vidqap_pyt -
See instructions to install fairseq INSTALL.md
-
To download the datasets ActivityNet-SRL-QA and Charades-SRL-QA see DATA.md
Training
- Configuration files are insider configs
Use one of the modelscd $ROOT python code/main_dist.py "vogqap_asrlqa" --ds_to_use='asrl_qa' --mdl.name='vog_qa' --train.bs=4 --train.epochs=10 --train.lr=1e-4lqa, mtx_qa, butd_qa, vog_qa
Evaluation
- Main evaluation file is
vidqa_code/eval_fn_vidqap.py. You can use this as a stand-alone file for a separate dataset as well.
cd $ROOT
python vidqa_code/eval_fn_vidqap.py --pred_file=... --ds_to_use='asrl_qa' --split_type='valid' --met_keys='meteor,rouge,bert_score'
ToDo:
- Add more documentation on how to run the models
- Add pre-trained model weights.
- Support dataset creation for new caption dataset.
Acknowledgements:
We thank:
- @LuoweiZhou: for their codebase on GVD (https://github.com/facebookresearch/grounded-video-description) along with the extracted features for ActivityNet.
- @antoine77340 for their codebase on S3D pretrained on Howto100M (https://github.com/antoine77340/S3D_HowTo100M) used for feature extraction on Charades.
- allennlp for providing demo and pre-trained model for SRL.
- fairseq for sequence generation implementation and transformer encoder decoder models.
Citation
@inproceedings{Sadhu2021VideoQA,
title={Video Question Answering with Phrases via Semantic Roles},
author={Arka Sadhu and Kan Chen and R. Nevatia},
booktitle={NAACL},
year={2021}
}