trainval.md
April 10, 2024 · View on GitHub
1. Prepare the Pretrained Weights
Although some weights can be downloaded dynamically at runtime, it is recommended to pre-download them for speeding up each run.
Pre-trained Image Encoder (EVA ViT-g)
wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/eva_vit_g.pth
the path of image encoder weight can be modified here.
Pre-trained Q-Former and Linear Projection
# InstructBLIP (recommended)
wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/InstructBLIP/instruct_blip_vicuna7b_trimmed.pth
# MiniGPT4
wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained_flant5xxl.pth
wget https://huggingface.co/Vision-CAIR/MiniGPT-4/blob/main/pretrained_minigpt4.pth
the path of Q-Former and Linear Weight can be modified in q_former_model and ckpt in each config here.
Prepare Vicuna Weights
Please first follow the instructions to prepare Vicuna v1.1 (for InstructBLIP) or Vicuna v1.0 (for MiniGPT4).
Then modify the llama_model in each config here to the folder that contains Vicuna weights.
2. Training
Data
We follow VideoChat2 to maintain consistency in the format of each instruction dataset. Please follow the source instructions to prepare the videos and annotations for each dataset. Then modify the path for each dataset here.
Please note:
(1)We do not need to prepare all datasets; we only need to prepare the datasets corresponding to the configurations needed for execution.
(2) The annotations for videochat11k and videochatgpt100k are slightly different from the source, which can be found here.
Running
Please first modify the path in train script for the desired config from config folder, then run
bash script/train/train.sh
3. Inference
MVBench
Please first modify the checkpoint path and annotation path in [test script], then run
bash script/inference/mvbench/test_mvbench.sh
VcgBench
All evaluation scripts can be found here.
For instance, to evaluate the temporal score on VideoChatGPT benchmark, we first run the inference to get prediction results:
bash script/inference/vcgbench/test_temporal.sh
and then execute the corresponding evaluation script to perform benchmarking:
bash script/inference/vcgbench/score_temporal.sh
VideoQABench
All testing procedures are identical to VCGbench, where all evaluation scripts are here.
For instance, to evaluate the result on MSVD, we first run
bash script/inference/qabench/msvd_qa.sh
and then run
bash script/inference/qabench/score_msvd.sh