IV. Downstream Task-shift (out-of-domain)
August 26, 2022 ยท View on GitHub
Evaluating self-supervised video representation models for the task of action detection and multi-label action classification.
This sub-repo is based on Facebook's official SlowFast repo for short-term action detection on AVA and multi-label action classification on Charades dataset. We extend it to experiment with Slowy-only R(2+1)D-18 backbone that is pretrained on Kinetics-400 via various video self-supervised learning methods.
Table of Contents
Setup
We use conda to manage dependencies. If you have not installed anaconda3 or miniconda3, please install it before following the steps below.
-
Clone the repository (if not already done)
git clone https://github.com/fmthoker/SEVERE_BENCHMARK.gitGo to the experiment folder for AVA and Charades.
cd SlowFast-ssl-vssl/ -
Create conda environment and install dependencies:
bash setup/create_env.sh slowfastThis will create and activate a conda environment called
slowfast.:warning: We use
torch==1.9.0with CUDA 11.1. If you have a CUDA with a different version, kindly use an apt PyTorch version. You will need to follow the steps insetup/create_env.shmanually to install different versions of dependencies.Please activate the environment for further steps.
conda activate slowfast
Evaluated VSSL models
- To evaluate video self-supervised pre-training methods used in the paper, you need the pre-trained checkpoints for each method. We assume that these models are downloaded as instructed in the main README.
- Symlink the pre-trained models for initialization. Suppose all your VSSL pre-trained checkpoints are at
../checkpoints_pretrainingls -s ../checkpoints_pretraining/ checkpoints_pretraining
Task: Action Detection
For this task, we use the AVA dataset.
Dataset Preparation: AVA
The data processing steps for AVA dataset is quite tedious. The steps for each of them are listed below.
First, you need to create a symlink to the root dataset folder into the repo. For e.g., if you store all your datasets at /path/to/datasets/, then,
# make sure you are inside the `SlowFast-ssl-vssl/` folder in the repo
ln -s /path/to/datasets/ data
:hourglass: Overall, the data preparation for AVA takes about 20 hours.
These steps are based on the ones in original repo.
- Download: This step takes about 3.5 hours.
cd scripts/prepare-ava/
bash download_data.sh
- Cut each video from its 15th to 30th minute: This step takes about 14 hours.
bash cut_videos.sh
- Extract frames: This step takes about 1 hour.
bash extract_frames.sh
- Download annotations: This step takes about 30 minutes.
bash download_annotations.sh
- Setup exception videos that may have failed the first time. For me, there was this video
I8j6Xq2B5ys.mp4that failed the first time. Seescripts/prepare-ava/exception.shto re-run the steps for such videos.
Experiments
First, configure an output folder where all logs/checkpoints should be stored. For e.g., if you want to store all outputs at /path/to/outputs/, then symlink it:
ln -s /path/to/outputs/ outputs
We run all our experiments on AVA 2.2. To run fine-tuning on AVA, using r2plus1d_18 backbone initialized from Kinetics-400 supervised pretraining, we use the following command(s):
cfg=configs/AVA/VSSL/32x2_112x112_R18_v2.2_supervised.yaml
bash scripts/jobs/train_on_ava.sh -c $cfg
(Optional) W&B logging: If you want to enable logging training curves on Weights and Biases, use the following command:
bash scripts/jobs/train_on_ava.sh -c $cfg -w True -e <wandb_entity>
where replace <wandb_entity> by your W&B username. Note that you first need to create an account on Weights and Biases and then login in your terminal via wandb login.
You can check out other configs for fine-tuning with other video self-supervised methods. The configs for all pre-training methods is provided below:
| Model | Config |
|---|---|
| No pre-training | 32x2_112x112_R18_v2.2_scratch.yaml |
| SeLaVi | 32x2_112x112_R18_v2.2_selavi.yaml |
| MoCo | 32x2_112x112_R18_v2.2_moco.yaml |
| VideoMoCo | 32x2_112x112_R18_v2.2_video_moco.yaml |
| Pretext-Contrast | 32x2_112x112_R18_v2.2_pretext_contrast.yaml |
| RSPNet | 32x2_112x112_R18_v2.2_rspnet.yaml |
| AVID-CMA | 32x2_112x112_R18_v2.2_avid_cma.yaml |
| CtP | 32x2_112x112_R18_v2.2_ctp.yaml |
| TCLR | 32x2_112x112_R18_v2.2_tclr.yaml |
| GDT | 32x2_112x112_R18_v2.2_gdt.yaml |
| Supervised | 32x2_112x112_R18_v2.2_supervised.yaml |
The training is followed by an evaluation on the test set. Thus, the numbers will be displayed in logs at the end of the run.
:warning: Note that, on AVA, we train using 4 GPUs (GeForce GTX 1080 Ti, 11GBs each) and a batch size of 32.
:hourglass: Each experiment takes about 8 hours to run on the suggested configuration.
Task: Multi-Label Classification
For this task, we use the Charades dataset.
Dataset Preparation: Charades
:hourglass: This, overall, takes about 2 hours.
- Download and unzip RGB frames
cd scripts/prepare-charades/
bash download_data.sh
- Download the split files
bash download_annotations.sh
Experiments
First, configure an output folder where all logs/checkpoints should be stored. For e.g., if you want to store all outputs at /path/to/outputs/, then symlink it:
ln -s /path/to/outputs/ outputs
To run fine-tuning on Charades, using r2plus1d_18 backbone initialized from Kinetics-400 supervised pretraining, we use the following command(s):
# activate the environment
conda activate slowfast
# make sure you are inside the `SlowFast-ssl-vssl/` folder in the repo
export PYTHONPATH=$PWD
cfg=configs/Charades/VSSL/32x8_112x112_R18_supervised.yaml
bash scripts/jobs/train_on_charades.sh -c $cfg -n 1 -b 16
This assumes that you have setup data folders symlinked into the repo. This shall save outputs in ./outputs/ folder. You can check ./outputs/<expt-folder-name>/logs/train_logs.txt to see the training progress.
For other VSSL models, please check other configs in configs/Charades/VSSL/.
:warning: Note that, on Charades, we obtain all our results using 1 GPU (NVIDIA RTX A600, 48GBs each) and a batch size of 16.
:hourglass: Each experiment takes about 8 hours to run on the suggested configuration.
FAQs
- I want to evaluate on the validation set after each training epoch. How do I do that?
A. In the config file, under theTRAINsection, change the value toEVAL_EPOCH: 1.