LAVIB

October 9, 2024 · View on GitHub

Code and scripts for "LAVIB: Large-scale Video Interpolation Benchmark"
To appear in the 38th Annual Conference on Neural Information Processing Systems (NeurIPS) 2024
[project website 🌐] [arXiv preprint 📃] [dataset 🤗]

Download
VFI benchmark
Additional info
- Citation
- License

Download

The dataset and splits are hosted on huggingface

Introduction

The dataset is stored in multiple chuncks of 20GB (lavib00, lavib01,etc.). This is done to avoid network overheads and allow and improve download speeds over multiple threads. After downloading files will need to then be combined before being extracted.

Annotations

name is the unique video index from which the clip is obtained.
shot is the index of the extracted 10-second segment from the video.
tmp_crop is the index (1-10) of the 1-second temporal location of the clip.
vrt_crop is the spatial location (1-2) that the tubelet is exctracted from. It corresponds to the Y axis.
hrz_crop is the spatial location (1-2) that the tubelet is exctracted from. It corresponds to the X axis.

The folders containing videos can be referenced by: <name>_hot<shot>_<tmp_crop>_<vrt_crop>_<hrz_crop>/vid.mp4

The main benchmark splits are train.csv, val.csv, and test.csv .

OOD splits can be loaded frfom their respective .csvs:

OOD-AFM

train_high_fm.csv, val_high_fm.csv, and test_high_fm.csv
train_low_fm.csv, val_low_fm.csv, and test_low_fm.csv

OOD-ALV

train_high_lv.csv, val_high_lv.csv, and test_high_lv.csv
train_low_lv.csv, val_low_lv.csv, and test_low_lv.csv

OOD-ARMS

train_high_rc.csv, val_high_rc.csv, and test_high_rc.csv
train_low_rc.csv, val_low_rc.csv, and test_low_rc.csv

OOD-APL

train_high_pl.csv, val_high_pl.csv, and test_high_pl.csv
train_low_pl.csv, val_low_pl.csv, and test_low_pl.csv

Script

You can also automatically download data and splits with lavib_downloader.sh.

You can resize video frames during data loading. Howevewer this includes significant overheads in loading/processing times. As an alternative you can store the videos at reduced resolutions and load them directly. To do this you can use resize.py with resizes videos to 540x540.

VFI benchmark

Three codebases are adjusted for VFI general instructions are given below

Dependencies

The required packages are listed below

torch >= 1.13.0
torchvision >= 0.14.0
numpy >= 1.22.4
pandas >= 1.3.4
sk-video >= 1.1.10
tqdm >= 4.65.0
wget >= 3.3
timm >= 1.0.3
pytorchvideo -> pip install git+https://github.com/facebookresearch/pytorchvideo.git@1fadaef40dd393ca09680f55582399f4679fc9b7
pytorch_msssim >= 1.0.0

Running RIFE

Please see the original repo for more details RIFE repo link. To run either training or inference use VFI/RIDE/train.py

The following call arguments are added:

root_dir: The folder location that segments_downsampled are stored in. If you are using the original sizes of videos you can adjust VFI/RIFE/dataset.py to load directly the segments.
eval_only: Integer (0-1) for running only inference. If set to 1 then only inference will run.
set: Definition for the challenge to run see choices for the available options.

Example run for training:

python train.py --batch_size 4 --root_dir /media/SCRATCH/LAVIB

Example run for inference (only) in high_afm:

python train.py --batch_size 1 --root_dir /media/SCRATCH/LAVIB --eval_only 1 --set high_fm --pretrained ckpt.pth

Running EMA-VFI

Please see the original repo for more details EMA-VFI repo link. For train or inference use VFI/EMA-VFI/train.py.

The following call arguments are added:

data_path: The folder location that segments_downsampled are stored in. If you are using the original sizes of videos you can adjust VFI/RIFE/dataset.py to load directly the segments.
eval_only: Integer (0-1) for running only inference. If set to 1 then only inference will run.
set: Definition for the challenge to run see choices for the available options.

Example run for training:

python train.py --batch_size 4 --data_path /media/SCRATCH/LAVIB

Example run for inference (only) in high_afm:

python train.py --batch_size 1 --data_path /media/SCRATCH/LAVIB --eval_only 1 --set high_fm --pretrained ckpt.pth

Running FLAVR

Please see the original repo for more details FLAVR repo link. For train or inference use VFI/FLAVR/main.py.

The following call arguments are added:

data_root: The folder location that segments_downsampled are stored in. If you are using the original sizes of videos you can adjust VFI/RIFE/dataset.py to load directly the segments.
eval_only: Integer (0-1) for running only inference. If set to 1 then only inference will run.
set: Definition for the challenge to run see choices for the available options.

Example run for training:

python main.py --batch_size 4 --data_root /media/SCRATCH/LAVIB

Example run for inference (only) in high_afm:

python main.py --batch_size 1 --data_path /media/SCRATCH/LAVIB --eval_only 1 --set high_fm --pretrained ckpt.pth

Weights

Main benchmark weights can be found here

OOD challenges weights can be found here

Additional Info

Citation

@inproceedings{stergiou2024lavib,
      title={LAVIB: Large-scale Video Interpolation Benchmark},
      author={Stergiou, Alexandros},
      booktitle={NeurIPS},
      year={2024}
    }

License

CC BY-SA-NC 4.0