LAVIB
October 9, 2024 · View on GitHub
Code and scripts for "LAVIB: Large-scale Video Interpolation Benchmark"
To appear in the 38th Annual Conference on Neural Information Processing Systems (NeurIPS) 2024
[project website 🌐]
[arXiv preprint 📃]
[dataset 🤗]
Table of contents
Download
The dataset and splits are hosted on huggingface
Introduction
The dataset is stored in multiple chuncks of 20GB (lavib00, lavib01,etc.). This is done to avoid network overheads and allow and improve download speeds over multiple threads. After downloading files will need to then be combined before being extracted.
Annotations
-
nameis the unique video index from which the clip is obtained. -
shotis the index of the extracted 10-second segment from the video. -
tmp_cropis the index (1-10) of the 1-second temporal location of the clip. -
vrt_cropis the spatial location (1-2) that the tubelet is exctracted from. It corresponds to the Y axis. -
hrz_cropis the spatial location (1-2) that the tubelet is exctracted from. It corresponds to the X axis.
The folders containing videos can be referenced by: <name>_hot<shot>_<tmp_crop>_<vrt_crop>_<hrz_crop>/vid.mp4
The main benchmark splits are
train.csv,val.csv, andtest.csv.
OOD splits can be loaded frfom their respective
.csvs:
OOD-AFM
-
train_high_fm.csv,val_high_fm.csv, andtest_high_fm.csv -
train_low_fm.csv,val_low_fm.csv, andtest_low_fm.csv
OOD-ALV
-
train_high_lv.csv,val_high_lv.csv, andtest_high_lv.csv -
train_low_lv.csv,val_low_lv.csv, andtest_low_lv.csv
OOD-ARMS
-
train_high_rc.csv,val_high_rc.csv, andtest_high_rc.csv -
train_low_rc.csv,val_low_rc.csv, andtest_low_rc.csv
OOD-APL
-
train_high_pl.csv,val_high_pl.csv, andtest_high_pl.csv -
train_low_pl.csv,val_low_pl.csv, andtest_low_pl.csv
Script
You can also automatically download data and splits with lavib_downloader.sh.
You can resize video frames during data loading. Howevewer this includes significant overheads in loading/processing times. As an alternative you can store the videos at reduced resolutions and load them directly. To do this you can use resize.py with resizes videos to 540x540.
VFI benchmark
Three codebases are adjusted for VFI general instructions are given below
Dependencies
The required packages are listed below
torch >= 1.13.0torchvision >= 0.14.0numpy >= 1.22.4pandas >= 1.3.4sk-video >= 1.1.10tqdm >= 4.65.0wget >= 3.3timm >= 1.0.3pytorchvideo->pip install git+https://github.com/facebookresearch/pytorchvideo.git@1fadaef40dd393ca09680f55582399f4679fc9b7pytorch_msssim >= 1.0.0
Running RIFE
Please see the original repo for more details RIFE repo link.
To run either training or inference use VFI/RIDE/train.py
The following call arguments are added:
root_dir: The folder location thatsegments_downsampledare stored in. If you are using the original sizes of videos you can adjustVFI/RIFE/dataset.pyto load directly thesegments.eval_only: Integer (0-1) for running only inference. If set to 1 then only inference will run.set: Definition for the challenge to run seechoicesfor the available options.
Example run for training:
python train.py --batch_size 4 --root_dir /media/SCRATCH/LAVIB
Example run for inference (only) in high_afm:
python train.py --batch_size 1 --root_dir /media/SCRATCH/LAVIB --eval_only 1 --set high_fm --pretrained ckpt.pth
Running EMA-VFI
Please see the original repo for more details EMA-VFI repo link.
For train or inference use VFI/EMA-VFI/train.py.
The following call arguments are added:
data_path: The folder location thatsegments_downsampledare stored in. If you are using the original sizes of videos you can adjustVFI/RIFE/dataset.pyto load directly thesegments.eval_only: Integer (0-1) for running only inference. If set to 1 then only inference will run.set: Definition for the challenge to run seechoicesfor the available options.
Example run for training:
python train.py --batch_size 4 --data_path /media/SCRATCH/LAVIB
Example run for inference (only) in high_afm:
python train.py --batch_size 1 --data_path /media/SCRATCH/LAVIB --eval_only 1 --set high_fm --pretrained ckpt.pth
Running FLAVR
Please see the original repo for more details FLAVR repo link.
For train or inference use VFI/FLAVR/main.py.
The following call arguments are added:
data_root: The folder location thatsegments_downsampledare stored in. If you are using the original sizes of videos you can adjustVFI/RIFE/dataset.pyto load directly thesegments.eval_only: Integer (0-1) for running only inference. If set to 1 then only inference will run.set: Definition for the challenge to run seechoicesfor the available options.
Example run for training:
python main.py --batch_size 4 --data_root /media/SCRATCH/LAVIB
Example run for inference (only) in high_afm:
python main.py --batch_size 1 --data_path /media/SCRATCH/LAVIB --eval_only 1 --set high_fm --pretrained ckpt.pth
Weights
Main benchmark weights can be found here
OOD challenges weights can be found here
Additional Info
Citation
@inproceedings{stergiou2024lavib,
title={LAVIB: Large-scale Video Interpolation Benchmark},
author={Stergiou, Alexandros},
booktitle={NeurIPS},
year={2024}
}
License
CC BY-SA-NC 4.0