SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications
March 2, 2026 · View on GitHub
Welcome to the official Google DeepMind repository for SciVid, a comprehensive benchmark for evaluating Video Foundation Models (ViFMs) across multiple scientific disciplines.
This repository contains:
- Instructions to download the datasets in a format compatible with our evaluations
- Instructions to evaluate pretrained video models on SciVid (eg. Hugging Face VideoMAE-B backbone), easily extensible to evaluate your own.
demo to inspect SciVid benchmark data, models and predictions.
Overview
SciVid comprises five Scientific Video tasks, combining both established and under-explored tasks across medical computer vision, animal behavior understanding and weather forecasting, and covering diverse data distributions and training regimes.
| Dataset | FlyVsFly | CalMS21 | WeatherBench 2 | Digital Typhoon | STIR |
|---|---|---|---|---|---|
| Example | ![]() | ![]() | ![]() | ![]() | |
| Domain | Fly behaviour | Mice behaviour | Weather | Typhoon satellite images | Surgical tissue |
| Task | Classification | Classification | Forecasting | Central pressure forecasting | Point tracking |
| Num train samples | 1M | 27K | 57K | 696 | N/A |
During evaluation, each model is appended with task-specific readouts and fine-tuned on each downstream dataset, with or without freezing the backbone.
Evaluation overview. For each task, we train a lightweight readout on top of the backbone, which is frozen (❄️) or fine-tuned (🔥).
In our paper, we evaluate the capabilities of a large set of ViFMs, establishing strong baselines and demonstrating the potential for effective transfer learning.
Installation
Installation has been tested with cuda 12.4 and python 3.10.
Get the code from the GitHub repository
git clone git@github.com:google-deepmind/scivid.git
Create and activate scivid conda environment
After installing miniconda if needed, create a conda environment with all required scivid dependencies.
conda env create -f scivid/environment.yml
This will create a conda environment named scivid, which you can activate with
conda activate scivid
Setup data
Download data
For optimized training, download the data from the scivid cloud storage bucket.
We also recommend storing the data on a
local SSD drive, if you have one available. This is particularly important to
speed up training on the weatherbench_future_pred task.
export SCIVID_COPY=/path/to/scivid_data_copy # set to the desired path (on ssd if available)
mkdir -p $SCIVID_COPY
gcloud storage rsync --recursive gs://scivid $SCIVID_COPY
Alternatively (slower), you can mount the data using gcsfuse in a separate location by running:
export SCIVID_MOUNT=/path/to/scivid_data_mount # set to *a separate location* from SCIVID_COPY
mkdir -p $SCIVID_MOUNT
gcsfuse --implicit-dirs scivid $SCIVID_MOUNT
In this case, we still recommend downloading the data for the
weatherbench_future_pred task with:
mkdir -p $SCIVID_COPY/full/weatherbench
gcloud storage rsync --recursive gs://scivid/full/weatherbench $SCIVID_COPY/full/weatherbench
Usage
Manage accelerator visibility and resources
To define which GPU to use and properly manage the accelerator memory, you will need to set the following environment variables:
export CUDA_VISIBLE_DEVICES=0
export XLA_PYTHON_CLIENT_MEM_FRACTION=.5
export TF_GPU_ALLOCATOR=cuda_malloc_async
What these commands do:
CUDA_VISIBLE_DEVICES=0 ensures data workers have access to the accelerator
when needed.
XLA_PYTHON_CLIENT_MEM_FRACTION=.5 reduces jax GPU memory pre-allocation, ensuring enough GPU memory is
available for other processes.
TF_GPU_ALLOCATOR=cuda_malloc_async helps prevent out-of-memory errors by
avoiding memory fragmentation issues.
Increase maximum number of open files
You might need to increase the maximum number of files which can be simultaneously opened to enable parallelized data preprocessing.
ulimit -n 4096
Run training
Set the SCIVID_DATA_DIR environment variable to either the root of the
copied or mounted data, depending on which data source you intend to use.
export SCIVID_DATA_DIR=$SCIVID_COPY # or $SCIVID_MOUNT
Below, we provide an example training command for training the task-specific readout using frozen features from the VideoMAE-B backbone on the Fly vs. Fly task (on GPU).
python -m kauldron.main --cfg=scivid/configs/launch_config.py:hf_videomae:flyvsfly_classification --cfg.workdir=/home/${USER}/tmp/exps/flyvsfly_videomae --cfg.aux.platform='cuda' --pdb
For WeatherBench2 forecasting, we additionally set
XLA_FLAGS="--xla_gpu_autotune_level=0" to avoid memory errors as follows:
XLA_FLAGS="--xla_gpu_autotune_level=0" python -m kauldron.main --cfg=scivid/configs/launch_config.py:hf_videomae:weatherbench_future_pred --cfg.workdir=/home/${USER}/tmp/exps/weatherbench_videomae --cfg.aux.platform='cuda' --pdb
Note that this may slow down training.
Run training with Scaling 4D Representations model
Download pre-trained model checkpoint with
wget -P ~/ https://storage.googleapis.com/representations4d/checkpoints/scaling4d_dist_b.npz
Set the SCALING4D_CHECKPOINT_PATH environment variable to the downloaded
checkpoint path.
export SCALING4D_CHECKPOINT_PATH=~/scaling4d_dist_b.npz
Launch training with scaling4d 4DS-B-dist-e model
python -m kauldron.main --cfg=scivid/configs/launch_config.py:scaling4d:flyvsfly_classification --cfg.workdir=/home/${USER}/tmp/exps/flyvsfly_scaling4d --cfg.aux.platform='cuda' --pdb
Note that released scaling4d 4DS-B-dist-e checkpoint is distilled from the released 4DS-e model. Results for this checkpoint are therefore different from the results reported for the pretrained 4DS-B model in Table 5 of the SciVid paper.
We report the following results for the released 4DS-B-dist-e checkpoint:
| Dataset | FlyVsFly | STIR | WB2 Z500/T850/Q700 |
|---|---|---|---|
| Metric | mAP ↑ | Acc ↑ | wRMSE ↓ |
| val results | 84.3 | 44.1 | 608/2.88/15.9e-3 |
Citing this work
We hope that our work will facilitate further research in cross-domain
development of ViFMs.
If you use our SciVid benchmark, please cite:
@inproceedings{hasson2025scivid,
title={SCIVID: Cross-Domain Evaluation of Video Models in Scientific Applications},
author={Hasson, Yana and Luc, Pauline and Momeni, Liliane and Ovsjanikov, Maks and Le Moing, Guillaume and Kuznetsova, Alina and Ktena, Ira and Sun, Jennifer J. and Koppula, Skanda and Gokay, Dilara and Heyward, Joseph and Pot, Etienne and Zisserman, Andrew},
year={2025},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
}
as well as the benchmarks included in SciVid:
@inproceedings{eyjolfsdottir2014flyvsfly,
title={Detecting social actions of fruit flies},
author={Eyjolfsdottir, Eyrun and Branson, Steve and Burgos-Artizzu, Xavier P and Hoopfer, Eric D and Schor, Jonathan and Anderson, David J and Perona, Pietro},
booktitle={ECCV},
year={2014},
}
@inproceedings{sun2021calms21,
title={The multi-agent behavior dataset: Mouse dyadic social interactions},
author={Sun, Jennifer J and Karigo, Tomomi and Chakraborty, Dipam and Mohanty, Sharada P and Wild, Benjamin and Sun, Quan and Chen, Chen and Anderson, David J and Perona, Pietro and Yue, Yisong and others},
booktitle={NeurIPS},
year={2021},
}
@article{schmidt2024stir,
title={Surgical Tattoos in Infrared: A Dataset for Quantifying Tissue Tracking and Mapping},
journal={IEEE Transactions on Medical Imaging},
publisher={Institute of Electrical and Electronics Engineers (IEEE)},
author={Schmidt, Adam and Mohareri, Omid and DiMaio, Simon P. and Salcudean, Septimiu E.},
year={2024},
}
@article{rasp2024wb2,
title={Weatherbench 2: A benchmark for the next generation of data-driven global weather models},
author={Rasp, Stephan and Hoyer, Stephan and Merose, Alexander and Langmore, Ian and Battaglia, Peter and Russell, Tyler and Sanchez-Gonzalez, Alvaro and Yang, Vivian and Carver, Rob and Agrawal, Shreya and others},
journal={Journal of Advances in Modeling Earth Systems},
year={2024},
}
@inproceedings{kitamoto2023typhoon,
author = {Kitamoto, Asanobu and Hwang, Jared and Vuillod, Bastien and Gautier, Lucas and Tian, Yingtao and Clanuwat, Tarin},
booktitle = {NeurIPS},
editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
title = {Digital Typhoon: Long-term Satellite Image Dataset for the Spatio-Temporal Modeling of Tropical Cyclones},
year = {2023}
}
Acknowledgements
SciVid's release was made possible by the invaluable contributions of the following people:
Yana Hasson, Pauline Luc, Lili Momeni, Guillaume Le Moing, Maks Ovsjanikov, Alina Kuznetsova, Ira Ktena, Jennifer Sun, Dilara Gokay, Etienne Pot, Phoebe Kirk and Yotam Doron.
We also extend our gratitude to our collaborators at Google.
SciVid uses the following separate libraries and packages:
We thank all their contributors and maintainers!
License and disclaimer
Copyright 2025 Google LLC
All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0
All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode
Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.
The SciVid dataset contains preprocessed versions of the following datasets:
- Fly-vs-Fly [1] dataset has been filtered as described by Task Programming [2]. Fly-vs-Fly (California Institute of Technology, Howard Hughes Medical Institute) is made available pursuant to a CC-0 license at https://data.caltech.edu/records/zrznw-w7386.
- CalMS21 [3] dataset has been filtered and downsampled, where we have further held out a subset from the train split for validation. CalMS21 (California Institute of Technology, Northwestern University) is made available under a CC-BY license at https://data.caltech.edu/records/s0vdx-0k302.
- STIR dataset [4] has been filtered, downsampled; and query and target points have been normalized by image dimensions. STIR (The University of British Columbia, Intuitive Surgical is made available under a CC-BY license at https://ieee-dataport.org/open-access/stir-surgical-tattoos-infrared.
- Digital Typhoon dataset [5] has been filtered, downsampled and partitioned into a fixed train/val/test splits. Digital Typhoon (National Institute of Informatics, Japan, Google Deepmind and several other French, US and Japanese universities) is made available pursuant to a CC-BY license at https://agora.ex.nii.ac.jp/digital-typhoon/dataset/index.html.en.
- Movi [6] has been filtered, and preprocessed to generate information relevant for tracking. The training split only is released as part of this project. The Movi dataset is made available here: https://github.com/google-research/kubric/blob/main/challenges/movi/README.md
- ARCO_ERA5_3variable_1h_1deg.zarr is a filtered and downsized version of ERA5. ERA5 [7] is made available at https://cds.climate.copernicus.eu/datasets/reanalysis-era5-pressure-levels?tab=overview under a bespoke License (https://cds.climate.copernicus.eu/datasets/reanalysis-era5-pressure-levels?tab=overview).
Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.
This is not an official Google product.
[1] Eyjolfsdottir, E., Branson, S., Burgos-Artizzu, X. P., Hoopfer, E. D., Schor, J., Anderson, D. J., and Perona, P. Detecting social actions of fruit flies. In ECCV, 2014.
[2] Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Yue, and Pietro Perona. Task programming: Learning data efficient behavior representations. In CVPR, 2021.
[3] Sun, J. J., Karigo, T., Chakraborty, D., Mohanty, S. P., Wild, B., Sun, Q., Chen, C., Anderson, D. J., Perona, P., Yue, Y., et al. The multi-agent behavior dataset: Mouse dyadic social interactions. In NeurIPS D&B, 2021.
[4] Schmidt, A., Mohareri, O., DiMaio, S. and Salcudean, S.E. STIR: Surgical Tattoos in Infrared. In IEEE Transactions on Medical Imaging 2024.
[5] Kitamoto, A., Hwang, J., Vuillod, B., Gautier, L., Tian, Y., & Clanuwat, T. Digital typhoon: Long-term satellite image dataset for the spatio-temporal modeling of tropical cyclones. In NeurIPS D&B 2022.
[6] Greff, K., and Belletti, F., and Beyer, L., and Doersch, C., and Du, Y., and Duckworth, D., and Fleet, D. J. and Gnanapragasam, D. and Golemo, F. and Herrmann, C. and others. Kubric: A scalable dataset generator. In CVPR 2022.
[7] Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., Thépaut, J-N. (2023): ERA5 hourly data on pressure levels from 1940 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS), DOI: 10.24381/cds.bd0915c6 (Accessed on DD-MMM-YYYY)



