๐ผ Panda-70M
October 22, 2024 ยท View on GitHub
This is the offical Github repository of Panda-70M.
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen,
Aliaksandr Siarohin,
Willi Menapace,
Ekaterina Deyneka,
Hsiang-wei Chao,
Byung Eun Jeon,
Yuwei Fang,
Hsin-Ying Lee,
Jian Ren,
Ming-Hsuan Yang,
Sergey Tulyakov
Computer Vision and Pattern Recognition (CVPR) 2024
Introduction
Panda-70M is a large-scale dataset with 70M high-quality video-caption pairs. This repository have three sections:
- Dataset Dataloading includes the csv files listing the data of Panda-70M and the code to download the dataset.
- Splitting includes the code to split a long video into multiple semantics-consistent short clips.
- Captioning includes the proposed video captioning model trained on Panda-70M.
๐ฅ Updates (Oct 2024)
To enhance the training of video generation models, which are intereted at single-shot videos with meaningful motion and aesthetically pleasing scenes, we introduce two additional annotations:
- Desirability Filtering: This annotation assesses whether a video is a suitable training sample. We categorize videos into six groups based on their characteristics:
desirable,0_low_desirable_score,1_still_foreground_image,2_tiny_camera_movement,3_screen_in_screen,4_computer_screen_recording. In the below table, we present examples for each category along with the percentage of videos within the dataset. - Shot Boundary Detection: This annotation provides a list of intervals representing continuous shots within a video (predicted by TransNetV2). If the length of the list is one, it indicates the video consists of a single continuous shot without any shot boundaries.
![]() |
![]() |
![]() |
| desirable (80.5%) | 0_low_desirable_score (5.28%) | 1_still_foreground_image (6.82%) |
![]() |
![]() |
![]() |
| 2_tiny_camera_movement (1.20%) | 3_screen_in_screen (5.03%) | 4_computer_screen_recording (1.13%) |
Dataset
Collection Pipeline
Download
| Split | Download | # Source Videos | # Samples | Video Duration | Storage Space |
|---|---|---|---|---|---|
| Training (full) | link (2.73 GB) | 3,779,763 | 70,723,513 | 167 khrs | ~36 TB |
| Training (10M) | link (504 MB) | 3,755,240 | 10,473,922 | 37.0 khrs | ~8.0 TB |
| Training (2M) | link (118 MB) | 800,000 | 2,400,000 | 7.56 khrs | ~1.6 TB |
| Validation | link (1.2 MB) | 2,000 | 6,000 | 18.5 hrs | ~4.0 GB |
| Testing | link (1.2 MB) | 2,000 | 6,000 | 18.5 hrs | ~4.0 GB |
More details can be found in Dataset Dataloading section.
Demonstration
Video-Caption Pairs in Panda-70M
![]() |
![]() |
![]() |
| A rhino and a lion are fighting in the dirt. | A person is holding a long haired dachshund in their arms. | A rocket launches into space on the launch pad. |
![]() |
![]() |
![]() |
| A person is kneading dough and putting jam on it. | A little boy is playing with a basketball in the city. | A 3d rendering of a zoo with animals and a train. |
![]() |
![]() |
![]() |
| A person in blue gloves is connecting an electrical supply to an injector. | There is a beach with waves and rocks in the foreground, and a city skyline in the background. | It is a rally car driving on a dirt road in the countryside, with people watching from the side of the road. |
**We will remove the video samples from our dataset / Github / project webpage / technical presentation as long as you need it. Please contact tsaishienchen at gmail dot com for the request.
Please check here for more samples.
Long Video Splitting and Captioning
https://github.com/snap-research/Panda-70M/assets/3857997/8144cf3d-c20c-4c18-a4bd-011451da9f9b
https://github.com/snap-research/Panda-70M/assets/3857997/b262128e-2152-41e8-873e-db2dc275c40f
License of Panda-70M
See license. The video samples are collected from a publicly available dataset. Users must follow the related license to use these video samples.
Citation
If you find this project useful for your research, please cite our paper. :blush:
@inproceedings{chen2024panda70m,
title = {Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers},
author = {Chen, Tsai-Shien and Siarohin, Aliaksandr and Menapace, Willi and Deyneka, Ekaterina and Chao, Hsiang-wei and Jeon, Byung Eun and Fang, Yuwei and Lee, Hsin-Ying and Ren, Jian and Yang, Ming-Hsuan and Tulyakov, Sergey},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2024}
}
Contact Information
Tsai-Shien Chen: tsaishienchen@gmail.com















