Sekai: A Video Dataset towards World Exploration

December 5, 2025 · View on GitHub

project page  arXiv  demo  YouTube 

This repo contains the dataset download and processing code used in

Sekai: A Video Dataset towards World Exploration

Zhen Li, Chuanhao Li, Xiaofeng Mao, Shaoheng Lin, Ming Li, Shitian Zhao, Zhaopan Xu, Xinyue Li, Yukang Feng, Jianwen Sun, Zizhen Li, Fanrui Zhang, Jiaxin Ai, Zhixiang Wang, Yuwei Wu, Tong He, Jiangmiao Pang, Yu Qiao, Yunde Jia, Kaipeng Zhang

Shanghai AI Laboratory, Beijing Institute of Technology

🔥 Update

  • [2025.07.10] We're thrilled by the community's enthusiasm — Dataset Access Assistance is now updated!
  • [2025.06.25] Video download and clip extraction tools for Sekai-Real are now available!
  • [2025.06.19] We have released our paper — discussions and feedback are warmly welcome!

🧠 Introduction

pipeline

TL;DR We present Sekai (せかい, “world” in Japanese), a high-quality egocentric video dataset for immersive world exploration and generation. Sekai includes over 5000 hours of YouTube videos and game footage with rich annotations. It features:

  • 📹 Diverse, high-resolution videos (720p)
  • 🌍 Coverage of 100+ countries and 750+ cities
  • 🚶‍♂️ First-person and 🛸 drone perspectives
  • 🕒 Long sequences (≥ 60s) for real-world continuity
  • 🏷️ Detailed annotations: location, scene, weather, crowd, captions, and camera trajectories

Sekai supports tasks like video understanding, navigation, and video-audio co-generation.

🚀 Quick Start

The Sekai dataset includes Sekai-Real from YouTube videos and Sekai-Game from video game videos. The camera trajectories for both parts are represented using an intrinsic matrix and per-frame extrinsic matrices, all of which are normalized.

Dataset Access Assistance

If you confirm that you are experiencing insurmountable difficulties in obtaining Sekai(-Real) dataset through the following steps, please fill out this form. We’ll review your request shortly and send you the details.

Sekai-Real

We provide a comprehensive toolchain for downloading original videos and extracting video clips.

SplitAnnotationCamera Trajectories# Source Videos# SamplesVideo DurationStorage Space
Sekai-Real-WalkingHuggingfaceHuggingface+65522991734986h~10TB
Sekai-Real-Walking-HQ*HuggingfaceHuggingface387918208304h~600GB
Sekai-Real-DroneHuggingfaceHuggingface692391265h~140GB

* denotes the best-of-the-best videos sampled in consideration of the computational resources for training.

+ denotes that a subset of videos was annotated with camera trajectories. Refer to the paper for more details. We will soon release camera trajectory annotations for all of Sekai-Real.

Sekai-Game

The videos and corresponding camera trajectory files of Sekai-Game is hosted on Hugging Face. Click the link to view and download.

SplitAnnotationVideos & Camera Trajectories
Sekai-Game-WalkingHuggingfacepart1 and part2
Sekai-Game-DroneHuggingfacehere

📦 Checklist

  • Tools for Sekai-Real video download and clip extraction.
  • Modified MegaSam used in Sekai.

📄 License

See license.

📖 Citation

If you find this project helpful, please consider citing:

@article{li2025sekai,
      title={Sekai: A Video Dataset towards World Exploration}, 
      author={Zhen Li and Chuanhao Li and Xiaofeng Mao and Shaoheng Lin and Ming Li and Shitian Zhao and Zhaopan Xu and Xinyue Li and Yukang Feng and Jianwen Sun and Zizhen Li and Fanrui Zhang and Jiaxin Ai and Zhixiang Wang and Yuwei Wu and Tong He and Jiangmiao Pang and Yu Qiao and Yunde Jia and Kaipeng Zhang},
      journal={arXiv preprint arXiv:2506.15675},
      year={2025}
}