๐’ณ-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability

March 12, 2026 ยท View on GitHub

Custom badge Custom badge GitHub license

Yu Yang1,2, Alan Liang2, Jianbiao Mei1, Yukai Ma1, Yong Liu1, Gim Hee Lee2
1 Zhejiang University 2 National University of Singapore

๐Ÿ“ข News

  • [2025-09-19] Our ๐’ณ-Scene is accepted by NeurIPS 2025!

  • [2025-06-18] We released our project website here.

  • [2025-06-16] The paper can be accessed at arxiv.

๐ŸŽฏ Abstract

pipeline

Overview of ๐’ณ-Scene. a unified world generator that supports multi-granular controllability through high-level text-to-layout generation and low-level BEV layout conditioning. It performs joint occupancy, image, and video generation for 3D scene synthesis and reconstruction with high fidelity.

๐Ÿ“ Getting Started

Please refer to the following documents to set up the environment and run ๐’ณ-Scene:

๐ŸŽฏ Roadmap

  • Paper & Project Page
  • Release the Training Code
  • Release the Inference Code
  • Release the Processed Data

๐ŸŽฅ Demo of Layout-to-Scene Generation

Local GIF
Local GIF

๐Ÿค Acknowledgments

We are grateful for the following open-source projects that inspired or assisted the development of ๐’ณ-Scene:

Occupancy GenerationVideo & Driving Synthesis
SemCityMagicDrive
DynamicCityDriveArena
OccSoraLiDARCrafter
UniSceneX-Drive

Special thanks to these communities for their incredible contributions to the field!

๐Ÿ”– Citation

@article{yang2025xscene,
  title={X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability},
  author={Yang, Yu and Liang, Alan and Mei, Jianbiao and Ma, Yukai and Liu, Yong and Lee, Gim Hee},
  journal={arXiv preprint arXiv:2506.13558},
  year={2025}
}