𝒳-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability

March 12, 2026 · View on GitHub

Yu Yang^1,2, Alan Liang², Jianbiao Mei¹, Yukai Ma¹, Yong Liu¹, Gim Hee Lee²
¹ Zhejiang University ² National University of Singapore

📢 News

[2025-09-19] Our 𝒳-Scene is accepted by NeurIPS 2025!
[2025-06-18] We released our project website here.
[2025-06-16] The paper can be accessed at arxiv.

Overview of 𝒳-Scene. a unified world generator that supports multi-granular controllability through high-level text-to-layout generation and low-level BEV layout conditioning. It performs joint occupancy, image, and video generation for 3D scene synthesis and reconstruction with high fidelity.

📝 Getting Started

Please refer to the following documents to set up the environment and run 𝒳-Scene:

🎯 Roadmap

Paper & Project Page
Release the Training Code
Release the Inference Code
Release the Processed Data

🎥 Demo of Layout-to-Scene Generation

🤝 Acknowledgments

We are grateful for the following open-source projects that inspired or assisted the development of 𝒳-Scene:

Occupancy Generation	Video & Driving Synthesis
SemCity	MagicDrive
DynamicCity	DriveArena
OccSora	LiDARCrafter
UniScene	X-Drive

Special thanks to these communities for their incredible contributions to the field!

🔖 Citation

@article{yang2025xscene,
  title={X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability},
  author={Yang, Yu and Liang, Alan and Mei, Jianbiao and Ma, Yukai and Liu, Yong and Lee, Gim Hee},
  journal={arXiv preprint arXiv:2506.13558},
  year={2025}
}