:sunglasses: Awesome 3D and 4D World Models

May 19, 2026 · View on GitHub

Awesome Logo arXiv Visitors PR's Welcome

:sunglasses: Awesome 3D and 4D World Models

This survey reviews state-of-the-art 3D and 4D world models - systems that learn, predict, and simulate the geometry and dynamics of real environments from multi-modal signals.

We unify terminology, scope, and evaluations, and organize the space into three complementary paradigms by representation:

Learn generative or predictive models from sequential video streams with geometric and temporal constraints. VideoGen focuses on long-horizon consistency, controllability, and scene-level generation, enabling agents to imagine or forecast plausible video rollouts.
Model 3D/4D occupancy grids that encode geometry and semantics in voxel space. OccGen provides a physics-consistent scaffold for robust perception, forecasting, and simulation, bridging low-level sensor data and high-level reasoning.
Leverage point cloud sequences from LiDAR sensors to generate or predict geometry-grounded scenes. LiDARGen emphasizes high-fidelity 3D structure, robustness to environment changes, and applications in safety-critical domains such as autonomous driving.

For more details, kindly refer to our paper and project page. :rocket:

:books: Citation

If you find this work helpful for your research, please kindly consider citing our papers:

@article{survey_3d_4d_world_models,
    title   = {{3D} and {4D} World Modeling: A Survey},
    author  = {Lingdong Kong and Wesley Yang and Jianbiao Mei and Youquan Liu and Ao Liang and Dekai Zhu and Dongyue Lu and Wei Yin and Xiaotao Hu and Mingkai Jia and Junyuan Deng and Kaiwen Zhang and Yang Wu and Tianyi Yan and Shenyuan Gao and Song Wang and Linfeng Li and Liang Pan and Yong Liu and Jianke Zhu and Wei Tsang Ooi and Steven C. H. Hoi and Ziwei Liu},
    journal = {arXiv preprint arXiv:2509.07996},
    year    = {2025}
}
@article{worldlens,
    title   = {{WorldLens}: Full-Spectrum Evaluations of Driving World Models in Real World},
    author  = {Ao Liang and Lingdong Kong and Tianyi Yan and Hongsi Liu and Wesley Yang and Ziqi Huang and Wei Yin and Jialong Zuo and Yixuan Hu and Dekai Zhu and Dongyue Lu and Youquan Liu and Guangfeng Jiang and Linfeng Li and Xiangtai Li and Long Zhuo and Lai Xing Ng and Benoit R. Cottereau and Changxin Gao and Liang Pan and Wei Tsang Ooi and Ziwei Liu},
    journal = {arXiv preprint arXiv:2512.10958},
    year    = {2025}
}

Table of Contents

Background

World modeling has become a cornerstone of modern AI, enabling agents to understand, represent, and predict dynamic environments. While prior research has focused primarily on 2D images and videos, the rapid emergence of native 3D and 4D representations (e.g., RGB-D, occupancy grids, LiDAR point clouds) calls for a dedicated study.

What Are Native 3D Representations?

Unlike 2D projections, native 3D/4D signals directly encode metric geometry, visibility, and motion in the physical coordinates where agents act. Examples include:

  • RGB-D imagery (2D images with depth channels)
  • Occupancy grids (voxelized maps of free vs. occupied space)
  • LiDAR point clouds (3D coordinates from active sensing)
  • Neural fields (e.g., NeRF, Gaussian Splatting)

What Are World Models in 3D and 4D?

A 3D/4D world model is an internal representation that allows an agent to imagine, forecast, and interact with its environment in the 3D space.

Generative World Models:
synthesize plausible 3D/4D worlds under conditions (e.g., text prompts, trajectories).
Predictive World Models:
anticipate the future evolution of 3D/4D scenes given past observations and actions.

Together, these models provide the foundation for simulation, planning, and embodied intelligence in complex environments.

1. Benchmarks & Datasets

Benchmarks

WorldLensVBenchWorldScore

Workshops

ThemeVenueDateLocationRecording
Workshop on 4D World Models: Bridging Generation and ReconstructionCVPR 2026TBDDenver-
The 2nd Workshop on World ModelsICLR 2026April 23, 2026Rio de Janeiro-
Workshop on World Modeling-February 4-6, 2026Montréal-
Workshop on Embodied World Models for Decision MakingNeurIPS 2025December 6, 2025San Diego-
Workshop on Reliable and Interactable World Models: Geometry, Physics, Interactivity and Real-World GeneralizationICCV 2025October 19, 2025Hawai'i-
Workshop on Building Physically Plausible World ModelsICML 2025July 19, 2025Vancouver-
Workshop on Assessing World ModelsICML 2025July 18, 2025Vancouver-
Workshop on Benchmarking World ModelsCVPR 2025June 12, 2025Nashville-
Workshop on World Models: Understanding, Modelling and ScalingICLR 2025April 28, 2025Singapore-
Workshop on Foundation Models for Autonomous SystemsCVPR 2024June 17, 2025Seattle[YouTube]

Datasets

:timer_clock: In chronological order, from the earliest to the latest.

ModelPaperVenueWebsite
KITTIAre We Ready for Autonomous Driving? The KITTI Vision Benchmark SuiteCVPR 2012Website
NYUv2Indoor Segmentation and Support Inference from RGBD ImagesECCV 2012Website
CARLAarXiv
CARLA: An Open Urban Driving Simulator
CoRL 2017Website
SemanticKITTIarXiv
SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
ICCV 2019Website
nuScenesarXiv
nuScenes: A Multimodal Dataset for Autonomous Driving
CVPR 2020Website
Waymo OpenarXiv
Scalability in Perception for Autonomous Driving: Waymo Open Dataset
CVPR 2020Website
STFarXiv
Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather
CVPR 2020Website
Virtual KITTI 2arXiv
Virtual KITTI 2
arXiv 2020Website
Argoverse 2arXiv
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
NeurIPS 2021Website
Lyft-Level5arXiv
One Thousand and One Hours: Self-Driving Motion Prediction Dataset
CoRL 2021Website
nuPlanarXiv
nuPlan: A Closed-Loop ML-Based Planning Benchmark for Autonomous Vehicles
CVPRW 2021Website
PandaSetarXiv
PandaSet: Advanced Sensor Suite Dataset for Autonomous Driving
ITSC 2022Website
OpenCOODarXiv
OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication
ICRA 2022Website
KITTI-360arXiv
KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D
TPAMI 2022Website
CarlaSCarXiv
MotionSC: Data Set and Network for Real-Time Semantic Mapping in Dynamic Environments
RA-L 2022Website
Robo3DarXiv
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
ICCV 2023Website
OpenOccupancyarXiv
OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception
ICCV 2023Website
Occ3D-nuScenesarXiv
Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving
NeurIPS 2023Website
OpenDV-YouTubearXiv
GenAD: Generalized Predictive Model for Autonomous Driving
CVPR 2024Website
SSCBencharXiv
SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving
IROS 2024Website
NAVSIMarXiv
NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking
NeurIPS 2024Website
DrivingDojoarXiv
DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model
NeurIPS 2024Website
EUVSarXiv
Extrapolated Urban View Synthesis Benchmark
ICCV 2025Website
Pi3DETarXiv
Perspective-Invariant 3D Object Detection
ICCV 2025Website

2. World Modeling from Video Generation

:one: Data Engines

:timer_clock: In chronological order, from the earliest to the latest.

ModelPaperVenueWebsiteGitHub
BEVControlarXiv
BEVControl: Accurately Controlling Street-View Elements with Multi-Perspective Consistency via BEV Sketch Layout
arXiv 2023--
BEVGenarXiv
Street-View Image Generation from a Bird's-Eye View Layout
RA-L 2024WebsiteGitHub
MagicDrivearXiv
MagicDrive: Street View Generation with Diverse 3D Geometry Control
ICLR 2024WebsiteGitHub
PanaceaarXiv
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
CVPR 2024WebsiteGitHub
DrivingDiffusionarXiv
DrivingDiffusion: Layout-Guided Multi-View Driving Scene Video Generation with Latent Diffusion Model
ECCV 2024WebsiteGitHub
WoVoGenarXiv
WoVoGen: World Volume-Aware Diffusion for Controllable Multi-Camera Driving Scene Generation
ECCV 2024-GitHub
DelphiarXiv
Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation
arXiv 2024WebsiteGitHub
SimGenarXiv
SimGen: Simulator-Conditioned Driving Scene Generation
NeurIPS 2024WebsiteGitHub
BEVWorldarXiv
BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents
arXiv 2024--
Panacea+arXiv
Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving
arXiv 2024Website-
DiVEarXiv
DiVE: DiT-Based Video Generation with Enhanced Control
arXiv 2024WebsiteGitHub
SyntheOccarXiv
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs
arXiv 2024WebsiteGitHub
HoloDrivearXiv
HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
arXiv 2024--
CogDrivingarXiv
Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention
arXiv 2024Website-
UniMLVGarXiv
UniMLVG: Unified Framework for Multi-View Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving
arXiv 2024-GitHub
DrivePhysicaarXiv
Physical Informed Driving World Model
arXiv 2024Website-
DriveDreamer-2arXiv
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
AAAI 2025WebsiteGitHub
SubjectDrivearXiv
SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control
AAAI 2025Website-
GladarXiv
Glad: A Streaming Scene Generator for Autonomous Driving
ICLR 2025-GitHub
DualDiffarXiv
DualDiff: Dual-Branch Diffusion Model for Autonomous Driving with Semantic Fusion
ICRA 2025-GitHub
UniScenearXiv
UniScene: Unified Occupancy-Centric Driving Scene Generation
CVPR 2025WebsiteGitHub
DriveScapearXiv
DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation
CVPR 2025Website-
PerLDiffarXiv
PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models
ICCV 2025WebsiteGitHub
MagicDrive-V2arXiv
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
ICCV 2025Website-
DINO-ForesightarXiv
DINO-Foresight: Looking into the Future with DINO
NeurIPS 2025WebsiteGitHub
Cosmos-Transfer1arXiv
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
arXiv 2025WebsiteGitHub
DualDiff+arXiv
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance
arXiv 2025-GitHub
CoGenarXiv
CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving
arXiv 2025Website-
NoiseControllerarXiv
NoiseController: Towards Consistent Multi-View Video Generation via Noise Decomposition and Collaboration
arXiv 2025--
STAGEarXiv
STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation
arXiv 2025--
WMRewardarXiv
Inference-time Physics Alignment of Video Generative Models with Latent World Models
arXiv 2026--

:two: Action Interpreters

:timer_clock: In chronological order, from the earliest to the latest.

ModelPaperVenueWebsiteGitHub
GAIA-1arXiv
GAIA-1: A Generative World Model for Autonomous Driving
arXiv 2023Website-
ADriver-IarXiv
ADriver-I: A General World Model for Autonomous Driving
arXiv 2023--
Drive-WMarXiv
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
CVPR 2024WebsiteGitHub
DriveDreamerarXiv
DriveDreamer: Towards Real-World-Driven World Models for Autonomous Driving
ECCV 2024WebsiteGitHub
GenADarXiv
GenAD: Generalized Predictive Model for Autonomous Driving
CVPR 2024-GitHub
VistaarXiv
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
NeurIPS 2024WebsiteGitHub
InfinityDrivearXiv
InfinityDrive: Breaking Time Limits in Driving World Models
arXiv 2024Website-
DrivingGPTarXiv
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-Modal Autoregressive Transformers
arXiv 2024Website-
DrivingWorldarXiv
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT
arXiv 2024WebsiteGitHub
GEMarXiv
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
CVPR 2025WebsiteGitHub
MaskGWMarXiv
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
CVPR 2025-GitHub
EponaarXiv
Epona: Autoregressive Diffusion World Model for Autonomous Driving
ICCV 2025WebsiteGitHub
VaViM & VaVAMarXiv
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
arXiv 2025WebsiteGitHub
MiLAarXiv
MiLA: Multi-View Intensive-Fidelity Long-Term Video Generation World Model for Autonomous Driving
arXiv 2025-GitHub
GAIA-2arXiv
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving
arXiv 2025Website-
DriVersearXiv
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment
arXiv 2025--
PosePilotarXiv
PosePilot: Steering Camera Pose for Generative World Models with Self-Supervised Depth
arXiv 2025--
ProphetDWMarXiv
ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos
arXiv 2025--
LongDWMarXiv
LongDWM: Cross-Granularity Distillation for Building A Long-Term Driving World Model
arXiv 2025WebsiteGitHub
UniDrive-WMarXiv
UniDrive-WM: Unified Understanding, Planning and Generation World Model For Autonomous Driving
arXiv 2026Website-

:three: Neural Simulators

:timer_clock: In chronological order, from the earliest to the latest.

ModelPaperVenueWebsiteGitHub
MagicDrive3DarXiv
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes
arXiv 2024WebsiteGitHub
DreamForgearXiv
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
arXiv 2024WebsiteGitHub
Doe-1arXiv
Doe-1: Closed-Loop Autonomous Driving with Large World Model
arXiv 2024WebsiteGitHub
DrivingSpherearXiv
DrivingSphere: Building A High-Fidelity 4D World for Closed-Loop Simulation
CVPR 2025WebsiteGitHub
UMGenarXiv
Generating Multimodal Driving Scenes via Next-Scene Prediction
CVPR 2025WebsiteGitHub
DriveArenaarXiv
DriveArena: A Closed-Loop Generative Simulation Platform for Autonomous Driving
ICCV 2025WebsiteGitHub
InfiniCubearXiv
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
ICCV 2025WebsiteGitHub
DiST-4DarXiv
DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation
ICCV 2025WebsiteGitHub
UniFuturearXiv
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception
arXiv 2025WebsiteGitHub
NexusarXiv
Decoupled Diffusion Sparks Adaptive Scene Generation
arXiv 2025WebsiteGitHub
ChallengerarXiv
Challenger: Affordable Adversarial Driving Video Generation
arXiv 2025WebsiteGitHub
Cosmos-DrivearXiv
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models
arXiv 2025WebsiteGitHub

:four: Scene Reconstructors

:timer_clock: In chronological order, from the earliest to the latest.

ModelPaperVenueWebsiteGitHub
3DGSarXiv
3D Gaussian Splatting for Real-Time Radiance Field Rendering
TOG 2023WebsiteGitHub
StreetGaussianarXiv
Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
ECCV 2024WebsiteGitHub
4DGFarXiv
Dynamic 3D Gaussian Fields for Urban Areas
NeurIPS 2024WebsiteGitHub
SCubearXiv
SCube: Instant Large-Scale Scene Reconstruction using VoxSplats
NeurIPS 2024WebsiteGitHub
HUGSarXiv
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
CVPR 2024WebsiteGitHub
MagicDrive3DarXiv
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes
arXiv 2024WebsiteGitHub
S3GaussianarXiv
S3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving
arXiv 2024WebsiteGitHub
VDGarXiv
VDG: Vision-Only Dynamic Gaussian for Driving Simulation
arXiv 2024WebsiteGitHub
UniGaussianarXiv
UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations
arXiv 2024--
Stag-1arXiv
Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model
arXiv 2024WebsiteGitHub
DrivingReconarXiv
DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving
arXiv 2024-GitHub
OccScenearXiv
OccScene: Semantic Occupancy-Based Cross-Task Mutual Learning for 3D Scene Generation
arXiv 2024--
SGDarXiv
SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior
WACV 2025--
OmniRearXiv
OmniRe: Omni Urban Scene Reconstruction
ICLR 2025WebsiteGitHub
DriveDreamer4DarXiv
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
CVPR 2025WebsiteGitHub
DeSiRe-GSarXiv
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes
CVPR 2025-GitHub
SplatADarXiv
SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving
CVPR 2025WebsiteGitHub
ReconDreamerarXiv
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration
CVPR 2025WebsiteGitHub
FreeSimarXiv
FreeSim: Toward Free-Viewpoint Camera Simulation in Driving Scenes
CVPR 2025Website-
StreetCrafterarXiv
StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
CVPR 2025WebsiteGitHub
FlexDrivearXiv
FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering
CVPR 2025--
S-NeRF++arXiv
S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation
TPAMI 2025--
InfiniCubearXiv
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
ICCV 2025WebsiteGitHub
DiST-4DarXiv
Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation
ICCV 2025WebsiteGitHub
DreamDrivearXiv
DreamDrive: Generative 4D Scene Modeling from Street View Images
arXiv 2025Website-
Uni-GaussiansarXiv
Uni-Gaussians: Unifying Camera and Lidar Simulation with Gaussians for Dynamic Driving Scenarios
arXiv 2025Website-
MuDGarXiv
MuDG: Taming Multi-Modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction
arXiv 2025WebsiteGitHub
UniFuturearXiv
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception
arXiv 2025WebsiteGitHub
SceneCrafterarXiv
Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving Humanoid Robots
arXiv 2025-GitHub
ReconDreamer++arXiv
ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation
arXiv 2025WebsiteGitHub
RealEnginearXiv
RealEngine: Simulating Autonomous Driving in Realistic Context
arXiv 2025-GitHub
GeoDrivearXiv
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control
arXiv 2025-GitHub
PseudoSimulationarXiv
Pseudo-Simulation for Autonomous Driving
arXiv 2025-GitHub
DreamlandarXiv
Dreamland: Controllable World Creation with Simulator and Generative Models
arXiv 2025Website-
Diff4SplatarXiv
Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models
arXiv 2025WebsiteGitHub
SpaceTimePilotarXiv
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
arXiv 2025WebsiteGitHub
FLAG-4DarXiv
FLAG-4D: Flow-Guided Local-Global Dual-Deformation Model for 4D Reconstruction
arXiv 2026-GitHub
MotionCrafterarXiv
MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
arXiv 2026WebsiteGitHub

3. World Modeling from Occupancy Generation

:one: Scene Representors

:timer_clock: In chronological order, from the earliest to the latest.

ModelPaperVenueWebsiteGitHub
SSDarXiv
Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data
arXiv 2023-GitHub
SemCityarXiv
SemCity: Semantic Scene Generation with Triplane Diffusion
CVPR 2024WebsiteGitHub
WoVoGenarXiv
WoVoGen: World Volume-Aware Diffusion for Controllable Multi-Camera Driving Scene Generation
ECCV 2024-GitHub
UrbanDiffarXiv
Urban Scene Diffusion through Semantic Occupancy Map
arXiv 2024Website-
DrivingSpherearXiv
DrivingSphere: Building A High-Fidelity 4D World for Closed-Loop Simulation
CVPR 2025WebsiteGitHub
UniScenearXiv
UniScene: Unified Occupancy-Centric Driving Scene Generation
CVPR 2025WebsiteGitHub
OccScenearXiv
OccScene: Semantic Occupancy-Based Cross-Task Mutual Learning for 3D Scene Generation
arXiv 2024--
InfiniCubearXiv
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
ICCV 2025WebsiteGitHub
Control-3D-ScenearXiv
Controllable 3D Outdoor Scene Generation via Scene Graphs
ICCV 2025WebsiteGitHub
X-ScenearXiv
X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability
arXiv 2025WebsiteGitHub
GenieDrivearXiv
GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation
CVPR 2026WebsiteGitHub

:two: Occupancy Forecasters

:timer_clock: In chronological order, from the earliest to the latest.

ModelPaperVenueWebsiteGitHub
Emergent-OccarXiv
Differentiable Raycasting for Self-supervised Occupancy Forecasting
ECCV 2022-GitHub
FF4DarXiv
Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting
CVPR 2023WebsiteGitHub
UniWorldarXiv
UniWorld: Autonomous Driving Pre-Training via World Models
arXiv 2023--
UniScenearXiv
UniScene: Multi-Camera Unified Pre-Training via 3D Scene Reconstruction for Autonomous Driving
arXiv 2023-GitHub
OccWorldarXiv
OccWorld: Learning A 3D Occupancy World Model for Autonomous Driving
ECCV 2024WebsiteGitHub
Cam4DOccarXiv
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
CVPR 2024-GitHub
DriveWorldarXiv
DriveWorld: 4D Pre-Trained Scene Understanding via World Models for Autonomous Driving
CVPR 2024--
OccSoraarXiv
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving
arXiv 2024WebsiteGitHub
UnOarXiv
UnO: Unsupervised Occupancy Fields for Perception and Forecasting
CVPR 2024Website-
LOPRarXiv
Self-Supervised Multi-Future Occupancy Forecasting for Autonomous Driving
arXiv 2024--
FSF-NetarXiv
FSF-Net: Enhance 4D Occupancy Forecasting with Coarse BEV Scene Flow for Autonomous Driving
arXiv 2024--
OccLLaMAarXiv
OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving
arXiv 2024--
DOMEarXiv
DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model
arXiv 2024WebsiteGitHub
GaussianADarXiv
GaussianAD: Gaussian-Centric End-to-End Autonomous Driving
arXiv 2024WebsiteGitHub
DFIT-OccWorldarXiv
An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-Assisted Training
arXiv 2024--
Drive-OccWorldarXiv
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving
AAAI 2025WebsiteGitHub
PreWorldarXiv
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
ICLR 2025-GitHub
OccProphetarXiv
OccProphet: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner Framework
ICLR 2025-GitHub
RenderWorldarXiv
RenderWorld: World Model with Self-Supervised 3D Label
ICRA 2025--
Occ-LLMarXiv
Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models
ICRA 2025--
EfficientOCFarXiv
Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting
CVPR 2025--
DIOarXiv
DIO: Decomposable Implicit 4D Occupancy-Flow World Model
CVPR 2025--
T³FormerarXiv
Temporal Triplane Transformers as Occupancy World Models
arXiv 2025--
UniOccarXiv
UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
ICCV 2025WebsiteGitHub
COMEarXiv
COME: Adding Scene-Centric Forecasting Control to Occupancy World Model
arXiv 2025-GitHub
I²WorldarXiv
I²-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting
ICCV 2025-GitHub
OmniNWMarXiv
OmniNWM: Omniscient Driving Navigation World Models
arXiv 2025-GitHub
SparseOccVLAarXiv
SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning
arXiv 2026-GitHub
GenieDrivearXiv
GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation
CVPR 2026WebsiteGitHub

:three: Autoregressive Simulators

:timer_clock: In chronological order, from the earliest to the latest.

ModelPaperVenueWebsiteGitHub
SemCityarXiv
SemCity: Semantic Scene Generation with Triplane Diffusion
CVPR 2024WebsiteGitHub
XCubearXiv
XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies
CVPR 2024WebsiteGitHub
PDDarXiv
Pyramid Diffusion for Fine 3D Large Scene Generation
ECCV 2024WebsiteGitHub
OccSoraarXiv
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving
arXiv 2024WebsiteGitHub
DynamicCityarXiv
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
ICLR 2025WebsiteGitHub
DrivingSpherearXiv
DrivingSphere: Building A High-Fidelity 4D World for Closed-Loop Simulation
CVPR 2025WebsiteGitHub
InfiniCubearXiv
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
ICCV 2025WebsiteGitHub
X-ScenearXiv
X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability
arXiv 2025WebsiteGitHub
PrITTIarXiv
PrITTI: Primitive-Based Generation of Controllable and Editable 3D Semantic Scenes
arXiv 2025WebsiteGitHub

4. World Modeling from LiDAR Generation

:one: Data Engines

:timer_clock: In chronological order, from the earliest to the latest.

ModelPaperVenueWebsiteGitHub
DUStyarXiv
Learning to Drop Points for LiDAR Scan Synthesis
IROS 2021WebsiteGitHub
LiDARGenarXiv
Learning to Generate Realistic LiDAR Point Clouds
ECCV 2022-GitHub
DUSty v2arXiv
Generative Range Imaging for Learning Scene Priors of 3D LiDAR Data
WACV 2023WebsiteGitHub
UltraLiDARarXiv
UltraLiDAR: Learning Compact Representations for LiDAR Completion and Generation
CVPR 2023Website-
Copilot4DarXiv
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
ICLR 2024Website-
R2DMarXiv
LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models
ICRA 2024WebsiteGitHub
ViDARarXiv
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
CVPR 2024-GitHub
LiDiffarXiv
Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion
CVPR 2024-GitHub
LiDMarXiv
Towards Realistic Scene Generation with LiDAR Diffusion Models
CVPR 2024-GitHub
RangeLDMarXiv
RangeLDM: Fast Realistic LiDAR Point Cloud Generation
ECCV 2024-GitHub
Text2LiDARarXiv
Text2LiDAR: Text-Guided LiDAR Point Cloud Generation via Equirectangular Transformer
ECCV 2024-GitHub
LiDARGRITarXiv
Taming Transformers for Realistic LiDAR Point Cloud Generation
arXiv 2024-GitHub
BEVWorldarXiv
BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents
arXiv 2024-GitHub
SDSarXiv
Simultaneous Diffusion Sampling for Conditional LiDAR Generation
arXiv 2024--
DiffSSCarXiv
DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models
IROS 2025--
HoloDrivearXiv
HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
arXiv 2024--
LOGenarXiv
LOGen: Toward LiDAR Object Generation by Point Diffusion
arXiv 2024WebsiteGitHub
OLiDMarXiv
OLiDM: Object-Aware LiDAR Diffusion Models for Autonomous Driving
AAAI 2025WebsiteGitHub
X-DrivearXiv
X-Drive: Cross-Modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios
ICLR 2025-GitHub
LidarDMarXiv
LidarDM: Generative LiDAR Simulation in a Generated World
ICRA 2025WebsiteGitHub
LiDAR-EDITarXiv
LiDAR-EDIT: LiDAR Data Generation by Editing the Object Layouts in Real-World Scenes
ICRA 2025WebsiteGitHub
R2FlowarXiv
Fast LiDAR Data Generation with Rectified Flows
ICRA 2025WebsiteGitHub
WeatherGenarXiv
WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion
CVPR 2025-GitHub
LiDPMarXiv
LiDPM: Rethinking Point Diffusion for Lidar Scene Completion
IV 2025WebsiteGitHub
HERMESarXiv
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
ICCV 2025WebsiteGitHub
SuperPCarXiv
SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization
CVPR 2025Website-
SPIRALarXiv
SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding
NeurIPS 2025WebsiteGitHub
3DiSSarXiv
Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving
arXiv 2025-GitHub
Distill-DPOarXiv
Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion
arXiv 2025-GitHub
DriveXarXiv
DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
arXiv 2025--
OpenDWMarXiv
OpenDWM: Open Driving World Models
arXiv 2025-GitHub
RadarGenarXiv
RadarGen: Automotive Radar Point Cloud Generation from Cameras
arXiv 2025WebsiteGitHub
La La LiDARarXiv
La La LiDAR: Large-Scale Layout Generation from LiDAR Data
AAAI 2026--
LiDARCrafterarXiv
LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences
AAAI 2026WebsiteGitHub
VeilaarXiv
Veila: Panoramic LiDAR Generation from a Monocular RGB Image
ICRA 2026--

:two: Action Forecasters

:timer_clock: In chronological order, from the earliest to the latest.

ModelPaperVenueWebsiteGitHub
Copilot4DarXiv
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
ICLR 2024Website-
ViDARarXiv
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
CVPR 2024-GitHub
BEVWorldarXiv
BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents
arXiv 2024-GitHub
HERMESarXiv
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
ICCV 2025WebsiteGitHub
DriveXarXiv
DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
arXiv 2025--

:three: Autoregressive Simulators

:timer_clock: In chronological order, from the earliest to the latest.

ModelPaperVenueWebsiteGitHub
HoloDrivearXiv
HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
arXiv 2024--
LidarDMarXiv
LidarDM: Generative LiDAR Simulation in a Generated World
ICRA 2025WebsiteGitHub
OpenDWMarXiv
OpenDWM: Open Driving World Models
arXiv 2025-GitHub
LiDARCrafterarXiv
LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences
arXiv 2025WebsiteGitHub

5. Applications

:one: Autonomous Driving

ModelPaperVenueWebsiteGitHub
OccSoraarXiv
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving
arXiv 2024-GitHub
DFIT-OccWorldarXiv
An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-Assisted Training
arXiv 2024--
LiDARCrafterarXiv
LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences
arXiv 2025WebsiteGitHub
UniSimarXiv
UniSim: A Neural Closed-Loop Sensor Simulator
CVPR 2023Website-
PanaceaarXiv
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
CVPR 2024WebsiteGitHub
DelphiarXiv
Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation
arXiv 2024WebsiteGitHub
DriveDreamer-2arXiv
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
AAAI 2025WebsiteGitHub
Panacea+arXiv
Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving
arXiv 2024Website-
MiLAarXiv
MiLA: Multi-View Intensive-Fidelity Long-Term Video Generation World Model for Autonomous Driving
arXiv 2025-GitHub
GAIA-2arXiv
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving
arXiv 2025Website-
GenieDrivearXiv
GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation
CVPR 2026WebsiteGitHub

:two: Robotics

ModelPaperVenueWebsiteGitHub
Habitat 2.0arXiv
Habitat 2.0: Training Home Assistants to Rearrange Their Habitat
arXiv 2021--
VLMPSarXiv
Visual Language Maps for Robot Navigation
ICRA 2023WebsiteGitHub
-arXiv
Foundation Models in Robotics: Applications, Challenges, and the Future
IJRR 2024-GitHub
RoboDreamerarXiv
RoboDreamer: Learning Compositional World Models for Robot Imagination
arXiv 2024WebsiteGitHub
BEHAVIORarXiv
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities
CoRL 2025WebsiteGitHub
BridgeV2WarXiv
BridgeV2W: Bridging Video Generation Models to Embodied World Models via Embodiment Masks
arXiv 2026Website-

:three: Video Games & XR

ModelPaperVenueWebsiteGitHub
ILVEarXiv
Interactive Latent Variable Evolution for the Generation of Minecraft Structures
ICFDG 2021--
ProcTHORarXiv
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
NeurIPS 2022WebsiteGitHub
WorldGPTarXiv
WorldGPT: Empowering LLM as Multimodal World Model
ACM MM 2024-GitHub
WorldExplorerarXiv
WorldExplorer: Towards Generating Fully Navigable 3D Scenes
SIGGRAPH Asia 2025WebsiteGitHub
Text2WorldarXiv
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation
arXiv 2025WebsiteGitHub
FlexWorldarXiv
FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis
arXiv 2025WebsiteGitHub
Hunyuan-GameCraftarXiv
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition
arXiv 2025WebsiteGitHub
HunyuanWorld 1.0arXiv
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
arXiv 2025WebsiteGitHub
MGVQarXiv
MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-Group Quantization
arXiv 2025-GitHub
EvoWorldarXiv
EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory
arXiv 2025-GitHub
ViewRopearXiv
Geometry-Aware Rotary Position Embedding for Consistent Video World Model
arXiv 2026--
MINDarXiv
MIND: Benchmarking Memory Consistency and Action Control in World Models
arXiv 2026WebsiteGitHub

:four: Digital Twins

ModelPaperVenueWebsiteGitHub
DynamicCityarXiv
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
ICLR 2025WebsiteGitHub
UrbanScene3DarXiv
Capturing, Reconstructing, and Simulating: the UrbanScene3D Datase
ECCV 2022WebsiteGitHub
UrbanWorldarXiv
UrbanWorld: An Urban World Model for 3D City Generation
arXiv 2024WebsiteGitHub
GaussianCityarXiv
GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation
CVPR 2025WebsiteGitHub
SceneDiffuser++arXiv
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model
CVPR 2025--
CityGenAgentarXiv
Imagine a City: CityGenAgent for Procedural 3D City Generation
arXiv 2026Website-

:five: Other Topics

ModelPaperVenueWebsiteGitHub
-arXiv
Interpreting Physics in Video World Models
arXiv 2026--

6. Other Resources

Tutorials

Talks & Seminars

7. Acknowledgements