A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

October 3, 2025 · View on GitHub

🤝 Citation

Please visit A Survey: Learning Embodied Intelligence from Physical Simulators and World Models for more details and comprehensive information.

Author list: Xiaoxiao Long, Qingrui Zhao, Kaiwen Zhang, Zihao Zhang, Dingrui Wang, Yumeng Liu, Zhengjie Shu, Yi Lu, Shouzheng Wang, Xinzhe Wei, Wei Li, Wei Yin, Yao Yao, Jia Pan, Qiu Shen, Ruigang Yang, Xun Cao, Qionghai Dai

Table of Content

1. Introduction
2. Levels of Intelligent Robot
3. Robotic Mobility, Dexterity and Interaction
4. Simulators
5. World Models
6. World Models for Intelligent Robots
- World Models for Autonomous Driving
- World Models for Articulated Robots

1. Introduction

Embodied intelligence provides a foundation for creating robots that can truly understand and reason about the world in a more human-like manner. Central to enabling intelligent behavior in robots are two key technologies: physical simulators and world models. Physical simulators provide controlled, high-fidelity environments for training and evaluating robotic agents, allowing safe and efficient development of complex behaviors. While world models empower robots with internal representations of their surroundings, enabling predictive planning and adaptive decision-making beyond direct sensory input. The synergy between them enhances robots' autonomy, adaptability, and task performance across diverse scenarios.

This repository aims to collect and organize research and resources related to learning embodied AI through the integration of physical simulators and world models.

2. Levels of Intelligent Robot

To address the absence of a comprehensive grading system that integrates the dimensions of "intelligent cognition" and "autonomous behavior," we outline a capability grading model for intelligent robots, ranging from IR-L0 to IR-L4. This model covers the entire technological evolution, from basic mechanical operation levels to advanced social interaction capabilities.

3. Robotic Mobility, Dexterity and Interaction

Model Predictive Control, MPC

Paper	Date	Venue
Model Predictive Control: Theory, Computation, and Design	2017	Nob Hill Publishing, LLC
Model predictive control of legged and humanoid robots: models and algorithms	2023-02	Advanced Robotics
An integrated system for real-time model predictive control of humanoid robots	2013-10	Humanoids 2013
Whole-body model-predictive control applied to the HRP-2 humanoid	2015-09	IROS 2015

Whole-Body Control, WBC

Paper	Date	Venue
Humanoid Robotics: A Reference	2017	Springer
A whole-body control framework for humanoids operating in human environments	2006-05	ICRA 2006
Hierarchical quadratic programming: Fast online humanoid-robot motion generation	2014-05	The International Journal of Robotics Research
Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot	2015-07	Autonomous Robots
Compliant locomotion using whole-body control and divergent component of motion tracking	2015-05	ICRA 2015
ExBody2: Advanced Expressive Humanoid Whole-Body Control	2024-12	arXiv
A Unified and General Humanoid Whole-Body Controller for Fine-Grained Locomotion	2025-02	arXiv

Reinforcement Learning

Paper	Date	Venue
Reinforcement learning in robotics: A survey	2013-08	The International Journal of Robotics Research
Learning-based legged locomotion: State of the art and future perspectives	2025-01	The International Journal of Robotics Research
Reinforcement learning of dynamic motor sequence: Learning to stand up	1998-10	IROS 1998
DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning	2017-07	TOG
Learning symmetric and low-energy locomotion	2018-07	TOG
Emergence of locomotion behaviours in rich environments	2017-10	arXiv
Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie	2019-03	arXiv

Imitation Learning

Paper	Date	Venue
Diffusion policy: Visuomotor policy learning via action diffusion	2024-10	The International Journal of Robotics Research
3d diffusion policy	2024-03	arXiv
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware	2023-04	arXiv
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation	2024-10	arXiv
Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets	2024-04	arXiv
AMP: adversarial motion priors for stylized physics-based character control	2021-07	TOG
Whole-body Humanoid Robot Locomotion with Human Reference	2024-10	IROS 2024
Dexcap: Scalable and portable mocap data collection system for dexterous manipulation	2024-03	arXiv
Open-television: Teleoperation with immersive active visual feedback	2024-07	arXiv
Visual Imitation Enables Contextual Humanoid Control	2025-05	arXiv

Visual-Language-Action Models, VLA

Paper	Date	Venue
Rt-2: Vision-language-action models transfer web knowledge to robotic control	2023-07	CoRL 2023
Openvla: An open-source vision-language-action	2024-06	arXiv
3D-VLA: A 3D Vision-Language-Action Generative World Model	2024-03	arXiv
Magma: A foundation model for multimodal ai agents	2025-06	CVPR 2025
$π_0$ : A Vision-Language-Action Flow Model for General Robot Control	2024-10	arXiv
Fast: Efficient action tokenization for vision-language-action models	2025-01	arXiv
Hi robot: Open-ended instruction following with hierarchical vision-language-action models	2025-02	arXiv
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation	2024-09	arXiv
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges	2025-05	arXiv

Robotic Locomotion

Related Survey

Paper	Date	Venue
Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning	2025-04	arXiv
A Comprehensive Review of Humanoid Robots	2025-03	SmartBot
Recent Progress in Legged Robots Locomotion Control	2021-06	Current Robotics Reports

Legged Locomotion

Paper	Date	Venue
Compliant terrain adaptation for biped humanoids without measuring ground surface and contact force	2009-02	T-RO
Online Learning of Uneven Terrain for Humanoid Bipedal Walking	2010-07	AAAI 2010
Practical bipedal walking control on uneven terrain using surface learning and push recovery	2011-09	IROS 2011
Biped walking stabilization based on linear inverted pendulum tracking	2010-09	IROS 2010
Dynamic walking with compliance on a Cassie bipedal robot	2019-06	European Control Conference
Dynamic walking on compliant and uneven terrain using DCM and passivity-based whole-body control	2019-10	Humanoids 2019
Fast Contact-Implicit Model Predictive Control	2024-01	T-RO
Efficient Anytime CLF Reactive Planning System for a Bipedal Robot on Undulating Terrain	2023-01	T-RO
Learning quadrupedal locomotion over challenging terrain	2020-10	Science Robotics
Blind bipedal stair traversal via sim-to-real reinforcement learning	2021-07	Robotics: Science and Systems (RSS)
Learning vision-based bipedal locomotion for challenging terrain	2024-05	ICRA 2024
Learning humanoid locomotion with perceptive internal model	2024-11	arXiv
Humanoid parkour learning	2024-06	arXiv
Unified modeling and control of walking and running on the spring-loaded inverted pendulum	2016-08	T-RO
Capturability-based analysis and control of legged locomotion, part 2: Application to m2v2, a lower- body humanoid	2012-09	ijrr
Convex model predictive control of single rigid body model on so (3) for versatile dynamic legged motions	2023-05	ICRA 2023
Bipedal hopping: Reduced- order model embedding via optimization-based control	2018-10	IROS 2018
Vertical Jump of a Humanoid Robot With CoP-Guided Angular Momentum Control and Impact Absorption	2023-05	T-RO
CDM-MPC: An integrated dynamic planning and control framework for bipedal robots jumping	2024-06	RAL
Optimizing bipedal locomotion for the 100m dash with comparison to human running	2023-05	ICRA 2023
Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control	2024-10	ijrr
Expressive Whole-Body Control for Humanoid Robots	2024-09	RSS
Exbody2: Advanced expressive humanoid whole-body control	2024-12	arXiv
OMNIH2O: Universal and dexterous human- to-humanoid whole-body teleoperation and learning	2024-06	CoRL 2024
ASAP: Aligning simulation and real-world physics for learning agile humanoid whole-body skills	2025-02	arXiv

Robotic Manipulation

Gripper-based manipulation

Paper	Date	Venue	Code
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion	2023-03	RSS 2023	Code
RT-1: Robotics Transformer for Real-World Control at Scale	2022-12	Arxiv	Code
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control	2023-7	PMLR 23	Code
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation	2022-9	CoRL 2022	Code
Act3d: 3d feature field transformers for multi-task robotic manipulation	2023-6	Arxiv	Code
Modeling of deformable objects for robotic manipulation: A tutorial and review	2020-9	Frontiers in Robotics and AI	--
6-DOF Grasping for Target-driven Object Manipulation in Clutter	2019-12	ICRA 2020	--
Cable manipulation with a tactile-reactive gripper	2021-12	IJRR 2021	--

Dexterous hand manipulation

Paper	Date	Venue	Code
Dexgraspnet: A large-scale robotic dexterous grasp dataset for general objects based on simulation	2023-5	Arxiv	Code
DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes	2024-10	CoRL 2024	Code
HGC-Net: Deep anthropomorphic hand grasping in clutter	2022-5	ICRA 2022	Code
Deep differentiable grasp planner for high-dof grippers	2022-2	Arxiv	--
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness	2025-3	Arxiv	Code
UGG: Unified Generative Grasping	2023-11	ECCV 2004	Code
SpringGrasp: Synthesizing Compliant, Dexterous Grasps under Shape Uncertainty	2024-4	Arxiv	Code
A System for General In-Hand Object Re-Orientation	2021-11	CoRL 2021	Code
Visual dexterity: In-hand reorientation of novel and complex object shapes	2023-11	Science Robotics 2023	--
Rotating without Seeing: Towards In-hand Dexterity through Touch	2023-3	RSS 2023	Code
DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video	2021-6	CoRL 2021	--
DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping	2025-2	Arixv	Code

Bimanual Manipulation Task

Paper	Date	Venue	Code
Stabilize to act: Learning to coordinate for bimanual manipulation	2023-9	CoRL 2023	--
Interactive imitation learning of bimanual movement primitives	2023-8	TMECH	--
Learning fine-grained bimanual manipulation with low-cost hardware	2023-4	RSS 2023	Cpde
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation	2024-1	CoRL 2024	Code
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation	2025-6	Arxiv	Code
Rdt-1b: a diffusion foundation model for bimanual manipulation	2024-10	Arxiv	Code

Whole-Body Manipulation Control

Paper	Date	Venue	Code
Tidybot: Personalized robot assistance with large language models	2023-12	Autonomous Robots	--
Open-world object manipulation using pre-trained vision-language models	2023-2	Arixv	Website
Harmon: Whole-body motion generation of humanoid robots from language descriptions	2024-10	CoRL 2024	Website
Okami: Teaching humanoid robots manipulation skills through single video imitation	2024-10	CoRL 2024	Code
Generalizable Humanoid Manipulation with 3D Diffusion Policies	2024-10	Arixv	Code
OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning	2024-6	CoRL 2024	Code
HumanPlus: Humanoid Shadowing and Imitation from Humans	2024-6	CoRL 2024	Code
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities	2025-3	Arixv	Code

Foundation Models in Humanoid Robot Manipulation

Paper	Date	Venue	Code
Do as i can, not as i say: Grounding language in robotic affordances	2024-4	Arixv	Code
Palm-e: An embodied multimodal language model	2023-3	ICML 2023	Website
Inner monologue: Embodied reasoning through planning with language models	2022-7	Arxiv	Website
Code as policies: Language model programs for embodied control	2022-9	ICRA 2023	Code
STIV: Scalable Text and Image Conditioned Video Generation	2024-12	Arxiv	--
GR00T N1: An open foundation model for generalist humanoid robots	2025-3	Arxiv	Code
$\pi_0$ : A Vision-Language-Action Flow Model for General Robot Control	2024-10	Arxiv	Code
Openvla: An open-source vision-language-action model	2024-6	Arxiv	Code
Gr-2: A generative video-language-action model with web-scale knowledge for robot manipulation	2024-10	Arxiv	Website

Human-Robot Interaction

Related Survey

Paper	Date	Venue
Humanlike service robots: A systematic literature review and research agenda	2024-08	Psychology & Marketing
Human–robot collaboration and machine learning: A systematic review of recent research	2023-02	Robotics and Computer-Integrated Manufacturing
Emotion Recognition for Human-Robot Interaction: Recent Advances and Future Perspectives	2020-12	Frontiers in Robotics and AI
Application, Development and Future Opportunities of Collaborative Robots (Cobots) in Manufacturing: A Literature Review	2022-04	International Journal of Human–Computer Interaction
Towards Social AI: A Survey on Understanding Social Interactions	2024-09	arXiv
Human–robot interaction: A review and analysis on variable admittance control, safety, and perspectives	2022-07	Machines
Human-robot perception in industrial environments: A survey	2021-02	Sensors

Cognitive Collaboration

Paper	Date	Venue	Code	Task
Artificial cognition for social human–robot interaction: An implementation	2017-06	Artificial Intelligence	--	Robot Cognitive Skills
Cognitive Interaction Analysis in Human–Robot Collaboration Using an Assembly Task	2021-05	Electronics	--	Assembly Collabotation
Enhancing Robotic Collaborative Tasks Through Contextual Human Motion Prediction and Intention Inference	2024-07	International Journal of Social Robotics	--	Human-Robot Handover
L3MVN: Leveraging Large Language Models for Visual Target Navigation	2023-10	IROS 2023	Github	Object Goal Navigation
SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation	2024-10	NeurIPS 2024	Github	Object Goal Navigation
TriHelper: Zero-Shot Object Navigation with Dynamic Assistance	2024-03	IROS 2024	--	Object Goal Navigation
CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs	2024-10	NeurIPS 2024 OWA Workshop	--	Object Goal Navigation
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation	2025-03	CVPR 2025	Github	Goal-oriented Navigation

Physical Reliability

Paper	Date	Venue	Code	Remarks
A Comparative Study of Probabilistic Roadmap Planners	2004	Algorithmic foundations of robotics V	--	Probabilistic Roadmap Planning (PRM)
Rapidly-exploring random trees: A new tool for Path Planning	1998	Research Report	--	Rapidly-exploring Random Trees (RRT)
Sampling-based Algorithms for Optimal Motion Planning	2011-05	International Journal of Robotics Research	--	PRM* and RRT*
Path planning for manipulators based on an improved probabilistic roadmap method	2021-12	Robotics and Computer-Integrated Manufacturing	--	Path Planning for Manipulators
RRT-connect: An efficient approach to single-query path planning	2000-04	ICRA 2000	--	Incrementally build two RRTs from the start and goal.
Homotopy-Aware RRT*: Toward Human-Robot Topological Path-Planning	2016-03	11th ACM/IEEE International Conference on Human-Robot Interaction	--	Human-robot Interactive Path-planning
Human-in-the-loop Robotic Manipulation Planning for Collaborative Assembly	2019-09	IEEE Transactions on Automation Science and Engineering	--	Human-robot Interactive Path-planning
CHOMP: Gradient optimization techniques for efficient motion planning	2009-05	ICRA 2009	MoveIt!	Gradient-based Trajectory Optimization
STOMP: Stochastic trajectory optimization for motion planning	2011-05	ICRA 2011	MoveIt!	Probabilistic Trajectory Optimization
ITOMP: Incremental trajectory optimization for real-time replanning in dynamic environments	2012-05	Proceedings of the International Conference on Automated Planning and Scheduling	Github	Trajectory Optimization in Dynamic Environment
Motion planning with sequential convex optimization and convex collision checking	2014	IJRR 2014	--	Trajectory Optimization using SCO
Considering avoidance and consistency in motion planning for human-robot manipulation in a shared workspace	2016-05	ICRA 2016	--	Human-robot Interactive Path-planning
Considering Human Behavior in Motion Planning for Smooth Human-Robot Collaboration in Close Proximity	2018-08	27th IEEE International Symposium on Robot and Human Interactive Communication	--	Human-robot Interactive Path-planning
Continuous-time Gaussian process motion planning via probabilistic inference	2017-07	IJRR 2018	--	Gaussian Process Motion Planner (GPMP)
Simultaneous Scene Reconstruction and Whole-Body Motion Planning for Safe Operation in Dynamic Environments	2021-03	IROS 2021	--	GPMP for Whole-body Motion Planning in Dynamic Scene
Admittance control for collaborative dual-arm manipulation	2019-12	International Conference on Advanced Robotics	--	Admittance Control
Cooperative control of dual-arm robots in different human-robot collaborative tasks	2020-02	Assembly Automation	--	Admittance Control
Control system design and methods for collaborative robots	2023-01	Applied Sciences	--	Interactive Control System
Towards shared autonomy framework for human-aware motion planning in industrial human-robot collaboration	2020-08	International Conference on Automation Science and Engineering	--	Industrial HRI
An actor-critic approach for legible robot motion planner	2020-05	ICRA 2020	--	RL Method
A task-adaptive deep reinforcement learning framework for dual-arm robot manipulation	2024-01	IEEE Transactions on Automation Science and Engineering	--	RL Method
Learning robust skills for tightly coordinated arms in contact-rich tasks	2024-01	IEEE RAL	--	RL Method
HandoverSim: A Simulation Framework and Benchmark for Human-to-Robot Object Handovers	2022-05	ICRA 2022	Github	Benchmark
GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation	2024-01	CVPR 2024	Github	Imitation Learning
MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data	2025-01	CoRR 2025	--	Imitation Learning

Social Embeddedness

Paper	Date	Venue	Links	Remarks
The space between us: A neurophilosophical framework for the investigation of human interpersonal space	2009-03	Neuroscience & Biobehavioral Reviews	--	Peripersonal Space
The interrelation between peripersonal action space and interpersonal social space: psychophysiological evidence and clinical implications	2021-02	Frontiers in Human Neuroscience	--	Peripersonal Space
Robot-assisted shopping for the blind: issues in spatial cognition and product selection	2008-03	Intelligent Service Robotics	--	Application in Social Scenario
A review of assistive spatial orientation and navigation technologies for the visually impaired	2017-08	Universal Access in the Information Society	--	Application in Social Scenario
ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic Cane	2024-05	arXiv	--	Application in Social Scenario
Conversational memory network for emotion recognition in dyadic dialogue videos	2018-06	Proceedings of the conference. Association for Computational Linguistics	--	Linguistic Research
Graph Based Network with Contextualized Representations of Turns in Dialogue	2021-09	EMNLP 2021	--	Linguistic Research
DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation	2019-08	EMNLP 2019	--	Linguistic Research
Dialogue act modeling for automatic tagging and recognition of conversational speech	2000-10	Computational Linguistics	--	Linguistic Research
Werewolf among us: Multimodal resources for modeling persuasion behaviors in social deduction games	2022-12	ACL 2023	--	Linguistic Research
The Call for Socially Aware Language Technologies	2025-02	pre-MIT Press publication version	--	Linguistic Research
LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition	2022-06	CVPR 2022	Github	Non-verbal Behaviors Study
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective	2023-12	CVPR 2024	Github	Non-verbal Behaviors Study
SocialGesture: Delving into Multi-person Gesture Understanding	2025-04	CVPR 2025	Dataset	Non-verbal Behaviors Study
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups	2024-04	CVPR 2024	Project Page	HRI in Social Group
MRAC Track 1: 2nd Workshop on Multimodal, Generative and Responsible Affective Computing	2024-09	ACM MM Workshop 2024	Workshop Page	Affective Computing
The Tong Test: Evaluating Artificial General Intelligence Through Dynamic Embodied Physical and Social Interactions	2024-03	Engineering	--	Evaluation of AGI in Social Interaction

4. Simulators

Related Survey

Paper	Date	Venue	Code	Application
A Review of Physics Simulators	2021	IEEE Access	–	Simlulator Survey
Review of Embodied AI	2025	arXiv	–	Embodied AI Survey
A Survey of Embodied AI	2022	arXiv	–	Embodied AI Survey

Related Works

Paper	Date	Venue	Code	Application
ManiSkill3	2024	arXiv	–	Manipulation Benchmark
ManiSkill2	2023	ICLR	–	Manipulation Benchmark
Analysis using DEM	2020	IEEE Aerospace	–	Granular Simulation
Mobile Aloha	2024	arXiv	–	Teleoperation
Open-Television	2024	arXiv	–	Teleoperation
Universal Manipulation Interface	2024	arXiv	–	Imitation Learning

Mainstream Simulators

Overview and Documentation

Paper	Date	Venue	Code	Application
Webots: Professional Mobile Robot Simulation	2004	JARS	–	Simulator Platform
Design and use paradigms for Gazebo	2004	IROS	–	Simulator Platform
MuJoCo: A physics engine for model-based control	2012	IROS	–	Simulator Platform
PyBullet: Python module for physics simulation	2016	GitHub	GitHub	Simulator Platform
CoppeliaSim (formerly V-REP)	2013	IROS	–	Simulator Platform
Isaac Gym: GPU-based physics simulation for robot learning	2021	arXiv	–	Simulator Platform
Isaac Sim	2025	NVIDIA Developer	–	Simulator Platform
Isaac Lab Documentation	2025	NVIDIA Developer	–	Simulator Platform
SAPIEN: A simulated part-based interactive environment	2020	CVPR	–	Simulator Platform
Genesis: A Universal and Generative Physics Engine	2024	GitHub	GitHub	Simulator Platform
MuJoCo Programming Guide	2025	Docs	–	Developer Guide
Newton Isaac Sim Project	2024	GitHub	GitHub	Simulator Platform
Newton Physics Engine Announcement	2025	NVIDIA Blog	–	Physics Engine

Physical Properties of Simulators

Physical Simulation Engines and Platforms

Paper	Date	Venue	Code	Application
LS Group Interact Kinematics	2025	Docs	–	Kinematics Documentation
NVIDIA Omniverse	2025	NVIDIA Developer	–	3D Simulation & Collaboration Platform
NVIDIA PhysX System Software	2021	NVIDIA Developer	–	Real-Time Physics Engine

Rendering Capabilities

Rendering Engines and Framework

Paper	Date	Venue	Code	Application
LuisaRender	2022	TOG	–	Rendering Framework
Pyrender	2019	GitHub	GitHub	Rendering
HydraRendererInfo	2019	GitHub	GitHub	Rendering
The Alliance for OpenUSD	2023	AOUSD	–	Open Universal Scene Description (USD) Standard
OpenGL: The Industry Standard for High‑Performance Graphics	1992	Khronos Group	–	Cross-Platform Graphics API
Vulkan: Cross‑Platform 3D Graphics and Compute API	2016	Khronos Group	–	Low-Level Graphics and Compute API
NVIDIA OptiX™ Ray Tracing Engine	2024	NVIDIA Developer	–	GPU-Accelerated Ray Tracing Framework

Sensor and Joint Component Types

5. World Models

Representative Architectures of World Models

Paper	Date	Venue	Code	Architecture
World Models	2018-03	NeurIPS 2018	-	RSSM
Learning Latent Dynamics for Planning from Pixels	2018-11	ICML 2019	Github	RSSM
Dream to Control: Learning Behaviors by Latent Imagination (Dreamer)	2019-12	ICLR 2020	Github	RSSM
Mastering Atari with Discrete World Models (Dreamer v2)	2020-10	ICLR 2021	Github	RSSM
DayDreamer: World Models for Physical Robot Learning	2022-06	CoRL 2022	Github	RSSM
Mastering Diverse Domains through World Models (Dreamer v3)	2023-01	Nature	Github	RSSM
A Path Towards Autonomous Machine Intelligence	2022-06	OpenReview	-	JEPA
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (I-JEPA)	2023-01	CVPR 2023	Github	JEPA
Revisiting Feature Prediction for Learning Visual Representations from Video (V-JEPA)	2024-04	arXiv	Github	JEPA
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning	2025-06	arXiv	Github	JEPA
TransDreamer: Reinforcement Learning with Transformer World Models	2022-02	NeurIPS 2021 Workshop	Github	TSSM
Transformer-based World Models Are Happy With 100k Interactions	2023-03	ICLR 2023	Github	TSSM
Genie: Generative Interactive Environments	2024-02	arXiv	-	TSSM
GAIA-1: A Generative World Model for Autonomous Driving	2023-09	arXiv Wayve	-	Autoregressive Transformer
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving	2023-11	ECCV 2024	Github	Autoregressive Transformer
Video generation models as world simulators (Sora)	2024-02	OpenAI	-	Diffusion
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability	2024-05	NeurIPS 2024	Github	Diffusion
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving	2025-03	arXiv Wayve	-	Diffusion
Vid2World: Crafting Video Diffusion Models to Interactive World Models	2025-05	arXiv	-	AR+Diffusion
Epona: Autoregressive Diffusion World Model for Autonomous Driving	2025-06	ICCV 2025	Github	AR+Diffusion

Core roles of World Models

Paper	Date	Venue	Code	Role
Cosmos World Foundation Model Platform for Physical AI	2025-03	arXiv	Github	Neural Simulator
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control	2025-04	arXiv	GitHub	Neural Simulator
GAIA-1: A Generative World Model for Autonomous Driving	2023-09	arXiv Wayve	-	Neural Simulator
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving	2025-03	arXiv Wayve	-	Neural Simulator
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving	2024-05	CVPR 2024	-	Neural Simulator
DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model	2024-10	arXiv	GitHub	Neural Simulator
Dream to Control: Learning Behaviors by Latent Imagination (Dreamer)	2019-12	ICLR 2020	Github	Dynamic Model
Mastering Atari with Discrete World Models (Dreamer v2)	2020-10	ICLR 2021	Github	Dynamic Model
DayDreamer: World Models for Physical Robot Learning	2022-06	CoRL 2022	Github	Dynamic Model
Mastering Diverse Domains through World Models (Dreamer v3)	2023-01	Nature	Github	Dynamic Model
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning	2023-05	NeurIPS 2023	Github	Dynamic Model
iVideoGPT: Interactive VideoGPTs are Scalable World Models	2024-05	NeurIPS 2024	Github	Dynamic Model
Video Prediction Models as Rewards for Reinforcement Learning (VIPER)	2023-05	NeurIPS 2023	Github	Reward Model
Video models are zero-shot learners and reasoners	2025-09	arXiv Google Deepmind	-	Neural Simulator

6. World Models for Intelligent Robots

World Models for Autonomous Driving

Table

WMs as Neural Simulators for Autonomous Driving

Paper	Date	Venue	Code	Application
GAIA-1: A Generative World Model for Autonomous Driving	2023-09	arXiv Wayve	-	Scenario Generation
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving	2023-09	ECCV 2024	GitHub	Scenario Generation
ADriver-I: A General World Model for Autonomous Driving	2023-11	arXiv	-	Scenario Generation
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving	2025-03	arXiv Wayve	-	Scenario Generation
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation	2024-05	AAAI 2025	GitHub	Scenario Generation
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation	2024-11	CVPR 2025	GitHub	Scenario Generation
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT	2024-12	arXiv	GitHub	Scenario Generation
MagicDrive: Street View Generation with Diverse 3D Geometry Control	2024-05	ICLR 2024	GitHub	Scenario Generation
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes	2024-11	arXiv	GitHub	Scenario Generation
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control	2024-11	arXiv	GitHub	Scenario Generation
WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation	2024-08	ECCV 2024	GitHub	Scenario Generation
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration	2024-11	CVPR 2025	GitHub	Scenario Generation
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance	2025-03	ICRA 2025	GitHub	Scenario Generation
Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving	2024-08	CVPR 2024	GitHub	Scenario Generation
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control	2025-04	arXiv	GitHub	Scenario Generation
GeoDrive: Trajectory-Conditioned 3D World Model for Autonomous Driving	2025-02	arXiv	-	Scenario Generation
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving	2024-05	CVPR 2024	-	Scenario Generation
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving	2025-05	arXiv	GitHub	Scenario Generation
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving	2025-01	AAAI 2025	GitHub	Scenario Generation
DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model	2024-10	arXiv	GitHub	Scenario Generation
RenderWorld: World Model with Self-Supervised 3D Label	2024-11	arXiv	-	Scenario Generation
OccLLaMA: A Language-Driven 3D Occupancy Generation Framework	2024-12	arXiv	-	Scenario Generation
BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space	2024-07	arXiv	-	Scenario Generation
HoloDrive: Holistic View-Aware World Model for Autonomous Driving	2024-10	arXiv	-	Scenario Generation
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control	2024-12	CVPR 2025	GitHub	Scenario Generation
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving	2024-08	arXiv	GitHub	Scenario Generation
ACT-Bench: Towards Action Controllable World Models for Autonomous Driving	2024-12	arXiv	-	Scenario Generation
InfinityDrive: Towards Infinite-Resolution World Models for Autonomous Driving	2024-12	arXiv	-	Scenario Generation
Epona: Autoregressive Diffusion World Model for Autonomous Driving	2025-06	ICCV 2025	GitHub	Scenario Generation
DrivePhysica: A Physics-Conditioned World Model for Autonomous Driving	2024-12	arXiv	-	Scenario Generation
Cosmos-Drive: Multi-Modal World Model for Autonomous Driving	2025-03	arXiv	GitHub	Scenario Generation
Genie 3: A new frontier for world models	2025-08	website	Talk	Interactive Online Simulation

WMs as Dynamic Models for Autonomous Driving

Paper	Date	Venue	Code	Application
MILE: Model-based Imitation Learning for Urban Driving	2022-10	NeurIPS 2022	GitHub	Motion Planning
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning	2025-03	arXiv	GitHub	Reasoning
TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction	2023-03	ICRA 2023	GitHub	Motion Prediction
Uniworld: Autonomous Driving Pre-training via World Models	2023-08	arXiv	-	Pre-training
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion	2023-11	ICLR 2024	-	Motion Planning
MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations	2023-11	IV 2025	-	Motion Planning
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving	2023-11	ECCV 2024	GitHub	Motion Planning
ViDAR: Visual Point Cloud Forecasting for Autonomous Driving	2023-12	CVPR 2024	-	Motion Prediction
Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving	2024-02	ECCV 2024	-	Motion Planning
LidarDM: Generative LiDAR Simulation in a Generated World	2024-04	ICRA 2025	-	Simulation
Enhancing End-to-End Autonomous Driving with Latent World Model	2025-02	ICLR 2025	GitHub	Motion Planning
UnO: Unsupervised Occupancy Fields for Perception and Forecasting	2024-06	CVPR 2024	-	Motion Prediction
CarFormer: Self-Driving with Learned Object-Centric Representations	2024-07	ECCV 2024	GitHub	Motion Planning
NeMo: Neural Occupancy Fields for Autonomous Driving	2024	ECCV 2024	-	Motion Prediction
Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage	2021-10	NeurIPS 2021	GitHub	Motion Planning
Imagine-2-Drive: High-Fidelity World Modeling for Autonomous Driving	2024-11	IROS 2025	GitHub	Motion Planning
Doe-1: Closed-Loop Autonomous Driving with Large World Model	2024-08	arXiv	GitHub	Motion Planning
GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction	2024-12	arXiv	GitHub	Motion Prediction
DFIT-OccWorld: Efficient Occupancy Forecasting via Differential Factorization and Interactive Transformer	2024-12	arXiv	-	Motion Prediction
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Large Language Model	2024-12	arXiv	-	Motion Planning
AdaWM: Adaptive World Model for Autonomous Driving	2025-01	ICLR 2025	-	Motion Planning
AD-L-JEPA: Autonomous Driving with L-JEPA	2025-01	arXiv	GitHub	Motion Prediction
HERMES: Harmonized Embodied Representation for Multi-modal Sensor Integration in Autonomous Driving	2025-01	ICCV 2025	GitHub	Motion Planning

WMs as Reward Models for Autonomous Driving

Paper	Date	Venue	Code	Application
SEM2: Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model	2024-05	T-ITS	-	Reinforcement Learning
Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models	2022-05	NeurIPS 2022	GitHub	Reinforcement Learning
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability	2024-05	NeurIPS 2024	GitHub	Reinforcement Learning
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving	2023-11	CVPR 2024	GitHub	Motion Planning
WoTE: World-model-based End-to-end Autonomous Driving	2025-04	ICCV 2025	GitHub	Motion Planning

World Models for Articulated Robots

The following table compares researches for World Models in Robotics in terms of model input, architecture, experiment platform, and code availability.

Neural Simulators

Paper	Date	Venue	Code
Whale: Towards generalizable and scalable world models for embodied decision-making	2024-08	arXiv	-
RoboDreamer: Learning Compositional World Models for Robot Imagination	2024-08	ICML 2024	Github
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination	2024-11	ICLR 2025	GitHub
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation	2025-01	arXiv	Github
Cosmos World Foundation Model Platform for Physical AI	2025-03	arXiv	Github
WorldEval: World Model as Real-World Robot Policies Evaluator	2025-05	arXiv	Github
DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories	2025-05	arXiv	Github