A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

October 3, 2025 · View on GitHub

Stars Badge Forks Badge Pull Requests Badge Issues Badge License Badge

🤝   Citation

Please visit A Survey: Learning Embodied Intelligence from Physical Simulators and World Models for more details and comprehensive information.

Author list: Xiaoxiao Long, Qingrui Zhao, Kaiwen Zhang, Zihao Zhang, Dingrui Wang, Yumeng Liu, Zhengjie Shu, Yi Lu, Shouzheng Wang, Xinzhe Wei, Wei Li, Wei Yin, Yao Yao, Jia Pan, Qiu Shen, Ruigang Yang, Xun Cao, Qionghai Dai

Table of Content

1. Introduction

Embodied intelligence provides a foundation for creating robots that can truly understand and reason about the world in a more human-like manner. Central to enabling intelligent behavior in robots are two key technologies: physical simulators and world models. Physical simulators provide controlled, high-fidelity environments for training and evaluating robotic agents, allowing safe and efficient development of complex behaviors. While world models empower robots with internal representations of their surroundings, enabling predictive planning and adaptive decision-making beyond direct sensory input. The synergy between them enhances robots' autonomy, adaptability, and task performance across diverse scenarios.

This repository aims to collect and organize research and resources related to learning embodied AI through the integration of physical simulators and world models.

2. Levels of Intelligent Robot

To address the absence of a comprehensive grading system that integrates the dimensions of "intelligent cognition" and "autonomous behavior," we outline a capability grading model for intelligent robots, ranging from IR-L0 to IR-L4. This model covers the entire technological evolution, from basic mechanical operation levels to advanced social interaction capabilities.

3. Robotic Mobility, Dexterity and Interaction

Model Predictive Control, MPC
PaperDateVenue
Model Predictive Control: Theory, Computation, and Design2017Nob Hill Publishing, LLC
Model predictive control of legged and humanoid robots: models and algorithms2023-02Advanced Robotics
An integrated system for real-time model predictive control of humanoid robots2013-10Humanoids 2013
Whole-body model-predictive control applied to the HRP-2 humanoid2015-09IROS 2015
Whole-Body Control, WBC
PaperDateVenue
Humanoid Robotics: A Reference2017Springer
A whole-body control framework for humanoids operating in human environments2006-05ICRA 2006
Hierarchical quadratic programming: Fast online humanoid-robot motion generation2014-05The International Journal of Robotics Research
Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot2015-07Autonomous Robots
Compliant locomotion using whole-body control and divergent component of motion tracking2015-05ICRA 2015
ExBody2: Advanced Expressive Humanoid Whole-Body Control2024-12arXiv
A Unified and General Humanoid Whole-Body Controller for Fine-Grained Locomotion2025-02arXiv
Reinforcement Learning
PaperDateVenue
Reinforcement learning in robotics: A survey2013-08The International Journal of Robotics Research
Learning-based legged locomotion: State of the art and future perspectives2025-01The International Journal of Robotics Research
Reinforcement learning of dynamic motor sequence: Learning to stand up1998-10IROS 1998
DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning2017-07TOG
Learning symmetric and low-energy locomotion2018-07TOG
Emergence of locomotion behaviours in rich environments2017-10arXiv
Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie2019-03arXiv
Imitation Learning
PaperDateVenue
Diffusion policy: Visuomotor policy learning via action diffusion2024-10The International Journal of Robotics Research
3d diffusion policy2024-03arXiv
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware2023-04arXiv
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation2024-10arXiv
Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets2024-04arXiv
AMP: adversarial motion priors for stylized physics-based character control2021-07TOG
Whole-body Humanoid Robot Locomotion with Human Reference2024-10IROS 2024
Dexcap: Scalable and portable mocap data collection system for dexterous manipulation2024-03arXiv
Open-television: Teleoperation with immersive active visual feedback2024-07arXiv
Visual Imitation Enables Contextual Humanoid Control2025-05arXiv
Visual-Language-Action Models, VLA
PaperDateVenue
Rt-2: Vision-language-action models transfer web knowledge to robotic control2023-07CoRL 2023
Openvla: An open-source vision-language-action2024-06arXiv
3D-VLA: A 3D Vision-Language-Action Generative World Model2024-03arXiv
Magma: A foundation model for multimodal ai agents2025-06CVPR 2025
Ď€0Ď€_0: A Vision-Language-Action Flow Model for General Robot Control2024-10arXiv
Fast: Efficient action tokenization for vision-language-action models2025-01arXiv
Hi robot: Open-ended instruction following with hierarchical vision-language-action models2025-02arXiv
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation2024-09arXiv
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges2025-05arXiv

Robotic Locomotion

Related Survey
PaperDateVenue
Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning2025-04arXiv
A Comprehensive Review of Humanoid Robots2025-03SmartBot
Recent Progress in Legged Robots Locomotion Control2021-06Current Robotics Reports
Legged Locomotion
PaperDateVenue
Compliant terrain adaptation for biped humanoids without measuring ground surface and contact force2009-02T-RO
Online Learning of Uneven Terrain for Humanoid Bipedal Walking2010-07AAAI 2010
Practical bipedal walking control on uneven terrain using surface learning and push recovery2011-09IROS 2011
Biped walking stabilization based on linear inverted pendulum tracking2010-09IROS 2010
Dynamic walking with compliance on a Cassie bipedal robot2019-06European Control Conference
Dynamic walking on compliant and uneven terrain using DCM and passivity-based whole-body control2019-10Humanoids 2019
Fast Contact-Implicit Model Predictive Control2024-01T-RO
Efficient Anytime CLF Reactive Planning System for a Bipedal Robot on Undulating Terrain2023-01T-RO
Learning quadrupedal locomotion over challenging terrain2020-10Science Robotics
Blind bipedal stair traversal via sim-to-real reinforcement learning2021-07Robotics: Science and Systems (RSS)
Learning vision-based bipedal locomotion for challenging terrain2024-05ICRA 2024
Learning humanoid locomotion with perceptive internal model2024-11arXiv
Humanoid parkour learning2024-06arXiv
Unified modeling and control of walking and running on the spring-loaded inverted pendulum2016-08T-RO
Capturability-based analysis and control of legged locomotion, part 2: Application to m2v2, a lower- body humanoid2012-09ijrr
Convex model predictive control of single rigid body model on so (3) for versatile dynamic legged motions2023-05ICRA 2023
Bipedal hopping: Reduced- order model embedding via optimization-based control2018-10IROS 2018
Vertical Jump of a Humanoid Robot With CoP-Guided Angular Momentum Control and Impact Absorption2023-05T-RO
CDM-MPC: An integrated dynamic planning and control framework for bipedal robots jumping2024-06RAL
Optimizing bipedal locomotion for the 100m dash with comparison to human running2023-05ICRA 2023
Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control2024-10ijrr
Expressive Whole-Body Control for Humanoid Robots2024-09RSS
Exbody2: Advanced expressive humanoid whole-body control2024-12arXiv
OMNIH2O: Universal and dexterous human- to-humanoid whole-body teleoperation and learning2024-06CoRL 2024
ASAP: Aligning simulation and real-world physics for learning agile humanoid whole-body skills2025-02arXiv

Robotic Manipulation

Gripper-based manipulation
PaperDateVenueCode
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion2023-03RSS 2023Code
RT-1: Robotics Transformer for Real-World Control at Scale2022-12ArxivCode
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control2023-7PMLR 23Code
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation2022-9CoRL 2022Code
Act3d: 3d feature field transformers for multi-task robotic manipulation2023-6ArxivCode
Modeling of deformable objects for robotic manipulation: A tutorial and review2020-9Frontiers in Robotics and AI--
6-DOF Grasping for Target-driven Object Manipulation in Clutter2019-12ICRA 2020--
Cable manipulation with a tactile-reactive gripper2021-12IJRR 2021--
Dexterous hand manipulation
PaperDateVenueCode
Dexgraspnet: A large-scale robotic dexterous grasp dataset for general objects based on simulation2023-5ArxivCode
DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes2024-10CoRL 2024Code
HGC-Net: Deep anthropomorphic hand grasping in clutter2022-5ICRA 2022Code
Deep differentiable grasp planner for high-dof grippers2022-2Arxiv--
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness2025-3ArxivCode
UGG: Unified Generative Grasping2023-11ECCV 2004Code
SpringGrasp: Synthesizing Compliant, Dexterous Grasps under Shape Uncertainty2024-4ArxivCode
A System for General In-Hand Object Re-Orientation2021-11CoRL 2021Code
Visual dexterity: In-hand reorientation of novel and complex object shapes2023-11Science Robotics 2023--
Rotating without Seeing: Towards In-hand Dexterity through Touch2023-3RSS 2023Code
DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video2021-6CoRL 2021--
DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping2025-2ArixvCode
Bimanual Manipulation Task
PaperDateVenueCode
Stabilize to act: Learning to coordinate for bimanual manipulation2023-9CoRL 2023--
Interactive imitation learning of bimanual movement primitives2023-8TMECH--
Learning fine-grained bimanual manipulation with low-cost hardware2023-4RSS 2023Cpde
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation2024-1CoRL 2024Code
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation2025-6ArxivCode
Rdt-1b: a diffusion foundation model for bimanual manipulation2024-10ArxivCode
Whole-Body Manipulation Control
PaperDateVenueCode
Tidybot: Personalized robot assistance with large language models2023-12Autonomous Robots--
Open-world object manipulation using pre-trained vision-language models2023-2ArixvWebsite
Harmon: Whole-body motion generation of humanoid robots from language descriptions2024-10CoRL 2024Website
Okami: Teaching humanoid robots manipulation skills through single video imitation2024-10CoRL 2024Code
Generalizable Humanoid Manipulation with 3D Diffusion Policies2024-10ArixvCode
OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning2024-6CoRL 2024Code
HumanPlus: Humanoid Shadowing and Imitation from Humans2024-6CoRL 2024Code
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities2025-3ArixvCode
Foundation Models in Humanoid Robot Manipulation
PaperDateVenueCode
Do as i can, not as i say: Grounding language in robotic affordances2024-4ArixvCode
Palm-e: An embodied multimodal language model2023-3ICML 2023Website
Inner monologue: Embodied reasoning through planning with language models2022-7ArxivWebsite
Code as policies: Language model programs for embodied control2022-9ICRA 2023Code
STIV: Scalable Text and Image Conditioned Video Generation2024-12Arxiv--
GR00T N1: An open foundation model for generalist humanoid robots2025-3ArxivCode
Ď€0\pi_0: A Vision-Language-Action Flow Model for General Robot Control2024-10ArxivCode
Openvla: An open-source vision-language-action model2024-6ArxivCode
Gr-2: A generative video-language-action model with web-scale knowledge for robot manipulation2024-10ArxivWebsite

Human-Robot Interaction

Related Survey
PaperDateVenue
Humanlike service robots: A systematic literature review and research agenda2024-08Psychology & Marketing
Human–robot collaboration and machine learning: A systematic review of recent research2023-02Robotics and Computer-Integrated Manufacturing
Emotion Recognition for Human-Robot Interaction: Recent Advances and Future Perspectives2020-12Frontiers in Robotics and AI
Application, Development and Future Opportunities of Collaborative Robots (Cobots) in Manufacturing: A Literature Review2022-04International Journal of Human–Computer Interaction
Towards Social AI: A Survey on Understanding Social Interactions2024-09arXiv
Human–robot interaction: A review and analysis on variable admittance control, safety, and perspectives2022-07Machines
Human-robot perception in industrial environments: A survey2021-02Sensors
Cognitive Collaboration
PaperDateVenueCodeTask
Artificial cognition for social human–robot interaction: An implementation2017-06Artificial Intelligence--Robot Cognitive Skills
Cognitive Interaction Analysis in Human–Robot Collaboration Using an Assembly Task2021-05Electronics--Assembly Collabotation
Enhancing Robotic Collaborative Tasks Through Contextual Human Motion Prediction and Intention Inference2024-07International Journal of Social Robotics--Human-Robot Handover
L3MVN: Leveraging Large Language Models for Visual Target Navigation2023-10IROS 2023GithubObject Goal Navigation
SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation2024-10NeurIPS 2024GithubObject Goal Navigation
TriHelper: Zero-Shot Object Navigation with Dynamic Assistance2024-03IROS 2024--Object Goal Navigation
CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs2024-10NeurIPS 2024 OWA Workshop--Object Goal Navigation
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation2025-03CVPR 2025GithubGoal-oriented Navigation
Physical Reliability
PaperDateVenueCodeRemarks
A Comparative Study of Probabilistic Roadmap Planners2004Algorithmic foundations of robotics V--Probabilistic Roadmap Planning (PRM)
Rapidly-exploring random trees: A new tool for Path Planning1998Research Report--Rapidly-exploring Random Trees (RRT)
Sampling-based Algorithms for Optimal Motion Planning2011-05International Journal of Robotics Research--PRM* and RRT*
Path planning for manipulators based on an improved probabilistic roadmap method2021-12Robotics and Computer-Integrated Manufacturing--Path Planning for Manipulators
RRT-connect: An efficient approach to single-query path planning2000-04ICRA 2000--Incrementally build two RRTs from the start and goal.
Homotopy-Aware RRT*: Toward Human-Robot Topological Path-Planning2016-0311th ACM/IEEE International Conference on Human-Robot Interaction--Human-robot Interactive Path-planning
Human-in-the-loop Robotic Manipulation Planning for Collaborative Assembly2019-09IEEE Transactions on Automation Science and Engineering--Human-robot Interactive Path-planning
CHOMP: Gradient optimization techniques for efficient motion planning2009-05ICRA 2009MoveIt!Gradient-based Trajectory Optimization
STOMP: Stochastic trajectory optimization for motion planning2011-05ICRA 2011MoveIt!Probabilistic Trajectory Optimization
ITOMP: Incremental trajectory optimization for real-time replanning in dynamic environments2012-05Proceedings of the International Conference on Automated Planning and SchedulingGithubTrajectory Optimization in Dynamic Environment
Motion planning with sequential convex optimization and convex collision checking2014IJRR 2014--Trajectory Optimization using SCO
Considering avoidance and consistency in motion planning for human-robot manipulation in a shared workspace2016-05ICRA 2016--Human-robot Interactive Path-planning
Considering Human Behavior in Motion Planning for Smooth Human-Robot Collaboration in Close Proximity2018-0827th IEEE International Symposium on Robot and Human Interactive Communication--Human-robot Interactive Path-planning
Continuous-time Gaussian process motion planning via probabilistic inference2017-07IJRR 2018--Gaussian Process Motion Planner (GPMP)
Simultaneous Scene Reconstruction and Whole-Body Motion Planning for Safe Operation in Dynamic Environments2021-03IROS 2021--GPMP for Whole-body Motion Planning in Dynamic Scene
Admittance control for collaborative dual-arm manipulation2019-12International Conference on Advanced Robotics--Admittance Control
Cooperative control of dual-arm robots in different human-robot collaborative tasks2020-02Assembly Automation--Admittance Control
Control system design and methods for collaborative robots2023-01Applied Sciences--Interactive Control System
Towards shared autonomy framework for human-aware motion planning in industrial human-robot collaboration2020-08International Conference on Automation Science and Engineering--Industrial HRI
An actor-critic approach for legible robot motion planner2020-05ICRA 2020--RL Method
A task-adaptive deep reinforcement learning framework for dual-arm robot manipulation2024-01IEEE Transactions on Automation Science and Engineering--RL Method
Learning robust skills for tightly coordinated arms in contact-rich tasks2024-01IEEE RAL--RL Method
HandoverSim: A Simulation Framework and Benchmark for Human-to-Robot Object Handovers2022-05ICRA 2022GithubBenchmark
GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation2024-01CVPR 2024GithubImitation Learning
MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data2025-01CoRR 2025--Imitation Learning
Social Embeddedness
PaperDateVenueLinksRemarks
The space between us: A neurophilosophical framework for the investigation of human interpersonal space2009-03Neuroscience & Biobehavioral Reviews--Peripersonal Space
The interrelation between peripersonal action space and interpersonal social space: psychophysiological evidence and clinical implications2021-02Frontiers in Human Neuroscience--Peripersonal Space
Robot-assisted shopping for the blind: issues in spatial cognition and product selection2008-03Intelligent Service Robotics--Application in Social Scenario
A review of assistive spatial orientation and navigation technologies for the visually impaired2017-08Universal Access in the Information Society--Application in Social Scenario
ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic Cane2024-05arXiv--Application in Social Scenario
Conversational memory network for emotion recognition in dyadic dialogue videos2018-06Proceedings of the conference. Association for Computational Linguistics--Linguistic Research
Graph Based Network with Contextualized Representations of Turns in Dialogue2021-09EMNLP 2021--Linguistic Research
DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation2019-08EMNLP 2019--Linguistic Research
Dialogue act modeling for automatic tagging and recognition of conversational speech2000-10Computational Linguistics--Linguistic Research
Werewolf among us: Multimodal resources for modeling persuasion behaviors in social deduction games2022-12ACL 2023--Linguistic Research
The Call for Socially Aware Language Technologies2025-02pre-MIT Press publication version--Linguistic Research
LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition2022-06CVPR 2022GithubNon-verbal Behaviors Study
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective2023-12CVPR 2024GithubNon-verbal Behaviors Study
SocialGesture: Delving into Multi-person Gesture Understanding2025-04CVPR 2025DatasetNon-verbal Behaviors Study
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups2024-04CVPR 2024Project PageHRI in Social Group
MRAC Track 1: 2nd Workshop on Multimodal, Generative and Responsible Affective Computing2024-09ACM MM Workshop 2024Workshop PageAffective Computing
The Tong Test: Evaluating Artificial General Intelligence Through Dynamic Embodied Physical and Social Interactions2024-03Engineering--Evaluation of AGI in Social Interaction

4. Simulators

Related Survey
PaperDateVenueCodeApplication
A Review of Physics Simulators2021IEEE Access–Simlulator Survey
Review of Embodied AI2025arXiv–Embodied AI Survey
A Survey of Embodied AI2022arXiv–Embodied AI Survey
Related Works
PaperDateVenueCodeApplication
ManiSkill32024arXiv–Manipulation Benchmark
ManiSkill22023ICLR–Manipulation Benchmark
Analysis using DEM2020IEEE Aerospace–Granular Simulation
Mobile Aloha2024arXiv–Teleoperation
Open-Television2024arXiv–Teleoperation
Universal Manipulation Interface2024arXiv–Imitation Learning

Mainstream Simulators

Overview and Documentation
PaperDateVenueCodeApplication
Webots: Professional Mobile Robot Simulation2004JARS–Simulator Platform
Design and use paradigms for Gazebo2004IROS–Simulator Platform
MuJoCo: A physics engine for model-based control2012IROS–Simulator Platform
PyBullet: Python module for physics simulation2016GitHubGitHubSimulator Platform
CoppeliaSim (formerly V-REP)2013IROS–Simulator Platform
Isaac Gym: GPU-based physics simulation for robot learning2021arXiv–Simulator Platform
Isaac Sim2025NVIDIA Developer–Simulator Platform
Isaac Lab Documentation2025NVIDIA Developer–Simulator Platform
SAPIEN: A simulated part-based interactive environment2020CVPR–Simulator Platform
Genesis: A Universal and Generative Physics Engine2024GitHubGitHubSimulator Platform
MuJoCo Programming Guide2025Docs–Developer Guide
Newton Isaac Sim Project2024GitHubGitHubSimulator Platform
Newton Physics Engine Announcement2025NVIDIA Blog–Physics Engine

Physical Properties of Simulators

Physical Simulation Engines and Platforms
PaperDateVenueCodeApplication
LS Group Interact Kinematics2025Docs–Kinematics Documentation
NVIDIA Omniverse2025NVIDIA Developer–3D Simulation & Collaboration Platform
NVIDIA PhysX System Software2021NVIDIA Developer–Real-Time Physics Engine

Rendering Capabilities

Rendering Engines and Framework
PaperDateVenueCodeApplication
LuisaRender2022TOG–Rendering Framework
Pyrender2019GitHubGitHubRendering
HydraRendererInfo2019GitHubGitHubRendering
The Alliance for OpenUSD2023AOUSD–Open Universal Scene Description (USD) Standard
OpenGL: The Industry Standard for High‑Performance Graphics1992Khronos Group–Cross-Platform Graphics API
Vulkan: Cross‑Platform 3D Graphics and Compute API2016Khronos Group–Low-Level Graphics and Compute API
NVIDIA OptiX™ Ray Tracing Engine2024NVIDIA Developer–GPU-Accelerated Ray Tracing Framework

Sensor and Joint Component Types

5. World Models

Representative Architectures of World Models
PaperDateVenueCodeArchitecture
World Models2018-03NeurIPS 2018-RSSM
Learning Latent Dynamics for Planning from Pixels2018-11ICML 2019GithubRSSM
Dream to Control: Learning Behaviors by Latent Imagination (Dreamer)2019-12ICLR 2020GithubRSSM
Mastering Atari with Discrete World Models (Dreamer v2)2020-10ICLR 2021GithubRSSM
DayDreamer: World Models for Physical Robot Learning2022-06CoRL 2022GithubRSSM
Mastering Diverse Domains through World Models (Dreamer v3)2023-01NatureGithubRSSM
A Path Towards Autonomous Machine Intelligence2022-06OpenReview-JEPA
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (I-JEPA)2023-01CVPR 2023GithubJEPA
Revisiting Feature Prediction for Learning Visual Representations from Video (V-JEPA)2024-04arXivGithubJEPA
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning2025-06arXivGithubJEPA
TransDreamer: Reinforcement Learning with Transformer World Models2022-02NeurIPS 2021 WorkshopGithubTSSM
Transformer-based World Models Are Happy With 100k Interactions2023-03ICLR 2023GithubTSSM
Genie: Generative Interactive Environments2024-02arXiv-TSSM
GAIA-1: A Generative World Model for Autonomous Driving2023-09arXiv Wayve-Autoregressive Transformer
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving2023-11ECCV 2024GithubAutoregressive Transformer
Video generation models as world simulators (Sora)2024-02OpenAI-Diffusion
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability2024-05NeurIPS 2024GithubDiffusion
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving2025-03arXiv Wayve-Diffusion
Vid2World: Crafting Video Diffusion Models to Interactive World Models2025-05arXiv-AR+Diffusion
Epona: Autoregressive Diffusion World Model for Autonomous Driving2025-06ICCV 2025GithubAR+Diffusion
Core roles of World Models
PaperDateVenueCodeRole
Cosmos World Foundation Model Platform for Physical AI2025-03arXivGithubNeural Simulator
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control2025-04arXivGitHubNeural Simulator
GAIA-1: A Generative World Model for Autonomous Driving2023-09arXiv Wayve-Neural Simulator
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving2025-03arXiv Wayve-Neural Simulator
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving2024-05CVPR 2024-Neural Simulator
DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model2024-10arXivGitHubNeural Simulator
Dream to Control: Learning Behaviors by Latent Imagination (Dreamer)2019-12ICLR 2020GithubDynamic Model
Mastering Atari with Discrete World Models (Dreamer v2)2020-10ICLR 2021GithubDynamic Model
DayDreamer: World Models for Physical Robot Learning2022-06CoRL 2022GithubDynamic Model
Mastering Diverse Domains through World Models (Dreamer v3)2023-01NatureGithubDynamic Model
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning2023-05NeurIPS 2023GithubDynamic Model
iVideoGPT: Interactive VideoGPTs are Scalable World Models2024-05NeurIPS 2024GithubDynamic Model
Video Prediction Models as Rewards for Reinforcement Learning (VIPER)2023-05NeurIPS 2023GithubReward Model
Video models are zero-shot learners and reasoners2025-09arXiv Google Deepmind-Neural Simulator

6. World Models for Intelligent Robots

World Models for Autonomous Driving

Table

WMs as Neural Simulators for Autonomous Driving
PaperDateVenueCodeApplication
GAIA-1: A Generative World Model for Autonomous Driving2023-09arXiv Wayve-Scenario Generation
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving2023-09ECCV 2024GitHubScenario Generation
ADriver-I: A General World Model for Autonomous Driving2023-11arXiv-Scenario Generation
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving2025-03arXiv Wayve-Scenario Generation
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation2024-05AAAI 2025GitHubScenario Generation
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation2024-11CVPR 2025GitHubScenario Generation
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT2024-12arXivGitHubScenario Generation
MagicDrive: Street View Generation with Diverse 3D Geometry Control2024-05ICLR 2024GitHubScenario Generation
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes2024-11arXivGitHubScenario Generation
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control2024-11arXivGitHubScenario Generation
WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation2024-08ECCV 2024GitHubScenario Generation
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration2024-11CVPR 2025GitHubScenario Generation
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance2025-03ICRA 2025GitHubScenario Generation
Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving2024-08CVPR 2024GitHubScenario Generation
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control2025-04arXivGitHubScenario Generation
GeoDrive: Trajectory-Conditioned 3D World Model for Autonomous Driving2025-02arXiv-Scenario Generation
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving2024-05CVPR 2024-Scenario Generation
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving2025-05arXivGitHubScenario Generation
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving2025-01AAAI 2025GitHubScenario Generation
DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model2024-10arXivGitHubScenario Generation
RenderWorld: World Model with Self-Supervised 3D Label2024-11arXiv-Scenario Generation
OccLLaMA: A Language-Driven 3D Occupancy Generation Framework2024-12arXiv-Scenario Generation
BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space2024-07arXiv-Scenario Generation
HoloDrive: Holistic View-Aware World Model for Autonomous Driving2024-10arXiv-Scenario Generation
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control2024-12CVPR 2025GitHubScenario Generation
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving2024-08arXivGitHubScenario Generation
ACT-Bench: Towards Action Controllable World Models for Autonomous Driving2024-12arXiv-Scenario Generation
InfinityDrive: Towards Infinite-Resolution World Models for Autonomous Driving2024-12arXiv-Scenario Generation
Epona: Autoregressive Diffusion World Model for Autonomous Driving2025-06ICCV 2025GitHubScenario Generation
DrivePhysica: A Physics-Conditioned World Model for Autonomous Driving2024-12arXiv-Scenario Generation
Cosmos-Drive: Multi-Modal World Model for Autonomous Driving2025-03arXivGitHubScenario Generation
Genie 3: A new frontier for world models2025-08websiteTalkInteractive Online Simulation
WMs as Dynamic Models for Autonomous Driving
PaperDateVenueCodeApplication
MILE: Model-based Imitation Learning for Urban Driving2022-10NeurIPS 2022GitHubMotion Planning
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning2025-03arXivGitHubReasoning
TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction2023-03ICRA 2023GitHubMotion Prediction
Uniworld: Autonomous Driving Pre-training via World Models2023-08arXiv-Pre-training
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion2023-11ICLR 2024-Motion Planning
MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations2023-11IV 2025-Motion Planning
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving2023-11ECCV 2024GitHubMotion Planning
ViDAR: Visual Point Cloud Forecasting for Autonomous Driving2023-12CVPR 2024-Motion Prediction
Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving2024-02ECCV 2024-Motion Planning
LidarDM: Generative LiDAR Simulation in a Generated World2024-04ICRA 2025-Simulation
Enhancing End-to-End Autonomous Driving with Latent World Model2025-02ICLR 2025GitHubMotion Planning
UnO: Unsupervised Occupancy Fields for Perception and Forecasting2024-06CVPR 2024-Motion Prediction
CarFormer: Self-Driving with Learned Object-Centric Representations2024-07ECCV 2024GitHubMotion Planning
NeMo: Neural Occupancy Fields for Autonomous Driving2024ECCV 2024-Motion Prediction
Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage2021-10NeurIPS 2021GitHubMotion Planning
Imagine-2-Drive: High-Fidelity World Modeling for Autonomous Driving2024-11IROS 2025GitHubMotion Planning
Doe-1: Closed-Loop Autonomous Driving with Large World Model2024-08arXivGitHubMotion Planning
GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction2024-12arXivGitHubMotion Prediction
DFIT-OccWorld: Efficient Occupancy Forecasting via Differential Factorization and Interactive Transformer2024-12arXiv-Motion Prediction
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Large Language Model2024-12arXiv-Motion Planning
AdaWM: Adaptive World Model for Autonomous Driving2025-01ICLR 2025-Motion Planning
AD-L-JEPA: Autonomous Driving with L-JEPA2025-01arXivGitHubMotion Prediction
HERMES: Harmonized Embodied Representation for Multi-modal Sensor Integration in Autonomous Driving2025-01ICCV 2025GitHubMotion Planning
WMs as Reward Models for Autonomous Driving
PaperDateVenueCodeApplication
SEM2: Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model2024-05T-ITS-Reinforcement Learning
Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models2022-05NeurIPS 2022GitHubReinforcement Learning
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability2024-05NeurIPS 2024GitHubReinforcement Learning
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving2023-11CVPR 2024GitHubMotion Planning
WoTE: World-model-based End-to-end Autonomous Driving2025-04ICCV 2025GitHubMotion Planning

World Models for Articulated Robots

The following table compares researches for World Models in Robotics in terms of model input, architecture, experiment platform, and code availability.

Neural Simulators
PaperDateVenueCode
Whale: Towards generalizable and scalable world models for embodied decision-making2024-08arXiv-
RoboDreamer: Learning Compositional World Models for Robot Imagination2024-08ICML 2024Github
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination2024-11ICLR 2025GitHub
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation2025-01arXivGithub
Cosmos World Foundation Model Platform for Physical AI2025-03arXivGithub
WorldEval: World Model as Real-World Robot Policies Evaluator2025-05arXivGithub
DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories2025-05arXivGithub