Awesome Model-Based Reinforcement Learning
December 20, 2025 · View on GitHub
This is a collection of research papers for model-based reinforcement learning (mbrl). And the repository will be continuously updated to track the frontier of model-based rl.
Welcome to follow and star!
[2025.12.01] New: We update the NeurIPS 2025 paper list of model-based rl! [2025.08.28] We update the ICML 2025 paper list of model-based rl. [2025.02.06] We update the ICLR 2025 paper list of model-based rl. [2024.10.27] We update the NeurIPS 2024 paper list of model-based rl. [2024.05.20] We update the ICML 2024 paper list of model-based rl. [2023.11.29] We update the ICLR 2024 paper list of model-based rl. [2023.09.29] We update the NeurIPS 2023 paper list of model-based rl. [2023.06.15] We update the ICML 2023 paper list of model-based rl. [2023.02.05] We update the ICLR 2023 paper list of model-based rl. [2022.11.03] We update the NeurIPS 2022 paper list of model-based rl. [2022.07.06] We update the ICML 2022 paper list of model-based rl. [2022.02.13] We update the ICLR 2022 paper list of model-based rl. [2021.12.28] We release the awesome model-based rl.
Table of Contents
- Awesome Model-Based Reinforcement Learning
A Taxonomy of Model-Based RL Algorithms
We’ll start this section with a disclaimer: it’s really quite hard to draw an accurate, all-encompassing taxonomy of algorithms in the Model-Based RL space, because the modularity of algorithms is not well-represented by a tree structure. So we will publish a series of related blogs to explain more Model-Based RL algorithms.
A non-exhaustive, but useful taxonomy of algorithms in modern Model-Based RL.
We simply divide Model-Based RL into two categories: Learn the Model and Given the Model.
-
Learn the Modelmainly focuses on how to build the environment model. -
Given the Modelcares about how to utilize the learned model.
And we give some examples as shown in the figure above. There are links to algorithms in taxonomy.
[1] World Models: Ha and Schmidhuber, 2018
[2] I2A (Imagination-Augmented Agents): Weber et al, 2017
[3] MBMF (Model-Based RL with Model-Free Fine-Tuning): Nagabandi et al, 2017
[4] MBVE (Model-Based Value Expansion): Feinberg et al, 2018
[5] ExIt (Expert Iteration): Anthony et al, 2017
[6] AlphaZero: Silver et al, 2017
[7] POPLIN (Model-Based Policy Planning): Wang et al, 2019
[8] M2AC (Masked Model-based Actor-Critic): Pan et al, 2020
Papers
format:
- [title](paper link) [links]
- author1, author2, and author3
- Key: key problems and insights
- OpenReview: optional
- ExpEnv: experiment environments
Classic Model-Based RL Papers
Toggle
-
Dyna, an integrated architecture for learning, planning, and reacting
- Richard S. Sutton. ACM 1991
- Key: dyna architecture
- ExpEnv: None
-
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
- Marc Peter Deisenroth, Carl Edward Rasmussen. ICML 2011
- Key: probabilistic dynamics model
- ExpEnv: cart-pole system, robotic unicycle
-
Learning Complex Neural Network Policies with Trajectory Optimization
- Sergey Levine, Vladlen Koltun. ICML 2014
- Key: guided policy search
- ExpEnv: mujoco
-
Learning Continuous Control Policies by Stochastic Value Gradients
- Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval Tassa, Tom Erez. NIPS 2015
- Key: backpropagation through paths, gradient on real trajectory
- ExpEnv: mujoco
-
- Junhyuk Oh, Satinder Singh, Honglak Lee. NIPS 2017
- Key: value-prediction model
- ExpEnv: collect domain, atari
-
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
- Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee. NIPS 2018
- Key: ensemble model and Qnet, value expansion
- ExpEnv: mujoco, roboschool
-
Recurrent World Models Facilitate Policy Evolution
- David Ha, Jürgen Schmidhuber. NIPS 2018
- Key: vae(representation), rnn(predictive model)
- ExpEnv: car racing, vizdoom
-
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
-
When to Trust Your Model: Model-Based Policy Optimization
- Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine. NeurIPS 2019
- Key: ensemble model, sac, k-branched rollout
- ExpEnv: mujoco
-
Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees
- Yuping Luo, Huazhe Xu, Yuanzhi Li, Yuandong Tian, Trevor Darrell, Tengyu Ma. ICLR 2019
- Key: Discrepancy Bounds Design, ME-TRPO with multi-step, Entropy regularization
- ExpEnv: mujoco
-
Model-Ensemble Trust-Region Policy Optimization
- Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel. ICLR 2018
- Key: ensemble model, TRPO
- ExpEnv: mujoco
-
Dream to Control: Learning Behaviors by Latent Imagination
- Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi. ICLR 2019
- Key: DreamerV1, latent space imagination
- ExpEnv: deepmind control suite, atari, deepmind lab
-
Exploring Model-based Planning with Policy Networks
- Tingwu Wang, Jimmy Ba. ICLR 2020
- Key: model-based policy planning in action space and parameter space
- ExpEnv: mujoco
-
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver. Nature 2020
- Key: MCTS, value equivalence
- ExpEnv: chess, shogi, go, atari
NeurIPS 2025
Toggle
-
Stable Planning through Aligned Representations in Model-Based Reinforcement Learning
- Misagh Soltani, Forest Agostinelli. NeurIPS 2025
- Key: visual planning, aligned representations, discrete latent state, heuristic search
- ExpEnv: Rubik's Cube, Sokoban
-
RLVR-World: Training World Models with Reinforcement Learning
- Mingsheng Long, et al. NeurIPS 2025
- Key: world model training, decision-aware, verifiable rewards
- ExpEnv: text games, robot manipulation
-
Dyn-O: Building Structured World Models with Object-Centric Representations
- Microsoft Research et al. NeurIPS 2025
- Key: structured world models, object-centric, physics modeling
- ExpEnv: physical interaction, object manipulation
-
Off-policy Reinforcement Learning with Model-based Exploration Augmentation
- Anonymous et al. NeurIPS 2025
- Key: exploration, diffusion model, synthetic experience, data augmentation
- ExpEnv: mujoco, sparse reward tasks
-
Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
- Xiu Li, et al. NeurIPS 2025
- Key: multi-agent MBRL, diffusion-inspired, sequence modeling, joint distribution
- ExpEnv: SMAC, MPE
-
SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer
- Yarden As, Chengrui Qu, Benjamin Unger, Dongho Kang, Max van der Hart, Laixi Shi, Stelian Coros, Adam Wierman, Andreas Krause. NeurIPS 2025
- Key: safe MBRL, sim-to-real, ensemble uncertainty, robust control
- ExpEnv: real-world robotics, safety gym
-
Improving Model-Based Reinforcement Learning by Converging to Flatter Minima
- Shrinivas Ramasubramanian, Benjamin Freed, Alexandre Capone, Jeff Schneider. NeurIPS 2025
- Key: model error, simulation lemma, model generalization,
- ExpEnv: DMC, Atari100k, HumanoidBench
ICML 2025
Toggle
-
Improving Transformer World Models for Data-Efficient RL
- Antoine Dedieu, Joseph Ortiz, Xinghua Lou, Carter Wendelken, Wolfgang Lehrach, J Swaroop Guntupalli, Miguel Lazaro-Gredilla, Kevin Murphy
- Key: dyna with warmup, patch nearestneighbor tokenization, block teacher forcing
- OpenReview: 4, 4, 4, 3
- ExpEnv: craftax-classic
-
Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning
- Brett Barkley, David Fridovich-Keil
- Key: Dyna-style algorithms significantly degrades performance across most DMC environments.
- OpenReview: 4, 4, 3, 2
- ExpEnv: gym, DeepMind Control Suite
-
Knowledge Retention in Continual Model-Based Reinforcement Learning
- Haotian Fu, Yixiang Sun, Michael L. Littman, George Konidaris
- Key: synthetic experience rehearsal, regaining memories through exploration
- OpenReview: 4, 3, 3, 3
- ExpEnv: mini-grid, deepmind control suite
-
Time-Aware World Model for Adaptive Prediction and Control
- Anh N Nhu, Sanghyun Son, Ming Lin
- Key: condition on the time-step size ∆t and and train over a diverse range of ∆t values
- OpenReview: 4, 3, 3
- ExpEnv: meta-world control tasks, PDE-control tasks
-
Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach
- Minting Pan, Yitao Zheng, Jiajian Li, Yunbo Wang, Xiaokang Yang
- Key: behavior abstraction network, hierarchical world model
- OpenReview: 3, 3, 3, 2
- ExpEnv: meta-world, carla, minedojo
-
Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning
- Dongsu Lee, Minhae Kwon
- Key: learn a latent abstraction that captures a temporal distance from both trajectory and transition levels of state space.
- OpenReview: 4, 3, 3, 2
- ExpEnv: D4RL, AntMaze, FrankaKitchen, CALVIN, pixel-based FrankaKitchen.
-
- Dongchi Huang, Jiaqi WANG, Yang Li, Chunhe Xia, Tianle Zhang, Kaige Zhang
- Key: leverage privileged information through privileged representation alignment and an asymmetric actor-critic structure
- OpenReview: 3, 3, 3
- ExpEnv: safety gymnasium benchmark, guard benchmark
-
Reward-free World Models for Online Imitation Learning
- Shangzhe Li, Zhiao Huang, Hao Su
- Key: reward-free world model, inverse soft-Q learning objective
- OpenReview: 4, 3, 3, 3
- ExpEnv: DMControl, MyoSuite, ManiSkill2
-
FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making
- Yucen Wang, Rui Yu, Shenghua Wan, Le Gan, De-Chuan Zhan
- Key: ground FM representations into the WM state space, model-based goal-condition RL
- OpenReview: 4, 3, 3, 3
- ExpEnv: DMControl, Kitchen, minecraft
-
Continual Reinforcement Learning by Planning with Online World Models
- Zichen Liu, Guoji Fu, Chao Du, Wee Sun Lee, Min Lin
- Key: plan with online world model, regret analysis
- OpenReview: 4, 4, 4, 3
- ExpEnv: ContinualBench
-
Scaling Laws for Pre-training Agents and World Models
- Tim Pearce*, Tabish Rashid*, David Bignell, Raluca Georgescu, Sam Devlin, Katja Hofmann
- Key: scaling laws, embodied AI, behavior cloning, world modeling, tokenizer, architecture
- ExpEnv: Bleeding Edge, RT-1 (robotics), Atari, NetHack
-
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning
- Gaoyue Zhou, Hengkai Pan, Yann LeCun, Lerrel Pinto
- Key: world models, offline learning, zero-shot planning, pretrained visual features, task-agnostic reasoning
- ExpEnv: Maze, Wall, Reach, Push-T, Rope Manipulation, Granular Manipulation
-
General agents need world models
- Jonathan Richens, Tom Everitt, David Abel
- Key: world models, goal-directed behavior, model-free learning, policy analysis, regret bounds
- ExpEnv: synthetic controlled Markov process (cMP) environments with varying sample trajectories and goal depths
-
RobustZero: Enhancing MuZero Reinforcement Learning Robustness to State Perturbations
- Yushuai Li, Hengyu Liu, Torben Bach Pedersen, Yuqiang He, Kim Guldstrand Larsen, Lu Chen, Christian S. Jensen, Jiachen Xu, Tianyi Li
- Key: MuZero, robustness, reinforcement learning, state perturbations, self-supervised learning, adaptive adjustment
- ExpEnv: CartPole, Pendulum, IEEE 34-bus, IEEE 123-bus, IEEE 8500-node, Highway, Intersection, Racetrack, Hopper, Walker2d, HalfCheetah, Ant
-
Accurate and Efficient World Modeling with Masked Latent Transformers
- Maxime Burchi, Radu Timofte
- Key: model-based reinforcement learning, world models, MaskGIT, spatial latent space, Dreamer, Transformer, efficiency
- ExpEnv: Crafter, Atari 100k
-
Trajectory World Models for Heterogeneous Environments
- Shaofeng Yin, Jialong Wu, Siqiao Huang, Xingjian Su, Xu He, Jianye Hao, Mingsheng Long
- Key: world models, heterogeneous environments, pre-training, in-context learning, model transfer, trajectory data
- ExpEnv: UniTraj (80 diverse environments), D4RL (HalfCheetah, Hopper, Walker2D), Cart-2-Pole, Cart-3-Pole
-
A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment
- Raanan Y. Rohekar, Yaniv Gurwicz, Sungduk Yu, Estelle Aflalo, Vasudev Lal
- Key: GPT, causal inference, attention mechanism, structural causal model, zero-shot causal discovery
- ExpEnv: Othello, Chess
ICLR 2025
Toggle
-
Learning Transformer-based World Models with Contrastive Predictive Coding
- Maxime Burchi, Radu Timofte
- Key: model-based reinforcement learning, transformer network, contrastive predictive coding
- ExpEnv: Atari 100k benchmark
-
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
- Yang Tian, Sizhe Yang, Jia Zeng, Ping Wang, Dahua Lin, Hao Dong, Jiangmiao Pang
- Key: Robotic Manipulation, Pre-training, Visual Foresight, Inverse Dynamics, Large-scale robot dataset
- ExpEnv: LIBERO-LONG benchmark, CALVIN ABC-D, real-world tasks
-
OptionZero: Planning with Learned Options
- Po-Wei Huang, Pei-Chiun Peng, Hung Guei, Ti-Rong Wu
- Key: Option, Semi-MDP, MuZero, MCTS, Planning, Reinforcement Learning
- ExpEnv: Atari
-
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
- Claas A Voelcker, Marcel Hussing, Eric Eaton, Amir-massoud Farahmand, Igor Gilitschenski
- Key: reinforcement learning, model based reinforcement learning, data augmentation, high update ratios
- ExpEnv: DeepMind Control Suite
-
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
- Michael Matthews, Michael Beukman, Chris Lu, Jakob Nicolaus Foerster
- Key: Reinforcement Learning, Open-Endedness, Unsupervised Environment Design, Automatic Curriculum Learning, Benchmark
- ExpEnv: 2D Physics-Based Tasks, Robotic Locomotion, Grasping, Video Games, Classic RL Environments
-
Learning to Search from Demonstration Sequences
- Dixant Mittal, Liwei Kang, Wee Sun Lee
- Key: Planning, Reasoning, Learning to Search, Reinforcement Learning, Large Language Model
- ExpEnv: Game of 24, 2D Grid Navigation, Procgen Games
-
Open-World Reinforcement Learning over Long Short-Term Imagination
- Jiajian Li, Qi Wang, Yunbo Wang, Xin Jin, Yang Li, Wenjun Zeng, Xiaokang Yang
- Key: Reinforcement Learning, World Models, Visual Control
- ExpEnv: MineDojo
-
MaestroMotif: Skill Design from Artificial Intelligence Feedback
- Martin Klissarov, Mikael Henaff, Roberta Raileanu, Shagun Sodhani, Pascal Vincent, Amy Zhang, Pierre-Luc Bacon, Doina Precup, Marlos C. Machado, Pierluca D'Oro
- Key: Hierarchical RL, Reinforcement Learning, LLMs
- ExpEnv: NetHack Learning Environment (NLE)
-
Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects
- Authors: Tai Hoang, Huy Le, Philipp Becker, Vien Anh Ngo, Gerhard Neumann
- Key: Robotic Manipulation, Equivariance, Graph Neural Networks, Reinforcement Learning, Deformable Objects
- ExpEnv: Rigid insertion, rope manipulation, cloth manipulation with multiple end-effectors
-
M^3PC: Test-time Model Predictive Control using Pretrained Masked Trajectory Model
- Kehan Wen, Yutong Hu, Yao Mu, Lei Ke
- Key: Offline-to-Online Reinforcement Learning, Model-based Reinforcement Learning, Masked Autoencoding, Robot Learning
- ExpEnv: D4RL, RoboMimic
-
Offline Model-Based Optimization by Learning to Rank
- Rong-Xi Tan, Ke Xue, Shen-Huan Lyu, Haopu Shang, yaowang, Yaoyuan Wang, Fu Sheng, Chao Qian
- Key: Offline model-based optimization, black-box optimization, learning to rank, learning to optimize
- ExpEnv: Diverse tasks across optimization scenarios
-
Monte Carlo Planning with Large Language Model for Text-Based Games
- Zijing Shi, Meng Fang, Ling Chen
- Key: Large language model, Monte Carlo tree search, Text-based games
- ExpEnv: Jericho benchmark
-
Interpreting Emergent Planning in Model-Free Reinforcement Learning
- Thomas Bush, Stephen Chung, Usman Anwar, Adrià Garriga-Alonso, David Krueger
- Key: reinforcement learning, interpretability, planning, probes, model-free, mechanistic interpretability, sokoban
- ExpEnv: Sokoban
-
Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient
- Wenlong Wang, Ivana Dusparic, Yucheng Shi, Ke Zhang, Vinny Cahill
- Key: Mamba-2, Model based reinforcement learning, Mamba, State space models
- ExpEnv: Atari 100K
-
Zero-shot Model-based Reinforcement Learning using Large Language Models
- Abdelhakim Benechehab, Youssef Attia El Hili, Ambroise Odonnat, Oussama Zekri, Albert Thomas, Giuseppe Paolo, Maurizio Filippone, Ievgen Redko, Balázs Kégl
- Key: Model-based Reinforcement Learning, Large language models, Zero-shot Learning, In-context Learning
- ExpEnv: D4RL, Pendulum, HalfCheetah, Hopper
-
On Rollouts in Model-Based Reinforcement Learning
- Bernd Frauenknecht, Devdutt Subhasish, Friedrich Solowjow, Sebastian Trimpe
- Key: Model-Based Reinforcement Learning, Model Rollouts, Uncertainty Quantification
- ExpEnv: Gym MuJoCo
-
Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning
- Haoxin Lin, Yu-Yan Xu, Yihao Sun, Zhilong Zhang, Yi-Chen Li, Chengxing Jia, Junyin Ye, Jiaji Zhang, Yang Yu
- Key: model-based reinforcement learning, any-step dynamics model
- ExpEnv: D4RL, NeoRL, Gym MuJoCo-v3
-
Discrete Codebook World Models for Continuous Control
- Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää, Yi Zhao, Kevin Sebastian Luck, Arno Solin, Joni Pajarinen
- Key: reinforcement learning, world model, representation learning, self-supervised learning, model-based reinforcement learning, continuous control
- ExpEnv: deepmind control suite, Meta-World, myosuite
NeurIPS 2024
Toggle
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
- Jialong Wu, Shaofeng Yin, Ningya Feng, Xu He, Dong Li, Jianye Hao, Mingsheng Long
- Key: world models, video generative models, autoregressive transformer, reinforcement learning, video prediction, visual planning
- ExpEnv: Meta-world
-
Parallelizing Model-based Reinforcement Learning Over the Sequence Length
- ZiRui Wang, Yue Deng, Junfeng Long, Yin Zhang
- Key: reinforcement learning, model-based reinforcement learning, parallelization, sequence length, world model, eligibility trace, sample efficiency
- ExpEnv: Atari 100K, DMControl
-
Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity
- Philip Amortila, Dylan J. Foster, Nan Jiang, Akshay Krishnamurthy, Zakaria Mhammedi
- Key: reinforcement learning, latent dynamics, statistical modularity, algorithmic modularity, observable-to-latent reductions, self-predictive models
- ExpEnv: None
-
SPO: Sequential Monte Carlo Policy Optimisation
- Matthew V Macfarlane, Edan Toledo, Donal Byrne, Paul Duckworth, Alexandre Laterre
- Key: reinforcement learning, rl, model-based reinforcement learning, sequential monte carlo, expectation maximisation, planning
- ExpEnv: Brax, Boxoban, Rubik's Cube
-
Seek Commonality but Preserve Differences: Dissected Dynamics Modeling for Multi-modal Visual RL
- Yangru Huang, Peixi Peng, Yifan Zhao, Guangyao Chen, Yonghong Tian
- Key: multi-modal reinforcement learning, visual RL, dynamics modeling, modality consistency, modality inconsistency, DDM
- ExpEnv: CARLA, DMControl
-
- Moritz Schneider, Robert Krug, Narunas Vaskevicius, Luigi Palmieri, Joschka Boedecker
- Key: reinforcement learning, rl, model-based reinforcement learning, representation learning, pvr, visual representations
- ExpEnv: DMC, ManiSkill2, Miniworld
-
Multi-Agent Domain Calibration with a Handful of Offline Data
- Tao Jiang, Lei Yuan, Lihe Li, Cong Guan, Zongzhang Zhang, Yang Yu
- Key: Multi-agent reinforcement learning, domain transfer
- ExpEnv: D4RL
-
The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning
-
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning
- Abdullah Akgül, Manuel Haussmann, Melih Kandemir
- Key: The paper argues that uncertainty-based reward penalization introduces excessive conservatism, potentially resulting in suboptimal policies through underestimation.
- ExpEnv: d4rl
-
BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning
-
Model-Based Transfer Learning for Contextual Reinforcement Learning
- Jung-Hoon Cho, Vindula Jayawardana, Sirui Li, Cathy Wu
- Key: bayesian optimization, contextual rl
- ExpEnv: gaussian process, traffic signal, eco-driving, advisory autonomy, control tasks
-
- Guhao Feng, Han Zhong
- Key: rl representation complexity
- ExpEnv: mujoco
ICML 2024
Toggle
-
HarmonyDream: Task Harmonization Inside World Models
- Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, Mingsheng Long
- Key: observation modeling and reward modeling analysis in world models
- ExpEnv: meta-world, rlbench, deepmind control suite, atari 100k
-
CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents
- Qinlin Zhao, Jindong Wang, Yixuan Zhang, Yiqiao Jin, Kaijie Zhu, Hao Chen, Xing Xie
- Key: propose a competitive framework for LLM-based agents; build a simulated competitive environment
- ExpEnv: a virtual town with only restaurants and customers
-
Model-based Reinforcement Learning for Parameterized Action Spaces
- Renhao Zhang, Haotian Fu, Yilin Miao, George Konidaris
- Key: discrete-continuous hybrid action space, dynamics model with parameterized actions, MPC with parameterized actions
- ExpEnv: platform, goal, hard goal, catch point, hard move
-
Learning Latent Dynamic Robust Representations for World Models
- Ruixiang Sun, Hongyu Zang, Xin Li, Riashat Islam
- Key: modified Dreamer architecture, hybrid-recurrent state space model
- ExpEnv: deepmind control suite, distracted deepmind control suite, mani-skill2
-
AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors
- Yucen Wang, Shenghua Wan, Le Gan, Shuai Feng, De-Chuan Zhan
- Key: implicit action generator, action-conditioned separated world models
- ExpEnv: deepmind control suite
-
Hieros: Hierarchical Imagination on Structured State Space Sequence World Models
- Paul Mattes, Rainer Schlosser, Ralf Herbrich
- Key: state-space models, multilayered hierarchical imagination, S5 based world model
- ExpEnv: atari 100k
-
Improving Token-Based World Models with Parallel Observation Prediction
- Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor
- Key: pixel-based mbrl, token-based world models, retentive environment model
- ExpEnv: atari 100k
-
Do Transformer World Models Give Better Policy Gradients?
- Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon
- Key: actions world model
- ExpEnv: double-pendulum, Myriad
-
Dr. Strategy: Model-Based Generalist Agents with Strategic Dreaming
- Hany Hamed, Subin Kim, Dongyeong Kim, Jaesik Yoon, Sungjin Ahn
- Key: during strategeic dreaming, train three policies -- highway policy, explorer policy and achiever policy, and then achieve downstream tasks
- ExpEnv: 2D Navigation, 3D-Maze Navigation, RoboKitchen
-
Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption
- Chenlu Ye, Jiafan He, Quanquan Gu, Tong Zhang
- Key: theoretical analysis of adversarial corruption for model-based rl, encompassing both online and offline settings
- ExpEnv: None
-
Model-based Reinforcement Learning for Confounded POMDPs
- Mao Hong, Zhengling Qi, Yanxun Xu
- Key: model-based RL, POMDP
- ExpEnv: None
ICLR 2024
Toggle
-
Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning
- Chengxing Jia, Chenxiao Gao, Hao Yin, Fuxiang Zhang, Xiong-Hui Chen, Tian Xu, Lei Yuan, Zongzhang Zhang, Zhi-Hua Zhou, Yang Yu
- Key: Reinforcement Learning, Model-based Reinforcement Learning, Offline Reinforcement Learning
- OpenReview: 8, 8, 8, 6
- ExpEnv: d4rl
-
Efficient Dynamics Modeling in Interactive Environments with Koopman Theory
- Arnab Kumar Mondal, Siba Smarak Panigrahi, Sai Rajeswar, Kaleem Siddiqi, Siamak Ravanbakhsh
- Key: Koopman Theory, Reinforcement Learning, Dynamical System, Planning, Longe range dynamics prediction models, Efficient forward dynamics
- OpenReview: 8, 6, 5, 3
- ExpEnv: mujoco
-
Combining Spatial and Temporal Abstraction in Planning for Better Generalization
- Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio
- Key: Reinforcement Learning, Planning, Neural Networks, Temporal Difference Learning, Generalization, Deep Reinforcement Learning
- OpenReview: 6, 6, 6, 5
- ExpEnv: MiniGrid-BabyAI framework
-
Mastering Memory Tasks with World Models
- Mohammad Reza Samsami, Artem Zholus, Janarthanan Rajendran, Sarath Chandar
- Key: recall to imagine module, based on DreamerV3
- OpenReview: 10, 8, 6
- ExpEnv: bsuite, popgym, atari, deepmind control suite, memory maze
-
Privileged Sensing Scaffolds Reinforcement Learning
- Edward S. Hu, James Springer, Oleh Rybkin, Dinesh Jayaraman
- Key: privileged information, based on DreamerV3
- OpenReview: 10, 8, 8, 8
- ExpEnv: gymnasium robotics
-
TD-MPC2: Scalable, Robust World Models for Continuous Control
- Nicklas Hansen, Hao Su, Xiaolong Wang
- Key: implicit world model, model predictive control, generalist td-mpc2
- OpenReview: 8, 8, 8, 8
- ExpEnv: deepmind control suite, Meta-World, maniskill2, myosuite
-
Robust Model Based Reinforcement Learning Using L1 Adaptive Control
- Minjun Sung, Sambhu Harimanas Karumanchi, Aditya Gahlawat, Naira Hovakimyan
- Key: L1 Adaptive Control
- OpenReview: 8, 6, 6, 6
- ExpEnv: mujoco
-
Learning Hierarchical World Models with Adaptive Temporal Abstractions from Discrete Latent Dynamics
- Christian Gumbsch, Noor Sajid, Georg Martius, Martin V. Butz
- Key: Context-specific Recurrent State Space Model, hierarchical world model
- OpenReview: 8, 6, 6
- ExpEnv: MiniHack, VisualPinPad, MultiWorld
-
Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
- Lunjun Zhang, Yuwen Xiong, Ze Yang, Sergio Casas, Rui Hu, Raquel Urtasun
- Key: discrete diffusion; world model; autonomous driving
- OpenReview: 10, 8, 6, 6, 6
- ExpEnv: NuScenes, KITTI Odometry, Argoverse2 Lidar
-
COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL
- Xiyao Wang, Ruijie Zheng, Yanchao Sun, Ruonan Jia, Wichayaporn Wongkamjan, Huazhe Xu, Furong Huang
- Key: conservative model rollouts, optimistic environment exploration
- OpenReview: 6, 6, 6
- ExpEnv: mujoco, deepmind control suite
-
Efficient Multi-agent Reinforcement Learning by Planning
- Qihan Liu, Jianing Ye, Xiaoteng Ma, Jun Yang, Bin Liang, Chongjie Zhang
- Key: mcts, optimistic search lambda, advantage-weighted policy optimization
- OpenReview: 8, 6, 6, 6
- ExpEnv: smac
-
Differentiable Trajectory Optimization as a Policy Class for Reinforcement and Imitation Learning
- Weikang Wan, Yufei Wang, Zackory Erickson, David Held
- Key: differentiable trajectory optimization
- OpenReview: 10, 8, 8, 5
- ExpEnv: deepmind control suite, robomimic, maniskill
-
- Zhihe YANG, Yunjian Xu
- Key: conditional diffusion, offline RL
- OpenReview: 8, 8, 6, 6
- ExpEnv: d4rl
-
MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning
- Zohar Rimon, Tom Jurgenson, Orr Krupnik, Gilad Adler, Aviv Tamar
- Key: context-based meta-RL, based on dreamer
- OpenReview: 6, 6, 6, 6
- ExpEnv: Point Robot Navigation, Escape Room, Reacher Sparse
-
Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning
-
DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing
- Vint Lee, Pieter Abbeel, Youngwoon Lee
- Key: learn to predict a temporally-smoothed reward rather than the exact reward at each timestep
- OpenReview: 6, 6, 6, 5
- ExpEnv: robodesk, hand, earthmoving
-
Informed POMDP: Leveraging Additional Information in Model-Based RL
- Gaspard Lambrechts, Adrien Bolland, Damien Ernst
- Key: informed world model, based on DreamerV3
- OpenReview: 6, 6, 6, 5
- ExpEnv: varying mountain hike, deepmind control suite, pop gym, flickering atari and flickering control
NeurIPS 2023
Toggle
-
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
- Zirui Zhao, Wee Sun Lee, David Hsu
- Key: LLM-MCTS
- ExpEnv: VirtualHome
-
- Zihao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian (Shawn) Ma, Yitao Liang
- Key: interactive planning approach based on LLM
- ExpEnv: minecraft
-
Facing Off World Model Backbones: RNNs, Transformers, and S4
- Fei Deng, Junyeong Park, Sungjin Ahn
- Key: world model backbones
- ExpEnv: MiniGrid, memory maze
-
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning
- Jialong Wu, Haoyu Ma, Chaoyi Deng, Mingsheng Long
- Key: Contextualized World Models
- ExpEnv: CARLA, deepmind control suite
-
Conformal Prediction for Uncertainty-Aware Planning with Diffusion Dynamics Model
-
LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
- Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu
- Key: MCTS-style benchmark
- ExpEnv: board games, atari, mujoco, gobigger
-
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning
- Haoran He, Chenjia Bai, Kang Xu, Zhuoran Yang, Weinan Zhang, Dong Wang, Bin Zhao, Xuelong Li
- Key: GPT-based diffusion model for planning and data synthesizing
- ExpEnv: Meta-World, Maze2D
-
MoVie: Visual Model-Based Policy Adaptation for View Generalization
- Sizhe Yang, Yanjie Ze, Huazhe Xu
- Key: view generalization, spatial adaptive encoder
- ExpEnv: deepmind control suite, adroit, xArm
-
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
- Shenao Zhang, Boyi Liu, Zhaoran Wang, Tuo Zhao
- Key: model-based reparameterization policy gradient method, smoothness regularization
- ExpEnv: mujoco
-
- Lin Guan, Karthik Valmeekam, Sarath Sreedharan, Subbarao Kambhampati
- Key: construct an explicit world (domain) model in planning domain definition language
- ExpEnv: household-robot domain, tyreworld and logistics
-
RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability
- Chuning Zhu, Max Simchowitz, Siri Gadipudi, Abhishek Gupta
- Key: representation resilience for visual RL
- ExpEnv: deepmind control suite, maniskill
-
Model-Based Control with Sparse Neural Dynamics
- Ziang Liu, Jeff He, Genggeng Zhou, Tobia Marcucci, Fei-Fei Li, Jiajun Wu, Yunzhu Li
- Key: network sparsification, mixed-integer formulation of ReLU neural dynamics
- ExpEnv: gym, cartpole, reacher
-
Optimal Exploration for Model-Based RL in Nonlinear Systems
- Andrew Wagenmaker, Guanya Shi, Kevin Jamieson
- Key: optimal sample complexity for nonlinear dynamical systems
- ExpEnv: affine dynamics system
-
State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding
- Devleena Das, Sonia Chernova, Been Kim
- Key: a joint embedding model between state-action pairs and concept-based explanations
- ExpEnv: connect4, lunar lander
-
Efficient Exploration in Continuous-time Model-based Reinforcement Learning
- Lenart Treven, Jonas Hübotter, Bhavya, Florian Dorfler, Andreas Krause
- Key: nonlinear ordinary differential equations, regret bound, measurement selection strategies
- ExpEnv: system’s tasks
-
Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models
- Xingyuan Zhang, Philip Becker-Ehmck, Patrick van der Smagt, Maximilian Karl
- Key: pretrained world models, imitation learning from observation only
- ExpEnv: deepmind control suite
-
STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning
- Weipu Zhang, Gang Wang, Jian Sun, Yetian Yuan, Gao Huang
- Key: categorical-VAE, transformer structure, DreamerV3
- ExpEnv: atari
ICML 2023
Toggle
-
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
- Sai Rajeswar Mudumba, Pietro Mazzaglia, Tim Verbelen, Alexandre Piche, Bart Dhoedt, Aaron Courville, Alexandre Lacoste
- Key: unsupervised pretrain, task-aware finetune, dyna-mpc
- ExpEnv: URLB benchmark, RWRL suite
-
Reparameterized Policy Learning for Multimodal Trajectory Optimization
- Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su
- Key: multimodal policy learning, reparameterized policy gradient
- ExpEnv: Meta-World, mujoco
-
Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy
- Xiyao Wang, Wichayaporn Wongkamjan, Ruonan Jia, Furong Huang
- Key: policy-adapted model learning, weight design
- ExpEnv: mujoco
-
Predictable MDP Abstraction for Unsupervised Model-Based RL
- Seohong Park, Sergey Levine
- Key: predictable MDP abstraction, tackle model exploitation
- ExpEnv: mujoco
-
Investigating the Role of Model-Based Learning in Exploration and Transfer
- Jacob C Walker, Eszter Vértes, Yazhe Li, Gabriel Dulac-Arnold, Ankesh Anand, Jessica Hamrick, Theophane Weber
- Key Insights: (1) Is there an advantage to an agent being model-based during unsupervised exploration and/or fine-tuning? (2) What are the contributions of each component of a model-based agent for downstream task learning? (3) How well does the model-based agent deal with environmental shift between the unsupervised and downstream phases?
- ExpEnv: Crafter, RoboDesk, Meta-World
-
The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms
- Anirudh Vemula, Yuda Song, Aarti Singh, J. Bagnell, Sanjiban Choudhury
- Key: objective mismatch, mbrl framework
- ExpEnv: Helicopter, WideTree, Linear Dynamical System, Maze, mujoco
-
The Benefits of Model-Based Generalization in Reinforcement Learning
- Kenny Young, Aditya Ramesh, Louis Kirsch, Jürgen Schmidhuber
- Key: experience replay, when and how learned model generalization
- ExpEnv: ProcMaze, ButtonGrid, PanFlute
-
STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning
- Souradip Chakraborty, Amrit Bedi, Alec Koppel, Mengdi Wang, Furong Huang, Dinesh Manocha
- Key: information directed sampling, kernelized Stein discrepancy
- ExpEnv: DeepSea
-
Model-based Reinforcement Learning with Scalable Composite Policy Gradient Estimators
- Paavo Parmas, Takuma Seno, Yuma Aoki
- Key: extension of Dreamer, total propagation computation graph
- ExpEnv: deepmind control suite
-
Reinforcement Learning with History Dependent Dynamic Contexts
- Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier
- Key: non-Markov context dynamics, logistic DCMDPs, theoretical analysis, extension of MuZero
- ExpEnv: MovieLens dataset
-
Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning
-
Simplified Temporal Consistency Reinforcement Learning
- Yi Zhao, Wenshuai Zhao, Rinu Boney, Juho Kannala, Joni Pajarinen
- Key: representation learning, temporal consistency
- ExpEnv: deepmind control suite
-
Curious Replay for Model-based Adaptation
- Isaac Kauvar, Chris Doyle, Linqi Zhou, Nick Haber
- Key: extension of DreamerV3, curious replay, count-based replay, adversarial replay
- ExpEnv: Crafter, deepmind control suite
-
On Many-Actions Policy Gradient
- Michal Nauman, Marek Cygan
- Key: bias and variance, theoretical analysis
- ExpEnv: deepmind control suite
-
Posterior Sampling for Deep Reinforcement Learning
- Remo Sasso, Michelangelo Conserva, Paulo Rauber
- Key: posterior sampling, continual value network
- ExpEnv: atari
-
Model-based Offline Reinforcement Learning with Count-based Conservatism
- Byeongchan Kim, Min-hwan Oh
- Key: count estimation, theoretical analysis
- ExpEnv: d4rl
ICLR 2023
Toggle
-
Transformers are Sample-Efficient World Models
- Vincent Micheli, Eloi Alonso, François Fleuret
- Key: discrete autoencoder, transformer based world model
- OpenReview: 8, 8, 8, 8
- ExpEnv: atari
-
Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization
- Jihwan Jeong, Xiaoyu Wang, Michael Gimelfarb, Hyunwoo Kim, Baher Abdulhai, Scott Sanner
- Key: model-based offline, bayesian posterior value estimate
- OpenReview: 8, 8, 6, 6
- ExpEnv: d4rl
-
User-Interactive Offline Reinforcement Learning
- Phillip Swazinna, Steffen Udluft, Thomas Runkler
- Key: let the user adapt the policy behavior after training is finished
- OpenReview: 10, 8, 6, 3
- ExpEnv: 2d-world, industrial benchmark
-
CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning
- Sheng Yue, Guanbo Wang, Wei Shao, Zhaofeng Zhang, Sen Lin, Ju Ren, Junshan Zhang
- Key: offline IRL, reward extrapolation error
- OpenReview: 8, 8, 6, 6
- ExpEnv: d4rl
-
Efficient Offline Policy Optimization with a Learned Model
- Zichen Liu, Siyi Li, Wee Sun Lee, Shuicheng YAN, Zhongwen Xu
- Key: offline rl, analysis of MuZero Unplugged, one-step look-ahead policy improvement
- OpenReview: 8, 6, 5
- ExpEnv: atari dataset
-
Efficient Planning in a Compact Latent Action Space
- zhengyao jiang, Tianjun Zhang, Michael Janner, Yueying Li, Tim Rocktäschel, Edward Grefenstette, Yuandong Tian
- Key: planning with VQ-VAE
- OpenReview: 6, 6, 6, 6
- ExpEnv: d4rl dataset
-
- Ruijie Zheng, Xiyao Wang, Huazhe Xu, Furong Huang
- Key: lipschitz regularization
- OpenReview: 8, 8, 6, 6
- ExpEnv: mujoco
-
MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations
- Nicklas Hansen, Yixin Lin, Hao Su, Xiaolong Wang, Vikash Kumar, Aravind Rajeswaran
- Key: three phases -- policy pretraining, targeted exploration, interactive learning
- OpenReview: 8, 6, 6, 6
- ExpEnv: adroit, meta-world, deepmind control suite
-
- Raj Ghugare, Homanga Bharadhwaj, Benjamin Eysenbach, Sergey Levine, Ruslan Salakhutdinov
- Key: Aligned Latent Models
- OpenReview: 8, 6, 6, 6, 6
- ExpEnv: mujoco
-
Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning
- Daniel Palenicek, Michael Lutter, Joao Carvalho, Jan Peters
- Key: longer horizons yield diminishing returns in terms of sample efficiency
- OpenReview: 8, 6, 6, 6
- ExpEnv: brax
-
Planning Goals for Exploration
- Edward S. Hu, Richard Chang, Oleh Rybkin, Dinesh Jayaraman
- Key: sampling-based planning, set goals for each training episode to directly optimize an intrinsic exploration reward
- OpenReview: 8, 8, 8, 8, 6
- ExpEnv: point maze, walker, ant maze, 3-block stack
-
Making Better Decision by Directly Planning in Continuous Control
- Jinhua Zhu, Yue Wang, Lijun Wu, Tao Qin, Wengang Zhou, Tie-Yan Liu, Houqiang Li
- Key: deep differentiable dynamic programming planner
- OpenReview: 8, 8, 8, 6
- ExpEnv: mujoco
-
Latent Variable Representation for Reinforcement Learning
- Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, sujay sanghavi, Dale Schuurmans, Bo Dai
- Key: variational learning, representation learning
- OpenReview: 8, 6, 6, 3
- ExpEnv: mujoco, deepmind control suite
-
SpeedyZero: Mastering Atari with Limited Data and Time
- Yixuan Mei, Jiaxuan Gao, Weirui Ye, Shaohuai Liu, Yang Gao, Yi Wu
- Key: distributed model-based rl, speed up EfficientZero
- OpenReview: 6, 6, 5
- ExpEnv: atari 100k
-
Transformer-based World Models Are Happy With 100k Interactions
- Jan Robine, Marc Höftmann, Tobias Uelwer, Stefan Harmeling
- Key: autoregressive world model, Transformer-XL, balanced cross-entropy loss, balanced dataset sampling
- OpenReview: 8, 6, 6, 6
- ExpEnv: atari 100k
-
On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning
- Yifan Xu, Nicklas Hansen, Zirui Wang, Yung-Chieh Chan, Hao Su, Zhuowen Tu
- Key: offline multi-task pretraining, online finetuning
- OpenReview: 6, 6, 6, 6
- ExpEnv: atari 100k
-
Become a Proficient Player with Limited Data through Watching Pure Videos
- Weirui Ye, Yunsheng Zhang, Pieter Abbeel, Yang Gao
- Key: unsupervised pre-training, finetune with down-stream tasks
- OpenReview: 8, 6, 6, 5
- ExpEnv: atari 100k
-
EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model
- Yifu Yuan, Jianye HAO, Fei Ni, Yao Mu, YAN ZHENG, Yujing Hu, Jinyi Liu, Yingfeng Chen, Changjie Fan
- Key: jointly pretrain the multi-headed dynamics model and unsupervised exploration policy, finetune to downstream tasks
- OpenReview: 6, 6, 6, 6
- ExpEnv: URLB benchmark
-
Choreographer: Learning and Adapting Skills in Imagination
- Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt, Alexandre Lacoste, Sai Rajeswar
- Key: world model, skill discovery, skill learning, Skill adaptation
- OpenReview: 8, 8, 6, 6
- ExpEnv: deepmind control suite, Meta-World
NeurIPS 2022
Toggle
-
Bidirectional Learning for Offline Infinite-width Model-based Optimization
- Can Chen, Yingxue Zhang, Jie Fu, Xue Liu, Mark Coates
- Key: model-based, offline
- OpenReview: 7, 6, 5
- ExpEnv: design-bench
-
A Unified Framework for Alternating Offline Model Training and Policy Learning
- Shentao Yang, Shujian Zhang, Yihao Feng, Mingyuan Zhou
- Key: model-based, offline, marginal importance weight
- OpenReview: 7, 6, 6, 5
- ExpEnv: d4rl dataset
-
Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief
- Kaiyang Guo, Shao Yunfeng, Yanhui Geng
- Key: model-based, offline
- OpenReview: 8, 8, 7, 7
- ExpEnv: d4rl dataset
-
- Jiafei Lyu, Xiu Li, Zongqing Lu
- Key: double check mechanism, bidirectional modeling, offline RL
- OpenReview: 7, 6, 6
- ExpEnv: d4rl dataset
-
- XiaoPeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, Zongqing Lu
- Key: multi-agent, model-based
- OpenReview: 7, 6, 4, 3
- ExpEnv: mpe, google research football
-
Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning
- Zhiwei Xu, Dapeng Li, Bin Zhang, Yuan Zhan, Yunpeng Bai, Guoliang Fan
- Key: multi-agent, model-based
- OpenReview: 6, 5
- ExpEnv: StarCraft II, Google Research Football, Multi-Agent Discrete MuJoCo
-
MoCoDA: Model-based Counterfactual Data Augmentation
- Silviu Pitis, Elliot Creager, Ajay Mandlekar, Animesh Garg
- Key: data augmentation framework, offline RL
- OpenReview: 7, 7, 7, 6
- ExpEnv: 2D Navigation, Hook-Sweep
-
When to Update Your Model: Constrained Model-based Reinforcement Learning
- Tianying Ji, Yu Luo, Fuchun Sun, Mingxuan Jing, Fengxiang He, Wenbing Huang
- Key: event-triggered mechanism, constrained model-shift lower-bound optimization
- OpenReview: 6, 6, 5, 5
- ExpEnv: mujoco
-
- Ashish Jayant, Shalabh Bhatnagar
- Key: constrained RL, model-based
- OpenReview: 7, 6, 5, 5
- ExpEnv: safety gym
-
Learning to Attack Federated Learning: A Model-based Reinforcement Learning Attack Framework
- Henger Li, Xiaolin Sun, Zizhan Zheng
- Key: attack & defense, federated learning, model-based
- OpenReview: 6, 6, 6, 5
- ExpEnv: MNIST, FashionMNIST, EMNIST, CIFAR-10 and synthetic dataset
-
Model-Based Imitation Learning for Urban Driving
- Anthony Hu, Gianluca Corrado, Nicolas Griffiths, Zachary Murez, Corina Gurau, Hudson Yeo, Alex Kendall, Roberto Cipolla, Jamie Shotton
- Key: model-based, imitation learning, autonomous driving
- OpenReview: 7, 6, 6
- ExpEnv: CARLA
-
Data-Driven Model-Based Optimization via Invariant Representation Learning
- Han Qi, Yi Su, Aviral Kumar, Sergey Levine
- Key: domain adaptation, invariant objective models, representation learning (no about model-based RL)
- OpenReview: 7, 6, 6, 5, 5
- ExpEnv: design-bench
-
Model-based Lifelong Reinforcement Learning with Bayesian Exploration
- Haotian Fu, Shangqun Yu, Michael Littman, George Konidaris
- Key: lifelong RL, variational bayesian
- OpenReview: 7, 6, 6
- ExpEnv: mujoco, meta-world
-
Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning
-
Joint Model-Policy Optimization of a Lower Bound for Model-Based RL
- Benjamin Eysenbach, Alexander Khazatsky, Sergey Levine, Russ Salakhutdinov
- Key: unified objective for model-based RL
- OpenReview: 8, 8, 7, 6
- ExpEnv: gridworld, mujoco, ROBEL manipulation
-
RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning
- Marc Rigter, Bruno Lacerda, Nick Hawes
- Key: offline rl, model-based rl, two-player game, adversarial model training
- OpenReview: 6, 6, 6, 4
- ExpEnv: d4rl
-
Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning
- Shenao Zhang
- Key: posterior sampling RL, referential update, constrained conservative update
- OpenReview: 7, 7, 5, 5
- ExpEnv: mujoco, N-Chain MDPs
-
Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning
- Chenyang Wu, Tianci Li, Zongzhang Zhang, Yang Yu
- Key: optimism in the face of uncertainty(OFU), BOO Regret
- OpenReview: 6, 6, 5
- ExpEnv: RiverSwim, Chain, Random MDPs
-
Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity
- Alekh Agarwal, Tong Zhang
- Key: posterior sampling RL, Bellman error decoupling framework
- OpenReview: 7, 7, 7, 6
- ExpEnv: None
-
Exponential Family Model-Based Reinforcement Learning via Score Matching
- Gene Li, Junbo Li, Nathan Srebro, Zhaoran Wang, Zhuoran Yang
- Key: optimistic model-based, score matching
- OpenReview: 7, 7, 6
- ExpEnv: None
-
Deep Hierarchical Planning from Pixels
- Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel
- Key: hierarchical RL, long-horizon and sparse reward tasks
- OpenReview: 6, 6, 5
- ExpEnv: atari, deepmind control suite, deepmind lab, crafter
-
Continuous MDP Homomorphisms and Homomorphic Policy Gradient
- Sahand Rezaei-Shoshtari, Rosie Zhao, Prakash Panangaden, David Meger, Doina Precup
- Key: Homomorphic Policy Gradient, Continuous MDP Homomorphisms, Lax Bisimulation Loss
- OpenReview: 7, 7, 7
- ExpEnv: deepmind control suite
ICML 2022
Toggle
-
DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations
- Fei Deng, Ingook Jang, Sungjin Ahn
- Key: dreamer, prototypes
- ExpEnv: deepmind control suite
-
Denoised MDPs: Learning World Models Better Than the World Itself
- Tongzhou Wang, Simon Du, Antonio Torralba, Phillip Isola, Amy Zhang, Yuandong Tian
- Key: representation learning, denoised model
- ExpEnv: deepmind control suite, RoboDesk
-
- Qi Wang, Herke van Hoof
- Key: graph structured surrogate model, meta training
- ExpEnv: atari, mujoco
-
Towards Adaptive Model-Based Reinforcement Learning
- Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, Harm van Seijen
- Key: local change adaptation
- ExpEnv: GridWorldLoCA, ReacherLoCA, MountaincarLoCA
-
Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation
- Pier Giuseppe Sessa, Maryam Kamgarpour, Andreas Krause
- Key: model-based multi-agent, confidence bound
- ExpEnv: SMART
-
- Shentao Yang, Yihao Feng, Shujian Zhang, Mingyuan Zhou
- Key: offline rl, model-based rl, stationary distribution regularization
- ExpEnv: d4rl
-
Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization
- Brandon Trabucco, Xinyang Geng, Aviral Kumar, Sergey Levine
- Key: benchmark, offline MBO
- ExpEnv: Design-Bench Benchmark Tasks
-
Temporal Difference Learning for Model Predictive Control
- Nicklas Hansen, Hao Su, Xiaolong Wang
- Key: td-learning, MPC
- ExpEnv: deepmind control suite, Meta-World
ICLR 2022
Toggle
-
Revisiting Design Choices in Offline Model Based Reinforcement Learning
- Cong Lu, Philip Ball, Jack Parker-Holder, Michael Osborne, Stephen J. Roberts
- Key: model-based offline, uncertainty quantification
- OpenReview: 8, 8, 6, 6, 6
- ExpEnv: d4rl dataset
-
Value Gradient weighted Model-Based Reinforcement Learning
- Claas A Voelcker, Victor Liao, Animesh Garg, Amir-massoud Farahmand
- Key: Value-Gradient weighted Model loss
- OpenReview: 8, 8, 6, 6
- ExpEnv: mujoco
-
Planning in Stochastic Environments with a Learned Model
- Ioannis Antonoglou, Julian Schrittwieser, Sherjil Ozair, Thomas K Hubert, David Silver
- Key: MCTS, stochastic MuZero
- OpenReview: 10, 8, 8, 5
- ExpEnv: 2048 game, Backgammon, Go
-
Policy improvement by planning with Gumbel
- Ivo Danihelka, Arthur Guez, Julian Schrittwieser, David Silver
- Key: Gumbel AlphaZero, Gumbel MuZero
- OpenReview: 8, 8, 8, 6
- ExpEnv: go, chess, atari
-
Model-Based Offline Meta-Reinforcement Learning with Regularization
- Sen Lin, Jialin Wan, Tengyu Xu, Yingbin Liang, Junshan Zhang
- Key: model-based offline Meta-RL
- OpenReview: 8, 6, 6, 6
- ExpEnv: d4rl dataset
-
Information Prioritization through Empowerment in Visual Model-based RL
- Homanga Bharadhwaj, Mohammad Babaeizadeh, Dumitru Erhan, Sergey Levine
- Key: mutual information, visual model-based RL
- OpenReview: 8, 8, 8, 6
- ExpEnv: deepmind control suite, Kinetics dataset
-
Transfer RL across Observation Feature Spaces via Model-Based Regularization
- Yanchao Sun, Ruijie Zheng, Xiyao Wang, Andrew E Cohen, Furong Huang
- Key: latent dynamics model, transfer RL
- OpenReview: 8, 6, 5, 5
- ExpEnv: CartPole, Acrobot and Cheetah-Run, mujoco, 3DBall
-
Learning State Representations via Retracing in Reinforcement Learning
- Changmin Yu, Dong Li, Jianye HAO, Jun Wang, Neil Burgess
- Key: representation learning, learning via retracing
- OpenReview: 8, 6, 5, 3
- ExpEnv: deepmind control suite
-
Model-augmented Prioritized Experience Replay
- Youngmin Oh, Jinwoo Shin, Eunho Yang, Sung Ju Hwang
- Key: prioritized experience replay, mbrl
- OpenReview: 8, 8, 6, 5
- ExpEnv: pybullet
-
Evaluating Model-Based Planning and Planner Amortization for Continuous Control
- Arunkumar Byravan, Leonard Hasenclever, Piotr Trochim, Mehdi Mirza, Alessandro Davide Ialongo, Yuval Tassa, Jost Tobias Springenberg, Abbas Abdolmaleki, Nicolas Heess, Josh Merel, Martin Riedmiller
- Key: model predictive control
- OpenReview: 8, 6, 6, 6
- ExpEnv: mujoco
-
Gradient Information Matters in Policy Optimization by Back-propagating through Model
- Chongchong Li, Yue Wang, Wei Chen, Yuting Liu, Zhi-Ming Ma, Tie-Yan Liu
- Key: two-model-based method, analyze model error and policy gradient
- OpenReview: 8, 8, 6, 6
- ExpEnv: mujoco
-
Pareto Policy Pool for Model-based Offline Reinforcement Learning
- Yijun Yang, Jing Jiang, Tianyi Zhou, Jie Ma, Yuhui Shi
- Key: model-based offline, model return-uncertainty trade-off
- OpenReview: 8, 8, 6, 5
- ExpEnv: d4rl dataset
-
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage
- Masatoshi Uehara, Wen Sun
- Key: model-based offline theory, PAC bounds
- OpenReview: 8, 6, 6, 5
- ExpEnv: None
-
Know Thyself: Transferable Visual Control Policies Through Robot-Awareness
- Edward S. Hu, Kun Huang, Oleh Rybkin, Dinesh Jayaraman
- Key: world models that transfer to new robots
- OpenReview: 8, 6, 6, 5
- ExpEnv: mujoco, WidowX and Franka Panda robot
NeurIPS 2021
Toggle
-
On Effective Scheduling of Model-based Reinforcement Learning
-
COMBO: Conservative Offline Model-Based Policy Optimization
- Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn
- Key: offline reinforcement learning, model-based reinforcement learning, deep reinforcement learning
- OpenReview: 6, 7, 6, 8
- ExpEnv: d4rl dataset
-
Safe Reinforcement Learning by Imagining the Near Future
- Garrett Thomas, Yuping Luo, Tengyu Ma
- Key: safe rl, reward penalty, theory about model-based rollouts
- OpenReview: 8, 6, 6
- ExpEnv: mujoco
-
Model-Based Reinforcement Learning via Imagination with Derived Memory
- Yao Mu, Yuzheng Zhuang, Bin Wang, Guangxiang Zhu, Wulong Liu, Jianyu Chen, Ping Luo, Shengbo Eben Li, Chongjie Zhang, Jianye HAO
- Key: extension of dreamer, prediction-reliability weight
- OpenReview: 6, 6, 6, 6
- ExpEnv: deepmind control suite
-
MobILE: Model-Based Imitation Learning From Observation Alone
-
Model-Based Episodic Memory Induces Dynamic Hybrid Controls
- Hung Le, Thommen Karimpanal George, Majid Abdolshah, Truyen Tran, Svetha Venkatesh
- Key: model-based, episodic control
- OpenReview: 7, 7, 6, 6
- ExpEnv: 2D maze navigation, cartpole, mountainCar and lunarlander, atari, 3D navigation: gym-miniworld
-
A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning
- Mingde Zhao, Zhen Liu, Sitao Luan, Shuyuan Zhang, Doina Precup, Yoshua Bengio
- Key: mbrl, set representation
- OpenReview: 7, 7, 7, 6
- ExpEnv: MiniGrid-BabyAI framework
-
Mastering Atari Games with Limited Data
- Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao
- Key: muzero, self-supervised consistency loss
- OpenReview: 7, 7, 7, 5
- ExpEnv: atrai 100k, deepmind control suite
-
Online and Offline Reinforcement Learning by Planning with a Learned Model
- Julian Schrittwieser, Thomas K Hubert, Amol Mandhane, Mohammadamin Barekatain, Ioannis Antonoglou, David Silver
- Key: muzero, reanalyse, offline
- OpenReview: 8, 8, 7, 6
- ExpEnv: atrai dataset, deepmind control suite dataset
-
Self-Consistent Models and Values
- Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver
- Key: new model learning way
- OpenReview: 7, 7, 7, 6
- ExpEnv: tabular MDP, Sokoban, atari
-
- Christopher Grimm, Andre Barreto, Gregory Farquhar, David Silver, Satinder Singh
- Key: value equivalence, value-based planning, muzero
- OpenReview: 8, 7, 7, 6
- ExpEnv: four rooms, atari
-
MOPO: Model-based Offline Policy Optimization
- Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma
- Key: model-based, offline
- OpenReview: None
- ExpEnv: d4rl dataset, halfcheetah-jump and ant-angle
-
RoMA: Robust Model Adaptation for Offline Model-based Optimization
- Sihyun Yu, Sungsoo Ahn, Le Song, Jinwoo Shin
- Key: model-based, offline
- OpenReview: 7, 6, 6
- ExpEnv: design-bench
-
Offline Reinforcement Learning with Reverse Model-based Imagination
- Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li, Chongjie Zhang
- Key: model-based, offline
- OpenReview: 7, 6, 6, 5
- ExpEnv: d4rl dataset
-
Offline Model-based Adaptable Policy Learning
- Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Tony Qin, Shang Wenjie, Jieping Ye
- Key: model-based, offline
- OpenReview: 6, 6, 6, 4
- ExpEnv: d4rl dataset
-
Weighted model estimation for offline model-based reinforcement learning
- Toru Hishinuma, Kei Senda
- Key: model-based, offline, off-policy evaluation
- OpenReview: 7, 6, 6, 6
- ExpEnv: pendulum, d4rl dataset
-
Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation
- Weitong Zhang, Dongruo Zhou, Quanquan Gu
- Key: learning theory, model-based reward-free RL, linear function approximation
- OpenReview: 6, 6, 5, 5
- ExpEnv: None
-
- Kefan Dong, Jiaqi Yang, Tengyu Ma
- Key: learning theory, model-based bandit RL, nonlinear function approximation
- OpenReview: 7, 7, 7, 6
- ExpEnv: None
-
Discovering and Achieving Goals via World Models
- Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, Deepak Pathak
- Key: unsupervised goal reaching, goal-conditioned RL
- OpenReview: 6, 6, 6, 6, 6
- ExpEnv: walker, quadruped, bins, kitchen
ICLR 2021
Toggle
-
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
- Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu
- Key: model-based, behavior cloning (warmup), trpo
- OpenReview: 8, 7, 7, 5
- ExpEnv: d4rl dataset
-
Control-Aware Representations for Model-based Reinforcement Learning
- Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh
- Key: representation learning, model-based soft actor-critic
- OpenReview: 6, 6, 6
- ExpEnv: planar system, inverted pendulum – swingup, cartpole, 3-link manipulator — swingUp & balance
-
Mastering Atari with Discrete World Models
- Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba
- Key: DreamerV2, many tricks(multiple categorical variables, KL balancing, etc)
- OpenReview: 9, 8, 5, 4
- ExpEnv: atari
-
Model-Based Visual Planning with Self-Supervised Functional Distances
- Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, Sergey Levine
- Key: goal-reaching task, dynamics learning, distance learning (goal-conditioned Q-function)
- OpenReview: 7, 7, 7, 7
- ExpEnv: sawyer, door sliding
-
- Arthur Argenson, Gabriel Dulac-Arnold
- Key: model-based, offline
- OpenReview: 8, 7, 5, 5
- ExpEnv: RL Unplugged(RLU), d4rl dataset
-
Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation
- Justin Fu, Sergey Levine
- Key: model-based, offline
- OpenReview: 8, 6, 6
- ExpEnv: design-bench
-
On the role of planning in model-based deep reinforcement learning
- Jessica B. Hamrick, Abram L. Friesen, Feryal Behbahani, Arthur Guez, Fabio Viola, Sims Witherspoon, Thomas Anthony, Lars Buesing, Petar Veličković, Théophane Weber
- Key: discussion about planning in MuZero
- OpenReview: 7, 7, 6, 5
- ExpEnv: atari, go, deepmind control suite
-
Representation Balancing Offline Model-based Reinforcement Learning
- Byung-Jun Lee, Jongmin Lee, Kee-Eung Kim
- Key: Representation Balancing MDP, model-based, offline
- OpenReview: 7, 7, 7, 6
- ExpEnv: d4rl dataset
-
- Balázs Kégl, Gabriel Hurtado, Albert Thomas
- Key: mixture density nets, heteroscedasticity
- OpenReview: 7, 7, 7, 6, 5
- ExpEnv: acrobot system
ICML 2021
Toggle
-
Conservative Objective Models for Effective Offline Model-Based Optimization
- Brandon Trabucco, Aviral Kumar, Xinyang Geng, Sergey Levine
- Key: conservative objective model, offline mbrl
- ExpEnv: design-bench
-
Continuous-Time Model-Based Reinforcement Learning
- Çağatay Yıldız, Markus Heinonen, Harri Lähdesmäki
- Key: continuous-time
- ExpEnv: pendulum, cartPole and acrobot
-
Model-Based Reinforcement Learning via Latent-Space Collocation
- Oleh Rybkin, Chuning Zhu, Anusha Nagabandi, Kostas Daniilidis, Igor Mordatch, Sergey Levine
- Key: latent space collocation
- ExpEnv: sparse metaworld tasks
-
Model-Free and Model-Based Policy Evaluation when Causality is Uncertain
- David A Bruns-Smith
- Key: worst-case bounds
- ExpEnv: ope-tools
-
Muesli: Combining Improvements in Policy Optimization
- Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, Hado van Hasselt
- Key: value equivalence
- ExpEnv: atari
-
Vector Quantized Models for Planning
- Sherjil Ozair, Yazhe Li, Ali Razavi, Ioannis Antonoglou, Aäron van den Oord, Oriol Vinyals
- Key: VQVAE, MCTS
- ExpEnv: chess datasets, DeepMind Lab
-
PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration
- Yuda Song, Wen Sun
- Key: sample complexity, kernelized nonlinear regulators, linear MDPs
- ExpEnv: mountain car, antmaze, mujoco
-
Temporal Predictive Coding For Model-Based Planning In Latent Space
- Tung Nguyen, Rui Shu, Tuan Pham, Hung Bui, Stefano Ermon
- Key: temporal predictive coding with a RSSM, latent space
- ExpEnv: deepmind control suite
-
Model-based Reinforcement Learning for Continuous Control with Posterior Sampling
- Ying Fan, Yifei Ming
- Key: regret bound of psrl, mpc
- ExpEnv: continuous cartpole, pendulum swingup, mujoco
-
A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
- Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin
- Key: learning theory, multi-agent, model-based self play, two-player zero-sum Markov games
- ExpEnv: None
Other
-
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
- Yuan Pu, Yazhe Niu, Zhenjie Yang, Jiyuan Ren, Hongsheng Li, Yu Liu TMLR2025
- Key: world model, MCTS, model-based reinforcement learning, transformer, latent planning, multitask learning
- ExpEnv: Atari, DMControl, VisualMatch
-
- Yuqi Wang, Jiawei He, Lue Fan, Hongxin Li, Yuntao Chen, Zhaoxiang Zhang CVPR 2024
- Key: AutoDrive world modeling
- ExpEnv: nuScenes
-
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving
-
Masked Trajectory Models for Prediction, Representation, and Control
- Philipp Wu, Arjun Majumdar, Kevin Stone, Yixin Lin, Igor Mordatch, Pieter Abbeel, Aravind Rajeswaran ICLR 2023 Workshop RRL
- Key: offline RL, learning for control, sequence modeling
- ExpEnv: d4rl
-
World Models via Policy-Guided Trajectory Diffusion
- Marc Rigter, Jun Yamada, Ingmar Posner Arxiv 2023
- Key: Diffusion model, world model
- ExpEnv: deepmind control suite, gridworld
-
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization
- Carlos E. Luis, Alessandro G. Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters Arxiv 2023
- Key: cumulative rewards uncertainty estimation in MBRL
- ExpEnv: mujoco
-
- Thomas Bi, Raffaello D'Andrea. Arxiv 2023
- Key: Data-Augmented, DreamerV3
- ExpEnv: Real-World Labyrinth Game
-
Mastering Diverse Domains through World Models
- Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap. Arxiv 2023
- Key: DreamerV3, scaling property to world model
- ExpEnv: deepmind control suite, atari, DMLab, minecraft
-
Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning
- Chuming Li, Ruonan Jia, Jiawei Yao, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang. IJCAI Workshop 2023
- Key: extended policy improvement, model regularization, planning theorem
- ExpEnv: mujoco
Tutorial
- [Video] Csaba Szepesvári - The challenges of model-based reinforcement learning and how to overcome them
- [Blog] Model-Based Reinforcement Learning: Theory and Practice
Codebase
Contributing
Our purpose is to make this repo even better. If you are interested in contributing, please refer to HERE for instructions in contribution.
License
Awesome Model-Based RL is released under the Apache 2.0 license.