Awesome Diffusion Model in RL

May 30, 2026 · View on GitHub

Awesome docs visitor badge GitHub stars GitHub forks GitHub license

This is a collection of research papers for Diffusion Model in RL. And the repository will be continuously updated to track the frontier of Diffusion RL.

Welcome to follow and star!

Table of Contents

Overview of Diffusion Model in RL

The Diffusion Model in RL was introduced by “Planning with Diffusion for Flexible Behavior Synthesis” by Janner, Michael, et al. It casts trajectory optimization as a diffusion probabilistic model that plans by iteratively refining trajectories.

image info

There is another way: "Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning" by Wang, Z. proposed Diffusion Model as policy-optimization in offline RL, et al. Specifically, Diffusion-QL forms policy as a conditional diffusion model with states as the condition from the offline policy-optimization perspective.

image info

Advantage

  1. Bypass the need for bootstrapping for long term credit assignment.
  2. Avoid undesirable short-sighted behaviors due to the discounting future rewards.
  3. Enjoy the diffusion models widely used in language and vision, which are easy to scale and adapt to multi-modal data.

Papers

format:
- [title](paper link) [links]
  - author1, author2, and author3...
  - publisher
  - key 
  - code 
  - experiment environment

Arxiv

ICML 2026

  • Reparameterization Flow Policy Optimization

    • Hai Zhong, Zhuoran Li, Xun Wang, Longbo Huang
    • Key: flow policy optimization, reparameterization policy gradient, stability regularization, exploration regularization
    • ExpEnv: locomotion and manipulation tasks with rigid and soft bodies under both state and visual inputs
  • Mean Flow Policy Optimization

    • Xiaoyi Dong, Xi Sheryl Zhang, Jian Cheng
    • Key: mean-flow policies, online RL, maximum-entropy RL, efficient flow-based policy optimization
    • ExpEnv: MuJoCo and DeepMind Control Suite benchmarks
  • DADP: Domain Adaptive Diffusion Policy

    • Pengcheng Wang, Qinghang Liu, Haotian Lin, Yiheng Li, Guojian Zhan, Masayoshi Tomizuka, Yixiao Wang
    • Key: domain-adaptive diffusion policy, disentangled domain representation, zero-shot adaptation, diffusion injection
    • ExpEnv: domain-generalization benchmarks across locomotion and manipulation tasks
  • PromptRL: Prompt Matters in RL for Flow-Based Image Generation

    • Fu-Yun Wang, Han Zhang, Michael Gharbi, Hongsheng Li, Taesung Park
    • Key: flow-based image generation, RL post-training, prompt refinement, prompt robustness
    • ExpEnv: GenEval, OCR accuracy, PickScore, and FLUX.1-Kontext image-editing reward evaluations
  • FAIL: Flow Matching Adversarial Imitation Learning for Image Generation

    • Yeyao Ma, Chen Li, Xiaosong Zhang, Han Hu, Weidi Xie
    • Key: adversarial imitation learning, flow matching, reward-free alignment, low-variance pathwise gradients
    • ExpEnv: prompt-following and aesthetic benchmarks, plus discrete image and video generation settings with FLUX fine-tuning
  • Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models

    • Shuchen Xue, Chongjian Ge, Shilong Zhang, Yichen Li, Zhi-Ming Ma
    • Key: advantage-weighted matching, diffusion RL, score/flow matching alignment, lower-variance policy gradients
    • ExpEnv: GenEval, OCR, and PickScore benchmarks on Stable Diffusion 3.5 Medium and FLUX

ICLR 2026

  • Exploratory Diffusion Model for Unsupervised Reinforcement Learning

    • Chengyang Ying, Huayu Chen, Xinning Zhou, Zhongkai Hao, Hang Su, Jun Zhu
    • Key: diffusion exploration policy, score-based intrinsic reward, unsupervised RL exploration
    • ExpEnv: Maze2d and URLB benchmarks
  • Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation

    • Guojian Zhan, Letian Tao, Pengcheng Wang, Yixiao Wang, Yuxin Chen, Yiheng Li, Hongyang Li, Masayoshi Tomizuka, Shengbo Eben Li
    • Key: mean-flow policy, instantaneous velocity constraint, one-step flow action generation
    • ExpEnv: Robomimic and OGBench robotic manipulation tasks
  • Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation

    • Xintong Duan, Yutong He, Fahim Tajwar, Ruslan Salakhutdinov, J Zico Kolter, Jeff Schneider
    • Key: consistency trajectory distillation, reward-aware planner compression, efficient offline diffusion planning
    • ExpEnv: Gym MuJoCo, FrankaKitchen, and long-horizon planning benchmarks
  • Dichotomous Diffusion Policy Optimization

    • Ruiming Liang, Yinan Zheng, Kexin Zheng, Tianyi Tan, Jianxiong Li, Liyuan Mao, Zhihao Wang, Guang Chen, Hangjun Ye, Jingjing Liu, Jinqiao Wang, Xianyuan Zhan
    • Key: dual diffusion-policy optimization, reward-max/reward-min controllable inference, stable policy improvement
    • ExpEnv: ExORL, OGBench, and NAVSIM autonomous-driving VLA evaluation
  • Flow Matching Policy Gradients

    • David McAllister, Songwei Ge, Brent Yi, Chung Min Kim, Ethan Weber, Hongsuk Choi, Haiwen Feng, Angjoo Kanazawa
    • Key: flow matching policy gradient, on-policy optimization, expressive continuous-action policy learning
    • ExpEnv: continuous control benchmarks
  • Flow Actor-Critic for Offline Reinforcement Learning

    • Jongseong Chae, Jongeui Park, Yongjae Shin, Gyeongmin Kim, Seungyul Han, Youngchul Sung
    • Key: flow-based actor-critic, conservative critic regularization, robust offline RL optimization
    • ExpEnv: D4RL and OGBench

NeurIPS 2025

ICML 2025

  • Graph Diffusion for Robust Multi-Agent Coordination

    • Xianghua Zeng, Hang Su, Zhengyi Wang, Zhiyuan LIN
    • Key: multi-agent coordination, offline reinforcement learning, diffusion models, Multi-Agent Reinforcement Learning (MARL), offline RL, graph diffusion models, policy robustness.
    • ExpEnv: Multi-Agent Particle Environments (MPE) (Spread, Tag, World tasks), Multi-Agent MuJoCo (MAMuJoCo) (2-agent halfcheetah, 2-agent ant, 4-agent ant), StarCraft Multi-Agent Challenge (SMAC)
  • DiMa: Understanding the Hardness of Online Matching Problems via Diffusion Models

    • Boyu Zhang, Aocheng Shen, Bing Liu, Qiankun Zhang, Bin Yuan, Jing Wang, Shenghao Liu, Xianjun Deng
    • Key: Online Bipartite Matching (OBM), Diffusion Model, Reinforcement Learning, hardness of combinatorial optimization, DDPMs, shortcut policy gradient (SPG), AI-enhanced combinatorial optimization.
    • ExpEnv: fractional OBM, OBM with random arrivals, OBM with stochastic rewards, thick-z graph instances.
  • RobustLight: Improving Robustness via Diffusion Reinforcement Learning for Traffic Signal Control

    • Mingyuan Li, Jiahao Wang, Guangsheng Yu, Xu Wang, Qianrun Chen, Wei Ni, Lixiang Li, Haipeng Peng
    • Key: reinforcement learning, diffusion, traffic signal control (TSC), robustness, adversarial attacks, missing data, dynamic state infilling.
    • ExpEnv: Cityflow (simulator), JiNan Datasets (JiNan1, JiNan2, JiNan3), HangZhou Datasets (HangZhou1, HangZhou2), New York Datasets (Newyork1, Newyork2).
  • Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks' Internal Representations

    • Aditya Taparia, Som Sagar, Ransalu Senanayake
    • Key: Explainable AI (XAI), Concept Generation, Vision-Language Models, Reinforcement Learning, Preference Learning, RL-based preference optimization (RLPO), TCAV, Generative Models, Understanding Neural Networks' Internal Representations, Diffusion Models (Stable Diffusion/SD), Deep Q-Network (DQN).
    • ExpEnv: GoogleNet, InceptionV3, ViT, Swin (CNN-based and Transformer-based classifiers pre-trained on ImageNet).

ICLR 2025

NeurIPS 2024

  • Adversarial Environment Design via Regret-Guided Diffusion Models

    • Hojun Chung, Junseo Lee, Minsoo Kim, Dohyeong Kim, Songhwai Oh
    • Key: Reinforcement Learning, Unsupervised Environment Design, Diffusion Models
    • ExpEnv: Minigrid, Partially Observable Maze Navigation, 2D Bipedal Locomotion
  • Graph Diffusion Policy Optimization

    • Yijing Liu, Chao Du, Tianyu Pang, Chongxuan Li, Min Lin, Wei Chen
    • Keyword: Graph Generation, Diffusion Models, Reinforcement Learning
    • ExpEnv: Drug Design, Graph Generation Tasks
    • Code: official
  • PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference

    • Kendong Liu, Zhiyu Zhu, Chuanhao Li, Hui Liu, Huanqiang Zeng, Junhui Hou
    • Key: Image Inpainting, Diffusion Models, Reinforcement Learning, Human Preference Alignment
    • Exp: Image inpainting comparison, image extension, 3D reconstruction
    • Code: official
  • Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models

    • Sangwoong Yoon, Himchan Hwang, Dohyun Kwon, Yung-Kyun Noh, Frank C. Park
    • Key: Diffusion Models, Maximum Entropy Inverse Reinforcement Learning (IRL), Energy-Based Models (EBM), Anomaly Detection
    • ExpEnv: Empirical studies on generative modeling and anomaly detection tasks.
  • Text-Aware Diffusion for Policy Learning

    • Calvin Luo, Mandy He, Zilai Zeng, Chen Sun
    • Key: Reinforcement Learning, Text-Conditioned Diffusion, Zero-Shot Reward, Policy Learning
    • ExpEnv: Humanoid, Dog environments, Meta-World
  • Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient

    • Zechu Li, Rickmer Krohn, Tao Chen, Anurag Ajay, Pulkit Agrawal, Georgia Chalvatzaki
    • Key: Reinforcement Learning, Multimodal Learning, Diffusion Models, Actor-Critic Algorithm
    • ExpEnv: High-dimensional continuous control tasks, Maze navigation with unseen obstacles
  • Model-Based Diffusion for Trajectory Optimization

    • Chaoyi Pan, Zeji Yi, Guanya Shi, Guannan Qu
    • Key: Model-Based Diffusion, Trajectory Optimization, Diffusion Models
    • ExpEnv: Contact-rich Tasks, High-dimensional Humanoids
  • Diffusion for World Modeling: Visual Details Matter in Atari

    • Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, François Fleuret
    • Key: Reinforcement Learning, Diffusion Models, World Modeling, Visual Details
    • ExpEnv: Atari 100k Benchmark, Counter-Strike: Global Offensive
  • MADiff: Offline Multi-agent Learning with Diffusion Models

    • Zhengbang Zhu, Minghuan Liu, Liyuan Mao, Bingyi Kang, Minkai Xu, Yong Yu, Stefano Ermon, Weinan Zhang
    • Key: Offline Reinforcement Learning, Multi-agent Learning, Diffusion Models, Coordination
    • ExpEnv: Multi-agent Learning Tasks
    • Code: official
  • Amortizing Intractable Inference in Diffusion Models for Vision, Language, and Control

    • Siddarth Venkatraman, Moksh Jain, Luca Scimeca, Minsu Kim, Marcin Sendera, Mohsin Hasan, Luke Rowe, Sarthak Mittal, Pablo Lemos, Emmanuel Bengio, Alexandre Adam, Jarrid Rector-Brooks, Yoshua Bengio, Glen Berseth, Nikolay Malkin
    • Key: Diffusion Models, Amortized Inference, Reinforcement Learning, Vision, Language, Multimodal Data
    • ExpEnv: Vision (Classifier Guidance), Language (Infilling under Discrete Diffusion LLM), Multimodal (Text-to-Image Generation), Offline RL Benchmarks
  • Diffusion Actor-Critic with Entropy Regulator

    • Yinuo Wang, Likun Wang, Yuxuan Jiang, Wenjun Zou, Tong Liu, Xujie Song, Wenxuan Wang, Liming Xiao, Jiang Wu, Jingliang Duan, Shengbo Eben Li
    • Key: Reinforcement Learning, Diffusion Models, Entropy Regulation, Multimodal Policy
    • ExpEnv: MuJoCo Benchmarks, Multimodal Tasks
  • Diffusion Spectral Representation for Reinforcement Learning

    • Dmitry Shribak, Chen-Xiao Gao, Yitong Li, Chenjun Xiao, Bo Dai
    • Key: Reinforcement Learning, Diffusion Models, Representation Learning, Markov Decision Processes (MDP), Partially Observable Markov Decision Processes (POMDP)
    • ExpEnv: Various RL Benchmarks (Fully and Partially Observable Settings)

ICML 2024

CVPR 2024

ICLR 2024

NeurIPS 2023

ICML 2023

ICLR 2023

ICRA 2023

NeurIPS 2022

ICML 2022

Codebase

  • GenerativeRL

    • Zhang, Jinouwen and Xue, Rongkun and Niu, Yazhe and Chen, Yun and Chen, Xinyan and Wang, Ruiheng and Liu, Yu
    • Publisher: GitHub
    • Key: Reinforcement Learning, Generative Model, Diffusion Model, Flow Model
    • Code: official
  • CleanDiffuser

    • Zibin Dong and Yifu Yuan and Jianye Hao and Fei Ni and Yi Ma and Pengyi Li and Yan Zheng
    • Publisher: GitHub
    • Key: Reinforcement Learning, Generative Model, Diffusion Model, Flow Model
    • Code: official

Contributing

Our purpose is to make this repo even better. If you are interested in contributing, please refer to HERE for instructions in contribution.

License

Awesome Diffusion Model in RL is released under the Apache 2.0 license.