Awesome Papers Merging Generative AI and Reinforcement Learning in Robotics

April 23, 2026 · View on GitHub

Awesome

This repository contains a curated list of papers and resources related to the survey titled "The Duality of Generative AI and Reinforcement Learning in Robotics: A Review".

The paper explores the synergy between modern generative AI tools (transformer- and diffusion-based models) and Reinforcement Learning (RL) for advancing robotic intelligence and physical grounding.

We also provide five Excel files (one for each category) that offer detailed summaries of the analyses we performed using the paper's taxonomy. These summaries cover several features of the analyzed papers, such as name of the framework, model used, code availability, dataset, type of application, simulation vs. real-world, crosscategories, experiment evaluation, year of publication, and short description.

Abstract

Our review paper examines the integration of generative AI models, specifically transformer- and diffusion-based models, with reinforcement learning (RL) to advance robotic physical grounding, ensuring robotic actions are based on environment interactions rather than purely computational inference. Our primary focus is on the interplay between generative AI and RL for robotics downstream tasks. Specifically, we investigate: (1) The role of generative AI tools (large language models, vision-language models, diffusion models, and world models) as priors in RL-based robotics. (2) The integration of different input-output modalities from pre-trained modules into the RL training loop. (3) How RL can train generative models for robotic policies, similar to its applications in language models. We then propose a new taxonomy based on our findings. Lastly, we identify key trends and open challenges, accounting for model scalability, generalizability, and grounding. Moreover, we devise architectural trade-offs in RL fine-tuning strategies for generative policies. We also reflect on issues inherent to generative black-box policies, such as safety concerns and failure modes, which could be addressed through learning-based approaches like RL. Actually, our findings suggest that learning-based control techniques will play a crucial role in grounding generative policies within real-world constraints in general.

(Keywords: Robotics, Generative AI, Foundation model, Reinforcement learning, Physical grounding)

To visualize the evolution of research in this domain, the following figure illustrates the trends in the integration of generative AI and reinforcement learning for robotics.

Trends in generative AI and RL integration for robotics.
Figure 1: Trends in generative AI and RL integration for robotics.

The Duality of Generative AI and Reinforcement Learning

The relationship between Reinforcement Learning and state-of-the-art generative models is a central theme of our review.This interplay is a duality with mutual benefits: generative models enhance RL capabilities, and RL helps ground generative policies in real-world applications. This symbiotic relationship is depicted in the following figure.


Duality between RL and generative AI models in robotics.
Figure 2: Duality between RL and generative AI models in robotics.

Taxonomy

Paper Taxonomy.
Figure 3: Taxonomy.

Generative Tools for RL

Generative Tools for RL explores how various architectures from modern generative AI can be integrated into the RL training loop. We analyze prior work on leveraging generative and foundation models to enhance robotics, focusing on architectures based on Transformer or Diffusion backbones. "As tools" highlights that pre-trained foundation models (like LLMs) are not being retrained end-to-end with the RL agent, but are instead leveraged in a modular way — as plug-and-play components that provide capabilities (such as understanding or generating specific modalities) that the RL agent can use during training or decision making.

Generative AI tools for RL.
Figure 4: Generative AI tools for RL.

For Generative Tools for RL, we categorize papers based on their underlying model architecture, which we refer to as the Base Model; the input and output modalities, referred to as Modality; and finally, the aim of the RL process, which we call the Task.

Base Model

The Base Model section classifies the papers according to their backbone architecture, briefly describes their features, and summarizes key aspects in tables. These aspects are important when selecting a tool for the RL tasks. See the Excel tables for a detailed classification:

  1. Large Language Models
  2. Vision Language Models
  3. Diffusion models
  4. World Models
  5. Video Prediction Models

Modality

This section focuses on the classification of the five types of generative AI models used in RL with an emphasis on how their input/output modalities shape their role within RL frameworks. These modalities vary across models: LLMs process text; VLMs combine visual and textual data; diffusion models handle a range of low-level and sensory modalities; world models integrate multiple modalities and generate internal representations. In the following, we analyze how these modality choices translate into trade-offs between abstraction and grounding, diversity and specificity for RL tasks, and ease of integration with RL agents.

Model TypeInput ModalityOutput ModalityPrimary Role in RLTrade-off Focus
LLMsTextTextAbstract Reasoning: Symbolic processing, task goals, reward signals, high-level objectives, task refinements.High Abstraction, Less Grounding
VLMsVisual + TextReasoning over VisualsVisual Feedback/Context: Visual scene understanding, reasoning over visual inputs, bridging visual and textual context.High Abstraction, Moderately Grounded
Diffusion ModelsLow-level/Sensory DataLow-level Control ActionsFine-grained Control: Policy learning, state generation, precise continuous control signals in action space.High Grounding, Less Abstraction
World ModelsMulti-modal (Visual, Text, Proprioceptive, etc.)Multi-modal State Representations, PredictionsEnvironment Dynamics & Planning: Learning predictive models, rich multi-modal state representations, supporting model-based RL.Fuses Abstraction & Grounding

Task

This section explores how generative AI models address key challenges in robotic RL, such as sparse rewards, sample inefficiency, generalization, and goal specification, by enhancing stages like Reward Signal generation, State Representation, and Policy Learning.

Generative models as information fusion operators across RL tasks.
Figure 5: Generative models as information fusion operators across RL tasks.

Reward Signal

1.Learning reward functions with LLMs
  • Augmenting Autotelic Agents with Large Language Models [paper]
  • Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks [paper]
  • FoMo Rewards: Can we cast foundation models as reward functions? [paper]
  • Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation [paper]
  • RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models [paper]
  • Game On: Towards Language Models as RL Experimenters [paper]
  • DrEureka: Language Model Guided Sim-To-Real Transfer [paper]
  • Text2Reward: Reward Shaping with Language Models for Reinforcement Learning [paper]
  • Eureka: Human-Level Reward Design via Coding Large Language Models [paper]
  • Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics [paper]
  • Language to Rewards for Robotic Skill Synthesis [paper]
  • Guiding Pretraining in Reinforcement Learning with Large Language Models [paper]
  • (2025)! Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation With Large Language Models [paper]
  • (2025)! LLM-Guided Reinforcement Learning: Addressing Training Bottlenecks through Policy Modulation [paper]
  • (2025)! LLM Coach: Reward Shaping for Reinforcement Learning-Based Navigation Agent [paper]
  • (2025)! Reward Evolution with Graph-of-Thoughts: A Bi-Level Language Model Framework for Reinforcement Learning [paper]
  • (2025)! Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning [paper]
  • (2025)! Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning [paper]
2.VLMs for reward learning
  • Learning transferable visual models from natural language supervision [paper]
  • Zero-shot text-to-image generation [paper]
  • Vision-Language Models as a Source of Rewards [paper]
  • Vision language models are zero-shot reward models for reinforcement learning[paper]
  • Language Reward Modulation for Pretraining Reinforcement Learning [paper]
  • RoboCLIP: One Demonstration is Enough to Learn Robot Policies [paper]
  • Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning [paper]
  • Affordance-Guided Reinforcement Learning via Visual Prompting [paper]
  • LIV: Language-Image Representations and Rewards for Robotic Control [paper]
  • Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning [paper]
  • Code as Reward: Empowering Reinforcement Learning with VLMs [paper]
  • Zero-Shot Reward Specification via Grounded Natural Language [paper]
  • The dark side of rich rewards: Understanding and mitigating noise in vlm rewards [paper]
  • (2025)! A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning [paper]
  • (2025)! Policy Learning from Large Vision-Language Model Feedback without Reward Modeling [paper]
  • (2025)! VARP: Reinforcement Learning from Vision-Language Model Feedback with Agent Regularized Preferences [paper]
3.Reward learning with diffusion models
  • Diffusion Reward: Learning Rewards via Conditional Video Diffusion [paper]
  • Extracting Reward Functions from Diffusion Models [paper]
  • Diffused Value Function: Value Function Estimation using Conditional Diffusion Models for Control [paper]
  • Reward-Directed Conditional Diffusion Models for Directed Generation and Representation Learning [paper]
  • Learning a Diffusion Model Policy from Rewards via Q-Score Matching [paper]
  • (2025)! TeViR: Text-to-Video Reward with Diffusion Models for Efficient Reinforcement Learning [paper]
  • (2025)! GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning [paper]
4.Learning rewards from video prediction
  • Learning Generalizable Robotic Reward Functions from 'In-The-Wild' Human Videos [paper]
  • Video prediction models as rewards for reinforcement learning [paper]
  • Learning reward functions for robotic manipulation by observing humans [paper]
  • Vip: Towards universal visual reward and representation via value-implicit pre-training [paper]
  • (2025)! VideoWorld: Exploring Knowledge Learning from Unlabeled Videos [paper]
  • (2025)! LuciBot: Automated Robot Policy Learning from Generated Videos [paper]

State Representation

1.Learning representations from videos
  • FoundationReinforcement Learning: towards Embodied Generalist Agents with Foundation Prior Assistance [paper]
  • Learning Universal Policies via Text-Guided Video Generation [paper]
  • Robotic offline rl from internet videos via value-function pre-training [paper]
  • Where are we in the search for an artificial visual cortex for embodied intelligence? [paper]
  • Foundation Reinforcement Learning (FRL) [paper]
  • (2025)! Video Generators are Robot Policies [paper]
  • (2025)! Pre-Trained Video Generative Models as World Simulators [paper]
2.Foundation world models for model-based RL
  • Daydreamer: World models for physical robot learning [paper]
  • Recurrent World Models Facilitate Policy Evolution [paper]
  • Masked World Models for Visual Control [paper]
  • Multi-View Masked World Models for Visual Robotic Manipulation [paper]
  • RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [paper]
  • iVideoGPT: Interactive VideoGPTs are Scalable World Models [paper]
  • Zero-shot Safety Prediction for Autonomous Robots with Foundation World Models [paper]
  • Genie: Generative Interactive Environments [paper]
  • On the role of forgetting in fine-tuning reinforcement learning models [paper]
  • Improving Transformer World Models for Data-Efficient RL [paper]
  • Investigating online rl in world models [paper]
  • RoboDreamer: Learning Compositional World Models for Robot Imagination [paper]
  • R-AIF: Solving sparse-reward robotic tasks from pixels with active inference and world models [paper]
  • Learning View-invariant World Models for Visual Robotic Manipulation [paper]
  • MoDem-V2: Visuo-Motor World Models for Real-World Robot Manipulation [paper]
  • Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning [paper]
  • A generalizable egovision multimodal world model for fine-grained ego-motion, object dynamics, and scene composition control [paper]
  • World models for physical robot learning [paper]
  • GenRL: Multimodal Foundation World Models for Generalist Embodied Agents [paper]
  • EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents [paper]
  • Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling [paper]
  • GenSim: Generating Robotic Simulation Tasks via Large Language Models [paper]
  • UniSim: Learning Interactive Real-World Simulators [paper]
  • (2025)! RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation [paper]
  • (2025)! Learning Primitive Embodied World Models: Towards Scalable Robotic Learning [paper]
  • (2025)! GWM: Towards Scalable Gaussian World Models for Robotic Manipulation [paper]
  • (2025)! Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [paper]
  • (2025)! FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation [paper]
  • (2025)! Accelerating Model-Based Reinforcement Learning with State-Space World Models [paper]
  • (2025)! Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator [paper]
  • (2025)! Efficient Reinforcement Learning by Guiding Generalist World Models with Non-Curated Data [paper]
  • (2025)! RLVR-World: Training World Models with Reinforcement Learning [paper]

Planning & Exploration

1.LLMs for exploration
  • Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning [paper]
  • Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration [paper]
  • Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance [paper]
  • ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models [paper]
  • (2025)! LaGR-SEQ: Language-guided reinforcement learning with sample-efficient querying [paper]
  • (2025)! Language-Conditioned Offline RL for Multi-Robot Navigation [paper]
  • (2025)! LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models [paper]
2.VLMs for exploration
  • RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback [paper]
  • Vision-Language Models Provide Promptable Representations for Reinforcement Learning [paper]
  • Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation? [paper]
  • Towards A Unified Agent with Foundation Models [paper]
  • (2025)! Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning [paper]
  • (2025)! Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning [paper]
  • (2025)! VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making [paper]
  • (2025)! ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning [paper]
3.LLMs for planning
  • Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [paper]
  • Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents [paper]
  • Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks [paper]
  • Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning [paper]
  • Language Instructed Reinforcement Learning for Human-AI Coordination [paper]
  • Retrieval-Augmented Hierarchical in-Context Reinforcement Learning and Hindsight Modular Reflections for Task Planning with LLMs [paper]
  • LLM Augmented Hierarchical Agents [paper]
  • Real-World Offline Reinforcement Learning from Vision Language Model Feedback [paper]
  • Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning [paper]
  • Motiongpt: Finetuned llms are general-purpose motion generators [paper]
  • Prompt, Plan, Perform: LLM-based Humanoid Control via Quantized Imitation Learning [paper]
  • Grounding llms for robot task planning using closed-loop state feedback [paper]
  • Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models [paper]
  • Socratic models: Composing zero-shot multimodal reasoning with language [paper]
  • Minigpt-4: Enhancing visionlanguage understanding with advanced large language models [paper]
  • (2025)! Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners [paper]
  • (2025)! Multi-Agent Fuzzy Reinforcement Learning With LLM for Cooperative Navigation of Endovascular Robotics [paper]
  • (2025)! Evaluating a Hybrid LLM Q-Learning/DQN Framework for Adaptive Obstacle Avoidance in Embedded Robotics [paper]
4.Diffusion models for planning and exploration
  • Generative adversarial imitation learning [paper]
  • Deterministic sampling-based motion planning: Optimality, complexity, and performance [paper]
  • Planning with Diffusion for Flexible Behavior Synthesis [paper]
  • EDGI: Equivariant Diffusion for Planning with Embodied Agents [paper]
  • Cold Diffusion on the Replay Buffer: Learning to Plan from Known Good States [paper]
  • Refining Diffusion Planner for Reliable Behavior Synthesis by Automatic Detection of Infeasible Plans [paper]
  • Hierarchical Diffuser: Simple Hierarchical Planning with Diffusion [paper]
  • Stitching sub-trajectories with conditional diffusion model for goal-conditioned offline rl [paper]
  • Language Control Diffusion: Efficiently Scaling Through Space, Time, and Tasks [paper]
  • SSD: Sub-trajectory Stitching with Diffusion Model for Goal-Conditioned Offline Reinforcement Learning [paper]
  • DiffSkill: Improving Reinforcement Learning through diffusion-based skill denoiser for robotic manipulation [paper]
  • Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning? [paper]
  • Scaling rectified flow transformers for high-resolution image synthesis [paper]
  • IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies [paper]
  • Learning to Reach Goals via Diffusion [paper]
  • Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling [paper]
  • MADIFF: Offline Multi-agent Learning with Diffusion Models [paper]
  • Diffusion Policy: Visuomotor Policy Learning via Action Diffusion [paper]
  • Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets[paper]
  • Goal masked diffusion policies for navigation and exploration [paper]
  • Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning [paper]
  • Adaptive Online Replanning with Diffusion Models [paper]
  • DiPPeR: Diffusion-based 2D Path Planner applied on Legged Robots [paper]
  • SafeDiffuser: Safe Planning with Diffusion Probabilistic Models via Control Barrier Functions [paper]
  • AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners [paper]
  • Flow q-learning [paper]
  • High-resolution image synthesis with latent diffusion models [paper]
  • Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning [paper]
  • Efficient Diffusion Policies for Offline Reinforcement Learning [paper]
  • Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching [paper]
  • Robust policy learning via offline skill diffusion [paper]
  • Reasoning with Latent Diffusion in Offline Reinforcement Learning [paper]
  • Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning [paper]
  • MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL [paper]
  • (2025)! COLSON: Controllable Learning-Based Social Navigation via Diffusion-Based Reinforcement Learning [paper]
  • (2025)! Offline Reinforcement Learning with Discrete Diffusion Skills [paper]
  • (2025)! DASP: Hierarchical Offline Reinforcement Learning via Diffusion Autodecoder and Skill Primitive [paper]
  • (2025)! Garment Diffusion Models for Robot-Assisted Dressing [paper]
  • (2025)! Enhancing Exploration With Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation [paper]
  • (2025)! Offline Adaptation of Quadruped Locomotion Using Diffusion Models [paper]
  • (2025)! DiffusionRL: Efficient Training of Diffusion Policies for Robotic Grasping Using RL-Adapted Large-Scale Datasets [paper]
  • (2025)! Motion Planning Diffusion: Learning and Adapting Robot Motion Planning With Diffusion Models [paper]
  • (2025)! GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning [paper]
  • (2025)! Chaos-Augmented Reinforcement Learning With Diffusion Models for Robust Legged Robot Locomotion [paper]
  • (2025)! Generalizable Offline Multiobjective Reinforcement Learning via Preference-Conditioned Diffuser [paper]
  • (2025)! World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation [paper]
  • (2025)! DreamControl: Human-Inspired Whole-Body Humanoid Control for Scene Interaction via Guided Diffusion [paper]
  • (2025)! Continual Diffuser (CoD): Mastering Continual Offline RL With Experience Rehearsal [paper]
  • (2025)! PegasusFlow: Parallel Rolling-Denoising Score Sampling for Robot Diffusion Planner Flow Matching [paper]

RL for Generative Policies

The second primary dimension of our taxonomy examines RL methods used to train generative models, offering a complementary perspective to Generative AI Tools for RL. Here, we analyze works that employ RL-based approaches to pre-train, fine-tune, or distill generative policies—where reinforcement learning is used directly to optimize models for action generation. We organize our discussion along three secondary dimensions, which we refer to as: (i) RL-Based Pre-Training, (ii) RL-Based Fine-Tuning, and (iii) Policy Distillation.

RL Pre-Training

1.Transformer Policy

  • Multi-agent reinforcement learning is a sequence modeling problem [paper]
  • Hyper-decision transformer for efficient online policy adaptation [paper]
  • Prompt-tuning decision transformer with preference ranking [paper]
  • Pre-training for robots: Offline rl enables learning new tasks from a handful of trials [paper]
  • Think before you act: Unified policy for interleaving language reasoning with actions [paper]
  • Online Foundation Model Selection in Robotics [paper]
  • Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem [paper]
  • A generalist agent [paper]
  • HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [paper]
  • Transformers are adaptable task planners [paper]
  • Pact: Perception-action causal transformer for autoregressive robotics pre-training [paper]
  • Latte: Language trajectory transformer [paper]
  • Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions [paper]
  • Anymorph: Learning transferable polices by inferring agent morphology [paper]
  • (2025)! AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning [paper]
  • (2025)! ManipLVM-R1: Reinforcement Learning for Reasoning in Embodied Manipulation with Large Vision-Language Models [paper]
  • (2025)! Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success [paper]
  • (2025)! MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models [paper]

2.Diffusion Policy

  • Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning [paper]
  • Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning [paper]
  • Beyond Conservatism: Diffusion Policies in Offline Multi-agent Reinforcement Learning [paper]
  • Diffusion Policy: Visuomotor Policy Learning via Action Diffusion [paper]
  • Generating Behaviorally Diverse Policies with Latent Diffusion Models [paper]
  • Hierarchical Diffusion for Offline Decision Making [paper]
  • Is Conditional Generative Modeling All You Need for Decision-Making? [paper]
  • Policy Representation via Diffusion Probability Model for Reinforcement Learning [paper]
  • Offline Skill Diffusion for Robust Cross-Domain Policy Learning [paper]
  • Score Regularized Policy Optimization through Diffusion Behavior for Efficient Offline Reinforcement Learning [paper]
  • Policy-Guided Diffusion [paper]
  • Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control [paper]
  • IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies [paper]
  • Revisiting generative policies: A simpler reinforcement learning algorithmic perspective [paper]
  • Policy Representation via Diffusion Probability Model for Reinforcement Learning [paper]
  • Diffusion policies as an expressive policy class for offline reinforcement learning [paper]
  • Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning [paper]
  • (2025)! Non-differentiable Reward Optimization for Diffusion-based Autonomous Motion Planning [paper]
  • (2025)! D3P: Dynamic Denoising Diffusion Policy via Reinforcement Learning [paper]

Generative Policy RL Fine-Tuning

  • Policy Agnostic RL Fine-Tuning Multiple Policy Classes with Actor-Critic RL [paper]
  • Diffusion Policy Policy Optimization [paper]
  • FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning [paper]
  • Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning [paper]
  • (2025)! FDPP: Fine-tune Diffusion Policy with Human Preference [paper]
  • (2025)! Improving Vision-Language-Action Model with Online Reinforcement Learning [paper]
  • (2025)! From Mystery to Mastery: Failure Diagnosis for Improving Manipulation Policies [paper]
  • (2025)! VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning [paper]
  • (2025)! SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning [paper]
  • (2025)! Emergent World Representations in OpenVLA [paper]
  • (2025)! Integrating Diffusion-based Multi-task Learning with Online Reinforcement Learning for Robust Quadruped Robot Control [paper]
  • (2025)! RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning [paper]
  • (2025)! Reinforced Embodied Planning with Verifiable Reward for Real-World Robotic Manipulation [paper]
  • (2025)! IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model [paper]
  • (2025)! ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy [paper]
  • (2025)! VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators [paper]
  • (2025)! ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning [paper]
  • (2025)! What Can RL Bring to VLA Generalization? An Empirical Study [paper]
  • (2025)! Interactive Post-Training for Vision-Language-Action Models [paper]
  • (2025)! RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training [paper]
  • (2025)! Steering Your Diffusion Policy with Latent Space Reinforcement Learning [paper]
  • (2025)! Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps [paper]

Policy Distillation

  • Rldg: Robotic generalist policy distillation via reinforcement learning [paper]
  • Evaluating real-world robot manipulation policies in simulation [paper]
  • (2025)! Refined policy distillation: From vla generalists to rl experts [paper]
  • (2025)! VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning [paper]
  • (2025)! RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models [paper]
  • (2026)! Jump-Start Reinforcement Learning with Vision-Language-Action Regularization [paper]

Policy Safety (new!)

  • (2025)! SafeVLA: Towards Safety Alignment of VisionLanguage-Action Model via Constrained Learning [paper]
  • (2025)! Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners [paper]

Citation

If you find our project useful, please cite our paper:

@article{Moroncelli2024TheDO,
  title={The duality of generative AI and reinforcement learning in robotics: A review},
  author={Angelo Moroncelli and Vishal Soni and Marco Forgione and Dario Piga and Blerina Spahiu and Loris Roveda},
  journal={Inf. Fusion},
  year={2024},
  volume={129},
  pages={104003},
  url={https://api.semanticscholar.org/CorpusID:273507428}
}