README.md
May 11, 2026 Β· View on GitHub
VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
Training VLM agents with multi-turn reinforcement learning
π₯ NeurIPS 2025 π₯
Kangrui Wang*, Pingyue Zhang*, Zihan Wang*, Yaning Gao*, Linjie Li*, Qineng Wang, Hanyang Chen, Chi Wan, Yiping Lu, Zhengyuan Yang, Lijuan Wang, Ranjay Krishna, Jiajun Wu, Li Fei-Fei, Yejin Choi, Manling Li
(* equal contribution)
FrozenLake |
Navigation |
Sokoban |
ManiSkill |
SVG |
We introduce VAGEN, a multi-turn reinforcement learning framework designed specifically for training vision-language model (VLM) agents. Built upon this framework, we propose World Modeling RL, a novel reinforcement learning approach that significantly improves the multi-turn performance of VLMs by explicitly supervising their worldmodel reasoning process, as shown in FigureΒ 1.
We frame multi-turn VLM agentic tasks as a Partially Observable Markov Decision Process (POMDP), shown in FigureΒ 2.
| Figure 1. Overview of the VAGEN framework. | Figure 2. POMDP formulation of multi-turn VLM agentic tasks. |
News
[2026/02] We have migrated the main branch to VAGEN-Lite, a lightweight and clean reimplementation built on VERL agent-loop for easy customization and stable performance. For the previous full-featured release, please visit the vagen-legacy branch.
[2025/12] Introducing VAGEN-Lite: a lightweight and clean reimplementation of VAGEN, built on the VERL agent-loop for easy customization and stable performance.
[2025/09] VAGEN is accepted by Neurips 2025
[2025/04] We've introduced a new modular design for environments and services in VAGEN:
- Enhanced environment framework for easier creation of custom environments
- New service architecture for efficient distributed training
- Check out our new guides:
- Creating Environments: New environment protocal.
- Creating Services: We now support hosting environments in a separate process
[2025/03] We release VAGEN, a multi-turn reinforcement learning framework for training VLM Agents!
Installation
conda create -n vagen python=3.12 -y
conda activate vagen
git clone https://github.com/mll-lab-nu/VAGEN.git
cd VAGEN
git submodule update --init --recursive
cd verl
USE_MEGATRON=0 bash scripts/install_vllm_sglang_mcore.sh
pip install --no-deps -e .
cd ..
pip install -e .
pip install "trl==0.26.2"
Quick Start
Training
VAGEN currently supports PPO / GRPO with two multi-turn training paradigms:
Multi-turn Concatenated Training: All turns in a trajectory are concatenated into a single training instance.
# Qwen/Qwen2.5-VL-3B-Instruct
cd VAGEN
bash examples/train/sokoban/train_ppo_qwen25vl3b.sh
# Qwen/Qwen3-VL-4B-Instruct
# pip install transformers==4.57.1
# pip install "sglang[all]==0.5.3.post3"
cd VAGEN
bash examples/train/sokoban/train_grpo_qwen3vl4b.sh
# Enable reward variance based top-p filtering
cd VAGEN
bash examples/train/frozenlake/train_grpo_qwen25vl3b_filtertopp_vision.sh
Multi-turn Non-Concatenated Training: Each trajectory is split into multiple turn-level training instances.
cd VAGEN
bash examples/train/sokoban/train_ppo_no_concat_qwen25vl3b.sh
Evaluation
VAGEN supports evaluation using different backends (OpenAI, Claude, Gemini, sglang, vLLM). For details, see vagen/evaluate/README.md.
cd VAGEN
# FrozenLake evaluation with sglang
bash examples/evaluate/frozenlake/eval_qwen25_vl_3b.sh
cd VAGEN
# Sokoban evaluation
bash examples/evaluate/sokoban/run_eval.sh
Customizing Your Environment
To train on your own environment, follow the steps below.
1. Create Your Environment Class
-
Use
GymImageEnvas the base class: -
Refer to Sokoban for a full implementation example:
2. Register the Environment
Add your environment entry to:
vagen/configs/env_registry.yaml
3. Create Configuration Files
Prepare training and validation configs:
train.yamlval.yaml
You can follow the Sokoban examples as templates:
4. Create a Training Script
Write your training script based on:
More Customization
See the Documentation for more customization options:
- Custom Filter β Trajectory filtering (e.g., Reward Variance (RV) filter in RAGEN)
- Custom Metric - Add W&B logging metrics
- Configuration - Training configuration reference
Useful Configs
refer to vagen/configs/vagen_multiturn.yaml
No Concat Mode
# Enable no concat mode: input is system prompt + current step observation
trainer:
concat_multi_turn: False
# Currently only supported with algorithm.adv_estimator=no_concat_gae
Image Logging
# Warning:
# - If you set a training-data rollout dir AND enable image logging, training images will also be dumped to disk.
# This can consume a large amount of storage very quickly. Monitor disk usage and consider cleanup/limits.
trainer:
log_image:
enable: false # true can enable saving rollout/validation images to disk
max_pending: 2 # max concurrent async image dump tasks
png_compress_level: 0 # PNG compression (0 = fastest, 9 = smallest)
HuggingFace Hub Upload
# export HF_TOKEN=xxx
huggingface_hub:
hf_save_freq: null # upload every N steps (must be a multiple of trainer.save_freq); null = disabled
repo_id: null
private: false
Training Data Filtering
filter:
name: reward_variance_top_p # refer to vagen/custom_filter
filter_kwargs:
top_p: 0.9
enable: False # set to true to enable filtering, recommended for grpo trainining
Known Issues & Fixes
See docs/issues.md
Citation
If you find our framework and paper useful, we appreciate it if you could cite our work:
@inproceedings{wang2025vagen,
title={VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents},
author={Kangrui Wang and Pingyue Zhang and Zihan Wang and Yaning Gao and Linjie Li and Qineng Wang and Hanyang Chen and Yiping Lu and Zhengyuan Yang and Lijuan Wang and Ranjay Krishna and Jiajun Wu and Li Fei-Fei and Yejin Choi and Manling Li},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://arxiv.org/abs/2510.16907}
}