Vero: An Open RL Recipe for General Visual Reasoning

June 19, 2026 · View on GitHub

Vero: An Open RL Recipe for General Visual Reasoning

Vero is a fully open reinforcement learning recipe for training and evaluating multi-task visual reasoning with vision-language models.

The released project combines an RL training stack (vero-rl) and an evaluation harness (vero-eval).

Vero Teaser

News

2026-06 — Vero accepted at ECCV 2026.
2026-06-17 — Second release. We expanded Vero with new model checkpoints and larger training data: Vero-Qwen35-9B and Vero-Qwen35-9B-Base, plus the Vero-1.6M and Vero-2.5M-unfiltered datasets.
2026-06 — Oral at CVPR 2026. Vero was selected for an oral presentation at the DataMFM workshop (Emerging Directions in Data for Multimodal Foundation Models) at CVPR 2026.

Highlights

600K curated RL samples from 59 datasets across 6 visual reasoning task categories: STEM, Chart & OCR, Spatial & Action, Knowledge & Recognition, Grounding, Counting & Search, & Captioning & Instruction Following
Single-stage RL recipe for visual reasoning with task-routed reward functions
VeroEvalSuite with 30 benchmarks spanning the 6 multimodal reasoning task categories
Support for many base models: Qwen3.5, Qwen2.5-VL, Qwen3-VL, MiMo-VL, Bee, Molmo2
Fully open codebase for training and evaluation

Installation

Clone Repository

git clone https://github.com/zlab-princeton/vero.git
cd vero

Environment Setup

bash scripts/setup_env.sh

This installs PyTorch, vLLM, Transformers, FlashAttention, and both project packages (vero-rl, vero-eval) in editable mode. See scripts/setup_env.sh for the full setup flow.

Data Setup

Dataset Composition

For Vero RL training, the model-run scripts use formatted local data under vero-rl/data by default. Prepare it once with:

python scripts/download_and_format_vero_600k.py

This script downloads or reuses cached data from zlab-princeton/Vero-600k, exports images into vero-rl/data/images/, and writes:

vero-rl/data/vero_600k_train.verl.jsonl
vero-rl/data/vero_600k_val.verl.jsonl

All bash launchers in vero-rl/examples/model_runs/ will pick up those files automatically once they exist.

Larger datasets from the second release — zlab-princeton/Vero-1.6M and zlab-princeton/Vero-2.5M-unfiltered — are also available on the Hub; the default training setup uses Vero-600k.

For custom data, Vero expects a specific data format; see docs/DATA.md for the format, curation details, and reward routing metadata.

Quick Start: Evaluation

Evaluation is independent of training — to just run the benchmarks, you can skip the training setup entirely. (For the full benchmark list, see Evaluation Benchmarks.)

Want it hands-off? Point an AI coding agent (Claude Code / Codex) at docs/AGENTS_SETUP.md — a one-file runbook it (or a human) can follow end to end to set up the environment and run the full reproduction.

1. One-time setup. set_paths.sh configures the env, caches, and the judge (JUDGE_MODEL_PATH + API_TYPE), so judge-based tasks work right after sourcing:

cp scripts/set_paths.sh.example set_paths.sh   # edit ROOT_PATH (a roomy disk)
source set_paths.sh                            # HF_HOME, caches, JUDGE_MODEL_PATH, API_TYPE
huggingface-cli login                          # gated datasets (e.g. MMMU_Pro)

2. Evaluate. The model defaults to zlab-princeton/Vero-Qwen3I-8B (override with --model-path):

cd vero-eval

# Smoke test — one rule-based task, 1 GPU, a few samples
bash examples/eval.sh --tasks chartqa_reasoning --limit 5

# Reproduce the FULL suite — all 30 benchmarks, no --limit (judge tasks need 2 GPUs)
bash examples/eval_domain.sh --domain all --num-gpus 2

Choose the --variant that matches the checkpoint type (instruct vs thinking):

Vero checkpoint	Type	`--variant`
`Vero-Qwen25-7B`, `Vero-Qwen3I-8B`	instruct	`reasoning` (default)
`Vero-Qwen3T-8B`, `Vero-MiMo-7B`, `Vero-Qwen35-9B`, `Vero-Qwen35-9B-Base`	thinking	`reasoning_samplingq3`

For a thinking checkpoint, pass the model and its variant explicitly:

bash examples/eval_domain.sh \
    --model-path zlab-princeton/Vero-Qwen3T-8B \
    --domain all --variant reasoning_samplingq3 --num-gpus 2

Notes. Verify a machine first with bash examples/preflight.sh (optional). The judge comes from JUDGE_MODEL_PATH (set by set_paths.sh); if unset, judge tasks fall back to OpenAI gpt-4o and need GPT_API_KEY. Judge-based tasks need 2 GPUs.

See docs/EVALUATION.md for benchmark coverage, judge configuration, and evaluation workflows.

Quick Start: Training

First set cache paths (the base model and reward judge download on the fly under HF_HOME) and prepare the repo-local training data:

cp scripts/set_paths.sh.example set_paths.sh   # edit ROOT_PATH (a roomy disk)
source set_paths.sh                            # sets HF_HOME, activates verovlm
python scripts/download_and_format_vero_600k.py

Then launch a training run. TRAIN_FILES, VAL_FILES, and IMAGE_ROOT are optional overrides if you want to point at different formatted data.

export ROOT_PATH="/path/to/data_root"  # for datasets and checkpoints
cd vero-rl
bash examples/model_runs/run_gspo_qwen3vl_instruct_mix_all_llmjudge.sh

The reward judge (Qwen/Qwen3.5-27B by default) downloads on first use; override it with export VLLM_JUDGE_MODEL_PATH=<model>.

Optional dataset overrides:

export TRAIN_FILES="/path/to/train.verl.jsonl"
export VAL_FILES="/path/to/val.verl.jsonl"
export IMAGE_ROOT="/path/to/data_root"

See docs/TRAINING.md for the full training guide.

Model Checkpoints

Pretrained Huggingface checkpoints are available via the following links:

Model	Base Model	Parameters	HF Link
`Vero-Qwen35-9B`	Qwen3.5-9B	9B	zlab-princeton/Vero-Qwen35-9B
`Vero-Qwen35-9B-Base`	Qwen3.5-9B-Base	9B	zlab-princeton/Vero-Qwen35-9B-Base
`Vero-Qwen25-7B`	Qwen2.5-VL-7B-Instruct	7B	zlab-princeton/Vero-Qwen25-7B
`Vero-Qwen3I-8B`	Qwen3-VL-8B-Instruct	8B	zlab-princeton/Vero-Qwen3I-8B
`Vero-Qwen3T-8B`	Qwen3-VL-8B-Thinking	8B	zlab-princeton/Vero-Qwen3T-8B
`Vero-MiMo-7B`	MiMo-VL-7B-SFT	7B	zlab-princeton/Vero-MiMo-7B

See docs/MODELS.md for the documented model families, training settings, and inference format.

Evaluation Benchmarks

Vero is evaluated with vero-eval, an evaluation harness built on lmms-eval which houses VeroEvalSuite, a 30-benchmark suite spanning:

Chart and OCR
STEM reasoning
Spatial reasoning and action
Knowledge and recognition
Grounding, counting, and visual search
Captioning and instruction following

Task Category	Benchmarks
Chart & OCR	ChartQA-Pro, ChartQA, InfoVQA, CharXiv, ChartMuseum, EvoChart
STEM	MMMU-PRO Standard, MMMU-PRO Vision, MathVision, MathVista
Spatial & Action	Blink, ERQA, GameQA, EmbSpatial, CVBench
Knowledge & Recognition	RealWorldQA, SimpleVQA (English), FVQA, MM-Vet V2
Grounding, Counting & Visual Search	CountBenchQA, CountQA, MMERealWorld, VStarBench, AerialVG, VisualProbe, ScreenSpot, ScreenSpotPro
Captioning & Instruction Following	MM-MTBench, MIABench, MMIFEval

Training

GSPO-based RL launch scripts for each base model:

Script	Model Family	Base Model
Train Vero-Qwen25-7B	`Vero-Qwen25-7B`	Qwen2.5-VL-7B-Instruct
Train Vero-Qwen3I-8B	`Vero-Qwen3I-8B`	Qwen3-VL-8B-Instruct
Train Vero-MiMo-7B	`Vero-MiMo-7B`	MiMo-VL-7B-SFT

During RL, Vero scores rollouts with task-routed rule-based rewards plus an LLM judge — see Reward for the formula, verifiers, and judge setup.

The training scripts auto-detect REPO_ROOT from their location, manage the LLM judge server automatically, and use Hydra-based configs from vero-rl/examples/model_runs/config/. See docs/TRAINING.md for the full training guide.

Repository Structure

Vero/
|-- docs/          Data, training, evaluation, and model documentation
|-- scripts/       Environment setup and data filtering scripts
|-- vero-eval/     Evaluation harness built around lmms-eval
`-- vero-rl/       RL training framework built around veRL

Documentation

Agent Setup Guide — one-file, end-to-end setup + eval runbook for AI coding agents (Claude Code / Codex) or humans
Training Guide
Evaluation Guide
Data Guide
Model Guide

Citation

If you use this repository, please cite:

@article{sarch2026vero,
    title   = {Vero: An Open RL Recipe for General Visual Reasoning},
    author  = {Sarch, Gabriel and Cai, Linrong and Wang, Qunzhong and Wu, Haoyang and Chen, Danqi and Liu, Zhuang},
    year    = {2026},
    journal = {arXiv preprint arXiv:2604.04917},
  }

Acknowledgements

This project builds on several strong open-source foundations:

veRL for distributed RL training infrastructure
lmms-eval for multimodal evaluation

License

This project is licensed under the Apache License 2.0.