THE FOLLOWING IS AN OLD VERSION. WE WILL UPDATE THIS SOON.
December 2, 2025 ยท View on GitHub
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
๐ This is the official implementation of SATORI.

๐ ๏ธ Requirements
To install dependencies:
conda env create -f environment.yaml
We use ms-swift to train the model. Install via:
pip install ms-swift -U
or:
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .
๐ฏ Training
Run the following command to start training:
MAX_PIXELS=401408 \
NPROC_PER_NODE=4 \
swift rlhf \
--rlhf_type grpo \
--model Qwen/Qwen2.5-VL-3B-Instruct \
--external_plugins ./plugin.py \
--reward_funcs external_format external_bbox_acc extern_vqa_acc external_caption_acc \
--use_vllm false \
--vllm_device auto \
--vllm_gpu_memory_utilization 0.6 \
--train_type full \
--torch_dtype bfloat16 \
--dataset <DATA_PATH> \
--max_length 2048 \
--max_completion_length 512 \
--num_train_epochs 1 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--learning_rate 1e-6 \
--gradient_accumulation_steps 2 \
--save_strategy 'steps' \
--eval_strategy 'steps' \
--eval_steps 200 \
--save_steps 200 \
--save_total_limit 2 \
--logging_steps 1 \
--output_dir output/GRPO \
--warmup_ratio 0.01 \
--dataloader_num_workers 4 \
--num_generations 16 \
--temperature 1.0 \
--top_p 0.9 \
--top_k 50 \
--system './prompt.txt' \
--deepspeed zero2 \
--log_completions true \
--vllm_max_model_len 1024 \
--num_iterations 1 \
--num_infer_workers 1 \
--async_generate false \
--beta 0.001 \
Note:
external_formatformats the dataexternal_bbox_acccomputes bounding-box accuracyextern_vqa_acccomputes VQA accuracyexternal_caption_acccomputes caption accuracy (Seeplugin.pyfor details) ๐
๐ Evaluation
We provide an evaluation script in the VLMEvalKit directory. Make sure VLMEvalKit is in your working directory and update the dataset/model paths in config.py:
# File: ./VLMEvalKit/vlmeval/config.py (line 417)
๐โโ๏ธ Run Evaluation
cd VLMEvalKit
CUDA_VISIBLE_DEVICES=0,1,2,3 \
torchrun --nproc-per-node=4 run.py \
--data MMBench MMStar \
--model Qwen2.5-VL-3B-Instruct \
--verbose
๐พ Trained Models
Download here:
- ๐ SATORI-3B
More model architechures and sizes will be released soon! ๐
๐ Dataset: VQA-Verify
We release the VQA-Verify dataset here: link ๐
Acknowledgements
This project adapts from ms-swift and VLMEvalKit. Thanks for their great work! ๐
๐ค Contributing
We welcome your issues, PRs, and feedback! Feel free to open an issue or submit a pull request. ๐