README.md

June 23, 2025 ยท View on GitHub

Co-Reinforcement Learning for
Unified Multimodal Understanding and Generation

Paper Github Hugging Face Collection

Introduction

CoRL is a GRPO-based RL framework designed to simultaneously enhance the generation and understanding capabilities of ULMs within a shared policy optimization paradigm. It comprises a unified RL stage for joint optimization and a refined RL stage for task-specific enhancement.

method overview.

๐Ÿ“ข Latest Updates

  • [2025-06-25] ๐Ÿ“Œ Code of TTRL for MM2T and T2I.
  • [2025-06-01] ๐Ÿ“Œ Core code of CoRL.

Environment

git https://github.com/mm-vl/ULM-R1.git
cd ULM-R1
conda create -n corl python=3.10 -y
conda activate corl
pip install -e .
pip install flash-attn --no-build-isolation
# pip install flash-attn --no-build-isolation --use-pep517

Pls refer to install.md for more details.

Training Data

training example for unified RL.

Training Pipeline

  • Unified RL
bash corl/scripts/corl_unified.sh

Test-Time Reinforcement Learning (TTRL)

We simply adapt the TTRL algorithm to multimodal understanding and text-to-image generation, aiming to explore the potential of RL in enhancing both understanding and generation performance at inference time.

Tip

required: trl>=0.18.1

multimodal understanding

bash ttrl/scripts/mm2t_mmmu.sh
bash ttrl/scripts/mm2t_mmstar.sh
MMMUMMStar
Janus-Pro-1B36.343.1
+ TTRL39.846.9

text-to-image generation

bash ttrl/scripts/t2i_geneval.sh
bash ttrl/scripts/t2i_unieval.sh
GenEvalUniEval (UniScore)
Janus-Pro-1B0.730.370
+ TTRL0.760.455

Acknowledgement

Janus-Pro | open-r1-multimodal | R1-V