README.md
May 1, 2026 Β· View on GitHub
UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models
Jiaqi Wang1,2*, Haoge Deng2*, Ting Pan2*, Yang Liu2, Chengyuan Wang2, Fan Zhang2, Yonggang Qi1β , Xinlong Wang2β
We propose UDM-GRPO, the first framework to integrate UDM with RL. Our method is guided by two key insights: (i) treating the final clean sample as the action provides more accurate and stable optimization signals; and (ii) reconstructing trajectories via the diffusion forward process better aligns probability paths with the pretraining distribution. Additionally, we introduce two strategies, Reduced-Step and CFG-Free, to further improve training efficiency. UDM-GRPO significantly improves base model, URSA, performance across multiple T2I tasks.
π News
[May 2026]π Accepted by ICML 2026 (Spotlight).[Apr 2026]Released Paper & Project Page & Model Weights.
β¨Hightlights
- π₯ Novel Approach: Correcting the action and trajectory to achieve the first method to integrate UDM with GRPO.
- π₯ SOTA Performance: State-of-the-art performance across multiple T2I benchmarks.
- π₯ High efficiency: Reduced-Step and CFG-Free training strategy.
π€ Model
| Task | Model |
|---|---|
| GenEval | π€GenEval |
| PickScore | π€PickScore |
π Table of Contents
π§ Installation
1. Environment Set Up
Clone this repository to local disk and install:
git clone https://github.com/Yovecent/UDM-GRPO.git
cd UDM-GRPO
conda create -n UDMGRPO python=3.10
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install -e .
pip install torch==2.5.1 xformers --index-url https://download.pytorch.org/whl/cu124
pip install psutil==7.0.0, flash-attn==2.7.4.post1 --no-build-isolation
2. Model Download
| Model | Resolution | Download |
|---|---|---|
| URSA-1.7B-IBQ512 | 512x512 | π€ Hugging Face |
3. Reward Preparation
1.PickScore
You can run the training code to download the PickScore Model or Pre-download.
2. GenEval
Pip and download the mask2former
# First
pip install openmim==0.3.9 open-clip-torch==2.31.0 numpy==1.26.0 opencv-python==4.11.0.86 clip-benchmark==1.6.1
# Then
mim install mmengine mmcv-full==1.7.2 --no-build-isolation
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection; git checkout 2.x
pip install setuptools==78.1.1
pip install -e . --no-build-isolation
# Then
mv ../raw_rl_data/object_names.txt .
wget https://download.openmmlab.com/mmdetection/v2.0/mask2former/mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco/mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco_20220504_001756-743b7d99.pth \
-O ./mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.pth
download the timm/vit_large_patch14_clip_224.openaiπ€ and change the model_path in diffnext.rewards.reward_image.GenEvalScorer to your mmdetection path
the mmdetection format should be
mmdetection/
β
βββ configs/
β βββ mask2former/
β βββ mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py
β
βββ mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.pth
β
βββ vit_large_patch14_clip_224.openai/
β βββ open_clip_config.json
β βββ pytorch_model.bin
β βββ ...
β
βββ object_names.txt
3. OCR
Install the paddle-ocr and the model:
pip install paddlepaddle-gpu==2.6.2
pip install paddleocr==2.9.1
pip install python-Levenshtein
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=False, lang="en", use_gpu=False, show_log=False)
change the ocr path in diffnext.rewards.reward_image.OCRScorer to your path
π₯« Data Preparation
GenEval
# First
cd raw_rl_data/geneval
python cache.py
# Then Change the train_dataloader.params.dataset in ursa_1.7b_ibq512.yaml
The same way for PickScore and OCR.
π€ Training
1. Single-node training
cd diffnext
accelerate launch --config_file accelerate_configs/4_nodes_deepspeed.yaml \
--machine_rank 0 --num_machines 1 --num_processes 8 \
scripts/train.py \
config="configs/geneval_grpo/ursa_1.7b_ibq512.yaml" \
experiment.name="ursa_geneval" \
experiment.output_dir="./experiments/ursa_geneval"
Note: If you modify the batch size in the configuration, you must ensure that
training.batch_size = num_prompts * num_images // num_gpus // num_batches.
2. Multi-node training
# Master node
sh scripts/geneval_grpo/main.sh
# Other nodes
sh scripts/geneval_grpo/main1.sh
sh scripts/geneval_grpo/main2.sh
sh scripts/geneval_grpo/main3.sh
ποΈ Evaluations
GenEval
1. Sample prompt images
cd diffnext/evaluations/geneval
torchrun --nproc_per_node=8 sample.py \
--height 512 --width 512 \
--guidance_scale 1.0 --num_inference_steps 25 \
--ckpt /path/to/URSA-1.7B-IBQ512 \
--tdir /path/to/checkpoint-XXXX/transformer/diffusion_pytorch_model.bin \
--outdir ./output/URSA-1.7B-IBQ512 \
--distributed
2. Evaluation
<IMAGE_FOLDER>=./output/URSA-1.7B-IBQ512
Please refer GenEval evaluation guide.
PickScore
1. Sample prompt images
cd diffnext/evaluations/pickscore
torchrun --nproc_per_node=8 sample.py \
--height 512 --width 512 \
--guidance_scale 1.0 --num_inference_steps 25 \
--ckpt /path/to/URSA-1.7B-IBQ512 \
--tdir /path/to/checkpoint-XXXX/transformer/diffusion_pytorch_model.bin \
--outdir ./output/URSA-1.7B-IBQ512 \
--distributed
2. Evaluation
python evaluate.py \
--image_root ./output/URSA-1.7B-IBQ512 \
--out_file ./output/URSA-1.7B-IBQ512/result.json
π Citation
If you find this repository useful, please consider giving a star β and citation π¦:
@article{wang2026udmgrpo,
title={UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models},
author={Wang, Jiaqi and Deng, Haoge and Pan, Ting and Liu, Yang and Wang, Chengyuan and Zhang, Fan and Qi, Yonggang and Wang, Xinlong},
journal={arXiv preprint arXiv:2604.18518},
year={2026}
}
@article{deng2025ursa,
title={Uniform Discrete Diffusion with Metric Path for Video Generation},
author={Deng, Haoge and Pan, Ting and Zhang, Fan and Liu, Yang and Luo, Zhuoyan and Cui, Yufeng and Shen, Chunhua and Shan, Shiguang and Zhang, Zhaoxiang and Wang, Xinlong},
journal={arXiv preprint arXiv:2510.24717},
year={2025}
}
@article{deng2024nova,
title={Autoregressive Video Generation without Vector Quantization},
author={Deng, Haoge and Pan, Ting and Diao, Haiwen and Luo, Zhengxiong and Cui, Yufeng and Lu, Huchuan and Shan, Shiguang and Qi, Yonggang and Wang, Xinlong},
journal={arXiv preprint arXiv:2412.14169},
year={2024}
}
π€ Acknowledgement
We thank the repositories:
- URSA. π»URSA is the base model of UDM-GRPO.
- NOVA. β¨NOVA is the predecessor of π»URSA.
- CodeWithGPU. CodeWithGPU library is the core of our data loading pipeline.
License
Code and models are licensed under Apache License 2.0.
