Acknowledgements

November 21, 2025 Β· View on GitHub

πŸ¦– TEMPFLOW-GRPO: WHEN TIMING MATTERS FOR GRPO IN FLOW MODELS

arXiv Website

TempFlow-GRPO (Temporal Flow GRPO), a principled GRPO framework that captures and exploits the temporal structure inherent in flow-based generation.

LOGO
LOGO

πŸ—ΊοΈ Roadmap for TempFlow-GRPO

TempFlow-GRPO (Temporal Flow GRPO), a principled GRPO framework that captures and exploits the temporal structure inherent in flow-based generation. TempFlow-GRPO introduces two key innovations: (i) a trajectory branching mechanism that provides process rewards by concentrating stochasticity at designated branching points, enabling precise credit assignment without requiring specialized intermediate reward models; and (ii) a noise-aware weighting scheme that modulates policy optimization according to the intrinsic exploration potential of each timestep, prioritizing learning during high-impact early stages while ensuring stable refinement in later phases. These innovations endow the model with temporally-aware optimization that respects the underlying generative dynamics, leading to state-of-the-art performance in human preference alignment and standard text-to-image benchmark.

Welcome Ideas and Contributions. Stay tuned!

πŸ†• News

We have presented an improved Flow-GRPO method, TempFlow-GRPO. We will release our code recently!πŸ”₯πŸ”₯πŸ”₯

  • [2025-08-06] We have released the first version of our paper. πŸ”₯πŸ”₯πŸ”₯
  • [2025-08-11] Thanks Jie Liu's comments for our paper. We will release the 1024 Flux RL model in the month. πŸ”₯πŸ”₯πŸ”₯
  • [2025-08-14] Our method also achieves better performance in FLUX 1024px with HPSv3 (based on Qwen2-VL) as reward. πŸ”₯πŸ”₯πŸ”₯
  • [2025-08-20] We have released the first version of our paper in huggface. πŸ”₯πŸ”₯πŸ”₯
  • [2025-09-12] We will release the second version of our paper in next week. πŸ”₯πŸ”₯πŸ”₯
  • [2025-09-17] We will release the code of our paper. πŸ”₯πŸ”₯πŸ”₯
  • [2025-10-28] Very happy to see TempFlow-GRPO in video RL of meituan's Longcat-Video. πŸ”₯πŸ”₯πŸ”₯
  • [2025-10-28] Very happy to see TempFlow-GRPO in image edit RL of baai's OmniGen2-EditScore. πŸ”₯πŸ”₯πŸ”₯
  • [2025-11-10] Upload the code for QwenImage. πŸ”₯πŸ”₯πŸ”₯

πŸš€ Updates

To support research and the open-source community, we will release the entire projectβ€”including datasets, training pipelines, and model weights. Our code is based on Flow-GRPO!. Thank you for your patience and continued support! 🌟

  • Release arXiv paper
  • Release GitHub repo
  • Release training code
  • Release neat training code
  • Release model checkpoints

πŸ“• Training & Evaluation

Preparation

  1. First you need to download the reward model (we support clip-based pickscore, vlm-based hpsv3, ...) and base model (SD3.5-M, FLUX.1-dev).
  2. Then you need to modify the noise level in sd3_pipeline_with_logprob_perstep and sd3_pipeline_with_logprob.
  3. Finally, you need to modify the config. We suggest you using 24 groups and 48 num groups.

Note that we use branch=4, per branch exploration=6. You can modify them in our code. We will release a neat code verision in next few days.

Training

About Group Strategy

  1. Seed Group: image
image image
  1. Prompt Group: notes the seed group

  2. Batch Group: global_std=True

SD3.5-M

# Flow-GRPO
bash scripts/multi_node/main.sh
# TempFlow-GRPO
bash scripts/multi_node/train_sd3_pr.sh

FLUX.1-dev

# Flow-GRPO
bash scripts/multi_node/train_flux.sh
# TempFlow-GRPO
bash scripts/multi_node/train_flux_pr.sh

QwenImage

# Flow-GRPO
bash scripts/multi_node/train_qwenimage.sh
# TempFlow-GRPO
bash scripts/multi_node/train_qwenimage_pr.sh

πŸ“Š Experimental Performance

Performance

πŸ“Ί Visualization

FLUX.1-dev
  • For more details please read our paper.

Acknowledgements

Flow-GRPO: The first method integrating online reinforcement learning (RL) into flow matching models.

Work was done at WeChat Vision.