README.md

December 15, 2025 · View on GitHub

Edit-R1: Reinforce Image Editing with Diffusion Negative-Aware Finetuning and MLLM Implicit Feedback

📣 News

[2025/10/19]: We release Edit-R1, which employs DiffusionNFT and a training-free reward model derived from pretrained MLLMs to fine-tune diffusion models for image editing. UniWorld-Qwen-Image-Edit-2509 and UniWorld-FLUX.1-Kontext-Dev are open-sourced.

🚀 Environment Set Up

Clone this repository and install packages.

git clone https://github.com/PKU-YuanGroup/Edit-R1.git
cd Edit-R1
conda create -n Edit-R1 python=3.10.16
pip install -e .

🗝️ Train

Deploy vLLM Reward Server

Start the reward server:

python reward_server/reward_server.py

If you want to check the status of the reward server, you can test it by running:

python reward_server/test_reward_server.py

Data Format

Directory structure:

- dataset-dir
  - images/
     - YOUR_IMAGE_DATA
     - ...
  - train_metadata.jsonl
  - test_metadata.jsonl

train_metadata.jsonl and test_metadata.jsonl format:

{"prompt": "PROMPT", "image": "IMAGE_RELATIVE_PATH", "requirement": "TASK_REQUIREMENT"}
...

Configure Training

See config/qwen_image_edit_nft.py and config/kontext_nft.py for available configurations.

Run Training

export REWARD_SERVER=[YOUR_REWARD_SERVICE_IP_ADDR]:12341

torchrun --nproc_per_node=8 \
    scripts/train_nft_qwen_image_edit.py --config config/qwen_image_edit_nft.py:config_name

And you can also refer to the example scripts in examples/.

⚡️ Reproduction

For reproducibility, we provide the reproduction scripts in reproduction/.

See Reproduction Details for more details.

👍 Acknowledgement

DiffusionNFT: Huge thanks for their elegant codebase 🤩!
Flow-GRPO
ImgEdit
UniWorld-V1

🔒 License

See LICENSE for details. The FLUX weights fall under the FLUX.1 [dev] Non-Commercial License.

✏️ Citation

@article{li2025uniworldv2,
    title={Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback},
    author={Li, Zongjian and Liu, Zheyuan and Zhang, Qihui and Lin, Bin and Yuan, Shenghai and Yan, Zhiyuan and Ye, Yang and Yu, Wangbo and Niu, Yuwei and Yuan, Li},
    journal={arXiv preprint arXiv:2510.16888},
    year={2025}
}

@article{lin2025uniworld,
  title={Uniworld: High-resolution semantic encoders for unified visual understanding and generation},
  author={Lin, Bin and Li, Zongjian and Cheng, Xinhua and Niu, Yuwei and Ye, Yang and He, Xianyi and Yuan, Shenghai and Yu, Wangbo and Wang, Shaodong and Ge, Yunyang and others},
  journal={arXiv preprint arXiv:2506.03147},
  year={2025}
}

@article{ye2025imgedit,
  title={Imgedit: A unified image editing dataset and benchmark},
  author={Ye, Yang and He, Xianyi and Li, Zongjian and Lin, Bin and Yuan, Shenghai and Yan, Zhiyuan and Hou, Bohan and Yuan, Li},
  journal={arXiv preprint arXiv:2505.20275},
  year={2025}
}

🎨 Case Comparisons

Original	Prompt	Nano-banana	GPT-4o	Qwen-Image-Edit	UniWorld-V2 (Ours)
	Case 1: `把鸟移动到红框里，删除掉现在的鸟，最后移除红框`				（✅正确执行指令）
	Case 2: `把中间白色衣服戴口罩女生的手势改成OK`				（✅OK手势）
	Case 3: `提取画面中的吉他`				（✅弦钮上二下三）
	Case 4: `把下面的所有文字并改用书法体。中间的“月满中秋”改成“千里团圆”。并且把月亮改成模糊的月饼。`				（✅模糊月饼，✅书法字体）
	Case 5: `让画面中的形象坐在高档西餐厅，双手拿刀叉吃牛排`				（✅人物特征，✅刀叉）
	Case 6: `在中间人物身上添加 3D 网格，精确覆盖衣服褶皱、头发和细节`				（✅精确覆盖）