RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought
January 24, 2026 · View on GitHub
Abstract: Real-World Image Super-Resolution is one of the most challenging task in image restoration. However, existing methods struggle with an accurate understanding of degraded image content, leading to reconstructed results that are both low-fidelity and unnatural. We present RealSR-R1, which empowers the RealSR models with understanding and reasoning capabilities. Inspired by the success of Chain of Thought (CoT) in large language models (LLMs), we simulate the human process of handling degraded images and propose the VLCoT framework, which integrates vision and language reasoning. The framework aims to precisely restore image details by progressively generating more comprehensive text and higher-resolution images. To overcome the challenge of traditional supervised learning CoT failing to generalize to real-world scenarios, we introduce, for the first time, Group Relative Policy Optimization (GRPO) into the Real-World Image Super-Resolution task. We propose VLCoT-GRPO as a solution, which designs four reward functions: (1) Format reward, used to standardize the CoT process; (2) Degradation reward, to incentivize accurate degradation estimation; (3) Understanding reward, to ensure the accuracy of the generated content; and (4) Generation reward, where we propose using a visual expert model to evaluate the quality of generated images, encouraging the model to generate more realistic images. Extensive experiments demonstrate that our proposed RealSR-R1 can generate realistic details and accurately understand image content, particularly in semantically rich scenes or images with severe degradation.

Cold-start
Please refer to Lumina-mGPT for data processing and training code.
🛠️ Setup
git clone https://github.com/Junboooo/RealSR-R1.git
conda create -n RealSR-R1 python=3.10
conda activate RealSR-R1
bash setup.sh
RL Dataset
Please refer to the script in ./src/realsr-r1/create_json.py for dataset preparation.
VLCOTGRPO
After ready the dataset, you can start training using the following example bash script. Our bash scripts are in ./src/scripts/example.py
export DEBUG_MODE="true"
export LOG_PATH="./debug_log_2b_GRPO_coco_base65cate_6k.txt"
# export LD_LIBRARY_PATH=/opt/conda/envs/grpo/lib/python3.10/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH
# export LD_LIBRARY_PATH=/usr/local/cuda-12.1/compat:$LD_LIBRARY_PATH
# export CUDA_HOME=/usr/local/cuda
export DATA_PATH=/work/docker/GRPO/src/realsr-r1/test500_v2
export CKPT_PATH=/work/docker/GRPO/share_models/RealSR-R1_lr5e6/checkpoint-30
export SAVE_PATH=./share_models/RealSR-R1_lr2e6
export PYTHONPATH=$PYTHONPATH:/work/docker/GRPO/src/realsr-r1
#8192
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,6
export CUDA_LAUNCH_BLOCKING=1
torchrun --nproc_per_node="6" \
--nnodes="1" \
--node_rank="0" \
--master_addr="127.0.0.1" \
--master_port="11317" \
src/realsr-r1/src/open_r1/grpo.py \
--output_dir ${SAVE_PATH} \
--model_name_or_path ${CKPT_PATH} \
--dataset_name ${DATA_PATH} \
--deepspeed /work/docker/GRPO/src/realsr-r1/local_scripts/zero3.json \
--max_prompt_length 5000 \
--max_completion_length 7000 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 1 \
--logging_steps 1 \
--bf16 \
--report_to wandb \
--gradient_checkpointing true \
--attn_implementation flash_attention_2 \
--max_pixels 401408 \
--num_train_epochs 2 \
--save_steps 1 \
--learning_rate=2e-06 \
--save_only_model true \
--num_generations 5 \
Results
Quantitative Comparisons (click to expand)
Visual Comparisons (click to expand)
Citation
If RealSR-R1 helps your research or work, please consider citing the following works:
@inproceedings{qiao2024RealSR-R1,
title={RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought},
author={Qiao, Junbo and Cai, Miaomiao and Li, Wei and Liu, Yutong and Huang, Xudong and He, Gaoqi and Xie, Jiao and Hu, Jie and Chen, Xinghao and Lin, Shaohui},
booktitle={arXiv preprint arXiv:2506.16796},
year={2025}
}
Acknowledgement
We sincerely thank projects Visual-RFT, PURE and Lumina-mGPT for providing their open-source resources.