README.md
January 8, 2026 · View on GitHub
Exploiting Diffusion Prior for Task-driven Image Restoration [ICCV 2025]
Jaeha Kim, Junghun Oh, Kyoung Mu Lee
Seoul National University, Korea
:bookmark_tabs: TL;DR: We propose a diffusion-based image restoration method that benefits high-level vision tasks.
Real-world demo results (detection)
| Low-quality inputs | EDTR results |
|---|---|
![]() | ![]() |
![]() | ![]() |
![]() | ![]() |
Instructions are provided below. Try it with your own images! :smile:
If you find EDTR useful, please star the repo. Thank you! :star::pray:
:loudspeaker: Update
- 2025.09.26: Release the real-world EDTR detection model and demo examples. The code has also been simplified.
- 2025.09.01: The code is released.
:gear: Installation
Our code has been tested with PyTorch 2.2.2 and CUDA 11.8.
conda create -n edtr python=3.10
conda activate edtr
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
python setup.py
Note that it will automatically download the pre-trained Stable Diffusion v2.1 weights.
(2026.01.08) It seems that Stable Diffusion 2.1 checkpoints are no longer publicly downloadable from Hugging Face.
If you have obtained the official Stable Diffusion 2.1 checkpoint (i.e., v2-1_512-nonema-pruned.ckpt),
place it under the weights/ directory to use this project; the code should then work as expected.
:rocket: Quick start
-
Download the pre-trained EDTR model for real-world detection from here and place it in the
weights/directory. -
Run the following command:
CUDA_VISIBLE_DEVICES=0 python demo.py --input inputs/demo --output results/demo
The results will be saved in results/demo/.
NOTE: You can also use your own images as input, but we recommend keeping the input image size below 512×512.
By default, restored images are resized so that the longer axis is 512. You can set a custom upscaling ratio with the --scale option, but keep in mind that excessive values may introduce artifacts.
If you encounter an out-of-GPU-memory error, try using the --vae-encoder-tiled and --cldm-tiled options.
:desktop_computer: Inference
Below are the inference instructions for reproducing the results reported in our main manuscript.
Datasets (validation)
We use the CUB-200-2011 dataset for classification and the PASCAL VOC2012 dataset for segmentation and detection.
For evaluation, we generate a synthetic degraded image set derived from these datasets.
You can download our processed (degraded) versions here, or generate following this instruction.
If you download our processed versions, please unzip the file and place the degraded datasets under datasets/source, so the structure looks like: datasets/source/CUB200 and datasets/source/VOC.
Pretrained Models
We provide pretrained models for EDTR and other comparison methods used in our main manusript.
Please unzip the file and place it in the proper experiments/ directory. For example, for EDTR model for detection, place the 007_edtr-s4 folder in experiments/det/voc2012/.
| Model Name | Classification (CUB200) | Segmentation (VOC2012) | Detection (VOC2012) |
|---|---|---|---|
| EDTR | download | download | download |
| Oracle, No-restoration, SR4IR, DiffBIR | download | download | download |
Command
You can evaluate the pre-trained EDTR detection model with the following command:
CUDA_VISIBLE_DEVICES=0 accelerate launch --main_process_port 4177 main/det/test_edtr.py --config configs/det/voc2012/test/007_edtr-s4.yaml --save-img
NOTE: You can find all the inference commands, including those for comparison methods, in the script.
:wrench: Train
NOTE: We recommend using a GPU setup with at least 40GB*4 or 80GB*2 of memory due to the large model size. In our experiments, we used either 4×A6000 GPUs or 2×H100 GPUs.
Datasets
| Classification | Segmentation | Detection |
|---|---|---|
| CUB200 (Kaggle) | VOC2012 (Kaggle) | VOC2012 (Kaggle) |
| - | - | COCO2017 |
For downloading the CUB200 and VOC2012 datasets, we recommend using the Kaggle links provided, as the official servers can sometimes be slow or unavailable.
Download the archive.zip file by clicking the "Download" button in the upper-right corner and selecting "Download dataset as zip" (Kaggle login required).
Then, place the file in the datasets/source/ directory and run the following commands:
python datasets/preprocess/cub200.py # for CUB200
python datasets/preprocess/voc2012.py # for VOC2012
These scripts will automatically extract the zip files and reorganize the datasets to match our code.
:warning: Since the downloaded folder names are the same for CUB200 and VOC2012, process them one at a time:
- Download CUB200, then run the preprocessing command for CUB200.
- Download VOC2012, then run the preprocessing command for VOC2012.
NOTE: For training EDTR model for real-world detection, please refer this for more details.
Pretrained models
Please download the codeformer_swinir.ckpt and place it in the weights/ directory. (It is used to initialize the SwinIR model, and the link is provided by DiffBIR.)
Command
Below are the example training commands for EDTR in detection.
-
Training pre-restoration model:
CUDA_VISIBLE_DEVICES=0,1 accelerate launch --main_process_port 4177 main/det/train_swinir-pre.py --config configs/det/voc2012/train/002_swinir-pre.yaml -
Training EDTR model:
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --main_process_port 4177 main/det/train_edtr.py --config configs/det/voc2012/train/007_edtr-s4.yaml
NOTE: You can find all kinds of training commands in the script.
:star: Citation
Please cite us if our work is useful for your research.
@inproceedings{kim2025exploiting,
title={Exploiting Diffusion Prior for Task-driven Image Restoration},
author={Kim, Jaeha and Oh, Junghun and Lee, Kyoung Mu},
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
year={2025}
}
:clap: Acknowledgement
This code is based on DiffBIR and SR4IR. We greatly appreciate their awesome works!
:e-mail: Contact
If you have any questions, please feel free to contact with me at jhkim97s2@gmail.com.





