README.md

May 15, 2025 · View on GitHub

Effective Diffusion Transformer Architecture for Image Super-Resolution

Kun Cheng* ¹ Lei Yu* ² Zhijun Tu ² Xiao He ¹ Liyu Chen ²
Yong Guo ³ Mingrui Zhu ¹ Nannan Wang ¹ Xinbo Gao ⁴ Jie Hu ²

¹ Xidian University ² Huawei Noah's Ark Lab
³ CBG, Huawei ⁴ Chongqing University of Posts and Telecommunications

🔎 Introduction

We propose DiT-SR, an effective diffusion transformer for real-world image super resolution:

Effective yet efficient architecture design;
Adaptive Frequence Modulation (AdaFM) for time step.

⚙️ Dependencies and Installation

git clone https://github.com/kunncheng/DiT-SR.git
cd DiT-SR

conda create -n DiT_SR python=3.10 -y
conda activate DiT_SR
pip install -r requirements.txt

The training data comprises LSDIR, DIV2K, DIV8K, OutdoorSceneTraining, Flicker2K and the first 10K face images from FFHQ. We saved all the image paths to txt files. For simplicity, you can also just use the LSDIR dataset.

Pre-trained Models

Several checkpoints should be downloaded to weights folder, including autoencoder and other pre-trained models for loss calculation.

Training Scripts

Real-world Image Super-resolution

torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/realsr_DiT.yaml --save_dir ${save_dir}

Blind Face Restoration

torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/faceir_DiT.yaml --save_dir ${save_dir}

🚀 Inference and Evaluation

Real-world Image Super-resolution

Real-world datasets: RealSR, RealSet65; Synthetic datasets: LSDIR-Test; Pretrained checkpoints.

bash test_realsr.sh

Blind Face Restoration

Real-world datasets: LFW, WebPhoto, Wider; Synthetic datasets: CelebA-HQ; Pretrained checkpoints.

bash test_faceir.sh

For the synthetic datasets (LSDIR-Test and CelebA-HQ), we are unable to release them due to corporate review restrictions. However, you can generate them yourself using these scripts.

🎓 Citiation

If you find our work useful in your research, please consider citing:

@inproceedings{cheng2025effective,
  title={Effective diffusion transformer architecture for image super-resolution},
  author={Cheng, Kun and Yu, Lei and Tu, Zhijun and He, Xiao and Chen, Liyu and Guo, Yong and Zhu, Mingrui and Wang, Nannan and Gao, Xinbo and Hu, Jie},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={3},
  pages={2455--2463},
  year={2025}
}

❤️ Acknowledgement

We sincerely appreciate the code release of the following projects: ResShift, DiT, FFTFormer, SwinIR, SinSR, and BasicSR.