README.md

March 22, 2025 · View on GitHub


Hierarchical Transformer
for Efficient Image Super-Resolution

Xiang Zhang1 · Yulun Zhang2 · Fisher Yu1

1ETH Zürich     2MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University

ECCV 2024 - Oral

[Paper] | [Supp] | [Video] | [🤗Hugging Face] | [Visual Results] | [Models]


Abstract: Transformers have exhibited promising performance in computer vision tasks including image super-resolution (SR). However, popular transformer-based SR methods often employ window self-attention with quadratic computational complexity to window sizes, resulting in fixed small windows with limited receptive fields. In this paper, we present a general strategy to convert transformer-based SR networks to hierarchical transformers (HiT-SR), boosting SR performance with multi-scale features while maintaining an efficient design. Specifically, we first replace the commonly used fixed small windows with expanding hierarchical windows to aggregate features at different scales and establish long-range dependencies. Considering the intensive computation required for large windows, we further design a spatial-channel correlation method with linear complexity to window sizes, efficiently gathering spatial and channel information from hierarchical windows. Extensive experiments verify the effectiveness and efficiency of our HiT-SR, and our improved versions of SwinIR-Light, SwinIR-NG, and SRFormer-Light yield state-of-the-art SR results with fewer parameters, FLOPs, and faster speeds (~7x).

📑 Contents


🔥 News

  • 2025-03: 🚀The DF2K version of HiT-SRF (HiT-SRF-DF2K) is released!
  • 2024-09: 🤗HiT-SR is available at 🤗Hugging Face. Thank Niels!
  • 2024-08: 🧑‍💻HiT-SRF is available at neosr. Thank muslll!
  • 2024-07: 🎉HiT-SR is accepted by ECCV 2024! This repo is released.

🛠️ Setup

  • Python 3.8
  • PyTorch 1.8.0 + Torchvision 0.9.0
  • NVIDIA GPU + CUDA
git clone https://github.com/XiangZ-0/HiT-SR.git
conda create -n HiTSR python=3.8
conda activate HiTSR
pip install -r requirements.txt
python setup.py develop

💿 Datasets

Training and testing sets can be downloaded as follows:

Training SetTesting SetVisual Results
DIV2K (800 training images, 100 validation images) [organized training dataset DIV2K: One Drive]Set5 + Set14 + BSD100 + Urban100 + Manga109 [complete testing dataset: One Drive]One Drive

A larger training dataset DF2K (DIV2K + Flickr2K) can also be used for better performance [DF2K: One Drive]

Please download training and testing datasets and put them into the corresponding folders of datasets/. See datasets for the detail of the directory structure.

🚀 Models

Method#Param. (K)FLOPs (G)DatasetPSNR (dB)SSIMModel ZooVisual Results
HiT-SIR79253.8Urban100 (x4)26.710.8045One DriveOne Drive
HiT-SNG103257.7Urban100 (x4)26.750.8053One DriveOne Drive
HiT-SRF86658.0Urban100 (x4)26.800.8069One DriveOne Drive
HiT-SRF-DF2K86658.0Urban100 (x4)27.000.8119One DriveOne Drive

The output size is set to 1280x720 to compute FLOPs. The performance of HiT-SRF-DF2K is (PSNR/SSIM):

MethodScaleSet5Set14B100Urban100Manga109
HiT-SRF-DF2Kx238.30/0.961534.06/0.921732.41/0.902733.30/0.938739.67/0.9793
HiT-SRF-DF2Kx334.79/0.930130.68/0.848629.33/0.811329.16/0.871734.71/0.9510
HiT-SRF-DF2Kx432.63/0.899328.96/0.789927.78/0.744227.00/0.811931.55/0.9203

🏋 Training

  • Download training (DIV2K or DF2K, already processed) and testing (Set5, Set14, BSD100, Urban100, Manga109, already processed) datasets, place them in datasets/.

  • Run the following scripts. The training configuration is in options/Train/.

    # HiT-SIR, input=64x64, 4 GPUs
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SIR_x2.yml --launcher pytorch
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SIR_x3.yml --launcher pytorch
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SIR_x4.yml --launcher pytorch
    
    # HiT-SNG, input=64x64, 4 GPUs
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 basicsr/train.py -opt options/Train/train_HiT_SNG_x2.yml --launcher pytorch
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 basicsr/train.py -opt options/Train/train_HiT_SNG_x3.yml --launcher pytorch
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 basicsr/train.py -opt options/Train/train_HiT_SNG_x4.yml --launcher pytorch
    
    # HiT-SRF, input=64x64, 4 GPUs
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SRF_x2.yml --launcher pytorch
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SRF_x3.yml --launcher pytorch
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SRF_x4.yml --launcher pytorch
    
    # HiT-SRF-DF2K, input=64x64, 4 GPUs
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SRF_x2_DF2K.yml --launcher pytorch
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SRF_x3_DF2K.yml --launcher pytorch
    python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 basicsr/train.py -opt options/Train/train_HiT_SRF_x4_DF2K.yml --launcher pytorch
    
  • The training experiments will be stored in experiments/.

🧪 Testing

Test with ground-truth images

  • Download the pre-trained models and place them in experiments/pretrained_models/.

    We provide pre-trained models for efficient image SR: HiT-SIR, HiT-SNG, and HiT-SRF (x2, x3, x4).

  • Download testing datasets (Set5, Set14, BSD100, Urban100, Manga109), place them in datasets/.

  • Run the following scripts. The testing configuration is in options/Test/ (e.g., test_HiT_SIR_x2.yml).

    Note 1: You can set use_chop: True (default: False) in YML to chop the image for testing.

    # No self-ensemble
    # HiT-SIR, reproduces results in Table 2 of the main paper
    python basicsr/test.py -opt options/Test/test_HiT_SIR_x2.yml
    python basicsr/test.py -opt options/Test/test_HiT_SIR_x3.yml
    python basicsr/test.py -opt options/Test/test_HiT_SIR_x4.yml
    
    # HiT-SNG, reproduces results in Table 2 of the main paper
    python basicsr/test.py -opt options/Test/test_HiT_SNG_x2.yml
    python basicsr/test.py -opt options/Test/test_HiT_SNG_x3.yml
    python basicsr/test.py -opt options/Test/test_HiT_SNG_x4.yml
    
    # HiT-SRF, reproduces results in Table 2 of the main paper
    python basicsr/test.py -opt options/Test/test_HiT_SRF_x2.yml
    python basicsr/test.py -opt options/Test/test_HiT_SRF_x3.yml
    python basicsr/test.py -opt options/Test/test_HiT_SRF_x4.yml
    
    # HiT-SRF-DF2K, reproduces results in the above Models section
    python basicsr/test.py -opt options/Test/test_HiT_SRF_x2_DF2K.yml
    python basicsr/test.py -opt options/Test/test_HiT_SRF_x3_DF2K.yml
    python basicsr/test.py -opt options/Test/test_HiT_SRF_x4_DF2K.yml
    
  • The output is stored in results/. All visual results of our pre-trained models can be accessed via one drive.

Test without ground-truth images

  • Download the pre-trained models and place them in experiments/pretrained_models/.

    We provide pre-trained models for efficient image SR: HiT-SIR, HiT-SNG, and HiT-SRF (x2, x3, x4).

  • Put your dataset (single LR images) in datasets/single. Some example images are in this folder.

  • Run the following scripts. The testing configuration is in options/test/ (e.g., test_single_x2.yml).

    Note 1: The default model is HiT-SRF. You can use other models like HiT-SIR by modifying the YML.

    Note 2: You can set use_chop: True (default: False) in YML to chop the image for testing.

    # Test on your dataset without ground-truth images
    python basicsr/test.py -opt options/Test/test_single_x2.yml
    python basicsr/test.py -opt options/Test/test_single_x3.yml
    python basicsr/test.py -opt options/Test/test_single_x4.yml
    
  • The output is stored in results/.

📊 Results

We apply our HiT-SR approach to improve SwinIR-Light, SwinIR-NG and SRFormer-Light, corresponding to our HiT-SIR, HiT-SNG, and HiT-SRF. Compared with the original structure, our improved models achieve better SR performance while reducing computational burdens.

  • Performance improvements of HiT-SR (SIR, SNG, and SRF indicate SwinIR-Light, SwinIR-NG, and SRFormer-Light, respectively).

  • Efficiency improvements of HiT-SR (SIR, SNG, and SRF indicate SwinIR-Light, SwinIR-NG, and SRFormer-Light, respectively). The complexity metrics are calculated under x2 upscaling on an A100 GPU, with the output size set to 1280x720.

  • Overall improvements of HiT-SR

  • Convergence improvements of HiT-SR

More detailed results can be found in the paper. All visual results of can be downloaded here.

More results (click to expan)
  • Quantitative comparison

  • Qualitative comparison on challenging scenes

📎 Citation

If you find the code helpful in your research or work, please consider citing the following paper.

@inproceedings{zhang2024hitsr,
    title={HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution},
    author={Zhang, Xiang and Zhang, Yulun and Yu, Fisher},
    booktitle={ECCV},
    year={2024}
}

🏅 Acknowledgements

This project is built on DAT, SwinIR, NGramSwin, SRFormer, and BasicSR. Special thanks to their excellent works!