NSARM

October 17, 2025 Β· View on GitHub

NSARM: Next-Scale Autoregressive Modeling for Robust Real-World Image Super-Resolution

Β Β 

Authors: Xiangtao Kong, Rongyuan Wu, Shuaizheng Liu, Lingchen Sun, and Lei Zhang

Abstract

Most recent real-world image super-resolution (Real-ISR) methods employ pre-trained text-to-image (T2I) diffusion models to synthesize the high-quality image either from random Gaussian noise, which yields realistic results but is slow due to iterative denoising, or directly from the input low-quality image, which is efficient but at the price of lower output quality. These approaches train ControlNet or LoRA modules while keeping the pre-trained model fixed, which often introduces over-enhanced artifacts and hallucinations, suffering from the robustness to inputs of varying degradations. Recent visual autoregressive (AR) models, such as pre-trained Infinity, can provide strong T2I generation capabilities while offering superior efficiency by using the bitwise next-scale prediction strategy. Building upon next-scale prediction, we introduce a robust Real-ISR framework, namely Next-Scale Autoregressive Modeling (NSARM). Specifically, we train NSARM in two stages: a transformation network is first trained to map the input low-quality image to preliminary scales, followed by an end-to-end full-model fine-tuning. Such a comprehensive fine-tuning enhances the robustness of NSARM in Real-ISR tasks without compromising its generative capability. Extensive quantitative and qualitative evaluations demonstrate that as a pure AR model, NSARM achieves superior visual results over existing Real-ISR methods while maintaining a fast inference speed. Most importantly, it demonstrates much higher robustness to the quality of input images, showing stronger generalization performance.

:star: If NSARM is helpful to your images or projects, please help star this repo. Thanks! :hugs:

πŸ”Ž Overview framework

Demo Image

πŸ“Œ Quantitative Results

NSARM achieves the best general performance in perception metircs across various datasets.

Demo Image

NSARM demonstrates much higher robustness, showing stronger generalization performance.

Demo Image

NSARM demonstrates substantial inference speed advantages over compareable methods.

Demo Image

πŸ“· Visual Results

NSARM achieves superior visual results over existing Real-ISR methods.

Demo Image

Demo Image Demo Image Demo Image Demo Image

βš™οΈ Dependencies and Installation

## git clone this repository
git clone https://github.com/Xiangtaokong/NSARM
cd NSARM

# create an environment with python >= 3.10
conda create -n NSARM python=3.10
conda activate NSARM
pip install -r requirements.txt 

OR refer to the environment of BasicSR and Infinity.

πŸš€ Test

Setp 1 Download the pre-trained models

Download pretrained VAE and T5.

Download NSARM:

Baidu Drive. Key: eqhc

Google Drive. (for complete version)

Use the following command to obtain the model and verify its completeness.

cat NSARM_part_* > NSARM.pth

md5sum NSARM.pth

the md5 output should be: 16905db52d64fd44c365b6a963a6598d *NSARM.pth

Currently, there are some permission issues with the weight files upload. We will complete the uploading of Huggingface as soon as possible.

Setp 2 Edit the test script

Edit the file NSARM/scripts/infer.sh.

Please modify the path of your model and data, mainly including:

infinity_model_path= NSARM.pth
vae_path= VAE path
text_encoder_ckpt= T5 path
--input_info 
--save_dir 

Setp 3 Run the command

cd NSARM
bash scripts/infer.sh

The results will be put in your --save_dir .

The training code will be released after paper submission.

❀️ Acknowledgments

This project is based on BasicSR and Infinity.

πŸ“§ Contact

If you have any questions, please feel free to contact: xiangtao.kong@connect.polyu.hk

πŸŽ“Citations

If our code helps your research or work, please consider citing our paper. The following are BibTeX references:

@article{kong2025nsarm,
  title={NSARM: Next-Scale Autoregressive Modeling for Robust Real-World Image Super-Resolution},
  author={Kong, Xiangtao and Wu, Rongyuan and Liu, Shuaizheng and Sun, Lingchen and Zhang, Lei},
  journal={arXiv preprint arXiv:2510.00820},
  year={2025}
}

🎫 License

This project is released under the Apache 2.0 license.