GRN: Generative Refinement Networks

April 29, 2026 ยท View on GitHub

arXiv Models Demo License GitHub stars

This is the official implementation of the paper Generative Refinement Networks for Visual Synthesis. Neither diffusion nor autoregressive โ€” GRN is a third way. ๐Ÿง  Refines globally like an artist. โšก Generates adaptively by complexity. ๐Ÿ† New SOTA across image & video. The visual generation paradigm just got rewritten.


๐Ÿ“‹ Table of Contents


๐Ÿš€ Demo

Try our interactive Text-to-Image demo on ๐Ÿค— Hugging Face Space:

GRN T2I Demo

Experience the power of Generative Refinement Networks firsthand by generating images from text prompts directly in your browser!


๐ŸŒŸ Introduction

Diffusion models dominate visual generation but they allocate uniform computational effort to samples with varying levels of complexity. Autoregressive (AR) models are complexity-aware, as evidenced by their variable likelihoods, but suffer from lossy tokenization and error accumulation.

We introduce Generative Refinement Networks (GRN), a new visual synthesis paradigm that addresses these issues:

  • Near-lossless tokenization via Hierarchical Binary Quantization (HBQ)
  • Global refinement mechanism that progressively perfects outputs like a human artist
  • Entropy-guided sampling for complexity-aware, adaptive-step generation

GRN achieves state-of-the-art results on ImageNet reconstruction and class-conditional generation, and scales effectively to text-to-image and text-to-video tasks.


Generative Refinement Framework
Framework

Starting from a random token map, GRN randomly selects more predictions at each step and refines all input tokens. For example, compared to the second step, the third step filled six new tokens (pink), kept two tokens (blue), erased two tokens (yellow), and left six tokens blank (gray).

Class-to-Image Examples
Class-to-Image Examples
Text-to-Image Examples
Text-to-Image Examples
Text-to-Video Examples

๐Ÿ‰ Open-Source Plan

GRN adopts a minimalist and self-contained design. This implementation is in PyTorch + GPU.

TaskCheckpointsInference CodeTraining Code
T2Vโฌœโฌœโœ…
T2Iโœ…โฌœโœ…
C2Iโฌœโœ…โœ…

๐Ÿ“ฆ Model Zoo

ModelCheckpoints
Tokenizersโœ… ImageNet Tokenizer
โœ… Joint Image/Video Tokenizer
GRN_ind_C2Iโœ… B
โฌœ L (TBD)
โฌœ H (TBD)
โฌœ G (TBD)
GRN_bit_T2Iโœ… GRN_T2I
GRN_bit_T2Vโฌœ GRN_T2V (TBD)

๐Ÿ› ๏ธ Installation

Step 1: Clone the repository

git clone https://github.com/MGenAI/GRN
cd GRN

Step 2: Create conda environment

A suitable conda environment named GRN can be created and activated with:

conda env create -f environment.yaml
conda activate GRN

Troubleshooting

If you get undefined symbol: iJIT_NotifyEvent when importing torch, simply:

pip uninstall torch
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu124

Check this issue for more details.


๐Ÿ–ผ๏ธ Class-to-Image

Dataset

Download ImageNet dataset, and place it in your IMAGENET_PATH.

Training

All training scripts are located in scripts/c2i/. We suggest using 8x80GB GPUs for most models.

ModelTraining ScriptGPUs Required
GRN_ind_Bbash scripts/c2i/train_GRN_ind_B.sh8x80GB
GRN_bit_Bbash scripts/c2i/train_GRN_bit_B.sh8x80GB
GRN_ind_Lbash scripts/c2i/train_GRN_ind_L.sh8x80GB
GRN_ind_Hbash scripts/c2i/train_GRN_ind_H.sh16x80GB
GRN_ind_Gbash scripts/c2i/train_GRN_ind_G.sh32x80GB

Evaluation

PyTorch pre-trained models are available here.

All evaluation scripts are located in scripts/c2i/. We suggest using 8x80GB vRAM GPUs.

ModelEvaluation Script
GRN_ind_Bbash scripts/c2i/eval_GRN_ind_B.sh
GRN_bit_Bbash scripts/c2i/eval_GRN_bit_B.sh
GRN_ind_Lbash scripts/c2i/eval_GRN_ind_L.sh
GRN_ind_Hbash scripts/c2i/eval_GRN_ind_H.sh
GRN_ind_Gbash scripts/c2i/eval_GRN_ind_G.sh

We use torch-fidelity to evaluate FID and IS against a reference image folder or statistics. We use the JiT's pre-computed reference stats under grn/utils_c2i/fid_stats.


๐ŸŽจ Text-to-Image

Inference

You can simply run python3 t2iv_infer_simple.py or use the following code:

from PIL import Image
import torch
from grn_pipeline import GRNPipeline

# Load pipeline
pipeline = GRNPipeline.from_pretrained(hf_repo_id='bytedance-research/GRN', device='cpu')
pipeline = pipeline.to('cuda')

# Generate one image
result = pipeline(
    prompt="A cute cat playing in the garden",
    guidance_scale=3.0,
    temperature=1.1,
    num_inference_steps=50,
    width=1024,
    height=1024,
    content_type='image',
    seed=42
)
image = result.images[0]
image.save('./generated_image.jpg')

๐Ÿ“ง Contact

If you are interested in scaling GRN for image generation / image editing / video generation / video editing / unified model directions, please feel free to reach out!

๐Ÿ“ง Email: hanjian.thu123@bytedance.com


๐Ÿค— Acknowledgements


๐Ÿ“ Citation

If you find our work useful, please consider citing:

@misc{han2026grn,
      title={Generative Refinement Networks for Visual Synthesis}, 
      author={Jian Han and Jinlai Liu and Jiahuan Wang and Bingyue Peng and Zehuan Yuan},
      year={2026},
      eprint={2604.13030},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.13030}, 
}