README.md

February 19, 2025 · View on GitHub

Scale-wise Text-conditioned AutoRegressive image generation

Important: We have made the weights and code for STAR available in a new repository. Click here to access it!

News

[2025-02] We have released official Codebase and weights at Hugging Face!
[2024-06] STAR Technical Report is released.

Introduction

STAR, the first scale-wise text-to-image model based on VAR, supports resolutions from 256×256 to 1024×1024.

By incorporating text conditioning, normalized 2D RoPE, and causal-driven stable sampling, STAR outperforms existing models in fidelity, consistency, and quality, with a faster generation speed of 2.21s for 1024×1024 images on an A100.

CLICK for Detailed Introduction & Architecture

Unlike VAR, which focuses on a toy category-based auto-regressive generation for 256 images, STAR explores the potential of this scale-wise auto-regressive paradigm in real-world scenarios, aiming to make AR as effective as diffusion models. To achieve this, we: + replace the single category token with a text encoder and cross-attention for detailed text guidance; + introduce cross-scale normalized RoPE to stabilize structural learning and reduce training costs, unleasing the power for high-resolution training; + propose a new sampling method to overcome the intrinsic simultaneous sampling issue in AR models. While these approaches have been (partially) explored to diffusion models, we are the first to validate and apply them in auto-regressive image generation, resulting in high-resolution, text-conditioned synthesis and can get StableDiffusion 2 performance.

framework of STAR

@article{ma2024star,
  title={STAR: Scale-wise Text-conditioned AutoRegressive image generation}, 
  author={Xiaoxiao Ma and Mohan Zhou and Tao Liang and Yalong Bai and Tiejun Zhao and Biye Li and Huaian Chen and Yi Jin},
  journal={arXiv preprint arXiv:2406.10797},
  year={2024}
}

README.md

Scale-wise Text-conditioned AutoRegressive image generation

Important: We have made the weights and code for STAR available in a new repository. Click here to access it!

News

Introduction

Quantitative Performance

Qualitative Performance

Reproduction

Citation