VAREdit

February 4, 2026 Β· View on GitHub

This is the official repository of ICLR 2026 paper: Visual Autoregressive Modeling for Instruction-Guided Image Editing

VAREdit Demo

VAREdit is an advanced image editing model built on the Infinity models, designed for high-quality instruction-based image editing.

Try our online demos: πŸ€—VAREdit-8B-1024 and πŸ€—VAREdit-8B-512.

🌟 Key Features

  • Strong Instruction Follow: Follows instructions more accurately due to the autoregressive nature of the model.
  • Efficient Inference: Optimized for fast generation with less than 1 seconds for 8B model.
  • Flexible Resolution: Supports 512Γ—512 and 1024Γ—1024 image resolutions

VAREdit Demo

πŸ“Š Model Variants

Model VariantResolutionsHuggingFace ModelTime (H800)VRAM (GB)
VAREdit-8B-512512Γ—512VAREdit-8B-512~0.7s50.41
VAREdit-8B-10241024Γ—1024VAREdit-8B-1024~1.99s50.41

πŸš€ Quick Start

Prerequisites

Before starting, ensure you have:

  • Python 3.8+
  • CUDA-compatible GPU with sufficient VRAM (8GB+ for 2B model, 24GB+ for 8B model)
  • Required dependencies installed

Installation

  1. Clone the repository
git clone https://github.com/HiDream-ai/VAREdit.git
cd VAREdit
  1. Install dependencies
pip install -r requirements.txt
pip install flash_attn
  1. Download model checkpoints

Download the VAREdit model checkpoints:

# Download from HuggingFace
git lfs install
git clone https://huggingface.co/HiDream-ai/VAREdit

Basic Usage

from infer import load_model, generate_image

model_components = load_model(
    pretrain_root="HiDream-ai/VAREdit",
    model_path="HiDream-ai/VAREdit/8B-1024.pth",
    model_size="8B",
    image_size=1024
)

# Generate edited image
edited_image = generate_image(
    model_components,
    src_img_path="assets/test.jpg",
    instruction="Add glasses to this girl and change hair color to red",
    cfg=3.0,  # Classifier-free guidance scale
    tau=0.1,  # Temperature parameter
    seed=42  # Optional random seed
)

πŸ“ Detailed Configuration

Model Sampling Parameters

ParameterDescriptionDefault
cfgClassifier-free guidance scale3.0
tauTemperature for sampling0.1
seedRandom seed for reproducibility-1 (random)

πŸ“‚ Project Structure

VAREdit/
β”œβ”€β”€ infer.py              # Main inference script
β”œβ”€β”€ train.py              # Main training script
β”œβ”€β”€ trainer.py            # Main trainer script
β”œβ”€β”€ infinity/             # Core model implementations
β”‚   β”œβ”€β”€ models/          # Model architectures
β”‚   β”œβ”€β”€ dataset/         # Data processing utilities
β”‚   └── utils/           # Helper functions
β”œβ”€β”€ tools/               # Additional tools and scripts
β”‚   └── run_infinity.py  # Model execution utilities
β”œβ”€β”€ assets/              # Demo images and resources
└── README.md           # This file

πŸ“Š Performance Benchmarks

MethodSizeEMU-Edit Bal.PIE-Bench Bal.Time (A800)
InstructPix2Pix1.1B2.9234.0343.5s
UltraEdit7.7B4.5415.5802.6s
OmniGen3.8B4.6743.49216.5s
AnySD2.9B3.1293.3263.4s
EditAR0.8B3.3054.70745.5s
ACE++16.9B2.0762.5745.7s
ICEdit17.0B4.7854.9338.4s
VAREdit (256px)2.2B5.5656.6840.5s
VAREdit (512px)2.2B5.6626.9960.7s
VAREdit (512px)8.4B7.8928.1051.2s
VAREdit (1024px)8.4B7.3797.6883.9s

Note: The released 8B models are trained longer and on more data.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“š Citation

If you use VAREdit in your research, please cite:

@inproceedings{varedit2026,
  title={Visual Autoregressive Modeling for Instruction-Guided Image Editing},
  author={Mao, Qingyang and Cai, Qi and Li, Yehao and Pan, Yingwei and Cheng, Mingyue and Yao, Ting and Liu, Qi and Mei, Tao},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

πŸ™ Acknowledgments

Note: This project is under active development. Features and code may change.