๐ŸŒ Introduction

April 5, 2025 ยท View on GitHub

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea

Qifan Yu1*, Wei Chow1*, Zhongqi Yue2*, Kaihang Pan1, Yang Wu3, Xiaoyang Wan1,

Juncheng Li1, Siliang Tang1, Hanwang Zhang2, Yueting Zhuang1

1Zhejiang University, 2Nanyang Technological University, 3Alibaba Group

*Equal Contribution.

arXiv Dataset Checkpoint GitHub GitHub Page

๐ŸŒ Introduction

AnyEdit is a comprehensive multimodal instruction editing dataset, comprising 2.5 million high-quality editing pairs spanning over 20 editing types across five domains. We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results. Using the dataset, we further train a novel AnyEdit Stable Diffusion with task-aware routing and learnable task embedding for unified image editing. Comprehensive experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models. This presents prospects for developing instruction-driven image editing models that support human creativity.

๐Ÿ”ฅ News

  • [2025.04.05] ๐ŸŽ‰AnyEdit has been accepted by CVPR 2025 (5,5,5) as Oral presentation.
  • We have released the training & inference scripts and model weight of AnySD. If you want more details on training or using AnySD to complete your desired image editing, please refer to our model repo for more details.
  • [2024.12.23] We have finished uploading the AnyEdit datasets with AnyEdit-Test benchmark and the AnyEdit data curation pipelines.

TODO

  • Release AnyEdit datasets.
  • Release AnyEdit-Test Benchmark.
  • Release data curation pipelines.
  • Release inference code.
  • Release training scripts.

๐Ÿ’ก Overview

Full training set and dev set are publicly available on Huggingface. We only provide a zip file for the test split to prevent potential data contamination from foundation models crawling the test set for training. Please download the test set here. image We comprehensively categorize image editing tasks into 5 groups based on different editing capabilities:

  • (a) Local Editing which focuses on region-based editing (green area);
  • (b) Global Editing which focuses on the full range of image rendering (yellow area);
  • (c) Camera Move Editing which focuses on viewpoints changing instead of scenes (gray area);
  • (d) Implicit Editing which requires commonsense knowledge to complete complex editing (orange area);
  • (e) Visual Editing which encompasses additional visual inputs, addressing the requirements for multi-modal editing (blue area).

โญ Steps for AnyEdit Collection

image

  1. General Data Preparation
  2. Diverse Instruction Generation
  3. Adaptive Editing Pipelines
  4. Data Quality Enhancement

Instruction Format

{
  "edit": "change the airplane to green",  # edited instruction
  "edited object": "airplane",   # the edited region, only for local editing, else is None
  "input": "a small airplane sits stationary on a piece of concrete.",  # the caption of the original image 
  "output": "A green small airplane sits stationary on a piece of concrete.",  # the caption of the edited image 
  "edit_type": "color_alter",  # editing type
  "visual_input": "None", # the reference image for visual input instruction, else is None
  "image_file": "COCO_train2014_000000521165.jpg", # the file of original image
  "edited_file": "xxxxx.png"  # the file of edited image
}

Instruciton Pipeline

image

๐Ÿ› ๏ธ Setups for AnyEdit

  1. Conda a new python environment and Download the pretrained weights
bash setup.sh
  1. Download all of our candidate datasets.
  2. Instruction Generation (please ref to CaptionsGenerator).
  3. Pre-filter for target images (before editing)
CUDA_VISIBLE_DEVICES=2 python pre_filter.py --instruction-path [xx.json] --instruction-type [] --image-root []
  1. Image Editing (refer to scripts for more examples)
  2. Post-filter for final datasets
CUDA_VISIBLE_DEVICES=2 python post_filter.py --instruction-type []

๐Ÿงณ Project Folder Structure

  • Datasets/
    • anyedit_datasets/
      • add
      • remove
      • replace
    • coco/
      • train2014/
        • 0.jpg
        • 1.jpg
    • flux_coco_images/
      • 0.jpg
      • 1.jpg
    • add_postfilter.json
    • remove_postfilter.json
    • replace_postfilter.json

๐ŸŽ–๏ธ AnyEdit Editing Results (Part โ… )

Original ImageEdit TypeEdit InstructionEdited Image
Action ChangeMake the action of the plane to taking off
AddInclude a candle on top of the cake
Appearance AlterMake the horses wearing garlands
Background ChangeAlter the background to a garden
Color AlterAlter the color of frame to orange
CountingThe number of camels increases to two
Implicit ChangeWhat will happen if the sun never go down?
Material ChangeChange the material of kitten like aluminium_foil
MovementShift the man in the image
OutpaintOutpaint the image as you can
RelationPlace two yellow flowers in the middle of the table
RemoveRemove the person on skis
ReplaceReplace the elephant with a seal
ResizeZoom out the giraffes in the image
Rotation ChangeTurn the bag counterclockwise
Style ChangeChange the style of the image to contrast
Textual ChangeReplace the text 'eddie' with 'stobart'
Tune TransferChange the season to autumn

๐ŸŽ–๏ธ AnyEdit Editing Results (Part โ…ก)

Original ImageReference ImageEdit TypeEdit InstructionEdited Image
Visual BboxFollow the given bounding box [v*] to remove the skis
Visual DepthRefer to the given depth image [v*] to remove umbrella
Visual Material TransferChange the material of monument like linen
Visual ReferenceReplace the elephants to [v*]
Visual ScribbleRefer to the given scribble [v*] to replace the toilet paper with a book
Visual SegmentFollow the given segment image [v*] to remove truck
Visual SketchWatch the given sketch [v*] to replace the bananas to apples

๐Ÿ“œ Citation

If you find this work useful for your research, please cite our paper and star our git repo:

@article{yu2024anyedit,
  title={AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea},
  author={Yu, Qifan and Chow, Wei and Yue, Zhongqi and Pan, Kaihang and Wu, Yang and Wan, Xiaoyang and Li, Juncheng and Tang, Siliang and Zhang, Hanwang and Zhuang, Yueting},
  journal={arXiv preprint arXiv:2411.15738},
  year={2024}
}