Papers

July 15, 2025 ยท View on GitHub

image

Awesome License: MIT Made With Love arXiv visitors

The repository is based on our survey Diffusion Model-Based Image Editing: A Survey (TPAMI 2025).

Yi Huang*, Jiancheng Huang*, Yifan Liu*, Mingfu Yan*, Jiaxi Lv*, Jianzhuang Liu*, Wei Xiong, He Zhang, Liangliang Cao, Shifeng Chen

Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS), Adobe Inc, Apple Inc, Southern University of Science and Technology (SUSTech)

Abstract

Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse the process of gradually adding noise to images, allowing them to generate high-quality samples from a complex distribution. In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the ๏ฌeld. We delve into a thorough analysis and categorization of these works from multiple perspectives, including learning strategies, user-input conditions, and the array of specific editing tasks that can be accomplished. In addition, we pay special attention to image inpainting and outpainting, and explore both earlier traditional context-driven and current multimodal conditional methods, offering a comprehensive analysis of their methodologies. To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval, featuring an innovative metric, LMM Score. Finally, we address current limitations and envision some potential directions for future research.

๐Ÿ”– News!!!

๐Ÿ“Œ We are actively tracking the latest research and welcome contributions to our repository and survey paper. If your studies are relevant, please feel free to contact us.

๐Ÿ“ฐ 2025-02-11: ๐Ÿฅณ Congrats, our paper is accepted by TPAMI 2025!!

๐Ÿ“ฐ 2024-10-25: Our benchmark EditEval_v2 is now released.

๐Ÿ“ฐ 2024-03-22: The template of computing LMM Score using GPT-4V, along with a corresponding leaderboard comparing several leading methods, is released.

๐Ÿ“ฐ 2024-03-14: Our benchmark EditEval_v1 is now released.

๐Ÿ“ฐ 2024-03-06: We establish a template for paper submissions. This template is accessible by navigating to the New Issue button within Issues or by clicking here. Once there, please select the Paper Submission Form and complete it following the guidelines provided.

๐Ÿ“ฐ 2024-02-28: Our comprehensive survey paper, summarizing related methods published before February 1, 2024, is now available.

๐Ÿ” BibTeX

If you find this work helpful in your research, welcome to cite the paper and give a โญ.

@article{huang2025diffusion,
  title={Diffusion Model-Based Image Editing: A Survey},
  author={Huang, Yi and Huang, Jiancheng and Liu, Yifan and Yan, Mingfu and Lv, Jiaxi and Liu, Jianzhuang and Xiong, Wei and Zhang, He and Cao, Liangliang and Chen, Shifeng},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2025},
  publisher={IEEE}
}

Table of contents

Papers

Training-Based

Training-Based: Domain-Specific Editing

TitlePublicationDate
TexFit: Text-Driven Fashion Image Editing with Diffusion ModelsAAAI 20242024.03
CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image ManipulationNeurIPS 20232023.10
Stylediffusion: Controllable disentangled style transfer via diffusion modelsICCV 20232023.08
Hierarchical diffusion autoencoders and disentangled image manipulationWACV 20242023.04
Towards Real-time Text-driven Image Manipulation with Unconditional Diffusion ModelsarXiv 20232023.04
Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion ModelsCVPR workshop 20232022.12
Diffstyler: Controllable dual diffusion for text-driven image stylizationTNNLS 20242022.11
Diffusion Models Already Have A Semantic Latent SpaceICLR 20222022.10
Egsde: Unpaired image-to-image translation via energy-guided stochastic differential equationsNeurIPS 20222022.07
Diffusion autoencoders: Toward a meaningful and decodable representationCVPR 20222021.11
Unit-ddpm: Unpaired image translation with denoising diffusion probabilistic modelsarXiv 20212021.04
Diffusionclip: Text-guided diffusion models for robust image manipulationCVPR 20222021.01

Training-Based: Reference and Attribute Guided Editing

TitlePublicationDate
MagicEraser: Erasing Any Objects via Semantics-Aware ControlECCV 20242024.10
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout ControlCVPR 20242023.12
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image InpaintingarXiv 20232023.12
DreamInpainter: Text-Guided Subject-Driven Image Inpainting with Diffusion ModelsarXiv 20232023.12
Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion ModelACM MM 20232023.10
Face Aging via Diffusion-based EditingBMVC 20232023.09
Anydoor: Zero-shot object-level image customizationCVPR 20242023.07
Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion ModelICASSP 20242023.06
Text-to-image editing by image information removalWACV 20242023.05
Reference-based Image Composition with Sketch via Structure-aware Diffusion ModelCVPR workshop 20232023.04
PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image EditorCVPR 20242023.03
Imagen editor and editbench: Advancing and evaluating text-guided image inpaintingCVPR 20232022.12
Smartbrush: Text and shape guided object inpainting with diffusion modelCVPR 20232022.12
ObjectStitch: Object Compositing With Diffusion ModelCVPR 20232022.12
Paint by example: Exemplar-based image editing with diffusion modelsCVPR 20232022.11

Training-Based: Instructional Editing

TitlePublicationDate
UIP2P: Unsupervised Instruction-based Image Editing via Edit Reversibility ConstraintICCV 20252024.12
FreeEdit: Mask-free Reference-based Image Editing with Multi-modal InstructionarXiv 20242024.09
EditWorld: Simulating World Dynamics for Instruction-Following Image EditingarXiv 20242024.05
InstructGIE: Towards Generalizable Image EditingarXiv 20242024.03
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language ModelsCVPR 20242023.12
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction FollowingarXiv 20232023.12
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention ModulationCVPR 20242023.12
Emu edit: Precise image editing via recognition and generation tasksarXiv 20232023.11
Guiding instruction-based image editing via multimodal large language modelsICLR 20242023.09
Instructdiffusion: A generalist modeling interface for vision tasksCVPR 20242023.09
MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert ControllersarXiv 20232023.09
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image ManipulationNeurIPS 20232023.08
Inst-Inpaint: Instructing to Remove Objects with Diffusion ModelsarXiv 20232023.04
HIVE: Harnessing Human Feedback for Instructional Visual EditingCVPR 20242023.03
DialogPaint: A Dialog-based Image Editing ModelarXiv 20232023.01
Learning to Follow Object-Centric Image Editing Instructions FaithfullyEMNLP 20232023.01
Instructpix2pix: Learning to follow image editing instructionsCVPR 20232022.11

Training-Based: Pseudo-Target Retrieval-Based Editing

TitlePublicationDate
Text-Driven Image Editing via Learnable RegionsCVPR 20242023.11
iEdit: Localised Text-guided Image Editing with Weak SupervisionarXiv 20232023.05
ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space ManipulationarXiv 20232023.05

Testing-Time Finetuning

Testing-Time Finetuning: Denosing Model Finetuning

TitlePublicationDate
Kv inversion: Kv embeddings learning for text-conditioned real image action editingarXiv 20232023.09
Custom-edit: Text-guided image editing with customized diffusion modelsCVPR workshop 20232023.05
Unitune: Text-driven image editing by fine tuning an image generation model on a single imageACM TOG 20232022.10

Testing-Time Finetuning: Embeddings Finetuning

TitlePublicationDate
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image EditingNeurIPS 20232023.09
Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion ModelsICCV 20232023.05
Uncovering the Disentanglement Capability in Text-to-Image Diffusion ModelsCVPR 20232022.12
Null-text inversion for editing real images using guided diffusion modelsCVPR 20232022.11

Testing-Time Finetuning: Guidance with Hypernetworks

TitlePublicationDate
StyleDiffusion: Prompt-Embedding Inversion for Text-Based EditingarXiv 20232023.05
Inversion-based creativity transfer with diffusion modelsCVPR 20232022.11

Testing-Time Finetuning: Latent Variable Optimization

TitlePublicationDate
StableDrag: Stable Dragging for Point-based Image EditingarXiv 20242024.03
FreeDrag: Feature Dragging for Reliable Point-based Image EditingCVPR 20242023.12
Contrastive Denoising Score for Text-guided Latent Diffusion Image EditingCVPR 20242023.11
MagicRemover: Tuning-free Text-guided Image inpainting with Diffusion ModelsarXiv 20232023.10
Dragondiffusion: Enabling drag-style manipulation on diffusion modelsICLR 20242023.07
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image EditingCVPR 20242023.06
Delta denoising scoreICCV 20232023.04
Directed Diffusion: Direct Control of Object Placement through Attention GuidanceAAAI 20242023.02
Diffusion-based Image Translation using disentangled style and content representationICLR 20222022.09

Testing-Time Finetuning: Hybrid Finetuning

TitlePublicationDate
Forgedit: Text Guided Image Editing via Learning and ForgettingarXiv 20232023.09
LayerDiffusion: Layered Controlled Image Editing with Diffusion ModelsarXiv 20232023.05
Sine: Single image editing with text-to-image diffusion modelsCVPR 20232022.12
Imagic: Text-Based Real Image Editing With Diffusion ModelsCVPR 20232022.10

Training and Finetuning Free

Training and Finetuning Free: Input Text Refinement

TitlePublicationDate
User-friendly Image Editing with Minimal Text Input: Leveraging Captioning and Injection TechniquesarXiv 20232023.06
ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image TranslationarXiv 20232023.05
InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User InstructionsarXiv 20232023.05
Preditor: Text guided image editing with diffusion priorarXiv 20232023.02

Training and Finetuning Free: Inversion/Sampling Modification

TitlePublicationDate
FireFlow: Fast Inversion of Rectified Flow for Image Semantic EditingarXiv 20242024.12
Inversion-Free Image Editing with Natural LanguageCVPR 20242023.12
Fixed-point Inversion for Text-to-image diffusion modelsarXiv 20232023.12
Tuning-Free Inversion-Enhanced Control for Consistent Image EditingarXiv 20232023.12
The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image EditingICLR 20242023.11
LEDITS++: Limitless Image Editing using Text-to-Image ModelsCVPR 20242023.11
A latent space of stochastic diffusion models for zero-shot image editing and guidanceICCV 20232023.10
Effective real image editing with accelerated iterative diffusion inversionICCV 20232023.09
Fec: Three finetuning-free methods to enhance consistency for real image editingarXiv 20232023.09
Iterative multi-granular image editing using diffusion modelsWACV 20242023.09
ProxEdit: Improving Tuning-Free Real Image Editing With Proximal GuidanceWACV 20242023.06
Diffusion self-guidance for controllable image generationNeurIPS 20232023.06
Diffusion Brush: A Latent Diffusion Model-based Editing Tool for AI-generated ImagesarXiv 20232023.06
Null-text guidance in diffusion models is secretly a cartoon-style creatorACM MM 20232023.05
Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion ModelsarXiv 20232023.05
An Edit Friendly DDPM Noise Space: Inversion and ManipulationsCVPR 20242023.04
Training-Free Content Injection Using H-Space in Diffusion ModelsWACV 20242023.03
Edict: Exact diffusion inversion via coupled transformationsCVPR 20232022.11
Direct inversion: Optimization-free text-driven real image editing with diffusion modelsarXiv 20222022.11

Training and Finetuning Free: Attention Modification

TitlePublicationDate
KV-Edit: Training-Free Image Editing for Precise Background PreservationarXiv 20252025.02
LIME: Localized Image Editing via Attention Regularization in Diffusion ModelsWACV 20252024.12
Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image EditingCVPR 20242024.03
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion ModelsarXiv 20232023.12
Tf-icon: Diffusion-based training-free cross-domain image compositionICCV 20232023.07
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion ModelsNeurIPS 20232023.06
Conditional Score Guidance for Text-Driven Image-to-Image TranslationNeurIPS 20232023.05
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and EditingICCV 20232023.04
Localizing Object-level Shape Variations with Text-to-Image Diffusion ModelsICCV 20232023.03
Zero-shot image-to-image translationACM SIGGRAPH 20232023.02
Shape-Guided Diffusion With Inside-Outside AttentionWACV 20242022.12
Plug-and-play diffusion features for text-driven image-to-image translationCVPR 20232022.11
Prompt-to-prompt image editing with cross attention controlICLR 20232022.08

Training and Finetuning Free: Mask Guidance

TitlePublicationDate
Grounded-Instruct-Pix2Pix: Improving Instruction Based Image Editing with Automatic Target GroundingICASSP 20242024.03
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted GuidanceACM MM 20242023.12
ZONE: Zero-Shot Instruction-Guided Local EditingCVPR 20242023.12
Watch your steps: Local image and scene editing by text instructionsarXiv 20232023.08
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion ModelsNeurIPS 20232023.06
Differential Diffusion: Giving Each Pixel Its StrengtharXiv 20232023.06
PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image EditingarXiv 20232023.06
FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion InferenceAAAI 20242023.05
Inpaint anything: Segment anything meets image inpaintingarXiv 20232023.04
Region-aware diffusion for zero-shot text-driven image editingCVM 20232023.02
Text-guided mask-free local image retouchingICME 20232022.12
Blended diffusion for text-driven editing of natural imagesCVPR 20222021.11
DiffEdit: Diffusion-based semantic image editing with mask guidanceICLR 20232022.10
Blended latent diffusionSIGGRAPH 20232022.06

Training and Finetuning Free: Multi-Noise Redirection

TitlePublicationDate
Object-aware Inversion and Reassembly for Image EditingICLR 20242023.10
Ledits: Real image editing with ddpm inversion and semantic guidancearXiv 20232023.07
Sega: Instructing diffusion using semantic dimensionsNeurIPS 20232023.01
The stable artist: Steering semantics in diffusion latent spacearXiv 20222022.12

Benchmark EditEval_v1

EditEval_v1 is a benchmark tailored for evaluation of general diffusion-model based image editing algorithms. It contains 50 high-quality images selected from Unsplash, each accompanied by a source text prompt, a target editing prompt, and a text editing instruction generated by GPT-4V. This benchmark covers seven most popular specific editing tasks across semantic, stylistic and structural editing defined in our paper: object addition, object replacement, object removal, background change, overall style change, texture change, and action change. Click here to download this dataset!

Benchmark EditEval_v2

EditEval_v2 is an enhanced benchmark designed to evaluate general diffusion-model-based image editing algorithms. This version expands upon its predecessor by including 150 high-quality images selected from Unsplash. Each image is paired with a source text prompt, a target editing prompt, and a text editing instruction generated by GPT-4V. EditEval_v2 continues to cover the seven most popular specific editing tasks across semantic, stylistic, and structural editing as defined in our paper: object addition, object replacement, object removal, background change, overall style change, texture change, and action change. Click here to download this dataset!

Leaderboard

To facilitate a user-friendly application of LMM Score, here we provide a comprehensive template for its implementation in GPT-4V. This template comes with step-by-step instructions and all required materials, making it easy for users to apply. Additionally, we construct a leaderboard comparing various representative methods evaluated using LMM Score on our EditEval_v1 benchmark, which can be found here.

Star History

Star History Chart