Papers

July 15, 2025 · View on GitHub

The repository is based on our survey Diffusion Model-Based Image Editing: A Survey (TPAMI 2025).

Yi Huang*, Jiancheng Huang*, Yifan Liu*, Mingfu Yan*, Jiaxi Lv*, Jianzhuang Liu*, Wei Xiong, He Zhang, Liangliang Cao, Shifeng Chen

Shenzhen Institute of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS), Adobe Inc, Apple Inc, Southern University of Science and Technology (SUSTech)

Abstract

Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse the process of gradually adding noise to images, allowing them to generate high-quality samples from a complex distribution. In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the ﬁeld. We delve into a thorough analysis and categorization of these works from multiple perspectives, including learning strategies, user-input conditions, and the array of specific editing tasks that can be accomplished. In addition, we pay special attention to image inpainting and outpainting, and explore both earlier traditional context-driven and current multimodal conditional methods, offering a comprehensive analysis of their methodologies. To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval, featuring an innovative metric, LMM Score. Finally, we address current limitations and envision some potential directions for future research.

🔖 News!!!

📌 We are actively tracking the latest research and welcome contributions to our repository and survey paper. If your studies are relevant, please feel free to contact us.

📰 2025-02-11: 🥳 Congrats, our paper is accepted by TPAMI 2025!!

📰 2024-10-25: Our benchmark EditEval_v2 is now released.

📰 2024-03-22: The template of computing LMM Score using GPT-4V, along with a corresponding leaderboard comparing several leading methods, is released.

📰 2024-03-14: Our benchmark EditEval_v1 is now released.

📰 2024-03-06: We establish a template for paper submissions. This template is accessible by navigating to the New Issue button within Issues or by clicking here. Once there, please select the Paper Submission Form and complete it following the guidelines provided.

📰 2024-02-28: Our comprehensive survey paper, summarizing related methods published before February 1, 2024, is now available.

🔍 BibTeX

If you find this work helpful in your research, welcome to cite the paper and give a ⭐.

@article{huang2025diffusion,
  title={Diffusion Model-Based Image Editing: A Survey},
  author={Huang, Yi and Huang, Jiancheng and Liu, Yifan and Yan, Mingfu and Lv, Jiaxi and Liu, Jianzhuang and Xiong, Wei and Zhang, He and Cao, Liangliang and Chen, Shifeng},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2025},
  publisher={IEEE}
}

Papers
Benchmark EditEval_v1
Template of Computing LMM Score
Leaderboard
Star History

Papers

Training-Based

Training-Based: Domain-Specific Editing

Title	Publication	Date
TexFit: Text-Driven Fashion Image Editing with Diffusion Models	AAAI 2024	2024.03
CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation	NeurIPS 2023	2023.10
Stylediffusion: Controllable disentangled style transfer via diffusion models	ICCV 2023	2023.08
Hierarchical diffusion autoencoders and disentangled image manipulation	WACV 2024	2023.04
Towards Real-time Text-driven Image Manipulation with Unconditional Diffusion Models	arXiv 2023	2023.04
Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models	CVPR workshop 2023	2022.12
Diffstyler: Controllable dual diffusion for text-driven image stylization	TNNLS 2024	2022.11
Diffusion Models Already Have A Semantic Latent Space	ICLR 2022	2022.10
Egsde: Unpaired image-to-image translation via energy-guided stochastic differential equations	NeurIPS 2022	2022.07
Diffusion autoencoders: Toward a meaningful and decodable representation	CVPR 2022	2021.11
Unit-ddpm: Unpaired image translation with denoising diffusion probabilistic models	arXiv 2021	2021.04
Diffusionclip: Text-guided diffusion models for robust image manipulation	CVPR 2022	2021.01

Training-Based: Reference and Attribute Guided Editing

Title	Publication	Date
MagicEraser: Erasing Any Objects via Semantics-Aware Control	ECCV 2024	2024.10
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control	CVPR 2024	2023.12
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting	arXiv 2023	2023.12
DreamInpainter: Text-Guided Subject-Driven Image Inpainting with Diffusion Models	arXiv 2023	2023.12
Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model	ACM MM 2023	2023.10
Face Aging via Diffusion-based Editing	BMVC 2023	2023.09
Anydoor: Zero-shot object-level image customization	CVPR 2024	2023.07
Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model	ICASSP 2024	2023.06
Text-to-image editing by image information removal	WACV 2024	2023.05
Reference-based Image Composition with Sketch via Structure-aware Diffusion Model	CVPR workshop 2023	2023.04
PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor	CVPR 2024	2023.03
Imagen editor and editbench: Advancing and evaluating text-guided image inpainting	CVPR 2023	2022.12
Smartbrush: Text and shape guided object inpainting with diffusion model	CVPR 2023	2022.12
ObjectStitch: Object Compositing With Diffusion Model	CVPR 2023	2022.12
Paint by example: Exemplar-based image editing with diffusion models	CVPR 2023	2022.11

Training-Based: Instructional Editing

Title	Publication	Date
UIP2P: Unsupervised Instruction-based Image Editing via Edit Reversibility Constraint	ICCV 2025	2024.12
FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction	arXiv 2024	2024.09
EditWorld: Simulating World Dynamics for Instruction-Following Image Editing	arXiv 2024	2024.05
InstructGIE: Towards Generalizable Image Editing	arXiv 2024	2024.03
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models	CVPR 2024	2023.12
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following	arXiv 2023	2023.12
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation	CVPR 2024	2023.12
Emu edit: Precise image editing via recognition and generation tasks	arXiv 2023	2023.11
Guiding instruction-based image editing via multimodal large language models	ICLR 2024	2023.09
Instructdiffusion: A generalist modeling interface for vision tasks	CVPR 2024	2023.09
MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers	arXiv 2023	2023.09
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation	NeurIPS 2023	2023.08
Inst-Inpaint: Instructing to Remove Objects with Diffusion Models	arXiv 2023	2023.04
HIVE: Harnessing Human Feedback for Instructional Visual Editing	CVPR 2024	2023.03
DialogPaint: A Dialog-based Image Editing Model	arXiv 2023	2023.01
Learning to Follow Object-Centric Image Editing Instructions Faithfully	EMNLP 2023	2023.01
Instructpix2pix: Learning to follow image editing instructions	CVPR 2023	2022.11

Training-Based: Pseudo-Target Retrieval-Based Editing

Title	Publication	Date
Text-Driven Image Editing via Learnable Regions	CVPR 2024	2023.11
iEdit: Localised Text-guided Image Editing with Weak Supervision	arXiv 2023	2023.05
ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation	arXiv 2023	2023.05

Testing-Time Finetuning

Testing-Time Finetuning: Denosing Model Finetuning

Title	Publication	Date
Kv inversion: Kv embeddings learning for text-conditioned real image action editing	arXiv 2023	2023.09
Custom-edit: Text-guided image editing with customized diffusion models	CVPR workshop 2023	2023.05
Unitune: Text-driven image editing by fine tuning an image generation model on a single image	ACM TOG 2023	2022.10

Testing-Time Finetuning: Embeddings Finetuning

Title	Publication	Date
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing	NeurIPS 2023	2023.09
Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models	ICCV 2023	2023.05
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models	CVPR 2023	2022.12
Null-text inversion for editing real images using guided diffusion models	CVPR 2023	2022.11

Testing-Time Finetuning: Guidance with Hypernetworks

Title	Publication	Date
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing	arXiv 2023	2023.05
Inversion-based creativity transfer with diffusion models	CVPR 2023	2022.11

Testing-Time Finetuning: Latent Variable Optimization

Title	Publication	Date
StableDrag: Stable Dragging for Point-based Image Editing	arXiv 2024	2024.03
FreeDrag: Feature Dragging for Reliable Point-based Image Editing	CVPR 2024	2023.12
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing	CVPR 2024	2023.11
MagicRemover: Tuning-free Text-guided Image inpainting with Diffusion Models	arXiv 2023	2023.10
Dragondiffusion: Enabling drag-style manipulation on diffusion models	ICLR 2024	2023.07
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing	CVPR 2024	2023.06
Delta denoising score	ICCV 2023	2023.04
Directed Diffusion: Direct Control of Object Placement through Attention Guidance	AAAI 2024	2023.02
Diffusion-based Image Translation using disentangled style and content representation	ICLR 2022	2022.09

Testing-Time Finetuning: Hybrid Finetuning

Title	Publication	Date
Forgedit: Text Guided Image Editing via Learning and Forgetting	arXiv 2023	2023.09
LayerDiffusion: Layered Controlled Image Editing with Diffusion Models	arXiv 2023	2023.05
Sine: Single image editing with text-to-image diffusion models	CVPR 2023	2022.12
Imagic: Text-Based Real Image Editing With Diffusion Models	CVPR 2023	2022.10

Training and Finetuning Free

Training and Finetuning Free: Input Text Refinement

Title	Publication	Date
User-friendly Image Editing with Minimal Text Input: Leveraging Captioning and Injection Techniques	arXiv 2023	2023.06
ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation	arXiv 2023	2023.05
InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions	arXiv 2023	2023.05
Preditor: Text guided image editing with diffusion prior	arXiv 2023	2023.02

Training and Finetuning Free: Inversion/Sampling Modification

Title	Publication	Date
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing	arXiv 2024	2024.12
Inversion-Free Image Editing with Natural Language	CVPR 2024	2023.12
Fixed-point Inversion for Text-to-image diffusion models	arXiv 2023	2023.12
Tuning-Free Inversion-Enhanced Control for Consistent Image Editing	arXiv 2023	2023.12
The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing	ICLR 2024	2023.11
LEDITS++: Limitless Image Editing using Text-to-Image Models	CVPR 2024	2023.11
A latent space of stochastic diffusion models for zero-shot image editing and guidance	ICCV 2023	2023.10
Effective real image editing with accelerated iterative diffusion inversion	ICCV 2023	2023.09
Fec: Three finetuning-free methods to enhance consistency for real image editing	arXiv 2023	2023.09
Iterative multi-granular image editing using diffusion models	WACV 2024	2023.09
ProxEdit: Improving Tuning-Free Real Image Editing With Proximal Guidance	WACV 2024	2023.06
Diffusion self-guidance for controllable image generation	NeurIPS 2023	2023.06
Diffusion Brush: A Latent Diffusion Model-based Editing Tool for AI-generated Images	arXiv 2023	2023.06
Null-text guidance in diffusion models is secretly a cartoon-style creator	ACM MM 2023	2023.05
Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models	arXiv 2023	2023.05
An Edit Friendly DDPM Noise Space: Inversion and Manipulations	CVPR 2024	2023.04
Training-Free Content Injection Using H-Space in Diffusion Models	WACV 2024	2023.03
Edict: Exact diffusion inversion via coupled transformations	CVPR 2023	2022.11
Direct inversion: Optimization-free text-driven real image editing with diffusion models	arXiv 2022	2022.11

Training and Finetuning Free: Attention Modification

Title	Publication	Date
KV-Edit: Training-Free Image Editing for Precise Background Preservation	arXiv 2025	2025.02
LIME: Localized Image Editing via Attention Regularization in Diffusion Models	WACV 2025	2024.12
Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing	CVPR 2024	2024.03
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models	arXiv 2023	2023.12
Tf-icon: Diffusion-based training-free cross-domain image composition	ICCV 2023	2023.07
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models	NeurIPS 2023	2023.06
Conditional Score Guidance for Text-Driven Image-to-Image Translation	NeurIPS 2023	2023.05
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing	ICCV 2023	2023.04
Localizing Object-level Shape Variations with Text-to-Image Diffusion Models	ICCV 2023	2023.03
Zero-shot image-to-image translation	ACM SIGGRAPH 2023	2023.02
Shape-Guided Diffusion With Inside-Outside Attention	WACV 2024	2022.12
Plug-and-play diffusion features for text-driven image-to-image translation	CVPR 2023	2022.11
Prompt-to-prompt image editing with cross attention control	ICLR 2023	2022.08

Training and Finetuning Free: Mask Guidance

Title	Publication	Date
Grounded-Instruct-Pix2Pix: Improving Instruction Based Image Editing with Automatic Target Grounding	ICASSP 2024	2024.03
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance	ACM MM 2024	2023.12
ZONE: Zero-Shot Instruction-Guided Local Editing	CVPR 2024	2023.12
Watch your steps: Local image and scene editing by text instructions	arXiv 2023	2023.08
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models	NeurIPS 2023	2023.06
Differential Diffusion: Giving Each Pixel Its Strength	arXiv 2023	2023.06
PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing	arXiv 2023	2023.06
FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference	AAAI 2024	2023.05
Inpaint anything: Segment anything meets image inpainting	arXiv 2023	2023.04
Region-aware diffusion for zero-shot text-driven image editing	CVM 2023	2023.02
Text-guided mask-free local image retouching	ICME 2023	2022.12
Blended diffusion for text-driven editing of natural images	CVPR 2022	2021.11
DiffEdit: Diffusion-based semantic image editing with mask guidance	ICLR 2023	2022.10
Blended latent diffusion	SIGGRAPH 2023	2022.06

Training and Finetuning Free: Multi-Noise Redirection

Title	Publication	Date
Object-aware Inversion and Reassembly for Image Editing	ICLR 2024	2023.10
Ledits: Real image editing with ddpm inversion and semantic guidance	arXiv 2023	2023.07
Sega: Instructing diffusion using semantic dimensions	NeurIPS 2023	2023.01
The stable artist: Steering semantics in diffusion latent space	arXiv 2022	2022.12

Benchmark EditEval_v1

EditEval_v1 is a benchmark tailored for evaluation of general diffusion-model based image editing algorithms. It contains 50 high-quality images selected from Unsplash, each accompanied by a source text prompt, a target editing prompt, and a text editing instruction generated by GPT-4V. This benchmark covers seven most popular specific editing tasks across semantic, stylistic and structural editing defined in our paper: object addition, object replacement, object removal, background change, overall style change, texture change, and action change. Click here to download this dataset!

Benchmark EditEval_v2

EditEval_v2 is an enhanced benchmark designed to evaluate general diffusion-model-based image editing algorithms. This version expands upon its predecessor by including 150 high-quality images selected from Unsplash. Each image is paired with a source text prompt, a target editing prompt, and a text editing instruction generated by GPT-4V. EditEval_v2 continues to cover the seven most popular specific editing tasks across semantic, stylistic, and structural editing as defined in our paper: object addition, object replacement, object removal, background change, overall style change, texture change, and action change. Click here to download this dataset!

Leaderboard

To facilitate a user-friendly application of LMM Score, here we provide a comprehensive template for its implementation in GPT-4V. This template comes with step-by-step instructions and all required materials, making it easy for users to apply. Additionally, we construct a leaderboard comparing various representative methods evaluated using LMM Score on our EditEval_v1 benchmark, which can be found here.