README.md

March 19, 2025 · View on GitHub

EEdit ⚡️:Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing

Zexuan Yan*  •  Yue Ma*  •  Chang Zou  •  Wenteng Chen  •  Qifeng Chen  •  Linfeng Zhang
* Equal Contribution     Corresponding Author

arXiv Project Page

📝 Introduction

EEdit Teaser

Inversion-based image editing is rapidly gaining momentum while suffering from significant computation overhead, hindering its application in real-time interactive scenarios. In this paper, we rethink that the redundancy in inversion-based image editing exists in both the spatial and temporal dimensions, such as the unnecessary computation in unedited regions and the redundancy in the inversion progress.

To tackle these challenges, we propose a practical framework, named EEdit, to achieve efficient image editing. Specifically, we introduce three techniques to solve them one by one:

  • For spatial redundancy, spatial locality caching is introduced to compute the edited region and its neighboring regions while skipping the unedited regions
  • Token indexing preprocessing is designed to further accelerate the caching
  • For temporal redundancy, inversion step skipping is proposed to reuse the latent for efficient editing

Our experiments demonstrate an average of 2.46X acceleration without performance drop in a wide range of editing tasks including prompt-guided image editing, dragging and image composition.

🛠️ Installation

conda create -n eedit python=3.12.3
conda activate eedit
pip install -r EEdit/requirements.txt

📥 Checkpoints & Datasets

All model weights and datasets are from open-source, free and publicly available channels:

We use FLUX-dev as our experimental model. You can obtain it from either:

We use PIE-BENCH as the prompt-guided dataset, you can refer to link

We use TF-ICON benchmark as the ref-guided dataset, you can refer to link

We use DragBench-DR and Drag-Bench-SR as the drag-guided datasets, you can refer to link

For masks generated from mapping_file.json, we provide scripts that follows:

python MyCodes/myutils.py

When all the data and checkpoints are ready, please follow our file directory structure to be compatible with the scripts.

📁 Checkpoints Structure
weights
├── flux1-dev.safetensors
├── model_index.json
├── scheduler
   └── scheduler_config.json
├── sd_vae_ft_mse
   ├── config.json
   └── diffusion_pytorch_model.safetensors
├── text_encoder
   ├── config.json
   └── model.safetensors
├── text_encoder_2
   ├── config.json
   ├── model-00001-of-00002.safetensors
   ├── model-00002-of-00002.safetensors
   ├── model.safetensors.index-1.json
   └── model.safetensors.index.json
├── tokenizer
   ├── merges-1.txt
   ├── merges.txt
   ├── special_tokens_map-1.json
   ├── special_tokens_map.json
   ├── tokenizer_config-1.json
   ├── tokenizer_config.json
   ├── vocab-1.json
   └── vocab.json
├── tokenizer_2
   ├── special_tokens_map-1.json
   ├── special_tokens_map.json
   ├── spiece.model
   ├── tokenizer.json
   └── tokenizer_config.json
├── transformer_config.json
└── vae
    ├── config.json
    └── diffusion_pytorch_model.safetensors
📁 Datasets Structure
input
├── composition
   ├── Real-Cartoon
   ├── 0000 a cartoon animation of a sheep in the forest
   ├── bg58.png
   ├── cp_bg_fg.jpg
   ├── dccf_image.jpg
   ├── fg35_63d22cda1f5b66e8e5aca776.jpg
   ├── fg35_mask.png
   └── mask_bg_fg.jpg
   ├── Real-Painting
   ├── Real-Sketch
   ├── Real-Real
    ...

├── drag_data
   ├── dragbench-dr
   ├── animals
   ├── JH_2023-09-14-1820-16
   ├── meta_data.pkl
   ├── meta_data_region.pkl
   ├── original_image.png
   └── user_drag.png
   └── dragbench-sr
       ├── art_0
   ├── meta_data.pkl
   ├── meta_data_region.pkl
   ├── original_image.png
   └── user_drag.png
    ...

├── inpaint
   ├── annotation_images 
   ├── 0_random_140
   ├── 000000000000.jpg
   
   ├── 1_change_object_80
   ├── 1_artificial
   ├── 1_animal
   ├── 111000000000.jpg
   ├── 2_human
   ├── 112000000000.jpg
   ├── 2_add_object_80
   ├── 1_artificial
   ├── 1_animal
   ├── 211000000000.jpg
        ...
   ├── mapping_file.json
   └── masks
       ├── mask-000.png
|       ├── mask-001.png
|       ├── ...

🚀 Generation

cd EEdit && source run_gen.sh

✨ Hyper-parameters Guidance

As we all know, edited results are affected by many parameters and even random seed. We try our best to explain those parameters and options that may affect image quality, so that you can generate satisfactory image editing results by yourself.

Reference-guided editing (Image Composition)
ParameterValueDescription
eta0.6Controls the strength of inversion injection during denoising. Higher values preserve more of the original image.
gamma0.6Controls the strength of inversion injection during denoising. Higher values preserve more of the original image.
blend_ratio0Deprecated parameter, not in use.
start_timestep0Fixed parameter, no adjustment needed.
stop_timestep10Number of timesteps where inversion affects denoising. Higher values result in output closer to original image.
use_rf_inversiontrueFixed parameter, keep as true.
use_cachetrue
false
Enable caching to accelerate inference. False will perform as non-accelerated pipeline.
num_inference_steps28Number of inference steps in the diffusion process.
cascade_num1/3/5Controls region score bonus for K-L1 distance neighboring regions. No adjustment needed.
fresh_ratio0.1Cache refresh ratio. Fixed parameter, higher is not always better.
fresh_threshold1/2/3Complete refresh interval. Set to 2 if edit results don't follow instructions well. Setting to 1 disables acceleration.
soft_fresh_weight0.25Fixed parameter, no adjustment needed.
tailing_step1Fixed parameter, higher values reduce speed.
inv_skip2/3/4Inversion step skipping interval. Default value of 2 is ok.
cache_type"ours_predefine"
"ours_cache"
Use token index preprocessing for faster speed. "ours_cache" disables preprocessing.
Prompt-guided editing
ParameterValueDescription
use_cachetrueEnable caching mechanism to accelerate inference
num_inference_steps28Total number of denoising steps in diffusion process
cascade_num1/3/5Number of cascade levels for region scoring
fresh_ratio0.1Ratio of cache entries to refresh each step, higher value caused lower speed.
fresh_threshold2/3Interval for complete cache refresh
soft_fresh_weight0.25Weight factor for soft cache refreshing
tailing_step1Step interval for tailing cache updates
strength1.0Fixed. Overall strength of the editing effect
inv_skip2/3Interval for skipping inversion steps
eta/gamma0.7Controls strength of inversion injection during denoising
stop_timestep6Timestep to stop inversion influence
mask_timestep18Timestep to end applying editing mask
cache_type"ours_predefine"
"ours_cache"
Cache strategy type - "ours_predefine" uses TIP while "ours_cache" does not.
Drag-guided editing (some are ommited)
ParameterValueDescription
drag_class"copy"/"cut"Controls region handling - "copy" preserves source region latents, "cut" swaps source and target region latents
t_prime_ratio0.5Controls dragging strength in (0,1) range. Higher values reduce dragging strength
alpha1Noise blending ratio in (0,1) range applied to inversion latents
inv_skip2/3Interval for skipping inversion steps

📝 TODO List

  • Release evaluation code
  • Release notebook and Hugging Face demo
  • Develop more user-friendly interaction logic and experience, including Gradio interface

🙏 Acknowledgements

  • Thanks to ToCa for cache implementations
  • Thanks to Diffusers for pipeline implementations
  • Thanks to Region Drag for dragging implementations

📝 BibTeX

@misc{yan2025eeditrethinkingspatial,
      title={EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing}, 
      author={Zexuan Yan and Yue Ma and Chang Zou and Wenteng Chen and Qifeng Chen and Linfeng Zhang},
      year={2025},
      eprint={2503.10270},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.10270}, 
}

📧 Contact

yzx_ustc@mail.ustc.edu.cn