BrushEdit

September 3, 2025 · View on GitHub

📢 Please check out our latest DiT-based image customization project IC-Custom, which provides powerful ID-Consistent editing capabilities!

😃 This repository contains the implementation of "BrushEdit: All-In-One Image Inpainting and Editing".

Keywords: Image Inpainting, Image Generation, Image Editing, Diffusion Models, MLLM Agent, Instruction-basd Editing

TL;DR: BrushEdit is an advanced, unified AI agent for image inpainting and editing.
Main Elements: 🛠️ Fully automated / 🤠 Interactive editing.

Yaowei Li^1*, Yuxuan Bian^3*, Xuan Ju^3*, Zhaoyang Zhang^2‡, Junhao Zhuang⁴, Ying Shan^2✉, Yuexian Zou^1✉
, Qiang Xu^3✉
¹Peking University ²ARC Lab, Tencent PCG ³The Chinese University of Hong Kong ⁴Tsinghua University
^*Equal Contribution ^‡Project Lead ^✉Corresponding Author

https://github.com/user-attachments/assets/fde82f21-8b36-4584-8460-c109c195e614

4K HD Introduction Video: Youtube.

📖 Table of Contents

BrushEdit

TODO

Release the code of BrushEdit. (MLLM-dirven Agent for Image Editing and Inpainting)
Release the paper and webpage. More info: BrushEdit
Release the BrushNetX checkpoint(a more powerful BrushNet).
Release gradio demo.

BrushEdit consists of four main steps: (i) Editing category classification: determine the type of editing required. (ii) Identification of the primary editing object: Identify the main object to be edited. (iii) Acquisition of the editing mask and target Caption: Generate the editing mask and corresponding target caption. (iv) Image inpainting: Perform the actual image editing. Steps (i) to (iii) utilize pre-trained MLLMs and detection models to ascertain the editing type, target object, editing masks, and target caption. Step (iv) involves image editing using the dual-branch inpainting model improved BrushNet. This model inpaints the target areas based on the target caption and editing masks, leveraging the generative potential and background preservation capabilities of inpainting models.

teaser

🚀 Getting Started

Environment Requirement 🌍

BrushEdit has been implemented and tested on CUDA118, Pytorch 2.0.1, python 3.10.6.

Clone the repo:

git clone https://github.com/TencentARC/BrushEdit.git

We recommend you first use conda to create virtual environment, and install pytorch following official instructions. For example:

conda create -n brushedit python=3.10.6 -y
conda activate brushedit
python -m pip install --upgrade pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

Then, you can install diffusers (implemented in this repo) with:

pip install -e .

After that, you can install required packages thourgh:

pip install -r app/requirements.txt

Download Checkpoints 💾

Checkpoints of BrushEdit can be downloaded using the following command.

sh app/down_load_brushedit.sh

The ckpt folder contains

BrushNetX pretrained checkpoints for Stable Diffusion v1.5 (brushnetX)
Pretrained Stable Diffusion v1.5 checkpoint (e.g., realisticVisionV60B1_v51VAE from Civitai). You can use scripts/convert_original_stable_diffusion_to_diffusers.py to process other models downloaded from Civitai.
Pretrained GroundingDINO checkpoint from offical.
Pretrained SAM checkpoint from offical.

The checkpoint structure should be like:

|-- models
    |-- base_model
        |-- realisticVisionV60B1_v51VAE
            |-- model_index.json
            |-- vae
            |-- ...
        |-- dreamshaper_8
            |-- ...
        |-- epicrealism_naturalSinRC1VAE
            |-- ...
        |-- meinamix_meinaV11
            |-- ...
        |-- ...
    |-- brushnetX
        |-- config.json
        |-- diffusion_pytorch_model.safetensors
    |-- grounding_dino
        |-- groundingdino_swint_ogc.pth
    |-- sam
        |-- sam_vit_h_4b8939.pth
    |-- vlm
        |-- llava-v1.6-mistral-7b-hf
          |-- ...
        |-- llava-v1.6-vicuna-13b-hf
          |-- ...
        |-- Qwen2-VL-7B-Instruct
          |-- ...
        |-- ...

We provide five base diffusion models, including:

Dreamshapre_8 is a versatile model that can generate impressive portraits and landscape images.
Epicrealism_naturalSinRC1VAE is a realistic style model that excels at generating portraits
HenmixReal_v5c is a model that specializes in generating realistic images of women.
Meinamix_meinaV11 is a model that excels at generating images in an animated style.
RealisticVisionV60B1_v51VAE is a highly generalized realistic style model.

The BrushNetX checkpoint represents an enhanced version of BrushNet, having been trained on a more diverse dataset to improve its editing capabilities, such as deletion and replacement.

We provide two VLM models, including Qwen2-VL-7B-Instruct and LLama3-LLaa-next-8b-hf. We strongly recommend using GPT-4o for reasoning. After selecting the VLM model as gpt4-o, enter the API KEY and click the Submit and Verify button. If the output is success, you can use gpt4-o normally. Secondarily, we recommend using the Qwen2VL model.

And you can download more prefromhuggingface_hubimporthf_hub_download, snapshot_downloadtrained VLMs model from QwenVL and LLaVA-Next.

🏃🏼 Running Scripts

🤗 BrushEidt demo

You can run the demo using the script:

sh app/run_app.sh

👻 Demo Features

💡 Fundamental Features:

🎨 Aspect Ratio: Select the aspect ratio of the image. To prevent OOM, 1024px is the maximum resolution.
🎨 VLM Model: Select the VLM model. We use preloaded models to save time. To use other VLM models, download them and uncomment the relevant lines in vlm_template.py from our GitHub repo.
🎨 Generate Mask: According to the input instructions, generate a mask for the area that may need to be edited.
🎨 Square/Circle Mask: Based on the existing mask, generate masks for squares and circles. (The coarse-grained mask provides more editing imagination.)
🎨 Invert Mask: Invert the mask to generate a new mask.
🎨 Dilation/Erosion Mask: Expand or shrink the mask to include or exclude more areas.
🎨 Move Mask: Move the mask to a new position.
🎨 Generate Target Prompt: Generate a target prompt based on the input instructions.
🎨 Target Prompt: Description for masking area, manual input or modification can be made when the content generated by VLM does not meet expectations.
🎨 Blending: Blending brushnet's output and the original input, ensuring the original image details in the unedited areas. (turn off is beeter when removing.)
🎨 Control length: The intensity of editing and inpainting.

💡 Advanced Features:

🎨 Base Model: We use preloaded models to save time. To use other VLM models, download them and uncomment the relevant lines in vlm_template.py from our GitHub repo.
🎨 Blending: Blending brushnet's output and the original input, ensuring the original image details in the unedited areas. (turn off is beeter when removing.)
🎨 Control length: The intensity of editing and inpainting.
🎨 Num samples: The number of samples to generate.
🎨 Negative prompt: The negative prompt for the classifier-free guidance.
🎨 Guidance scale: The guidance scale for the classifier-free guidance.

🤝🏼 Cite Us

@misc{li2024brushedit,
  title={BrushEdit: All-In-One Image Inpainting and Editing}, 
  author={Yaowei Li and Yuxuan Bian and Xuan Ju and Zhaoyang Zhang and and Junhao Zhuang and Ying Shan and Yuexian Zou and Qiang Xu},
  year={2024},
  eprint={2412.10316},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

BrushEdit

TODO

🛠️ Pipeline Overview