BrushEdit
September 3, 2025 Β· View on GitHub
π’ Please check out our latest DiT-based image customization project IC-Custom, which provides powerful ID-Consistent editing capabilities!
π This repository contains the implementation of "BrushEdit: All-In-One Image Inpainting and Editing".
Keywords: Image Inpainting, Image Generation, Image Editing, Diffusion Models, MLLM Agent, Instruction-basd Editing
TL;DR: BrushEdit is an advanced, unified AI agent for image inpainting and editing.
Main Elements: π οΈ Fully automated / π€ Interactive editing.
Yaowei Li1*, Yuxuan Bian3*, Xuan Ju3*, Zhaoyang Zhang2β‘, Junhao Zhuang4, Ying Shan2β, Yuexian Zou1β
, Qiang Xu3β
1Peking University 2ARC Lab, Tencent PCG 3The Chinese University of Hong Kong 4Tsinghua University
*Equal Contribution β‘Project Lead βCorresponding Author
πProject Page | πArxiv | πΉVideo | π€Hugging Face Demo | π€Hugging Model |
https://github.com/user-attachments/assets/fde82f21-8b36-4584-8460-c109c195e614
4K HD Introduction Video: Youtube.
π Table of Contents
TODO
- Release the code of BrushEdit. (MLLM-dirven Agent for Image Editing and Inpainting)
- Release the paper and webpage. More info: BrushEdit
- Release the BrushNetX checkpoint(a more powerful BrushNet).
- Release gradio demo.
π οΈ Pipeline Overview
BrushEdit consists of four main steps: (i) Editing category classification: determine the type of editing required. (ii) Identification of the primary editing object: Identify the main object to be edited. (iii) Acquisition of the editing mask and target Caption: Generate the editing mask and corresponding target caption. (iv) Image inpainting: Perform the actual image editing. Steps (i) to (iii) utilize pre-trained MLLMs and detection models to ascertain the editing type, target object, editing masks, and target caption. Step (iv) involves image editing using the dual-branch inpainting model improved BrushNet. This model inpaints the target areas based on the target caption and editing masks, leveraging the generative potential and background preservation capabilities of inpainting models.

π Getting Started
Environment Requirement π
BrushEdit has been implemented and tested on CUDA118, Pytorch 2.0.1, python 3.10.6.
Clone the repo:
git clone https://github.com/TencentARC/BrushEdit.git
We recommend you first use conda to create virtual environment, and install pytorch following official instructions. For example:
conda create -n brushedit python=3.10.6 -y
conda activate brushedit
python -m pip install --upgrade pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
Then, you can install diffusers (implemented in this repo) with:
pip install -e .
After that, you can install required packages thourgh:
pip install -r app/requirements.txt
Download Checkpoints πΎ
Checkpoints of BrushEdit can be downloaded using the following command.
sh app/down_load_brushedit.sh
The ckpt folder contains
- BrushNetX pretrained checkpoints for Stable Diffusion v1.5 (
brushnetX) - Pretrained Stable Diffusion v1.5 checkpoint (e.g., realisticVisionV60B1_v51VAE from Civitai). You can use
scripts/convert_original_stable_diffusion_to_diffusers.pyto process other models downloaded from Civitai. - Pretrained GroundingDINO checkpoint from offical.
- Pretrained SAM checkpoint from offical.
The checkpoint structure should be like:
|-- models
|-- base_model
|-- realisticVisionV60B1_v51VAE
|-- model_index.json
|-- vae
|-- ...
|-- dreamshaper_8
|-- ...
|-- epicrealism_naturalSinRC1VAE
|-- ...
|-- meinamix_meinaV11
|-- ...
|-- ...
|-- brushnetX
|-- config.json
|-- diffusion_pytorch_model.safetensors
|-- grounding_dino
|-- groundingdino_swint_ogc.pth
|-- sam
|-- sam_vit_h_4b8939.pth
|-- vlm
|-- llava-v1.6-mistral-7b-hf
|-- ...
|-- llava-v1.6-vicuna-13b-hf
|-- ...
|-- Qwen2-VL-7B-Instruct
|-- ...
|-- ...
We provide five base diffusion models, including:
- Dreamshapre_8 is a versatile model that can generate impressive portraits and landscape images.
- Epicrealism_naturalSinRC1VAE is a realistic style model that excels at generating portraits
- HenmixReal_v5c is a model that specializes in generating realistic images of women.
- Meinamix_meinaV11 is a model that excels at generating images in an animated style.
- RealisticVisionV60B1_v51VAE is a highly generalized realistic style model.
The BrushNetX checkpoint represents an enhanced version of BrushNet, having been trained on a more diverse dataset to improve its editing capabilities, such as deletion and replacement.
We provide two VLM models, including Qwen2-VL-7B-Instruct and LLama3-LLaa-next-8b-hf. We strongly recommend using GPT-4o for reasoning. After selecting the VLM model as gpt4-o, enter the API KEY and click the Submit and Verify button. If the output is success, you can use gpt4-o normally. Secondarily, we recommend using the Qwen2VL model.
And you can download more prefromhuggingface_hubimporthf_hub_download, snapshot_downloadtrained VLMs model from QwenVL and LLaVA-Next.
ππΌ Running Scripts
π€ BrushEidt demo
You can run the demo using the script:
sh app/run_app.sh
π» Demo Features
π‘ Fundamental Features:
- π¨ Aspect Ratio: Select the aspect ratio of the image. To prevent OOM, 1024px is the maximum resolution.
- π¨ VLM Model: Select the VLM model. We use preloaded models to save time. To use other VLM models, download them and uncomment the relevant lines in vlm_template.py from our GitHub repo.
- π¨ Generate Mask: According to the input instructions, generate a mask for the area that may need to be edited.
- π¨ Square/Circle Mask: Based on the existing mask, generate masks for squares and circles. (The coarse-grained mask provides more editing imagination.)
- π¨ Invert Mask: Invert the mask to generate a new mask.
- π¨ Dilation/Erosion Mask: Expand or shrink the mask to include or exclude more areas.
- π¨ Move Mask: Move the mask to a new position.
- π¨ Generate Target Prompt: Generate a target prompt based on the input instructions.
- π¨ Target Prompt: Description for masking area, manual input or modification can be made when the content generated by VLM does not meet expectations.
- π¨ Blending: Blending brushnet's output and the original input, ensuring the original image details in the unedited areas. (turn off is beeter when removing.)
- π¨ Control length: The intensity of editing and inpainting.
π‘ Advanced Features:
- π¨ Base Model: We use preloaded models to save time. To use other VLM models, download them and uncomment the relevant lines in vlm_template.py from our GitHub repo.
- π¨ Blending: Blending brushnet's output and the original input, ensuring the original image details in the unedited areas. (turn off is beeter when removing.)
- π¨ Control length: The intensity of editing and inpainting.
- π¨ Num samples: The number of samples to generate.
- π¨ Negative prompt: The negative prompt for the classifier-free guidance.
- π¨ Guidance scale: The guidance scale for the classifier-free guidance.
π€πΌ Cite Us
@misc{li2024brushedit,
title={BrushEdit: All-In-One Image Inpainting and Editing},
author={Yaowei Li and Yuxuan Bian and Xuan Ju and Zhaoyang Zhang and and Junhao Zhuang and Ying Shan and Yuexian Zou and Qiang Xu},
year={2024},
eprint={2412.10316},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
π Acknowledgement
Our code is modified based on diffusers and BrushNet here, thanks to all the contributors!
β Contact
For any question, feel free to email liyaowei01@gmail.com.