MINDiff
July 30, 2025 路 View on GitHub
MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization
馃搫 Accepted at ICCV 2025 Workshop on P13N
MINDiff is an inference-time method that mitigates overfitting in text-to-image personalization models such as DreamBooth and DreamBooth+LoRA. It uses mask-integrated negative attention to suppress subject's influence in irrelevant regions. It also allows user control via a scale parameter (位) to balance subject fidelity and prompt alignment during inference.
Results
The following results show the effect of applying MINDiff to a DreamBooth model fine-tuned on Stable Diffusion 1.4.
Varying the value of 位 controls the balance between subject fidelity and prompt alignment. Higher 位 values lead to stronger suppression of subject influence, resulting in generations that more closely follow the input text prompt.
Usage
-
Install PyTorch This project was tested with the following PyTorch environment:
torch==2.3.0CUDA 11.8
We recommend installing PyTorch using the official instructions:
馃憠 Torch
-
Clone the repository
git clone https://github.com/seuleepy/MINDiff.git cd MINDiff pip install -r requirements.txt -
Prepare a fine-tuned DreamBooth model Use any existing DreamBooth model. MINDiff has been tested on models fine-tuned with Stable Diffusion 1.4, 2.1, and SDXL + LoRA.
-
Generate an image with MINDiff Use the following command:
bash inference.shBefore running the script, you need to provide the following arguments:
CUSTOM_MODEL_DIR: Path to your fine-tuned DreamBooth model.modifier_token: The token used during DreamBooth training (e.g., "sks")mask_token: A token from your prompt used to guide mask generation via attention maps. It must be included in the prompt.attn_scale: A float value that controls the strength of suppression. Higher values increases text alignment by reducing subject influence.