MINDiff

July 30, 2025 · View on GitHub

MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization

📄 Accepted at ICCV 2025 Workshop on P13N

MINDiff is an inference-time method that mitigates overfitting in text-to-image personalization models such as DreamBooth and DreamBooth+LoRA. It uses mask-integrated negative attention to suppress subject's influence in irrelevant regions. It also allows user control via a scale parameter (λ) to balance subject fidelity and prompt alignment during inference.

Results

The following results show the effect of applying MINDiff to a DreamBooth model fine-tuned on Stable Diffusion 1.4.

Varying the value of λ controls the balance between subject fidelity and prompt alignment. Higher λ values lead to stronger suppression of subject influence, resulting in generations that more closely follow the input text prompt.

Usage

Install PyTorch This project was tested with the following PyTorch environment:
- torch==2.3.0
- CUDA 11.8
We recommend installing PyTorch using the official instructions:

👉 Torch

Clone the repository

git clone https://github.com/seuleepy/MINDiff.git
cd MINDiff
pip install -r requirements.txt

Prepare a fine-tuned DreamBooth model Use any existing DreamBooth model. MINDiff has been tested on models fine-tuned with Stable Diffusion 1.4, 2.1, and SDXL + LoRA.
Generate an image with MINDiff Use the following command:
```
bash inference.sh
```
Before running the script, you need to provide the following arguments:
- CUSTOM_MODEL_DIR: Path to your fine-tuned DreamBooth model.
- modifier_token: The token used during DreamBooth training (e.g., "sks")
- mask_token: A token from your prompt used to guide mask generation via attention maps. It must be included in the prompt.
- attn_scale: A float value that controls the strength of suppression. Higher values increases text alignment by reducing subject influence.