MINDiff

July 30, 2025View on GitHub

MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization

馃搫 Accepted at ICCV 2025 Workshop on P13N

MINDiff is an inference-time method that mitigates overfitting in text-to-image personalization models such as DreamBooth and DreamBooth+LoRA. It uses mask-integrated negative attention to suppress subject's influence in irrelevant regions. It also allows user control via a scale parameter (位) to balance subject fidelity and prompt alignment during inference.

Results

The following results show the effect of applying MINDiff to a DreamBooth model fine-tuned on Stable Diffusion 1.4.

Varying the value of 位 controls the balance between subject fidelity and prompt alignment. Higher 位 values lead to stronger suppression of subject influence, resulting in generations that more closely follow the input text prompt.

Usage

  1. Install PyTorch This project was tested with the following PyTorch environment:

    • torch==2.3.0
    • CUDA 11.8

    We recommend installing PyTorch using the official instructions:

    馃憠 Torch

  2. Clone the repository

    git clone https://github.com/seuleepy/MINDiff.git
    cd MINDiff
    pip install -r requirements.txt
    
  3. Prepare a fine-tuned DreamBooth model Use any existing DreamBooth model. MINDiff has been tested on models fine-tuned with Stable Diffusion 1.4, 2.1, and SDXL + LoRA.

  4. Generate an image with MINDiff Use the following command:

    bash inference.sh
    

    Before running the script, you need to provide the following arguments:

    • CUSTOM_MODEL_DIR: Path to your fine-tuned DreamBooth model.
    • modifier_token: The token used during DreamBooth training (e.g., "sks")
    • mask_token: A token from your prompt used to guide mask generation via attention maps. It must be included in the prompt.
    • attn_scale: A float value that controls the strength of suppression. Higher values increases text alignment by reducing subject influence.