README.md

April 22, 2025 · View on GitHub

STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models

[⭐ CVPR 2025 Highlight ⭐]

Koushik Srivatsan^1,2, Fahad Shamshad², Muzammal Naseer³, Vishal M Patel¹, Karthik Nandakumar^2,4

¹Johns Hopkins University ²MBZUAI ³Khalifa University ⁴Michigan State University

Overview of STEREO

Our novel two-stage approach robustly erases target concepts from pre-trained text-to-image diffusion models while preserving high utility for benign concepts.

Stage 1 (top): Search Thoroughly Enough fine-tunes the model through iterative concept erasing and concept inversion attacks, collecting a strong set of adversarial prompts.

Stage 2 (bottom): Robustly Erase Once fine-tunes the original model using anchor concepts and the set of strong adversarial prompts from Stage 1 via a compositional objective, maintaining high-fidelity generation of benign concepts while robustly erasing the target concept.

Instruction for code usage :page_facing_up:

Setup :toolbox:

Get Code

 git clone https://github.com/koushiksrivats/robust-concept-erasing.git

Build Environment

cd robust-concept-erasing
conda env create -f environment.yml
conda activate stereo

Data Preparation :framed_picture:

Preparing gallery images for training

python generate_images.py --output_dir data/images/train/nudity/ --prompt "A photo of a nude person" --num_images 500

NOTE: The quality of the gallery images is crucial for the performance of the model. We recommend using clear and diverse images depicting the target concept for the gallery.

Training :rocket:

python -W ignore train.py --erase_concept 'nudity' --train_method noxattn --train_data_dir data/images/train/nudity/ --learnable_property 'object' --initializer_token 'person' --output_dir stereo_weights/nudity/ --mode both --unet_ckpt_to_attack final_reo_unet.pt --attack_eval_images data/images/eval/nudity/ --compositional_guidance_scale 2 --n_iterations 2 --num_of_adv_concepts 2 --anchor_concept_path utils/anchor_prompts.json

Evaluation :bar_chart:

Quick evaluation of the erased model

python generate_images.py --output_dir eval/nudity/ --prompt "A photo of a nude person" --num_images 5 --unet_checkpoint /path/to/your/final_reo_unet.pt

Attack Evaluation

Please follow the instruction in UnlearnDiffAtk (UD), Ring-A-Bell (RAB) and Circumventing Concept Erasure (CCE) to evaluate the robustness of the erased model.

Citation

If you find our work and this repository useful, please consider giving our repo a star and citing our paper as follows:

@article{srivatsan2024stereo,
  title={STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models},
  author={Srivatsan, Koushik and Shamshad, Fahad and Naseer, Muzammal and Patel, Vishal M and Nandakumar, Karthik},
  journal={arXiv preprint arXiv:2408.16807},
  year={2024}
}

Contact

If you have any questions, please create an issue on this repository or contact at koushiksrivatsan.ofcl@gmail.com.

Acknowledgement :pray:

Our code is built on top of the ESD repository. We thank the authors for releasing their code.