README.md

April 5, 2025 · View on GitHub

🧹CleanDIFT: Diffusion Features without Noise

Nick Stracke^* · Stefan A. Baumann^* · Kolja Bauer^* · Frank Fundel · Björn Ommer

CompVis @ LMU Munich
CVPR 2025 (Oral)

This repository contains the official implementation of the paper "CleanDIFT: Diffusion Features without Noise".

We propose CleanDIFT, a novel method to extract noise-free, timestep-independent features by enabling diffusion models to work directly with clean input images. Our approach is efficient, training on a single GPU in just 30 minutes.

teaser

🚀 Usage

Setup

Just clone the repo and install the requirements via pip install -r requirements.txt, then you're ready to go.

Training

In order to train a feature extractor on your own, you can run python train.py. The training script expects your data to be stored in ./data with the following format: Single level directory with images named filename.jpg and corresponding json files filename.json that contain the key caption.

Feature Extraction

For feature extraction, please refer to one of the notebooks at notebooks. We demonstrate how to extract features and use them for semantic correspondence detection and depth prediction.

Our checkpoints are fully compatible with the diffusers library. If you already have a pipeline using SD 1.5 or SD 2.1 from diffusers, you can simply replace the U-Net state dict:

from diffusers import UNet2DConditionModel
from huggingface_hub import hf_hub_download

unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-2-1", subfolder="unet")
ckpt_pth = hf_hub_download(repo_id="CompVis/cleandift", filename="cleandift_sd21_unet.safetensors")
state_dict = load_file(ckpt_pth)
unet.load_state_dict(state_dict, strict=True)

Combination with Telling Left From Right (TLFR)

Our checkpoints can also be combined with more advanced feature extraction methods than DIFT, such as TLFR. We provide an adapted version of their codebase that can be used to reproduce our state-of-the-art zero-shot semantic correspondence results in the geoaware-sc-eval branch.

🎓 Citation

If you use this codebase or otherwise found our work valuable, please cite our paper:

@inproceedings{stracke2025cleandift,
    title={CleanDIFT: Diffusion Features without Noise}, 
    author={Nick Stracke and Stefan Andreas Baumann and Kolja Bauer and Frank Fundel and Björn Ommer},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2025}
}