Official Implementation of DOODL (End-to-End Diffusion Latent Optimization Improves Classifier Guidance)
April 18, 2023 ยท View on GitHub
What is DOODL?
DOODL (Direct Optimization of Diffusion Latents) is a variant of classifier guidance that directly optimizes diffusion latents x_T instead of using model-based gradients to guide denoising. This is done be leveraging the EDICT algorithm and MemCNN library to construct a diffusion process that can be backpropagated through with constant memory cost w.r.t the number of diffusion steps without significant runtime increase. The control of this optimization allows a variety of guidance modes to be incorporated. Check out our paper for more details and don't hesitate to reach out with questions!
Setup
HF Auth token
Paste a copy of a suitable HF Auth Token into hf_auth with no new line (to be read by the following code in edict_functions.py)
with open('hf_auth', 'r') as f:
auth_token = f.readlines()[0].strip()
Example file at ./hf_auth
abc123abc123
Environment
Run conda env create -f environment.yaml, activate that conda env (conda activate doodl). Run jupyter with that conda env active
FGVC models
FGVC models can be downloaded from the WS-DAN repo and saved at fgvc_ws_dan_helpers/checkpoints/
Experimentation
Check out this notebook for examples of how to use DOODL.
Other Files
- doodl.py has the core functionality of DOODL
- my_half_diffusers is a very slightly changed version of the HF Diffusers repo
- fgvc_ws_dan_helpers/ gives access to the WSDAN Model.
- memcnn/ is a very lightly modified version of the excellent MemCNN library. Thank you to the original MemCNN authors!
Citation
If you find our work useful in your research, please cite the following works:
@misc{wallace2023endtoend,
title={End-to-End Diffusion Latent Optimization Improves Classifier Guidance},
author={Bram Wallace and Akash Gokul and Stefano Ermon and Nikhil Naik},
year={2023},
eprint={2303.13703},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@article{wallace2022edict,
title={EDICT: Exact Diffusion Inversion via Coupled Transformations},
author={Wallace, Bram and Gokul, Akash and Naik, Nikhil},
journal={arXiv preprint arXiv:2211.12446},
year={2022}
}
License
Our code is BSD-3 licensed. See LICENSE.txt for details.