GENIE: Higher-Order Denoising Diffusion Solvers NeurIPS 2022

April 25, 2023 · View on GitHub

Tim Dockhorn · Arash Vahdat · Karsten Kreis

Paper Project Page

Animation

Requirements

GENIE is built using PyTorch 1.11.0 and CUDA 11.3. Please use the following command to install the requirements:

pip install -r requirements.txt

Optionally, you may also install NVIDIA Apex. The Adam optimizer from this library is faster than PyTorch's native Adam.

Pretrained checkpoints

We provide pre-trained checkpoints for all models presented in the paper. Note that the CIFAR-10 base diffusion model is taken from the ScoreSDE repo.

Description	Checkpoint path
CIFAR-10 base diffusion model	`work_dir/cifar10/checkpoint_8.pth`
CIFAR-10 base GENIE model	`work_dir/cifar10/genie_checkpoint_20000.pth`
Church base diffusion model	`work_dir/church/checkpoint_300000.pth`
Church base GENIE model	`work_dir/church/genie_checkpoint_35000.pth`
Bedroom base diffusion model	`work_dir/bedroom/checkpoint_300000.pth`
Bedroom base GENIE model	`work_dir/bedroom/genie_checkpoint_40000.pth`
ImageNet base diffusion model	`work_dir/imagenet/checkpoint_400000.pth`
ImageNet base GENIE model	`work_dir/imagenet/genie_checkpoint_25000.pth`
Conditional ImageNet base diffusion model	`work_dir/imagenet/cond_checkpoint_400000.pth`
Conditional ImageNet base GENIE model	`work_dir/imagenet/cond_genie_checkpoint_15000.pth`
Cats base diffusion model	`work_dir/cats/base/checkpoint_400000.pth`
Cats base GENIE model	`work_dir/cats/base/genie_checkpoint_20000.pth`
Cats diffusion upsampler	`work_dir/cats/upsampler/checkpoint_150000.pth`
Cats GENIE upsampler	`work_dir/cats/upsampler/genie_checkpoint_20000.pth`

Unconditional sampling

After placing the provided checkpoints at the paths outlined above, you can sample from the base model via:

python main.py --mode eval --config <dataset>.eval --workdir <new_directory> --sampler ttm2

Here, dataset is one of cifar10, church, bedroom, imagenet, or cats. To turn off the GENIE model and sample from the plain diffusion model (via DDIM), simply remove the --sampler ttm2 flag. By default, the above generates 16 samples using a single GPU.

On the cats dataset, we also provide an upsampler, which can be run using the following command:

python main.py --mode eval --config cats.eval_upsampler --workdir <new_directory> --data_folder <folder_with_128x128_samples> --sampler ttm2

Conditional and classifier-free guidance sampling

On ImageNet, we also provide a class-conditional checkpoint, which can be controleld via the --labels flag.

python main.py --mode eval --config imagenet.eval_conditional --workdir output/testing_sampling/imagenet_genie_conditional/v2/ --sampler ttm2 --labels 1000

To generate all samples from the same class, you can set --labels to a single integer between 0 and 999 (inclusive). Alternatively, you can provide a list of labels, for example, --labels 0,87,626,3; note, however, that the length of the list needs to be the same as the total number of generated samples. To sample using random labels, you may set the --labels flag to the number of classes, for ImageNet that would be 1000.

Furthermore, since we provide both class-conditinal and unconditional checkpoints for ImageNet, you can generate samples using classifier-free guidance:

python main.py --mode eval --config imagenet.eval_with_guidance --workdir output/testing_sampling/imagenet_genie_guidance/v3 --sampler ttm2 --labels 1000 --guidance_scale 1.

The --guidance_scale flag should be set to a positive float.

Training your own models

Data preparations

First, create the following two folders:

mkdir -p data/raw/
mkdir -p data/processed/

Afterwards, run the following commands to download and prepare the data used for training.

CIFAR-10

wget -P data/raw/ https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
python dataset_tool.py --source data/raw/cifar-10-python.tar.gz --dest data/processed/cifar10.zip

LSUN Chuch

python lsun/download.py -c church_outdoor
mv church_outdoor_train_lmdb.zip data/raw
mv church_outdoor_val_lmdb.zip data/raw
unzip data/raw/church_outdoor_train_lmdb.zip -d data/raw/
python dataset_tool.py --source=data/raw/church_outdoor_train_lmdb/ --dest=data/processed/church.zip --resolution=128x128

LSUN Bedroom

python lsun/download.py -c bedroom
mv bedroom_train_lmdb.zip data/raw
mv bedroom_val_lmdb.zip data/raw
unzip data/raw/bedroom_train_lmdb.zip -d data/raw/
python dataset_tool.py --source=data/raw/bedroom_train_lmdb/ --dest=data/processed/bedroom.zip --resolution=128x128

AFHQ-v2

wget -N https://www.dropbox.com/s/vkzjokiwof5h8w6/afhq_v2.zip?dl=0
mv 'afhq_v2.zip?dl=0' data/raw/afhq_v2.zip
unzip data/raw/afhq_v2.zip -d data/raw/afhq_v2
python dataset_tool.py --source=data/raw/afhq_v2/train/cat --dest data/processed/cats.zip
python dataset_tool.py --source=data/raw/afhq_v2/train/cat --dest data/processed/cats_128.zip --resolution=128x128

ImageNet

First download the ImageNet Object Localization Challenge, then run the following

python dataset_tool.py --source==data/raw/imagenet/ILSVRC/Data/CLS-LOC/train --dest=data/processed/imagenet.zip --resolution=64x64 --transform=center-crop

FID evaluation

Before training, you should compute FID stats.

python compute_fid_statistics.py --path data/processed/cifar10.zip --file cifar10.npz
python compute_fid_statistics.py --path data/processed/church.zip --file church_50k.npz --max_samples 50000
python compute_fid_statistics.py --path data/processed/bedroom.zip --file bedroom_50k.npz --max_samples 50000
python compute_fid_statistics.py --path data/processed/imagenet.zip --file imagenet.npz
python compute_fid_statistics.py --path data/processed/cats.zip --file cats.npz

Diffusion model training scripts

We provide configurations to reproduce our models here. Feel free to use a different numbers of GPUs than us, however, in that case, you should also change the (per GPU) batch size (config.train.batch_size) in the corresponding config file. To train the base diffusion models, use the following commands:

python main.py --mode train --config church.train_diffusion --workdir <new_directory> --n_gpus_per_node 8
python main.py --mode train --config bedroom.train_diffusion --workdir <new_directory> --n_gpus_per_node 8
python main.py --mode train --config imagenet.train_diffusion --workdir <new_directory> --n_gpus_per_node 8
python main.py --mode train --config imagenet.train_diffusion_conditional.py --workdir <new_directory> --n_gpus_per_node 8
python main.py --mode train --config cats.train_diffusion --workdir <new_directory> --n_gpus_per_node 8
python main.py --mode train --config cats.train_diffusion_upsampler --workdir <new_directory> --n_gpus_per_node 8 --n_nodes 2

To continue an interrupted training run, you may run the following command:

python main.py --mode continue --config <config_file> --workdir <existing_working_directory> --ckpt_path <path_to_checkpoint>

We recommend to use the same number of GPUs (via --n_gpus_per_node) and nodes (via --n_nodes) as in the interrupted run.

Genie model training scripts

Our GENIE models can be trained using the following commands:

python main.py --mode train --config cifar10.train_genie --workdir <new_directory> --n_gpus_per_node 8
python main.py --mode train --config church.train_genie --workdir <new_directory> --n_gpus_per_node 8
python main.py --mode train --config bedroom.train_genie --workdir <new_directory> --n_gpus_per_node 8
python main.py --mode train --config imagenet.train_genie --workdir <new_directory> --n_gpus_per_node 8
python main.py --mode train --config imagenet.train_genie_conditional.py --workdir <new_directory> --n_gpus_per_node 8
python main.py --mode train --config cats.train_genie --workdir <new_directory> --n_gpus_per_node 8
python main.py --mode train --config cats.train_genie_upsampler --workdir <new_directory> --n_gpus_per_node 8 --n_nodes 2

To continue interrupted training runs, use the same syntax as above.

Citation

If you find the provided code or checkpoints useful for your research, please consider citing our NeurIPS paper:

@inproceedings{dockhorn2022genie,
  title={{{GENIE: Higher-Order Denoising Diffusion Solvers}}},
  author={Dockhorn, Tim and Vahdat, Arash and Kreis, Karsten},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

License

The code of this work is made available under the NVIDIA Source Code License. Please see our main LICENSE file.

All pre-trained checkpooints are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

License Dependencies

For any code dependencies related to StyleGAN3 (stylegan3/, torch_utils/, and dnnlib/), the license is the Nvidia Source Code License by NVIDIA Corporation, see StyleGAN3 LICENSE.

The script to download LSUN data has the MIT License.

We use three diffusion model architectures; see below:

Model	License
ScodeSDE	Apache License 2.0
Guided Diffusion	MIT License
PyTorch Diffusion	MIT License