CLR-GAN: Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction(ECCV24)
August 4, 2024 · View on GitHub
CLR-GAN: Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction
Shengke Sun, Ziqian Luan, Zhanshan Zhao, Shijie Luo and Shuzhen Han
European Conference on Computer Vision ( ECCV ) 2024
This folder contains all the codes that implements the Consistent Latent Representation and Reconstruction method on StyleGAN-V2 and StyleGAN-V2-ADA used for reproducing the experimental results reported on the paper.
How to use this code:
Preparing datasets
Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels.
Custom datasets can be created from a folder containing images; see python dataset_tool.py --help for more information. Alternatively, the folder can also be used directly as a dataset, without running it through dataset_tool.py first, but doing so may lead to suboptimal performance.
Legacy TFRecords datasets are not supported — see below for instructions on how to convert them.
FFHQ:
Step 1: Download the Flickr-Faces-HQ dataset as TFRecords.
Step 2: Extract images from TFRecords using dataset_tool.py from the TensorFlow version of StyleGAN2-ADA:
# Using dataset_tool.py from TensorFlow version at
# https://github.com/NVlabs/stylegan2-ada/
python ../stylegan2-ada/dataset_tool.py unpack \
--tfrecord_dir=~/ffhq-dataset/tfrecords/ffhq --output_dir=/tmp/ffhq-unpacked
Step 3: Create ZIP archive using dataset_tool.py from this repository:
# Original 1024x1024 resolution.
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq.zip
# Scaled down 256x256 resolution.
#
# Note: --resize-filter=box is required to reproduce FID scores shown in the
# paper. If you don't need to match exactly, it's better to leave this out
# and default to Lanczos. See https://github.com/NVlabs/stylegan2-ada-pytorch/issues/283#issuecomment-1731217782
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq256x256.zip \
--width=256 --height=256 --resize-filter=box
AFHQ: Download the AFHQ dataset and create ZIP archive:
python dataset_tool.py --source=~/downloads/afhq/train/cat --dest=~/datasets/afhqcat.zip
python dataset_tool.py --source=~/downloads/afhq/train/dog --dest=~/datasets/afhqdog.zip
python dataset_tool.py --source=~/downloads/afhq/train/wild --dest=~/datasets/afhqwild.zip
LSUN: Download the desired categories from the LSUN project page and convert to ZIP archive:
python dataset_tool.py --source=~/downloads/lsun/raw/cat_lmdb --dest=~/datasets/lsuncat200k.zip \
--transform=center-crop --width=256 --height=256 --max_images=200000
python dataset_tool.py --source=~/downloads/lsun/raw/car_lmdb --dest=~/datasets/lsuncar200k.zip \
--transform=center-crop-wide --width=512 --height=384 --max_images=200000
Training new networks
In its most basic form, training new networks boils down to:
python train.py --outdir=~/training-runs --data=~/mydataset.zip --gpus=1 --dry-run
python train.py --outdir=~/training-runs --data=~/mydataset.zip --gpus=1
The first command is optional; it validates the arguments, prints out the training configuration, and exits. The second command kicks off the actual training.
In this example, the results are saved to a newly created directory ~/training-runs/<ID>-mydataset-auto1, controlled by --outdir. The training exports network pickles (network-snapshot-<INT>.pkl) and example images (fakes<INT>.png) at regular intervals (controlled by --snap). For each pickle, it also evaluates FID (controlled by --metrics) and logs the resulting scores in metric-fid50k_full.jsonl (as well as TFEvents if TensorBoard is installed).
The name of the output directory reflects the training configuration. For example, 00000-mydataset-auto1 indicates that the base configuration was auto1, meaning that the hyperparameters were selected automatically for training on one GPU. The base configuration is controlled by --cfg:
| Base config | Description |
|---|---|
auto (default) | Automatically select reasonable defaults based on resolution and GPU count. Serves as a good starting point for new datasets but does not necessarily lead to optimal results. |
paper256 | Reproduce results for FFHQ and LSUN Church at 256x256 using 1, 2, 4, or 8 GPUs. |
paper512 | Reproduce results for AFHQ-Cat at 512x512 using 1, 2, 4, or 8 GPUs. |
The training configuration can be further customized with additional command line options:
--aug=noaugdisables ADA.--cond=1enables class-conditional training (requires a dataset with labels).--mirror=1amplifies the dataset with x-flips. Often beneficial, even with ADA.--resume=ffhq1024 --snap=10performs transfer learning from FFHQ trained at 1024x1024.--resume=~/training-runs/<NAME>/network-snapshot-<INT>.pklresumes a previous training run.--gamma=10overrides R1 gamma. We recommend trying a couple of different values for each new dataset.--aug=ada --target=0.7adjusts ADA target value (default: 0.6).--augpipe=blitenables pixel blitting but disables all other augmentations.--augpipe=bgcfncenables all available augmentations (blit, geom, color, filter, noise, cutout).
Please refer to python train.py --help for the full list.
Model Repository
The following table lists the pre-trained GAN model that be used to reproduce the experimental results listed in paper.
| Model | Link |
|---|---|
| CIFAR-10 | Google Drive |
| CelebA | Google Drive |
| AFHQ-Cat | Google Drive |
| LSUN-Church | Google Drive |
| ImageNet(64x64) | Google Drive |
Acknowledgement
Thanks to StyleGAN-ADA for sharing the code.
BibTeX
If you find our work helpful for your research, please consider to cite:
@inproceedings{sun2024clrgan,
title = {CLR-GAN: Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction},
author = {Sun, Shengke and Luan, Ziqian and Zhao, Zhanshan and Luo, Shijie and Han, Shuzhen},
booktitle = {European Conference on Computer Vision},
year = {2024}
}
