MUNIT: Multimodal Unsupervised Image-to-Image Translation
October 13, 2021 · View on GitHub
Paper | Video Demo (1m) | Previous Implementation
This is an improved implementation of MUNIT. The main differences are:
- Use spectral normalization in the generator and the discriminator.
- Use the two-timescale update rule (TTUR) with the discriminator learning rate 0.0004 and the generator learning rate 0.0001.
- Use hinge loss instead of least square loss.
- Use a global residual discriminator instead of multi-resolution patch-wise discriminators for datasets that contain iconic objects and do not require pixel-wise correspondence (e.g., animal faces).
License
Imaginaire is released under NVIDIA Software license. For commercial use, please consult researchinquiries@nvidia.com
Software Installation
For installation, please checkout INSTALL.md.
Hardware Requirement
We trained our models using an NVIDIA DGX1 with 8 V100 32GB GPUs. You can try to use fewer GPUs or reduce the batch size if it does not fit in your GPU memory, but training stability and image quality cannot be guaranteed.
Training
MUNIT prefers the following data structure.
${TRAINING_DATASET_ROOT_FOLDER}
└───images_a
└───001.jpg
└───002.jpg
└───003.jpg
...
└───images_b
└───001.jpg
└───002.jpg
...
Training data preparation
- Here we use dog and cat images in the publicly available animal face datasets (AFHQ) instead. The dataset is available (https://github.com/clovaai/stargan-v2) under Creative Commons BY-NC 4.0 license by NAVER Corporation. Download and extract the data
python scripts/download_dataset.py --dataset afhq_dog2cat
- Preprocess the data into LMDB format
bash scripts/build_lmdb.sh munit afhq_dog2cat
Training command
python -m torch.distributed.launch --nproc_per_node=8 train.py \
--config configs/projects/munit/afhq_dog2cat/ampO1.yaml
Inference
- Download a small set of test examples (5 randomly sampled images from the test set)
python scripts/download_test_data.py --model_name munit
- Or arrange your own data into the following format.
projects/munit/test_data/afhq_dog2cat
└───images_a
└───0001.jpg
└───0002.jpg
└───0003.jpg
...
└───images_b
└───0001.jpg
└───0002.jpg
└───0003.jpg
...
- Translate images from dogs to cats
-
Inference command
python -m torch.distributed.launch --nproc_per_node=1 inference.py \ --config configs/projects/munit/afhq_dog2cat/ampO1.yaml \ --output_dir projects/munit/output/afhq_dog2catOutputs are saved in
projects/munit/output/afhq_dog2cat:

-
To use the style from example cat images, change
random_styleinconfigs/projects/munit/afhq_dog2cat/ampO1.yamltoFalse. -
To translate images in the other direction (from cats to dogs), change
a2btoFalse.
-
Citation
If you use this code for your research, please cite our paper.
@inproceedings{huang2018multimodal,
title={Multimodal Unsupervised Image-to-image Translation},
author={Huang, Xun and Liu, Ming-Yu and Belongie, Serge and Kautz, Jan},
booktitle={European Conference on Computer Vision (ECCV)}},
year={2018}
}