HyperEditor: Achieving Both Authenticity and Cross-Domain Capability in Image Editing via Hypernetworks

May 13, 2024 ยท View on GitHub

|Paper(Arxiv)|

Congratulations on our article being accepted by AAAI2024

Getting Started

Prerequisites

  • Linux or macOS
  • NVIDIA GPU + CUDA CuDNN (CPU may be possible with some modifications, but is not inherently supported)
  • Python 3

Installation

  • Dependencies: We recommend running this repository using Anaconda. All dependencies for defining the environment are provided in environment/hyperEditor_env.yaml.

Auxiliary Models

In addition, we provide various auxiliary models needed for training your own HyperEditor models from scratch. These include the pretrained e4e encoders into W, pretrained StyleGAN2 generators, and models used for loss computation.

Pretrained W-Encoders

PathDescription
Faces W-EncoderPretrained e4e encoder trained on FFHQ into the W latent space.

StyleGAN2 Generator

PathDescription
FFHQ StyleGANStyleGAN2 model trained on FFHQ with 1024x1024 output resolution.

Note: all StyleGAN models are converted from the official TensorFlow models to PyTorch using the conversion script from rosinality.


Other Utility Models

PathDescription
IR-SE50 ModelPretrained IR-SE50 model taken from TreB1eN for use in our ID loss and encoder backbone on human facial domain.
ResNet-34 ModelResNet-34 model trained on ImageNet taken from torchvision for initializing our encoder backbone.
CurricularFace BackbonePretrained CurricularFace model taken from HuangYG123 for use in ID similarity metric computation.
MTCNNWeights for MTCNN model taken from TreB1eN for use in ID similarity metric computation. (Unpack the tar.gz to extract the 3 model weights.)

By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models. However, you may use your own paths by changing the necessary values in configs/path_configs.py.

Preparing your Data

In order to train HyperEditor on your own data, you should perform the following steps:

  1. You need to get the FFHQ dataset and the Celeba-HQ dataset
  2. Update configs/paths_config.py with the necessary data paths and model paths for training and inference.
dataset_paths = {
    'ffhq': '/home/asus/stuFile/FFHQ/FFHQ',
    'celeba_test': '/home/asus/stuFile/celeba_hq'
}
  1. Configure a new dataset under the DATASETS variable defined in configs/data_configs.py. There, you should define the source/target data paths for the train and test sets as well as the transforms to be used for training and inference.
DATASETS = {
	'ffhq_hypernet': {
	'transforms': transforms_config.EncodeTransforms,
	'train_source_root': dataset_paths['ffhq'],
	'train_target_root': dataset_paths['ffhq'],
	'test_source_root': dataset_paths['celeba_test'],
	'test_target_root': dataset_paths['celeba_test']
	}

}
  1. To train with your newly defined dataset, simply use the flag --dataset_type my_hypernet.

Training HyperEditor

The main training script can be found in scripts/train.py. See options/train_options.py for all training-specific flags. Intermediate training results are saved to opts.exp_dir. This includes checkpoints, train outputs, and test outputs. Additionally, if you have tensorboard installed, you can visualize tensorboard logs in opts.exp_dir/logs.

Here, we provide an example for training on the human faces domain:

python scripts/train.py

Additional Notes

  • To select which generator layers to tune with the hypernetworks, you can use the --layers_to_tune flag.

    • By default, we will alter all non-toRGB convolutional layers.
    • If we use adaptive layer selector to reduce the complexity of the model, when a single model implements a single attribute edit, you can use --choose_layers flag.
    • The adaptive layer selector trade-off parameter can be used with --lambda_std, which defaults to 0.6.
  • If training a model to edit only a single attribute, fill in text pairs in --init_text and --target_text, respectively, e.g. ('face','face with smile').

  • If you are training a model to edit multiple attributes, you can use the --target_text_file flag.

    • --init_text defaults to the initial text for multiple attributes, such as 'face'.
    • The object txt file pointed to by --target_text_file contains the target text condition in each line.

Inference

Inference Script

You can use scripts/inference.py to apply a trained HyperEditor model on a set of images: See options/test_options.py for all inference-specific flags.

Here, we provide an example for inference on the human faces domain:

python scripts/inference.py

Additional Notes

  • The results are saved to --exp_dir.
  • The path to the trained HyperEditor model is stored in --checkpoint_path.
  • The path to the test images is stored in --data_path.

Citation

If you use this code for your research, please cite the following work:

@inproceedings{zhang2024hypereditor,
  title={HyperEditor: Achieving Both Authenticity and Cross-Domain Capability in Image Editing via Hypernetworks},
  author={Zhang, Hai and Wu, Chunwei and Cao, Guitao and Wang, Hailing and Cao, Wenming},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={7},
  pages={7051--7059},
  year={2024}
}