README.md

July 22, 2022 · View on GitHub

A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration

Abstract

Generative adversarial networks (GANs) have drawn enormous attention due to the simple yet effective training mechanism and superior image generation quality. With the ability to generate photo-realistic high-resolution (e.g., $1024\times1024$) images, recent GAN models have greatly narrowed the gaps between the generated images and the real ones. Therefore, many recent works show emerging interest to take advantage of pre-trained GAN models by exploiting the well-disentangled latent space and the learned GAN priors. We briefly review recent progress on leveraging pre-trained large-scale GAN models from three aspects, i.e., 1) the training of large-scale generative adversarial networks, 2) exploring and understanding the pre-trained GAN models, and 3) leveraging these models for subsequent tasks like image restoration and editing.

Figure 1. Illustration of GAN inversion methods.

Illustration

In this figure, $\mathbf{x}$ and $\mathbf{\hat{x}}$ are given real image and generated image, respectively. The red dotted line means supervision. It can be seen that the in-domain constraint requires the generated image $\mathbf{\hat{x}}$ can be inverted back into the latent space. Here, $\mathbf{z}$ is not restricted in $\mathcal{Z}$ space, and may refer to more generic latent code (e.g., $\mathbf{w}$ , $\mathbf{f}$ , etc).

Figure Content (PDF file here)

Figure 2. A Summary of Relevant Papers

If you want to get the raw file, please refer to ProcessOn.com (passcode: 1qaz)

Figure 3. Illustration of recent GAN models (see (a)$\sim$(d)) and the latent spaces of StyleGAN series (see (e)).

Illustration

(a) For PGGAN, the blue part denotes the progressive growing procedure from \$4\times4$ to \$8\times8$. The components with dash lines are employed for the fade-in strategy, where $\alpha$ is gradually growing to 1. They are discarded when the model grows to a higher-resolution. (b) For BigGAN, a specific noise is delivered to each layer together with the class embedding, and the model is end-to-end trained without the progressive growing procedure. (c) For StyleGAN, a series of FC layers are deployed to map $\mathbf{z}$ into $\mathbf{w}$. The green only belongs to StyleGAN2. (d) For StyleGAN3, the generator is largely modulated to improve the translational and rotation equivariance. The discriminator is omitted since it is identical with that used in StyleGAN2. (e) For simplicity, here we take the StyleGAN series as an example to show the latent spaces based on GAN inversion task.

Figure Content (PDF file here)

Table 1. A summary of GAN inversion and methods leveraging pre-trained GANs for image editing and restoration.

Illustration

For the inversion method, "O", "L", "T" represent optimization-based, learning-based, and training-based (or fine-tuning) methods, while "/" means no inversion is performed in this method, and the numbers (without square brackets) are the indices of methods used for inversion in this table. Note that the methods are ordered (roughly) according to publicly accessible time (e.g., the appear time on ArXiv, openreview.net, CVF Open Access, etc.).

Abbreviations

$^\ast$ Abbreviations: AD (ADE20K), AF (AFHQ), CA (CelebA), CD (CACD), CF (CIFAR), CH (CelebA-HQ), CM (CelebAMask-HQ), CO (MS COCO), CS (CityScapes), CU (Caltech-UCSD Birds), DA (Danbooru, aka Anime Faces), DF (DeepFashion), FF (FFHQ), FL (Flowers), IN (ImageNet), LF (LFW), LS (LSUN), MF (MetFaces), MN (MNIST), OM (Omniglot), P3 (Places365), PL (Places), PT (Oxford-IIIT Pet, aka Cats and Dogs), RA (RAVDESS), SC (Stanford Cars), SS (Streetscape), SV (SVHN), TR (Transient), UT (UT Zappos50K)

$^\dagger$ Abbreviations: AD (Adversarial Defense), AE (Attribute Editing, i.e., w/o reference), AN (Anomaly Detection), AR (Artifacts Removal), AT (Attribute Transfer, i.e., w/ reference), CO (Image Crossover), [U]DA ([Unsupervised] Domain Adaptation), DN (Image Denoising), FF (Face Frontalization), IC (Image Colorization), IG (Image Generation), IH (Information Hiding), Int (Interpolation), Inv (Inversion), IP (Inpainting), PI (Parsing or Segmentation to Image), SI (Sketch to Image), SR (Image Super-resolution), ST (Style Transfer), TR (Transform and Random Jittering).

$^\ddagger$ Some custom datasets collected or regenerated by the authors are omitted since they are not publicly available or can be generated automatically based on current public datasets.

Table Content

No.	Method	Publication	Backbone	Latent Space	Inversion Method	Dataset $^\ast$	Application $^\dagger$
1	BiGAN (Link) (Code)	ICLR 2017	/	$\mathcal{Z}$	T	MN, IN	Inv
2	ALI (Link) (Code)	ICLR 2017	/	$\mathcal{Z}$	T	CF, SV, CA, IN	Inv, Int
3	Zhu et al. (Link) (Code)	ECCV 2016	DCGAN	$\mathcal{Z}$	L, O	SH, LS, PL $^\ddagger$	Inv, Int, AE
4	IcGAN (Link) (Code)	NeurIPSw 2016	cGAN	$\mathcal{Z}$ , $\mathcal{C}$	L	MN, CA	Inv, AT, AE
5	Creswell et al. (Link) (Code)	T-NNLS 2018	DCGAN, WGAN-GP	$\mathcal{Z}$	O	OM, UT, CA	Inv
6	Lipton et al. (Link) (Code)	ICLRw 2017	DCGAN	$\mathcal{Z}$	O	CA	Inv
7	PGD-GAN (Link) (Code)	ICASSP 2018	DCGAN	$\mathcal{Z}$	O	MN, CA	Inv
8	Ma et al. (Link) (Code)	NeurIPS 2018	DCGAN	$\mathcal{Z}$	O	MN, CA	Inv, IP
9	Suzuki et al. (Link) (Code)	ArXiv 2018	SNGAN, BigGAN, StyleGAN	$\mathcal{F}$	3	IN, FL, FF, DA	CO
10	GANDissection (Link) (Code)	ICLR 2019	PGGAN	$\mathcal{F}$	/	LS, AD	AE, AR
11	NPGD (Link) (Code)	ICCV 2019	DCGAN, SAGAN	$\mathcal{Z}$	L, O	MN, CA, LS	Inv, SR, IP
12	Image2StyleGAN (Link) (Code)	ICCV 2019	StyleGAN	$\mathcal{W}+$	O	FF $^\ddagger$	Inv, Int, AE, ST
13	Bau et al. (Link) (Code)	ICLRw 2019	PGGAN, WGAN-GP, StyleGAN	$\mathcal{Z}$ , $\mathcal{W}$	L, O	LS	Inv
14	GANPaint (Link) (Demo)	ToG 2019	PGGAN	$\mathcal{Z}$ , $\Theta$	L, O, T	LS	Inv, AE
15	InterFaceGAN(Link) (Code)	CVPR 2020	PGGAN, StyleGAN	$\mathcal{Z}$ , $\mathcal{W}$	3, 8	CH	AE, AR
16	GANSeeing(Link) (Code)	ICCV 2019	PGGAN, WGAN-GP, StyleGAN	$\mathcal{Z}$ , $\mathcal{W}$	13	LS	Inv
17	YLG(Link) (Code)	CVPR 2020	SAGAN	$\mathcal{Z}$	O	IN	Inv
18	Image2StyleGAN++(Link) (Video)	CVPR 2020	StyleGAN	$\mathcal{W}+$ , $\mathcal{N}$	O	LS, FF	Inv, CO, IP, AE, ST
19	mGANPrior(Link) (Code)	CVPR 2020	PGGAN, StyleGAN	$\mathcal{Z}$	O	FF, CH, LS	Inv, IC, SR, IP, DN, AE
20	MimicGAN(Link)	IJCV 2020	DCGAN	$\mathcal{Z}$	O	CA, FF, LF	Inv, UDA, AD, AN
21	PULSE(Link) (Code)	CVPR 2020	StyleGAN	$\mathcal{Z}$	O	FF, CH	Inv, SR
22	DGP(Link) (Code)	ECCV 2020	BigGAN	$\mathcal{Z}$	O, T	IN, P3	Inv, Int, IC, IP, SR, AD, TR, AE
23	StyleGAN2Distillation(Link) (Code)	ECCV 2020	StyleGAN2, pix2pixHD	$\mathcal{W}+$	/	FF	AT, AE
24	EditingInStyle(Link) (Code)	CVPR 2020	PGGAN, StyleGAN, StyleGAN2	$\mathcal{F}$	/	FF, LS	AT
25	StyleRig(Link) (Video)	CVPR 2020	StyleGAN	$\mathcal{W}+$	/	FF	AT
26	ALAE(Link) (Code)	CVPR 2020	StyleGAN	$\mathcal{W}$	T	MN, FF, LS, CH	Inv, AT
27	IDInvert(Link) (Code)	ECCV 2020	StyleGAN	$\mathcal{W}+$	L, O	FF, LS	Inv, Int, AE, CO
28	pix2latent(Link) (Code)	ECCV 2020	BigGAN, StyleGAN2	$\mathcal{Z}$	O	IN, CO, CF, LS	Inv, TR, AE
29	IDDistanglement(Link) (Code)	ToG 2020	StyleGAN	$\mathcal{W}$	L	FF	Inv, AT
30	WhenAndHow(Link)	ArXiv 2020	MLP	$\mathcal{Z}$	O	MN	Inv, IP
31	Guan et al.(Link)	ArXiv 2020	StyleGAN	$\mathcal{W}+$	L, O	CH, CD	Inv, Int, AT, IC
32	SeFa(Link) (Code)	CVPR 2021	PGGAN, BigGAN, StyleGAN	$\mathcal{Z}$	19, 27	FF, CH, LS, IN, SS, DA	AE
33	GH-Feat(Link) (Code)	CVPR 2021	StyleGAN	$\mathcal{S}$	L	MN, FF, LS, IN	Inv, AT, AE
34	pSp(Link) (Code)	CVPR 2021	StyleGAN2	$\mathcal{W}+$	L	FF, AF, CH, CM	Inv, FF, SI, SR
35	StyleFlow(Link) (Code)	ToG 2021	StyleGAN, StyleGAN2	$\mathcal{W}+$	12	FF, LS	AT, AE
36	PIE(Link) (Code)	ToG 2020	StyleGAN	$\mathcal{W}+$	O	FF	AT, AE
37	Bartz et al.(Link) (Code)	BMVC 2020	StyleGAN, StyleGAN2	$\mathcal{Z}$ , $\mathcal{W}+$	L	FF, LS	Inv, DN
38	StyleIntervention(Link)	ArXiv 2020	StyleGAN2	$\mathcal{S}$	O	FF	Inv, AE
39	StyleSpace(Link) (Code)	CVPR 2021	StyleGAN2	$\mathcal{S}$	O	FF, LS	Inv, AE
40	Hijack-GAN(Link) (Code)	CVPR 2021	PGGAN, StyleGAN	$\mathcal{Z}$	/	CH	AE
41	NaviGAN(Link) (Code)	CVPR 2021	pix2pixHD, BigGAN, StyleGAN2	$\Theta$	StyleGAN2	FF, LS, CS, IN	AE
42	GLEAN(Link) (Code)	CVPR 2021	StyleGAN	$\mathcal{W}+$	L	FF, LS	Inv, SR
43	ImprovedGANEmbedding(Link) (Code)	ArXiv 2020	StyleGAN, StyleGAN2	$\mathcal{P}$	O	FF, MF $^\ddagger$	Inv, IC, IP, SR
44	GFPGAN(Link) (Code)	CVPR 2021	StyleGAN2	$\mathcal{W}$	L	FF	Inv, SR
45	EnjoyEditing(Link) (Code)	ICLR 2021	PGGAN, StyleGAN2	$\mathcal{Z}$	12	FF, CA, CH, P3, TR	Inv, AE
46	SAM(Link) (Code)	ToG 2021	StyleGAN	$\mathcal{W}+$	L	CA, CH	AE
47	e4e(Link) (Code)	ToG 2021	StyleGAN2	$\mathcal{W}+$	L	FF, CH, LS, SC	Inv, AE
48	StyleCLIP(Link) (Code)	ICCV 2021	StyleGAN2	$\mathcal{W}+$ , $\mathcal{S}$	47, O	FF, CH, LS, AF	AE
49	LatentComposition(Link) (Code)	ICLR 2021	PGGAN, StyleGAN2	$\mathcal{Z}$	L	FF, CH, LS	Inv, IP, AT
50	GANEnsembling(Link) (Code)	CVPR 2021	StyleGAN2	$\mathcal{W}+$	L, O	CH, SC, PT	Inv, AT
51	ReStyle(Link) (Code)	ICCV 2021	StyleGAN2	$\mathcal{W}+$	L	FF, CH, SC, LS, AF	Inv, AE
52	E2Style(Link) (Code)	T-IP 2022	StyleGAN2	$\mathcal{W}+$	L	FF, CH	Inv, SI, PI, AT, IP, SR, AE, IH
53	GPEN(Link) (Code)	CVPR 2021	StyleGAN2	$\mathcal{W}+$ , $\mathcal{N}$	L	FF, CH	Inv, SR
54	Consecutive(Link) (Code)	ICCV 2021	StyleGAN	$\mathcal{W}+$	O	FF, RA	Inv, Int, AE
55	BDInvert(Link) (Code)	ICCV 2021	StyleGAN, StyleGAN2	$\mathcal{F}$ / $\mathcal{W}+$	O	FF, CH, LS	Inv, AE
56	HFGI(Link) (Code)	CVPR 2022	StyleGAN2	$\mathcal{W}+$ , $\mathcal{F}$	L	FF, CH, SC	Inv, AE
57	VisualVocab(Link) (Code)	ICCV 2021	BigGAN	$\mathcal{Z}$	/	P3, IN	AE
58	HyperStyle(Link) (Code)	CVPR 2022	StyleGAN2	$\mathcal{W}+$	L	FF, CH, AF	Inv, AE, ST
59	GANGealing(Link) (Code)	CVPR 2022	StyleGAN2	$\mathcal{W}$	/	LS, FF, AF, CH, CU	TR
60	HyperInverter(Link) (Code)	CVPR 2022	StyleGAN2	$\mathcal{W}$ , $\Theta$	L	FF, CH, LS	Inv, Int, AE
61	InsetGAN(Link) (Code)	CVPR 2022	StyleGAN2	$\mathcal{W}+$	O	FF, DF $^\ddagger$	CO, IG
62	HairMapper(Link) (Code)	CVPR 2022	StyleGAN2	$\mathcal{W}+$	47	FF, CM $^\ddagger$	AE
63	SAMInv(Link) (Code)	CVPR 2022	BigGAN-deep, StyleGAN2	$\mathcal{W}+$ , $\mathcal{F}$	L	FF, LS, IN	Inv, AE

Contributions

Pull requests are welcome for error correction and content expansion!

Tips:

The tables in latex and markdown can be generated by tablesgenerator.com
You can download our table content from here, and load it (or your own CSV files) in tablesgenerator.com

Citation

Please find more details in our paper. If you find it useful, please consider citing

@article{liu2022pretrainedGANs,
  title={A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration},
  author={Liu, Ming and Wei, Yuxiang and Wu, Xiaohe and Zuo, Wangmeng and Zhang, Lei},
  journal={arXiv preprint arXiv:2207.10309},
  year={2022}
}

A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration

Abstract

Contents

Contributions

Citation