INSTALLATION.md

May 27, 2023 · View on GitHub

The environment is tested with Ubuntu 20.04 and Python 3.8, with NVIDIA GPU plus CUDA enabled. Anaconda or Miniconda is recommended to install the running environment. All the packages dependencies can be found in e4s_env.yaml, and it's convinient to create a conda environment via conda env create -f e4s_env.yaml command.

💡 Hint: If you find some problems when installing dlib, please consider to install it from conda forge or build it manually.

If you plan to use SegNeXt-FaceParser as described in section 1.3.1 below, some extra effort is needed. Click >>this link<< to forward to the installation guidance of SegNeXt-FaceParser. mmcv-full==1.5.1 + mmcls==0.20.1 + latest SegNeXt-FaceParser version is tested.

1.2 pre-trained model

We provide a pre-trained RGI model that was trained on FFHQ dataset for 300K iterations, please fetch the model from this Google Drive link and place it in the pretrained_ckpts/e4s folder.

1.3 Other dependencies

1.3.1 Face Parser

We use face parser to estimate the facial segmentation. Currently, we provide the following two pre-trained face parsers:

face-parsing.PyTorch (the default one): repo

Please download the pre-trained model here, and place it in the pretrained_ckpts/face_parsing folder.
SegNeXt-FaceParser: repo

Please download the pre-trained SegNeXt model (small | base), and place it in the pretrained_ckpts/face_parsing folder. The corresponding configuration files are already included in the pretrained_ckpts/face_parsing folder.

💡 Hint: The following FaceVid2Vid and GPEN are only applied for face swapping. If noly face editing is needed, just skip to Section 2 directly.

1.3.2 FaceVid2Vid: paper | unofficial-repo

This face reenactment model is applied to drive source face to show similar pose and expression as the target. Currently, we use zhanglonghao's impl. of FaceVid2Vid, where the pre-trained model can be downloaded here (Vox-256-New). Similarly, please put it in the pretrained_ckpts/facevid2vid folder.

1.3.3 GPEN: paper | repo

A face restoration model (GPEN) is used to improve the resolution of the intermediate driven face. You can execute the following script to fetch them automatically:

cd pretrained_ckpts/gpen
sh ./fetch_gpen_models.sh

Alternatively, you can download the pre-trained models manually as follows:

Model	download link
RetinaFace-R50	https://public-vigen-video.oss-cn-shanghai.aliyuncs.com/robin/models/RetinaFace-R50.pth, for face detection
RealESRNet_x4	https://public-vigen-video.oss-cn-shanghai.aliyuncs.com/robin/models/realesrnet_x4.pth, for x4 super resolution
GPEN-BFR-512	https://public-vigen-video.oss-cn-shanghai.aliyuncs.com/robin/models/GPEN-BFR-512.pth, GEPN pre-trained model
ParseNet	https://public-vigen-video.oss-cn-shanghai.aliyuncs.com/robin/models/ParseNet-latest.pth, for face parsing

Make sure to place these checkpoint files in pretrained_ckpts/gpen/weights folder.

2. Sanity check

After fetching these checkpoints, your pretrained_ckpts folder should be same as:

pretrained_ckpts/
├── auxiliray (optional for training)
│   ├── model_ir_se50.pth
│   └── model.pth
├── e4s
│   └── iteration_300000.pt
├── face_parsing
│   ├── 79999_iter.pth
│   ├── segnext.tiny.512x512.celebamaskhq.160k.py
│   ├── segnext.tiny.best_mIoU_iter_160000.pth (optional)
│   ├── segnext.base.512x512.celebamaskhq.160k.py
│   ├── segnext.base.best_mIoU_iter_140000.pth (optional)
│   ├── segnext.small.512x512.celebamaskhq.160k.py
│   ├── segnext.small.best_mIoU_iter_140000.pth (optional)
│   ├── segnext.large.512x512.celebamaskhq.160k.py
│   └── segnext.large.best_mIoU_iter_150000.pth (optional)
├── facevid2vid
│   ├── 00000189-checkpoint.pth.tar
│   └── vox-256.yaml
├── gpen
│   ├── fetch_gepn_models.sh
│   └── weights
│       ├── GPEN-BFR-512.pth
│       ├── ParseNet-latest.pth
│       ├── realesrnet_x4.pth
│       └── RetinaFace-R50.pth
├── put_ckpts_accordingly.txt
└── stylegan2 (optional for training)
    └── stylegan2-ffhq-config-f.pt