README.md
November 30, 2025 ยท View on GitHub
FLAME Head Tracker
Note
This project depends on other third-party libraries or code, which may be licensed under different terms. When using this project, you are required to comply with the license terms of any dependencies in addition to the MIT License. Please review the licenses of all dependencies before use or distribution.
Current Version: v4.1 ๐ง (Aug 02, 2025)
Update:
- Improved tracking speed:
- ~0.9s/frame in landmark-based fitting mode (on Nvidia 4090)
- ~1.9s/frame in photometric fitting mode (on Nvidia 4090)
- Supports optimizable camera FOV.
Previous Versions:
- v3.4.1 ๐ฆ (https://github.com/PeizhiYan/flame-head-tracker/tree/v3.4.1)
- v3.3 stable ๐ (https://github.com/PeizhiYan/flame-head-tracker/tree/v3.3)
- v3.2 stable ๐ (https://github.com/PeizhiYan/flame-head-tracker/tree/v3.2)
Supported Features:
| Scenario | ๐ Landmarks-based Fitting | ๐ Photometric Fitting |
|---|---|---|
| ๐ท Single-Image Reconstruction | โ | โ |
| ๐ฅ Monocular Video Tracking | โ | โ |
๐ฆ Usage
Single-Image-Based Reconstruction ๐ท
Please follow the example in: Example_1_single_image_reconstruction.ipynb
The result ret_dict contains the following data:
- shape
(1, 300)The FLAME shape code. - exp
(1, 100)The FLAME expression code. - head_pose
(1, 3)The FLAME head pose. Not used (zeros). - jaw_pose
(1, 3)The FLAME jaw pose. - neck_pose
(1, 3)The FLAME neck pose. Not used (zeros). - eye_pose
(1, 6)The FLAME eyeball poses. - tex
(1, 50)The FLAME parametric texture code. - light
(1, 9, 3)The estimated SH lighting coefficients. - cam
(1, 6)The estimated 6DoF camera pose (yaw, pitch, roll, x, y, z). - fov
(1)The optimized camera FOV. - K
(1, 3, 3)The camera intrinsic matrix (assume image size is 256x256). - img_rendered
(1, 256, 256, 3)Rendered shape on top of the original image (for visualization purposes only). - mesh_rendered
(1, 256, 256, 3)Rendered mesh shape with landmarks (for visualization purposes only). - img
(1, 512, 512, 3)The image on which the FLAME model was fit. (Ifrealign==Trueimgis identical toimg_aligned) - img_aligned
(1, 512, 512, 3)The aligned image. - parsing
(1, 512, 512)The face semantic parsing result ofimg. - parsing_aligned
(1, 512, 512)The face semantic parsing result ofimg_aligned. - lmks_68
(1, 68, 2)The 68 Dlib format face landmarks. - lmks_ears
(1, 20, 2)The ear landmarks (only one ear). - lmks_eyes
(1, 10, 2)The eyes landmarks. - blendshape_scores
(1, 52)The facial expression blendshape scores from Mediapipe.
Monocular Video-Based Tracking ๐ฅ
Please follow the example in: Example_2_video_tracking.ipynb
Note
- The results will be saved to the
save_path. The reconstruction result of each frame will be saved to the corresponding[frame_id].npzfile. - Although each
.npzfile contains the shape coefficients and texture coefficients, they are actually same (canonical shape and texture). The expression coefficients, jaw pose, eye pose, light, and camera pose were optimized on each frame. - If
photometric_fittingisTrue, it will also save the canonical texture map as atexture.pngfile.
๐ฅ๏ธ Environment Setup
Prerequisites:
- GPU: Nvidia GPU (recommend >= 8GB memory). I tested the code on Nvidia A6000 (48GB) GPU.
- OS: Ubuntu Linux (tested on 22.04 LTS and 24.04 LTS), I haven't tested the code on Windows.
1๏ธโฃ Step 1: Create a conda environment.
conda create --name tracker -y python=3.10
conda activate tracker
2๏ธโฃ Step 2: Install necessary libraries.
Nvidia CUDA compiler (11.7)
conda install -c "nvidia/label/cuda-11.7.1" cuda-toolkit ninja
# (Linux only) ----------
ln -s "$CONDA_PREFIX/lib" "$CONDA_PREFIX/lib64" # to avoid error "/usr/bin/ld: cannot find -lcudart"
# Install NVCC (optional, if the NVCC is not installed successfully try this)
conda install -c conda-forge cudatoolkit=11.7 cudatoolkit-dev=11.7
After install, check NVCC version (should be 11.7):
nvcc --version
PyTorch (2.0 with CUDA)
pip install torch==2.0.1 torchvision --index-url https://download.pytorch.org/whl/cu117
Now let's test if PyTorch is able to access CUDA device, the result should be True:
python -c "import torch; print(torch.cuda.is_available())"
Some Python packages
pip install -r requirements.txt
3๏ธโฃ Step 3: Download necessary model files.
Note
Because of copyright concerns, we cannot re-share some model files. Please follow the instructions to download the necessary model file.
FLAME
-
Download FLAME 2020 (fixed mouth, improved expressions, more data) from https://flame.is.tue.mpg.de/ and extract to
./models/FLAME2020- As an alternative to manually downloading, you can run
./download_FLAME.shto automatically download and extract the model files.
- As an alternative to manually downloading, you can run
-
Follow https://github.com/TimoBolkart/BFM_to_FLAME to generate the
FLAME_albedo_from_BFM.npzfile and place at./models/FLAME_albedo_from_BFM.npz
DECA
-
Download
deca_model.tarfrom https://docs.google.com/uc?export=download&id=1rp8kdyLPvErw2dTmqtjISRVvQLj6Yzje, and place at./models/deca_model.tar -
Download the files from: https://github.com/yfeng95/DECA/tree/master/data, and place at
./models/
MICA
- Download
mica.tarfrom https://drive.google.com/file/d/1bYsI_spptzyuFmfLYqYkcJA6GZWZViNt, and place at./models/mica.tar
Mediapipe Face Landmarker
- Download
face_landmarker.taskfrom https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task, rename asface_landmarker_v2_with_blendshapes.task, and save at./models/face_landmarker.task
Ear Landmarker (Optional)
If you want to use ear landmarks during the fitting, please download our pre-trained ear landmarker model ear_landmarker.pth from https://github.com/PeizhiYan/flame-head-tracker/releases/download/resource/ear_landmarker.pth, and save at ./models/.
Warning
The ear landmarker model was trained on the i-Bug ear landmarks dataset, which is for RESEARCH purpose ONLY.
The final structure of ./models/ is:
./models
โโโ 79999_iter.pth <----- face parsing model
โโโ deca_model.tar <----- deca model
โโโ ear_landmarker.pth <----- our ear landmarker model
โโโ face_landmarker.task <----- mediapipe face landmarker model
โโโ fixed_displacement_256.npy
โโโ FLAME2020 <----- FLAME 2020 model folder
โย ย โโโ female_model.pkl
โย ย โโโ generic_model.pkl
โย ย โโโ male_model.pkl
โย ย โโโ Readme.pdf
โโโ FLAME_albedo_from_BFM.npz <----- FLAME texture model from BFM_to_FLAME
โโโ head_template.obj <----- FLAME head template mesh
โโโ landmark_embedding.npy
โโโ mean_texture.jpg
โโโ mica.tar <----- mica model
โโโ placeholder.txt
โโโ texture_data_256.npy
โโโ uv_face_eye_mask.png
โโโ uv_face_mask.png
โ๏ธ Acknowledgement and Disclaimer
Acknowledgement
Our code is mainly based on the following repositories:
- FLAME: https://github.com/soubhiksanyal/FLAME_PyTorch
- Pytorch3D: https://github.com/facebookresearch/pytorch3d
- DECA: https://github.com/yfeng95/DECA
- MICA: https://github.com/Zielon/MICA
- FLAME Photometric Fitting: https://github.com/HavenFeng/photometric_optimization
- FaceParsing: https://github.com/zllrunning/face-parsing.PyTorch
- Dlib2Mediapipe: https://github.com/PeizhiYan/Mediapipe_2_Dlib_Landmarks
- Face Alignment: https://github.com/1adrianb/face-alignment
- i-Bug Ears (ear landmarks dataset): https://ibug.doc.ic.ac.uk/resources/ibug-ears/
- Ear Landmark Detection: https://github.com/Dryjelly/Face_Ear_Landmark_Detection
- ArcFace (from InsightFace): https://github.com/deepinsight/insightface
- RobustVideoMatting: https://github.com/PeterL1n/RobustVideoMatting
We want to acknowledge the contributions of the authors of these repositories. We do not claim ownership of any code originating from these repositories, and any modifications we have made are solely for our specific use case. All original rights and attributions remain with the respective authors.
Disclaimer
Our code can be used for research purposes, provided that the terms of the licenses of any third-party code, models, or dependencies are followed. For commercial use, the parts of code we wrote are for free, but please be aware to get permissions from any third-party to use their code, models, or dependencies. We do not assume any responsibility for any issues, damages, or liabilities that may arise from the use of this code. Users are responsible for ensuring compliance with any legal requirements, including licensing terms and conditions, and for verifying that the code is suitable for their intended purposes.
๐งธ Citation
Please consider citing our works if you find this code useful. This code was originally used for "Gaussian Deja-vu" (accepted for WACV 2025 in Round 1) and "ArchitectHead" (accepted for WACV 2026).
@misc{Yan_2026_WACV,
author = {Yan, Peizhi and Ward, Rabab and Tang, Qiang and Du, Shan},
title = {ArchitectHead: Continuous Level of Detail Control for 3D Gaussian Head Avatars},
year = {2025},
note = {Accepted to WACV 2026}
}
@InProceedings{Yan_2025_WACV,
author = {Yan, Peizhi and Ward, Rabab and Tang, Qiang and Du, Shan},
title = {Gaussian Deja-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities},
booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
month = {February},
year = {2025},
pages = {276-286}
}




