ATL-Diff: Audio-Driven Talking Head Generation using Early Landmark Guide Noise Diffusion.

December 10, 2024 · View on GitHub


Abtract: Audio-driven talking head generation presents significant challenges in creating realistic facial animations that accurately synchronize with audio signals. This paper introduces ATL-Diff, a novel approach that addresses key limitations in existing methods through an innovative three-component framework. The Landmark Generation Module supports to construction a sequence of landmarks from audio. Landmarks Guide Noise is the approach adds movement information by distributing the noise following landmarks so it isolates audio from the model. 3D Identity Diffusion network to preserve keep the identity characteristics. Experimental validation on the MEAD and CREMA-D datasets demonstrates the method’s superior performance. ATL-Diff significantly outperforms state-of-the-art techniques across all critical metrics. The approach achieves near real-time processing capabilities, generating high-quality facial animations with exceptional computational efficiency and remarkable preservation of individual facial nuances. By bridging audio signals and facial movements with unprecedented precision, this research advances talking head generation technologies with promising applications in virtual assistants, education, medical communication, and emerging digital platforms.


Model Architecture

overview architecture

Qualitative Results

overview architecture

How to train:

  • To train Landmarks Generation Module follow our work in this repository
  • Update soon!

How to inference:

Get the repository:

  • Clone the repository:
    git clone https://github.com/sonvth/ATL-Diff
    cd ATL-Diff
    

Setup enviroment:

  • Run this source on terminal
    python3 -m venv .venv
    pip3 install -r requirements.txt
    

Inference

  • Download weights in from this drive

  • Put downloaded folder into repo.

  • Edit audio source and identity source in config.py file.

  • Run this command:

    run infer.sh
    

Please star and follow if this repository helpful for you

Authorized by sowwn