N represent the number of persons

November 21, 2025 · View on GitHub

Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback

Xingpei Ma^* Shenneng Huang^* Jiaran Cai^*† Yuansheng Guan^* Shen Zheng^* Hanfeng Zhao Qiang Zhang Shunsi Zhang

^* Equal contribution ^†Project lead & Corresponding Author

Guangzhou Quwan Network Technology

AAAI 2026

TL; DR: We present Playmate2, which effectively tackles key challenges related to temporal coherence in long sequences and multi-character animations, for generating high-quality audio-driven videos. To the best of our knowledge, this is the first training-free approach capable of enabling audio-driven animation for three or more characters without requiring additional data or model modifications.

📰 News

2025/11/21: 🔥🔥🔥 We release the weights and inference code of Playmate2!
2025/11/10: 🎉🎉🎉 Our paper has been accepted and will be presented at AAAI 2026. We plan to release the inference code and model weights for both Playmate and Playmate2 in the coming weeks. Stay tuned and thank you for your patience!
2025/10/15: 🚀🚀🚀 Our paper is in public on arxiv.

📸 Showcase

Multi-Character Animation

Singing Videos

Multi-Style Animation

Explore more examples.

Quick Start

🛠️Installation

1. Create a conda environment and install pytorch, xformers

conda create -n playmate2 python=3.10
conda activate playmate2
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install -U xformers==0.0.29 --index-url https://download.pytorch.org/whl/cu124

2. Flash-attn installation:

pip install misaki[en]
pip install ninja 
pip install psutil 
pip install packaging 
pip install flash_attn==2.7.4.post1 --no-build-isolation

3. Other dependencies

pip install -r requirements.txt

4. FFmeg installation

conda install -c conda-forge ffmpeg

sudo yum install ffmpeg ffmpeg-devel

🧱Model Preparation

Model Download

Models	Download Link	Save Path
Wan2.1-I2V-14B-720P	Huggingface	pretrained_weights/Wan2.1-I2V-14B-720P
chinese-wav2vec2-base	Huggingface	pretrained_weights/chinese-wav2vec2-base
VideoLLaMA3-7B	Huggingface	pretrained_weights/VideoLLaMA3-7B
Our Pretrained Model	Huggingface	pretrained_weights/playmate2

Download models using huggingface-cli:

mkdir pretrained_weights
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir ./pretrained_weights/Wan2.1-I2V-14B-720P
huggingface-cli download TencentGameMate/chinese-wav2vec2-base --local-dir ./pretrained_weights/chinese-wav2vec2-base
huggingface-cli download TencentGameMate/chinese-wav2vec2-base model.safetensors --revision refs/pr/1 --local-dir ./pretrained_weights/chinese-wav2vec2-base
huggingface-cli download DAMO-NLP-SG/VideoLLaMA3-7B --local-dir ./pretrained_weights/VideoLLaMA3-7B
huggingface-cli download PlaymateAI/Playmate2 --local-dir ./pretrained_weights/playmate2

Inference

It is recommended to use an A100 or higher GPUs for inference.

One person

python inference.py \
    --gpu_num 1 \  # 1(single gpu) or 3(multiple gpus)
    --image_path examples/images/01.png \
    --audio_path examples/audios/01.wav \
    --prompt_path examples/prompts/01.txt \
    --output_path examples/outputs/01.mp4 \
    --max_size 1280 \
    --id_num 1

Multiple Persons

# N represent the number of persons
python inference.py \
    --gpu_num 1 \  # 1(single gpu) or 3+N-1(multiple gpus)
    --image_path examples/images/04.png \
    --audio_path examples/audios/04 \
    --mask_path examples/masks/04 \
    --prompt_path examples/prompts/04.txt \
    --output_path examples/outputs/04.mp4 \
    --max_size 1280 \
    --id_num 3

📑 Todo List

📝 Citation

If you find our work useful for your research, please consider citing the paper:


@article{ma2025playmate2,
  title={Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback},
  author={Ma, Xingpei and Huang, Shenneng and Cai, Jiaran and Guan, Yuansheng and Zheng, Shen and Zhao, Hanfeng and Zhang, Qiang and Zhang, Shunsi},
  journal={arXiv preprint arXiv:2510.12089},
  year={2025}
}