Base Model Information

August 23, 2025 · View on GitHub

Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement

Jiayi Gao^* · Changcheng Hua^* · Qingchao Cheng · Yuxin Peng · Yang Liu^†

Introduction

Specifically, we first propose \textit{\ding{192}Face Aware Prompt Enhancement}, using GPT-4o to enhance the text prompt with facial details derived from the reference image. We then propose Prompt Aware Reference Image Enhancement, leveraging an identity-preserving image generator to refine the reference image, rectifying conflicts with the text prompt. The above mutual refinement significantly improves input quality before video generation. Finally, we propose ID-Aware Spatiotemporal Guidance Enhancement, utilizing an unified gradients to optimize identity preservation and video quality jointly during generation. Our method outperforms prior work and is validated by automatic and human evaluations on a 1000-video test set—winning first place in the ACM Multimedia 2025 Identity-Preserving Video Generation Challenge, demonstrating state-of-the-art performance and strong generality.

Base Model Information

Models	Download Link	Video Size	License
Wan2.1-VACE-1.3B	Huggingface 🤗 ModelScope 🤖	~ 81 x 480 x 832	Apache-2.0
Wan2.1-VACE-14B	Huggingface 🤗 ModelScope 🤖	~ 81 x 720 x 1280	Apache-2.0

Our method is proposed based on Wan2.1-VACE, please download your preferred base model to <repo-root>/models/.

⚙️ Installation

The codebase was tested with Python 3.10.13, CUDA version 12.4, and PyTorch >= 2.5.1.

Setup for Model Inference

You can setup for VACE model inference by running:

git clone https://github.com/ali-vilab/VACE.git && cd VACE
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu124  # If PyTorch is not installed.
pip install -r requirements.txt
pip install wan@git+https://github.com/Wan-Video/Wan2.1  # If you want to use Wan2.1-based VACE.

Data

We accomplish the validation experiments based on the dataset provided by the challenge. Please download the dataset to <repo-root>/data/.

Inference

To acquire the generated the video, please run:

sh run.sh

Evaluation

Please check <repo-root>/ID_eval_finals/

Acknowledgement

We are grateful for the following awesome projects, including VACE and Wan.