README.md
July 17, 2025 · View on GitHub
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer [ICCV 2025]
Qingyu Shi
·
Jianzong Wu
·
Jinbin Bai
·
Jiangning Zhang
·
Lu Qi
·
Yunhai Tong
·
Xiangtai Li
PKU, NTU, NUS, ZJU, UC Merced
Introduction
We propose DeT, a tuning-based method that adapts Video Diffusion Transformers (DiT) for motion transfer tasks.

Data Preparation
We provide three examples in the directory ./data:
├── data
├── dance-twirl
├── videos
├── dance-twirl.mp4
├── masks
├── dance-twirl
├── 00000.png
...
├── trajectories
├── dance-twirl.pth
├── prompts.txt
├── trajectories.txt
├── videos.txt
├── dog-agility
...
├── snowboard
...
[Optional] You can use your own source videos. Please prepare and organize the datasets following the provided examples. Additionally, annotate trajectories in the source video for the dense point tracking loss:
cd checkpoints
wget https://hf-mirror.com/facebook/cotracker3/resolve/main/scaled_online.pth
cd ..
python generate_trajectories.py --root ./data/your-data % should be a directory such as ./data/dance-twirl
Training
Please replace the model and data paths before running the script:
% if you are in mainland China, we recommend to use hf-mirror.
export HF_ENDPOINT=https://hf-mirror.com
bash train_cogvideox.sh
Inference
Please replace the model and data paths before running the script:
bash run_cogvideox.sh
MTBench
Download MTBench with:
huggingface-cli download QingyuShi/MTBench --local-dir ./MTBench --repo-type dataset
For quicker ablation studies, a lightweight subset—MTBench_subset—is included in the repository to reduce computational overhead.
The evaluation script is located at ./evaluation.py.

Citing DeT
@article{DeT,
title={Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer},
author={Shi, Qingyu and Wu, Jianzong and Bai, Jinbin and Zhang, Jiangning and Qi, Lu and Li, Xiangtai and Tong, Yunhai},
journal={arXiv preprint arXiv:2503.17350},
year={2025}
}