Boosting Camera Motion Control for Video Diffusion Transformers
November 12, 2025 ยท View on GitHub
Accepted into BMVC 2025. Project Page: https://soon-yau.github.io/CameraMotionGuidance/
Existing camera control methods for U-Net do not work for diffusion transformers. We developed the first camera control for space-time diffuser transformer. Our method, based on classifier-free guidance, restore controllability and boost motion by over 400%.
๐ Camera Motion Guidance
Conventionally, no guidance is used in Classifier-free guidance (CFG) or camera condition is supplied without a motion reference. In training, we randomly set some video be static (repeat first frame to whole sequence) and all-zeros camera condition, which we use as null motion reference. Then we use a seperate camera guidance term to allow for control independent to text conditioning.
๐ฅ Conditioning Camera Poses
Conditioning camera poses.
โ ๏ธ MotionCtrl in DiT โ Uncontrollable Motion
MotionCtrl method in DiT has uncontrollable motion.
๐ง CameraCtrl in DiT โ Limited Motion
CameraCtrl method in DiT has limited motion.
๐ Our Method โ Restored Controllability and Boosted Motion
Our method restores camera controllability with boosted motion.
Citation
@inproceedings{cheong2024cmg,
author = {Cheong, Soon Yau and Mustafa, Armin and Ceylan, Duygu and Gilbert, Andrew and Huang, Chun-hao Paul},
title = {Boosting Camera Motion Control for Video Diffusion Transformers},
booktitle = {Proceedings of the British Machine Vision Conference (BMVC)},
year = {2025}}
Installation
pip install uv
uv python install 3.10
uv sync
source .venv/bin/activate
Download models
Download model folder e.g. /cameratrlcmg from https://huggingface.co/soonyau/cmg/tree/main to "./models"
MODEL_NAME="cameractrlcmg"
huggingface-cli download soonyau/cmg --repo-type=model --local-dir ./models/${MODEL_NAME}$ --include "${MODEL_NAME}$/*"
Disable acceleration extras in config.py as compiling GPU-specific CUDA extensions can fail with incompatible build environment.
enable_flashattn=False,
enable_layernorm_kernel=False,
...
shardformer=False,
Inference
Run the script and pass in the model path e.g. bash scripts/inference.sh models/cameractrlcmg. Change camera pose and text prompt in config.py in model path.