Awesome-Controllable-Video-Diffusion
July 22, 2025 ยท View on GitHub
Awesome Controllable Video Generation with Diffusion Models.
Table of Contents
- Pose Control
- Audio Control
- Expression Control
- Universal Control
- Camera Control
- Trajectory Control
- Subject Control
- Area Control
- Video Control
- Brain Control
- ID Control
Pose Control
UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
๐ Paper | ๐ Project Page
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
๐ Paper | ๐ Project Page | ๐ป Code
MikuDance: Animating Character Art with Mixed Motion Dynamics
๐ Paper | ๐ Project Page | ๐ป Code
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
๐ Paper | ๐ Project Page | ๐ป Code
TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation
๐ Paper | ๐ Project Page | ๐ป Code
DynamicPose: A robust image-to-video framework for portrait animation driven by pose sequences
Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
๐ Paper | ๐ Project Page | ๐ป Code
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
๐ Paper | ๐ Project Page
DreaMoving: A Human Video Generation Framework based on Diffusion Models
๐ Paper | ๐ Project Page | ๐ป Code
MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
๐ Paper | ๐ Project Page | ๐ป Code
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
๐ Paper | ๐ Project Page | ๐ป Code
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
๐ Paper | ๐ Project Page | ๐ป Code
Magic-Me: Identity-Specific Video Customized Diffusion
๐ Paper | ๐ Project Page | ๐ป Code
DisCo: Disentangled Control for Referring Human Dance Generation in Real World
๐ Paper | ๐ Project Page | ๐ป Code
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer
๐ Paper | ๐ Project Page
MimicMotion : High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
๐ Paper | ๐ Project Page | ๐ป Code
Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control
๐ Paper | ๐ Project Page
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
๐ Paper | ๐ Project Page | ๐ป Code
MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation.
MDM: Human Motion Diffusion Model
๐ Paper | ๐ Project Page | ๐ป Code
Audio Control
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
๐ Paper | ๐ Project Page | ๐ป Code
Every Image Listens, Every Image Dances: Music-Driven Image Animation
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation
๐ Paper | ๐ Project Page | ๐ป Code
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
๐ Paper | ๐ Project Page | ๐ป Code
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
๐ Paper | ๐ Project Page | ๐ป Code
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
๐ Paper | ๐ Project Page | ๐ป Code
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Speech Driven Video Editing via an Audio-Conditioned Diffusion Model
๐ Paper | ๐ Project Page | ๐ป Code
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
๐ Paper | ๐ Project Page | ๐ป Code
Listen, denoise, action! Audio-driven motion synthesis with diffusion models
๐ Paper | ๐ Project Page | ๐ป Code
CoDi: Any-to-Any Generation via Composable Diffusion
๐ Paper | ๐ Project Page | ๐ป Code
Generative Disco: Text-to-Video Generation for Music Visualization
AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion
EMO: Emote Portrait Alive Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
๐ Paper | ๐ Project Page | ๐ป Code
Context-aware Talking Face Video Generation
Expression Control
FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers
๐ Paper | ๐ Project Page | ๐ป Code
X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention
๐ Paper | ๐ Project Page | ๐ป Code
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models
๐ Paper | ๐ Project Page | ๐ป Code
SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers
๐ Paper | ๐ Project Page | ๐ป Code
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
๐ Paper | ๐ Project Page
Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
๐ Paper | ๐ Project Page | ๐ป Code
Echomimic: Lifelike audio-driven portrait animations through editable landmark conditions
๐ Paper | ๐ Project Page | ๐ป Code
Universal Control
VACE: All-in-One Video Creation and Editing
๐ Paper | ๐ Project Page | ๐ป Code
ControlNeXt: Powerful and Efficient Control for Image and Video Generation
๐ Paper | ๐ Project Page | ๐ป Code
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models
๐ Paper | ๐ Project Page | ๐ป Code
ControlVideo: Training-free Controllable Text-to-Video Generation
TrackGo: A Flexible and Efficient Method for Controllable Video Generation
๐ Paper | ๐ Project Page | ๐ป Code
VideoComposer: Compositional Video Synthesis with Motion Controllability
๐ Paper | ๐ Project Page | ๐ป Code
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
๐ Paper | ๐ Project Page | ๐ป Code
UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control
๐ Paper | ๐ Project Page | ๐ป Code
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
๐ Paper | ๐ Project Page | ๐ป Code
VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet
๐ Paper | ๐ Project Page | ๐ป Code
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
๐ Paper | ๐ Project Page | ๐ป Code
Camera Control
MotionMaster: Training-free Camera Motion Transfer For Video Generation
๐ Paper | ๐ Project Page | ๐ป Code
CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion
CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
๐ Paper | ๐ Project Page | ๐ป Code
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
๐ Paper | ๐ Project Page | ๐ป Code
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
๐ Paper | ๐ Project Page | ๐ป Code
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
๐ Paper | ๐ Project Page
Controlling Space and Time with Diffusion Models
๐ Paper | ๐ Project Page
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
๐ Paper | ๐ Project Page
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control
๐ Paper | ๐ Project Page
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation
๐ Paper | ๐ Project Page | ๐ป Code
Training-free Camera Control for Video Generation
๐ Paper | ๐ Project Page
Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text
๐ Paper | ๐ Project Page | ๐ป Code
MotionBooth: Motion-Aware Customized Text-to-Video Generation
DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models
๐ Paper | ๐ Project Page
Trajectory Control
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation
๐ Paper | ๐ Project Page
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models
๐ Paper | ๐ Project Page | ๐ป Code
TrailBlazer: Trajectory Control for Diffusion-Based Video Generation
๐ Paper | ๐ Project Page | ๐ป Code
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
๐ Paper | ๐ Project Page | ๐ป Code
Tora: Trajectory-oriented Diffusion Transformer for Video Generation
๐ Paper | ๐ Project Page
Controllable Longer Image Animation with Diffusion Models
๐ Paper | ๐ Project Page
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
๐ Paper | ๐ Project Page | ๐ป Code
MotionBooth: Motion-Aware Customized Text-to-Video Generation
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics
๐ Paper | ๐ Project Page | ๐ป Code
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
๐ Paper | ๐ Project Page | ๐ป Code
Generative Image Dynamics
๐ Paper | ๐ Project Page
Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation
Video Diffusion Models are Training-free Motion Interpreter and Controlle
๐ Paper | ๐ Project Page
Subject Control
Phantom: Subject-consistent video generation via cross-modal alignment
๐ Paper | ๐ Project Page
Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
๐ Paper | ๐ Project Page | ๐ป Code
ActAnywhere: Subject-Aware Video Background Generation
๐ Paper | ๐ Project Page
MotionBooth: Motion-Aware Customized Text-to-Video Generation
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
One-Shot Learning Meets Depth Diffusion in Multi-Object Videos
Area Control
Boximator: Generating Rich and Controllable Motions for Video Synthesis
๐ Paper | ๐ Project Page
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts
๐ Paper | ๐ Project Page | ๐ป Code
AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance
๐ Paper | ๐ Project Page | ๐ป Code
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
๐ Paper | ๐ Project Page
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion
๐ Paper | ๐ Project Page
Video Control
Customizing Motion in Text-to-Video Diffusion Models
๐ Paper | ๐ Project Page
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
๐ Paper | ๐ Project Page | ๐ป Code
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
๐ Paper | ๐ Project Page | ๐ป Code
Motion Inversion for Video Customization
๐ Paper | ๐ Project Page | ๐ป Code
Brain Control
NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties
ID Control
FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation
๐ Paper | ๐ Project Page | ๐ป Code
Concat-ID: Towards Universal Identity-Preserving Video Synthesis
๐ Paper | ๐ Project Page | ๐ป Code
Ingredients: Blending Custom Photos with Video Diffusion Transformers
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
๐ Paper | ๐ Project Page | ๐ป Code
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models
๐ Paper | ๐ Project Page | ๐ป Code
Movie Gen: A Cast of Media Foundation Models
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
๐ Paper | ๐ Project Page | ๐ป Code
ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
๐ Paper | ๐ Project Page | ๐ป Code
VideoBooth: Diffusion-based Video Generation with Image Prompts
๐ Paper | ๐ Project Page | ๐ป Code
Magic-Me: Identity-Specific Video Customized Diffusion