Awesome-Controllable-Video-Diffusion

July 22, 2025 ยท View on GitHub

Awesome License: MIT

Awesome Controllable Video Generation with Diffusion Models.

Table of Contents

Pose Control

UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer

๐Ÿ“„ Paper | ๐Ÿ’ป Code

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

๐Ÿ“„ Paper | ๐ŸŒ Project Page

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

MikuDance: Animating Character Art with Mixed Motion Dynamics

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

DynamicPose: A robust image-to-video framework for portrait animation driven by pose sequences

๐Ÿ’ป Code

Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation

๐Ÿ“„ Paper

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

๐Ÿ“„ Paper | ๐ŸŒ Project Page

DreaMoving: A Human Video Generation Framework based on Diffusion Models

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Magic-Me: Identity-Specific Video Customized Diffusion

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

DisCo: Disentangled Control for Referring Human Dance Generation in Real World

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

๐Ÿ“„ Paper | ๐ŸŒ Project Page

MimicMotion : High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control

๐Ÿ“„ Paper | ๐ŸŒ Project Page

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation.

๐Ÿ’ป Code

MDM: Human Motion Diffusion Model

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Audio Control

FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Every Image Listens, Every Image Dances: Music-Driven Image Animation

๐Ÿ“„ Paper

MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

๐Ÿ“„ Paper | ๐Ÿ’ป Code

Speech Driven Video Editing via an Audio-Conditioned Diffusion Model

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Listen, denoise, action! Audio-driven motion synthesis with diffusion models

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

CoDi: Any-to-Any Generation via Composable Diffusion

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Generative Disco: Text-to-Video Generation for Music Visualization

๐Ÿ“„ Paper

AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion

๐Ÿ“„ Paper

EMO: Emote Portrait Alive Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Context-aware Talking Face Video Generation

๐Ÿ“„ Paper

Expression Control

FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

๐Ÿ“„ Paper | ๐ŸŒ Project Page

Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Echomimic: Lifelike audio-driven portrait animations through editable landmark conditions

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Universal Control

VACE: All-in-One Video Creation and Editing

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

ControlNeXt: Powerful and Efficient Control for Image and Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

ControlVideo: Training-free Controllable Text-to-Video Generation

๐Ÿ“„ Paper | ๐Ÿ’ป Code

TrackGo: A Flexible and Efficient Method for Controllable Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

VideoComposer: Compositional Video Synthesis with Motion Controllability

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Camera Control

MotionMaster: Training-free Camera Motion Transfer For Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion

๐Ÿ“„ Paper

CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

๐Ÿ“„ Paper

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

๐Ÿ“„ Paper | ๐ŸŒ Project Page

Controlling Space and Time with Diffusion Models

๐Ÿ“„ Paper | ๐ŸŒ Project Page

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

๐Ÿ“„ Paper | ๐ŸŒ Project Page

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Training-free Camera Control for Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page

Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

MotionBooth: Motion-Aware Customized Text-to-Video Generation

๐Ÿ“„ Paper | ๐Ÿ’ป Code

DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models

๐Ÿ“„ Paper | ๐ŸŒ Project Page

Trajectory Control

MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page

FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

TrailBlazer: Trajectory Control for Diffusion-Based Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Tora: Trajectory-oriented Diffusion Transformer for Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page

Controllable Longer Image Animation with Diffusion Models

๐Ÿ“„ Paper | ๐ŸŒ Project Page

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

MotionBooth: Motion-Aware Customized Text-to-Video Generation

๐Ÿ“„ Paper | ๐Ÿ’ป Code

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Generative Image Dynamics

๐Ÿ“„ Paper | ๐ŸŒ Project Page

Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation

๐Ÿ“„ Paper

Video Diffusion Models are Training-free Motion Interpreter and Controlle

๐Ÿ“„ Paper | ๐ŸŒ Project Page

Subject Control

Phantom: Subject-consistent video generation via cross-modal alignment

๐Ÿ“„ Paper | ๐ŸŒ Project Page

Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos

๐Ÿ“„ Paper

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

ActAnywhere: Subject-Aware Video Background Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page

MotionBooth: Motion-Aware Customized Text-to-Video Generation

๐Ÿ“„ Paper | ๐Ÿ’ป Code

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation

๐Ÿ“„ Paper | ๐Ÿ’ป Code

One-Shot Learning Meets Depth Diffusion in Multi-Object Videos

๐Ÿ“„ Paper

Area Control

Boximator: Generating Rich and Controllable Motions for Video Synthesis

๐Ÿ“„ Paper | ๐ŸŒ Project Page

Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

๐Ÿ“„ Paper | ๐ŸŒ Project Page

Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion

๐Ÿ“„ Paper | ๐ŸŒ Project Page

Video Control

Customizing Motion in Text-to-Video Diffusion Models

๐Ÿ“„ Paper | ๐ŸŒ Project Page

MotionClone: Training-Free Motion Cloning for Controllable Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Motion Inversion for Video Customization

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Brain Control

NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties

๐Ÿ“„ Paper

ID Control

FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Concat-ID: Towards Universal Identity-Preserving Video Synthesis

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Ingredients: Blending Custom Photos with Video Diffusion Transformers

๐Ÿ“„ Paper | ๐Ÿ’ป Code

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Movie Gen: A Cast of Media Foundation Models

๐Ÿ“„ Paper

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

VideoBooth: Diffusion-based Video Generation with Image Prompts

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code

Magic-Me: Identity-Specific Video Customized Diffusion

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code