If you like our project, please give us a star β on GitHub for the latest update.
This repository is dedicated to collecting, organizing, and tracking recent advancements in personalized video generation and editing. It serves as a centralized resource for papers, models, and benchmarks in this rapidly evolving field.
[2024-07-18] We have initiated the repository.
If you want to add your work to this list, please do not hesitate to email jhuang90@ur.rochester.edu or pull requests.
Markdown format:
* | [**Paper Title**] | Venue | Date | [[paper]](link) [[code]](link) [[project]](link)|
| Title | Venue | Date | Links |
|---|
| PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models | CVPR 2024 | Dec 2023 (arXiv) | Paper β Project - Code |
| VideoBooth: Diffusion-based Video Generation with Image Prompts | CVPR 2024 | Dec 2023 (arXiv) | Paper β Project β Code |
| CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects | arXiv | Jan 18 2024 | Paper β Project |
| DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control | ACMMM 2024 | May 21 2024 | Paper β Project - Code |
| Still-Moving: Customized Video Generation without Customized Video Data | TOG | Jul 11 2024 | Paper β Project |
| Customcrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities | AAAI 2025 | Feb 2025 | Paper β Code |
| Dynamic Concepts Personalization from Single Videos | SIGGRAPH 2025 | Feb 20 2025 | Paper β Page |
| BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation | arXiv | May 11 2025 | Paper |
| Title | Venue | Date | Links |
|---|
| Movie Gen: A Cast of Media Foundation Models | arXiv | Oct 17 2024 | Paper β Project |
| SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner | arXiv | Dec 13 2024 | Paper β Project |
| VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models | arXiv | Dec 27 2024 | Paper β Code |
| Multi-subject Open-set Personalization in Video Generation | CVPR 2025 | Jan 2025 (arXiv) | Paper β Project β Code |
| ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning | arXiv | Jan 2025 | Paper |
| AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance | arXiv | Feb 2025 | Paper β Code |
| Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts | CVPR 2025 | Feb 2025 | Paper |
| Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment | ICCV 2025 | Feb 16 2025 | Paper β Project β Code |
| SkyReels-A2: Compose Anything in Video Diffusion Transformers | arXiv | Apr 3 2025 | Paper β Project β Code |
| CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance | arXiv | Mar 13 2025 | Paper |
| MAGREF: Masked Guidance for Any-Reference Video Generation | arXiv | May 29 2025 | Paper Code |
| Tora2: Motion and Appearance Customized DiffusionTransformer for Multi-Entity Video Generation | arXiv | Jul 08 2025 | Paper |
| BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration | arXiv | Oct 1 2025 | Paper Page |
| Kaleido: Open-Sourced Multi-Subject Reference Video Generation Model | arXiv | Oct 21 2025 | Paper Code |
| First Frame Is the Place to Go for Video Content Customization | arXiv | Nov 19 2025 | Paper Code |
| Title | Venue | Date | Links |
|---|
| Structure and Content-Guided Video Synthesis with Diffusion Models | ICCV 2023 | Feb 2023 | Paper |
| VideoComposer: Compositional Video Synthesis with Motion Controllability | NeurIPS 2023 | Jun 2023 (arXiv) | Paper β Project - Code |
| DreamVideo: Composing Your Dream Videos with Customized Subject and Motion | CVPR 2024 | Dec 2023 (arXiv) | Paper β Project - Code |
| Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models | ECCV 2024 | Feb 2024 | Paper - Project - Code |
| MotionBooth: Motion-Aware Customized Text-to-Video Generation | NeurIPS 2024 (Spotlight) | Jun 2024 | Paper - Project - Code |
| DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control | arXiv | Oct 17 2024 | Paper β Page |
| MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models | ACMMM 2024 | Dec 2 2024 | Paper β Code |
| Subject-driven Video Generation via Disentangled Identity and Motion | arXiv | Apr 23 2025 | Paper β Code |
| DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization | arXiv | Mar 4 2025 | Paper β Project |
| VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models | CVPR 2025 | Mar 13 2025 | Paper Project |
| DreamRunner: Fine-Grained Compositional Story-to-Video Generation with Retrieval-Augmented Motion Adaptation | Arxiv | Mar 18 2025 | Paper - Project - Code |
| JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation | arXiv | Mar 31 2025 | Paper β Project |
| PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement | arXiv | Jun 9 2025 | Paper |
| CoMo: Compositional Motion Customization for Text-to-Video Generation | arXiv | Oct 27 2025 | Paper - Page |
| MotionStream: Real-Time Video Generation with Interactive Motion Controls | arXiv | Nov 03 2025 | Paper - Page - [https://github.com/alex4727/motionstream] |
| MultiMotion: Multi Subject Video Motion Transfer via Video Diffusion Transformer | arXiv | Dec 08 2025 | Paper |
| Title | Venue | Date | Links |
|---|
| Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation | ICCV 2023 | Dec 22 2022 | Code Paper |
| Dreamix: Video Diffusion Models are General Video Editors | arXiv | Feb 2023 | Paper β Project |
| Make-A-Protagonist: Generic Video Editing with Visual and Textual Clues | arXiv | May 15 2023 | Paper β Code |
| Towards Consistent Video Editing with Text-to-Image Diffusion Models | NeurIPS 2023 | May 27 2023 | Paper |
| Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance | TVCG 2024 | Jun 2023 | Paper β Code |
| MagicEdit: High-Fidelity and Temporally Coherent Video Editing | arXiv | Aug 28 2023 | Paper β Code - Page |
| Cut-and-Paste: Subject-Driven Video Editing with Attention Control | arXiv | Nov 20 2023 | Paper β Code |
| DragVideo: Interactive Drag-style Video Editing | ECCV 2024 | Dec 3 2023 | Paper - Code |
| AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks | TMLR 2024 | Mar 21 2024 | Paper β Project β Code |
| ReVideo: Remake a Video with Motion and Content Control | NeurIPS 2024 | May 22 2024 | β Paper - Project - Code |
| DIVE: Taming DINO for Subject-Driven Video Editing | arXiv | Dec 4 2024 | Paper β Project |
| DreamInsert: Zero-Shot Image-to-Video Object Insertion from A Single Image | arXiv | Mar 13 2025 | Paper |
| Get In Video: Add Anything You Want to the Video | arXiv | May 2025 | Project β Paper |
| Pix2Video: Video Editing using Image Diffusion | ICCV 2023 | Mar 22 2023 | Project β Paper |
| VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control | arXiv | Mar 28 2025 | Project β Paper |
| Lucy Edit: Open-Weight Text-Guided Video Editing | arXiv | Sep 18 2025 | Paper - Github |
| OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models | arXiv | Sep 22 2025 | Paper - Project - Code |
| ContextFlow: Training-Free Video Object Editing via Adaptive Context Enrichment | arXiv | Sep 22 2025 | Paper - Project - Code |
| EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning | arXiv | Sep 24 2025 | Paper |
| IMAGEdit : Let Any Subject Transform | arXiv | Oct 01 2025 | Paper - Project - Code |
| InstructX: Towards Unified Visual Editing with MLLM Guidance | arXiv | Oct 10 2025 | Paper |
| In-Context Learning with Unpaired Clips for Instruction-based Video Editing | arXiv | Oct 16 2025 | Paper - Code |
Look: The unified visual baseline of a pieceβcovering style, color, and lighting, texture/grade, and any VFX choices, to achieve a consistent on-screen feel.
| Title | Venue | Date | Links |
|---|
| VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning | arXiv | Oct 29 2025 | Paper β Project β Code |
| Video-As-Prompt: Unified Semantic Control for Video Generation | arXiv | Oct 28 2025 | Paper β Project β Code |
| Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation | arXiv | Aug 11 2025 | Paper β Project β Code |
| VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer | arXiv | Feb 09 2025 | Paper β Project |
| StyleMaster: Stylize Your Video with Artistic Generation and Translation | CVPR 2025 | Dec 10 2024 | Paper β Project β Code |
| Title | Venue | Date | Links |
|---|
| Magic-Me: Identity-Specific Video Customized Diffusion | arXiv | Mar 20 2024 | Paper β Project β Code |
| ID-Animator: Zero-Shot Identity-Preserving Human Video Generation | arXiv | Apr 23 2024 | Paper β Project β Code |
| PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation | ICCV 2025 | Mar 16 2025 | Paper β Project βCode |
| MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization | arXiv | Mar 16 2025 | Paper β Project βCode |
| Title | Venue | Date | Links |
|---|
| ConsisID: Identity-Preserving Text-to-Video Generation by Frequency Decomposition | CVPR 2025 | Nov 26 2024 | Paper β Code |
| AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation | arXiv | Nov 26 2024 | Paper β Code |
| Ingredients: Blending Custom Photos with Video Diffusion Transformers | arXiv | Jan 3 2025 | Paper β Code |
| Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers | ICCV 2025 | Jan 7 2025 | Paper β Code |
| EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion | arXiv | Jan 23 2025 | Paper β Code |
| SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers | arXiv | Feb 15 2025 | Paper β Page - Code |
| Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts | CVPR 2025 | Feb 4 2025 | Paper β Page |
| FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation | arXiv | Feb 25 2025 | Paper β Project β Code |
| Concat-ID: Towards Universal Identity-Preserving Video Synthesis | arXiv | Mar 18 2025 | Paper β Code |
| Proteus-ID: ID-Consistent and Motion-Coherent Video Customization | arXiv | Jun 30 2025 | Paper β Project |
| From Large Angles to Consistent Faces: Identity-Preserving Video Generation via Mixture of Facial Experts | arXiv | Aug 13 2025 | Paper - Code |
| HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning | arXiv | Seq 10 2025 | Paper - Code - Page |
| Lynx: Towards High-Fidelity Personalized Video Generation | arXiv | Seq 19 2025 | Paper - Project |
| Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation | arXiv | Aug 12 2025 | Paper - Page - Code |
| Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning | arXiv | Oct 17 2025 | Paper - Page - Code |
| ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation | arXiv | Nov 1 2025 | Paper |
| ContextAnyone: Context-Aware Diffusion for Character-Consistent Text-to-Video Generation | arXiv | Dec 8 2025 | Paper - Github |
| Title | Venue | Date | Links |
|---|
| BachVid: Training-Free Video Generation with Consistent Background and Character | arXiv | Oct 24 2025 | Paper β Code |
| ο½Scaling Zero-Shot Reference-to-Video Generation ο½ arXiv | Dec 7 2025 | Paper - Code - Projectο½ | |
| Title | Venue | Date | Links |
|---|
| EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions | ECCV 2024 | Feb 27 2024 | Paper β Code β Page |
| EMO2: End-Effector Guided Audio-Driven Avatar Video Generation | ECCV 2024 | Jan 18 2025 | Paper |
| FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis | ACMMM 2025 | Apr 07 2025 | Paper - Project - Code |
| Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation | arXiv | May 28 2025 | Paper β Project - Code |
| SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers | arXiv | Jun 11 2025 | Paper β Project - Code |
| InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions | arXiv | Jun 11 2025 | Paper β Project |
| OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation | arXiv | Jun 23 2025 | Paper β Project - Code |
| MirrorMe: Towards Realtime and High Fidelity Audio-Driven Halfbody Animation | arXiv | Jun 27 2025 | Paper β Project |
| Democratizing High-Fidelity Co-Speech Gesture Video Generation | ICCV 2025 | Jul 09 2025 | Paper β Project - Code |
| StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation | arXiv | Aug 11 2025 | Paper β Project - Code |
| FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation | arXiv | Aug 15 2025 | Paper - Project |
| Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis | arXiv | Sep 11 2025 | Paper - Project |
| Input-Aware Sparse Attention for Real-Time Co-Speech Video Generation | Siggrapha Asia | Oct 2 2025 | Paper - Project - Codeο½ |
| Paper2Video: Automatic Video Generation from Scientific Papers | arXiv | Oct 6 2025 | Paper - Project - Code |
| Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation | arXiv | Oct 27 2025 | Paper - Project - Code |
| Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback | AAAI | Oct 14 2025 | Paper - Project - Code |
| Title | Venue | Date | Links |
|---|
| Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos | AAAI 2024 | Apr 3 2023 | Paper β Code β Page |
| DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion | ICCV 2023 | Apr 12 2023 | Paper β Code β Page |
| DisCo: Disentangled Control for Realistic Human Dance Generation | CVPR 2024 | Jun 30 2023 | Paper β Code β Page |
| MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion | ICML 2024 | Nov 18 2023 | Paper β Code β Page |
| MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model | CVPR 2024 | Nov 27 2023 | Paper β Code β Page |
| Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation | CVPR 2024 | Nov 28 2023 | Paper β Code β Page |
| Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control | arXiv | Jun 05 2024 | Paper β Page |
| MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance | ICML 2025 | Jun 28 2024 | Paper β Code β Page |
| MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling | CVPR 2025 | Sep 24 2024 | Paper β Code β Page |
| StableAnimator: High-Quality Identity-Preserving Human Image Animation | CVPR 2025 | Sep 24 2024 | Paper β Code β Page |
| DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses | ICCV 2025 | Nov 30 2024 | Paper β Code β Page |
| DisPose: Disentangling Pose Guidance for Controllable Human Image Animation | ICLR 2025 | Dec 12 2024 | Paper β Code - Page |
| Consistent Human Image and Video Generation with Spatially Conditioned Diffusion | arXiv | Dec 19 2024 | Paper β Code |
| DirectorLLM for Human-Centric Video Generation | arXiv | Dec 19 2024 | Paper |
| X-Dyna: Expressive Dynamic Human Image Animation | CVPR 2025 (Highlight) | Jan 17 2025 | Paper β Page - Code |
| HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation | arXiv | Feb 7 2025 | Paper β Page |
| Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance | arXiv | Feb 10 2025 | Paper β Page |
| DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance | arXiv | Apr 20 2025 | Paper β Page |
| TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation | CVPR 2025 | Apr 11 2025 | Paper |
| DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation | arXiv | May 23 2025 | Paper β Page - Code |
| StableAnimator++: Overcoming Pose Misalignment and Face Distortion for Human Image Animation | arXiv | Jul 20 2025 | Paper β Page |
| Wan-Animate: Unified Character Animation and Replacement with Holistic Replication | arXiv | Seq 17 2025 | Paper β Page |
| SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation | arXiv | Nov 24 2025 | Paper β Page - Code |
| Title | Venue | Date | Links |
|---|
| Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation | Siggraph Asia 2024 | Jun 4 2024 | Paper - Page - Code |
| Follow-Your-Emoji-Faster: Towards Efficient, Fine-Controllable, and Expressive Freestyle Portrait Animation | IJCV 2025 | Seq 20, 2025 | Paper - Page - Code |
| Title / Benchmark | Venue | Date | Links |
|---|
| ConsisID-Bench β 150 identities & 90 prompts (human-domain) | CVPR 2025 (Highlight) | Nov 2024 | Project β Data |
| MSRVTT-Personalization (Alchemist-Bench) β Multi-subject personalization benchmark | CVPR 2025 | Jan 2025 | Paper β Data/Code |
| VACE-Benchmark β VACE: All-in-One Video Creation and Editing | arXiv 2025 | Mar 2025 | Paper β Data/Code |
| FullBench - FullDiT: Multi-Task Video Generative Foundation Model with Full Attention | arXiv | Mar 25 2025 | Paper β Data |
| A2 Bench β βElements-to-Videoβ evaluation benchmark for arbitrary subjects | arXiv | Apr 2025 | Paper β Data/Code |
| OpenS2V-Eval β Fine-grained S2V benchmark (180 prompts, real & synthetic) | arXiv | May 28 2025 | Paper β Project β Code |
| Proteus-Bench | arXiv | Jun 30 2025 | Paper β Project |
| Title / Dataset | Venue | Date | Links |
|---|
| Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset | Arxiv | Oct 2025 | Paper β Project β Data |
| ConsisID-Data | CVPR 2025 (Highlight) | Oct 2024 | Paper β Project β Data |
| Any2CapIns | Arxiv | Mar 2025 | Paper β Project β Data |
| OpenS2V-5M | Arxiv | May 28 2025 | Paper β Project β Data |
| Phantom-Data | Arxiv | Jun 23 2025 | Paper β Project β Data |
| SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation | Arxiv | Jul 14 2025 | Paper β Project β Data |
| TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation | Arxiv | Oct 8 2025 | Paper β Project β Data |
| Title / Dataset | Venue | Date | Links |
|---|
| TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis | Arxiv 2025 | Aug 2025 | Paper β Project β Data |
| CustomConcept101 | CVPR 2023 | Dec 2023 | Paper β Project β Data |
| Title / Dataset | Venue | Date | Links |
|---|
| Character Mixing for Video Generation | Arxiv 2025 | Oct 06 2025 | Paper β Project β Code |