README.md
February 8, 2024 · View on GitHub
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
Jinbo Xing, Menghan Xia*, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu,
Haoxin Chen, Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin Wong
(* corresponding author)
From CUHK and Tencent AI Lab.
IEEE TVCG 2024
🔆 Introduction
Make-Your-Video is a customized video generation model with both text and motion structure (depth) control. It inherits rich visual concepts from image LDM and supports longer video inference.
🤗 Applications
Real-life scene to video
| Real-life scene | Ours | Text2Video-zero+CtrlNet | LVDMExt+Adapter |
|
|
|
|
| "A dam discharging water" | |||
|
|
|
|
| "A futuristic rocket ship on a launchpad, with sleek design, glowing lights" | |||
3D scene modeling to video
| Real-life scene | Ours | Text2Video-zero+CtrlNet | LVDMExt+Adapter |
|
|
|
|
| "A train on the rail, 2D cartoon style" | |||
|
|
|
|
| "A Van Gogh style painting on drawing board in park, some books on the picnic blanket, photorealistic" | |||
|
|
|
|
| "A Chinese ink wash landscape painting" | |||
Video re-rendering
| Original video | Ours | SD-Depth | Text2Video-zero+CtrlNet | LVDMExt+Adapter | Tune-A-Video |
|
|
|
|
|
|
| "A tiger walks in the forest, photorealistic" | |||||
|
|
|
|
|
|
| "An origami boat moving on the sea" | |||||
|
|
|
|
|
|
| "A camel walking on the snow field, Miyazaki Hayao anime style" | |||||
🌟 Method Overview

📝 Changelog
- [2023.11.30]: 🔥🔥 Release the main model.
- [2023.06.01]: 🔥🔥 Create this repo and launch the project webpage.
🧰 Models
| Model | Resolution | Checkpoint |
|---|---|---|
| MakeYourVideo256 | 256x256 | Hugging Face |
It takes approximately 13 seconds and requires a peak GPU memory of 20 GB to animate an image using a single NVIDIA A100 (40G) GPU.
⚙️ Setup
Install Environment via Anaconda (Recommended)
conda create -n makeyourvideo python=3.8.5
conda activate makeyourvideo
pip install -r requirements.txt
💫 Inference
1. Command line
- Download the pre-trained depth estimation model from Hugging Face, and put the
dpt_hybrid-midas-501f0c75.ptincheckpoints/depth/dpt_hybrid-midas-501f0c75.pt. - Download pretrained models via Hugging Face, and put the
model.ckptincheckpoints/makeyourvideo_256_v1/model.ckpt. - Input the following commands in terminal.
sh scripts/run.sh
👨👩👧👦 Other Interesting Open-source Projects
VideoCrafter1: Framework for high-quality video generation.
DynamiCrafter: Open-domain image animation methods using video diffusion priors.
Play with these projects in the same conda environement!
😉 Citation
@article{xing2023make,
title={Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance},
author={Xing, Jinbo and Xia, Menghan and Liu, Yuxin and Zhang, Yuechen and Zhang, Yong and He, Yingqing and Liu, Hanyuan and Chen, Haoxin and Cun, Xiaodong and Wang, Xintao and others},
journal={arXiv preprint arXiv:2306.00943},
year={2023}
}
📢 Disclaimer
We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.
🌞 Acknowledgement
We gratefully acknowledge the Visual Geometry Group of University of Oxford for collecting the WebVid-10M dataset and follow the corresponding terms of access.