README.md

February 8, 2024 · View on GitHub

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

Jinbo Xing, Menghan Xia*, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu,
Haoxin Chen, Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin Wong

(* corresponding author)

From CUHK and Tencent AI Lab.

IEEE TVCG 2024

🔆 Introduction

Make-Your-Video is a customized video generation model with both text and motion structure (depth) control. It inherits rich visual concepts from image LDM and supports longer video inference.

🤗 Applications

Real-life scene to video

Real-life scene	Ours	Text2Video-zero+CtrlNet	LVDM_Ext+Adapter

"A dam discharging water"

"A futuristic rocket ship on a launchpad, with sleek design, glowing lights"

3D scene modeling to video

Real-life scene	Ours	Text2Video-zero+CtrlNet	LVDM_Ext+Adapter

"A train on the rail, 2D cartoon style"

"A Van Gogh style painting on drawing board in park, some books on the picnic blanket, photorealistic"

"A Chinese ink wash landscape painting"

Video re-rendering

Original video	Ours	SD-Depth	Text2Video-zero+CtrlNet	LVDM_Ext+Adapter	Tune-A-Video

"A tiger walks in the forest, photorealistic"

"An origami boat moving on the sea"

"A camel walking on the snow field, Miyazaki Hayao anime style"

🌟 Method Overview

📝 Changelog

[2023.11.30]: 🔥🔥 Release the main model.
[2023.06.01]: 🔥🔥 Create this repo and launch the project webpage.

🧰 Models

Model	Resolution	Checkpoint
MakeYourVideo256	256x256	Hugging Face

It takes approximately 13 seconds and requires a peak GPU memory of 20 GB to animate an image using a single NVIDIA A100 (40G) GPU.

⚙️ Setup

Install Environment via Anaconda (Recommended)

conda create -n makeyourvideo python=3.8.5
conda activate makeyourvideo
pip install -r requirements.txt

💫 Inference

1. Command line

Download the pre-trained depth estimation model from Hugging Face, and put the dpt_hybrid-midas-501f0c75.pt in checkpoints/depth/dpt_hybrid-midas-501f0c75.pt.
Download pretrained models via Hugging Face, and put the model.ckpt in checkpoints/makeyourvideo_256_v1/model.ckpt.
Input the following commands in terminal.

  sh scripts/run.sh

👨‍👩‍👧‍👦 Other Interesting Open-source Projects

VideoCrafter1: Framework for high-quality video generation.

DynamiCrafter: Open-domain image animation methods using video diffusion priors.

Play with these projects in the same conda environement!

😉 Citation

@article{xing2023make,
  title={Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance},
  author={Xing, Jinbo and Xia, Menghan and Liu, Yuxin and Zhang, Yuechen and Zhang, Yong and He, Yingqing and Liu, Hanyuan and Chen, Haoxin and Cun, Xiaodong and Wang, Xintao and others},
  journal={arXiv preprint arXiv:2306.00943},
  year={2023}
}

📢 Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.

🌞 Acknowledgement

We gratefully acknowledge the Visual Geometry Group of University of Oxford for collecting the WebVid-10M dataset and follow the corresponding terms of access.