Video-Infinity

June 25, 2024 · View on GitHub

Video-Infinity

Video-Infinity: Distributed Long Video Generation
Zhenxiong Tan, Xingyi Yang, Songhua Liu, and Xinchao Wang
Learning and Vision Lab, National University of Singapore

TL;DR (Too Long; Didn't Read)

Video-Infinity generates long videos quickly using multiple GPUs without extra training. Feel free to visit our project page for more information and generated videos.

Features

Distributed 🌐: Utilizes multiple GPUs to generate long-form videos.
High-Speed 🚀: Produces 2,300 frames in just 5 minutes.
Training-Free 🎓: Generates long videos without requiring additional training for existing models.

Setup

Installation Environment

conda create -n video_infinity_vc2 python=3.10
conda activate video_infinity_vc2
pip install -r requirements.txt

Usage

Quick Start

Basic Usage

python inference.py --config examples/config.json

Multi-Prompts

python inference.py --config examples/multi_prompts.json

Single GPU

python inference.py --config examples/single_gpu.json

Config

Basic Config

Parameter	Description
`devices`	The list of GPU devices to use.
`base_path`	The path to save the generated videos.

Pipeline Config

Parameter	Description
`prompts`	The list of text prompts. Note: The number of prompts should be greater than the number of GPUs.
`file_name`	The name of the generated video.
`num_frames`	The number of frames to generate on each GPU.

Video-Infinity Config

Parameter	Description
`*.padding`	The number of local context frames.
`attn.topk`	The number of global context frames for `Attention` model.
`attn.local_phase`	When the denoise timestep is less than `t`, it bias the attention. This adds a `local_bias` to the local context frames and a `global_bias` to the global context frames.
`attn.global_phase`	It is similar to `local_phase`. But it bias the attention when the denoise timestep is greater than `t`.
`attn.token_num_scale`	If the value is `True`, the scale factor will be rescaled by the number of tokens. Default is `False`. More details can be referred to this paper.

How to Set Config

To avoid the loss of high-frequency information, we recommend setting the sum of padding and attn.topk to be less than 24 (which is similar to the number of the default frames in the VideoCrafter2 model).
- If you wish to have a larger padding or attn.topk, you should set the attn.token_num_scale to True.
A higher local_phase.t and global_phase.t will result in more stable videos but may reduce the diversity of the videos.
More padding will provide more local context.
A higher attn.topk will bring about overall stability in the videos.

Citation

@article{
  tan2024videoinf,
  title={Video-Infinity: Distributed Long Video Generation},
  author={Zhenxiong Tan, Xingyi Yang, Songhua Liu, and Xinchao Wang},
  journal={arXiv preprint arXiv:2406.16260},
  year={2024}
}

Acknowledgements

Our project is based on the VideoCrafter2 model. We would like to thank the authors for their excellent work! ❤️