Towards Real-Time Text2Video via CLIP-Guided, Pixel-Level Optimization

October 25, 2022 · View on GitHub

Peter Schaldenbrand, Zhixuan Liu, and Jean Oh. The Robotics Institute, Carnegie Mellon University

An approach to generating videos based on a series of given language descriptions of the video. We currently only have a Colab implementation which is linked above.

A ballerina frog dancing

Please message Peter at pschalde at andrew dot cmu dot edu with any questions or make a GitHub issue. Thanks!