Mora: More like Sora for Generalist Video Generation

October 10, 2024 · View on GitHub

🔍 See our newest Video Generation paper: "Mora: Enabling Generalist Video Generation via A Multi-Agent Framework" )

📧 Please let us know if you find a mistake or have any suggestions by e-mail: lis221@lehigh.edu

📰 News

🚀️ Oct 9: Our Mora update v2 paper and training code will coming soon.

🚀️ Jun 13: Our code is released!

🚀️ Mar 20: Our paper "Mora: Enabling Generalist Video Generation via A Multi-Agent Framework" is released!

Mora is a multi-agent framework designed to facilitate generalist video generation tasks, leveraging a collaborative approach with multiple visual agents. It aims to replicate and extend the capabilities of OpenAI's Sora. Task

📹 Demo for Artist Creation

Inspired by OpenAI Sora: First Impressions, we utilize Mora to generate Shy kids video. Even though Mora has reached the similar level as Sora in terms of video duration, 80s, Mora still has a significant gap in terms of resolution, object consistency, motion smoothness, etc.

https://github.com/JHL328/test/assets/55661930/abe276f7-12d3-4d24-aff3-7474296e854e

🎥 Demo (1024×576 resolution, 12 seconds and more!)

Mora: A Multi-Agent Framework for Video Generation

test image

Multi-Agent Collaboration: Utilizes several advanced visual AI agents, each specializing in different aspects of the video generation process, to achieve high-quality outcomes across various tasks.
Broad Spectrum of Tasks: Capable of performing text-to-video generation, text-conditional image-to-video generation, extending generated videos, video-to-video editing, connecting videos, and simulating digital worlds, thereby covering an extensive range of video generation applications.
Open-Source and Extendable: Mora’s open-source nature fosters innovation and collaboration within the community, allowing for continuous improvement and customization.
Proven Performance: Experimental results demonstrate Mora's ability to achieve performance that is close to that of Sora in various tasks, making it a compelling open-source alternative for the video generation domain.

Results

Text-to-video generation

Input prompt	Output video
A vibrant coral reef teeming with life under the crystal-clear blue ocean, with colorful fish swimming among the coral, rays of sunlight filtering through the water, and a gentle current moving the sea plants.
A majestic mountain range covered in snow, with the peaks touching the clouds and a crystal-clear lake at its base, reflecting the mountains and the sky, creating a breathtaking natural mirror.
In the middle of a vast desert, a golden desert city appears on the horizon, its architecture a blend of ancient Egyptian and futuristic elements.The city is surrounded by a radiant energy barrier, while in the air, seve

Text-conditional image-to-video generation

Input prompt	Input image	Mora generated Video	Sora generated Video
Monster Illustration in the flat design style of a diverse family of monsters. The group includes a furry brown monster, a sleek black monster with antennas, a spotted green monster, and a tiny polka-dotted monster, all interacting in a playful environment.
An image of a realistic cloud that spells “SORA”.

Extend generated video

Original video	Mora extended video	Sora extended video

Video-to-video editing

Instruction	Original video	Mora edited Video	Sora edited Video
Change the setting to the 1920s with an old school car. make sure to keep the red color.
Put the video in space with a rainbow road

Connect videos

Input previous video	Input next video	Output connect Video

Simulate digital worlds

Mora simulating video	Sora simulating video

Getting Started

Code will be released as soon as possible!

Citation

@article{yuan2024mora,
  title={Mora: Enabling Generalist Video Generation via A Multi-Agent Framework},
  author={Yuan, Zhengqing and Chen, Ruoxi and Li, Zhaoxu and Jia, Haolong and He, Lifang and Wang, Chi and Sun, Lichao},
  journal={arXiv preprint arXiv:2403.13248},
  year={2024}
}

@article{liu2024sora,
  title={Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models},
  author={Liu, Yixin and Zhang, Kai and Li, Yuan and Yan, Zhiling and Gao, Chujie and Chen, Ruoxi and Yuan, Zhengqing and Huang, Yue and Sun, Hanchi and Gao, Jianfeng and others},
  journal={arXiv preprint arXiv:2402.17177},
  year={2024}
}

@misc{openai2024sorareport,
  title={Video generation models as world simulators},
  author={OpenAI},
  year={2024},
  howpublished={https://openai.com/research/video-generation-models-as-world-simulators},
}