README.md

June 6, 2026 · View on GitHub

Important

🚀 Cosmos 3 Has Arrived

Cosmos 3 is NVIDIA's next-generation foundation model platform for Physical AI. Compared with Cosmos-Transfer1, Cosmos 3 delivers significantly stronger transfer capabilities, enabling higher-fidelity transformation, adaptation, and simulation across diverse domains, sensors, environments, and embodiments.

Beyond improving transfer quality, Cosmos 3 unifies capabilities that previously required multiple specialized models. A single Cosmos 3 model can reason, predict future world states, transfer across domains and modalities, and generate actions and policies for embodied agents within one unified architecture.

This repository is no longer under active development and will receive only limited maintenance updates. Future model releases, features, documentation, and community support will be focused on Cosmos 3.

👉 Visit the new Cosmos home: https://github.com/NVIDIA/Cosmos

There you will find the latest Cosmos 3 models, technical reports, tutorials, benchmarks, and ecosystem updates.

Thank you for your support of Cosmos-Transfer1. We encourage all users to migrate to Cosmos 3 for the latest state-of-the-art Physical AI capabilities.

Product Website | Hugging Face | Paper | Paper Website

Cosmos-Transfer1 is a key branch of Cosmos World Foundation Models (WFMs) specialized for multimodal controllable conditional world generation or world2world transfer. The three main branches of Cosmos WFMs are cosmos-predict, cosmos-transfer, and cosmos-reason. We visualize the architecture of Cosmos-Transfer1 in the following figure.

Cosmos-Transfer1 Architecture Diagram

Cosmos-Transfer1 includes the following:

ControlNet-based single modality conditional world generation where a user can generate visual simulation based on one of the following modalities: segmentation video, depth video, edge video, blur video, LiDAR video, or HDMap video. Cosmos-Transfer1 generates a video based on the signal modality conditional input, a user text prompt, and, optionally, an input RGB video frame prompt (which could be from the last video generation result when operating in the autoregressive setting). We will use Cosmos-Transfer1-7B [Modality] to refer to the model operating in this setting. For example, Cosmos-Transfer1-7B [Depth] refers to a depth ControlNet model.
MultiControlNet-based multimodal conditional world generation where a user can generate visual simulation based on any combination of segmentation video, depth video, edge video, and blur video (LiDAR video and HDMap in the AV sample) with a spatiotemporal control map to control the stregnth of each modality across space and time. Cosmos-Transfer1 generates a video based on the multimodal conditional inputs, a user text prompt, and, optionally, an input RGB video frame prompt (This could be from the last video generation result when operating in the autoregressive setting.). This is the preferred mode of Cosmos-Transfer. We will refer it as Cosmos-Transfer1-7B.
4KUpscaler for upscaling a 720p-resolution video to a 4K-resolution video.
Post-training scripts for helping Physical AI builders post-train pre-trained Cosmos-Transfer1 for their applications.
Pre-training scripts for helping Physical AI builders train their own Cosmos-Transfer1 models from scratch.

News

[2025/08] Cosmos-Transfer1-7B Edge Distilled is available! Now you can generate videos in a single diffusion step (vs. 36 steps), significantly speeding up inference. We provide the distillation recipe and training code, so you can even distill your own models! Try it out and tell us what you think!
[2025/05] Cosmos AV Single2MultiView is available! Now you can create dynamic, multi-view clips from just one video. Try it out and tell us what you think!
[2025/04] Post training is available! Now you can customize Transfer1 models in your own way. Please try it out and we look forward to your feedback.

Example Model Behavior

Cosmos-Transfer LiDAR + HDMap Conditional Inputs -> World

Cosmos-Transfer Multimodal Conditional Inputs -> World

Getting Started

We provide a comphrehensive set of examples to illustrate how to perform inference, post-training, etc, with Cosmos-Transfer1. Click a relevant example below and start your Cosmos journey.

Installation

Please refer to INSTALL.md for general instructions on environment setup.

Workflow

*Robotics Augmentation Workflow: Scene augmentation for robotic manipulation, mapping one robotics synthetic example to multiple realistic examples

Cosmos-Transfer1 Models

Cosmos-Transfer1-7B: multimodal controllable conditional world generation with adaptive spatiotemporal control map. The supported modalities include segmentation, depth, canny edge, and blur visual.
Cosmos-Transfer1-7B [Depth | Edge | Keypoint | Segmentation | Vis]: single modality controllable conditional world generation. This refers to Cosmos-Transfer1-7B operates on the single modality case and is reduced to a ControlNet.
Cosmos-Transfer1-7B-Sample-AV: multimodal controllable conditional world generation with adaptive spatiotemporal control map specialized for autonomous vehicle applications. The supported modalities include LiDAR and HDMap.
Cosmos-Transfer1-7B [LiDAR | HDMap]: single modality controllable conditional world generation for autonomous vehicle applications. This refers to Cosmos-Transfer1-7B-Sample-AV operates on the single modality case and is reduced to a ControlNet.
Cosmos-Transfer1-7B-4KUpscaler: 4K upscaler to super-resolute 720p videos to 4K videos.

License and Contact

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

This model includes safety and content moderation features powered by Llama Guard 3. Llama Guard 3 is used solely as a content input filter and is subject to its own license.

NVIDIA Cosmos source code is released under the Apache 2 License.

NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact cosmos-license@nvidia.com.