README.md

June 6, 2026 · View on GitHub

Important

🚀 Cosmos 3 Has Arrived

Cosmos 3 is NVIDIA's next-generation foundation model platform for Physical AI. Compared with Cosmos-Reason2, Cosmos 3 delivers substantially stronger physical reasoning capabilities while extending beyond reasoning to support world prediction, simulation, transfer, and action generation within a single unified model.

Rather than relying on separate models for reasoning, prediction, transfer, and policy learning, a single Cosmos 3 model can understand the world, reason about physical interactions, predict future outcomes, transform observations across domains, and generate actions for embodied agents. This unified architecture enables stronger performance across a broad range of Physical AI applications, including robotics, autonomous vehicles, and smart spaces.

This repository is no longer under active development and will receive only limited maintenance updates. Future model releases, features, documentation, and community support will be focused on Cosmos 3.

👉 Visit the new Cosmos home: https://github.com/NVIDIA/Cosmos

There you will find the latest Cosmos 3 models, technical reports, tutorials, benchmarks, and ecosystem updates.

Thank you for your support of Cosmos-Reason2. We encourage all users to migrate to Cosmos 3 for the latest state-of-the-art Physical AI capabilities.

NVIDIA Cosmos

🤗 Hugging Face | Cosmos Cookbook

NVIDIA Cosmos Reason – an open, customizable, reasoning vision language model (VLM) for physical AI and robotics - enables robots and vision AI agents to reason like humans, using prior knowledge, physics understanding and common sense to understand and act in the real world. This model understands space, time, and fundamental physics, and can serve as a planning model to reason what steps an embodied agent might take next.

Cosmos Reason excels at navigating the long tail of diverse scenarios of the physical world with spatial-temporal understanding. Cosmos Reason is post-trained with physical common sense and embodied reasoning data with supervised fine-tuning and reinforcement learning. It uses chain-of-thought reasoning capabilities to understand world dynamics without human annotations.

Table of Contents

News!
Model Family
Setup
Inference
Post-Training
Quantization
Troubleshooting
Additional Resources
License and Contact

News!

[June 1, 2026] Cosmos 3 is live! We’re excited to share that Cosmos 3 has launched — NVIDIA’s next-generation family of open omnimodal world foundation models for Physical AI. As part of this launch, the Cosmos GitHub repositories have moved to the main NVIDIA GitHub organization: https://github.com/nvidia/cosmos. Cosmos 3 unifies language, images, video, audio, and actions in a single architecture, enabling developers to build agents that can understand, reason, simulate, and act in the physical world. Please use the new repository as the source of truth for the latest code, models, documentation, and updates.
[April 29, 2026] The Cosmos-Reason2-32B model is now available on HuggingFace, along with its performance benchmarks. The 2B and 8B model performance is also included.
[February 9, 2026] We have Improved documentation and troubleshooting guidance, expanded platform support GB200 and ARM (torchcodec & inference sample fixed), enhanced quantization and training debuggability, and updated CUDA compatibility
[December 19, 2025] We have released the Cosmos-Reason2 models and code for Physical AI common sense and embodied reasoning. The 2B and 8B models are now available on Hugging Face.

Model Family

Setup

This repository only contains documentation/examples/utilities. You do not need it to run inference. See Inference example for a minimal inference example. The following setup instructions are only needed to run the examples in this repository.

Clone the repository:

git clone https://github.com/nvidia-cosmos/cosmos-reason2.git
cd cosmos-reason2

Install one of the following environments:

Virtual Environment

Install system dependencies:

sudo apt-get install curl ffmpeg git git-lfs unzip

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

Hugging Face CLI

uvx hf auth login

Install the repository:

uv sync --extra cu128
source .venv/bin/activate

CUDA variants:

CUDA Version	Arguments	Notes
CUDA 12.8	`--extra cu128`	NVIDIA Driver
CUDA 13.0	`--extra cu130`	NVIDIA Driver

For DGX Spark and Jetson AGX, you must use CUDA 13.0. Additionally, you must set TRITON_PTXAS_PATH to your system PTXAS:

export TRITON_PTXAS_PATH="/usr/local/cuda/bin/ptxas"

Docker Container

Please make sure you have access to Docker on your machine and the NVIDIA Container Toolkit is installed.

Build the container:

image_tag=$(docker build -f Dockerfile --build-arg=CUDA_VERSION=12.8.1 -q .)

CUDA variants:

CUDA Version	Arguments	Notes
CUDA 12.8	`--build-arg=CUDA_VERSION=12.8.1`	NVIDIA Driver
CUDA 13.0	`--build-arg=CUDA_VERSION=13.0.0`	NVIDIA Driver

For DGX Spark and Jetson AGX, you must use CUDA 13.0.

Run the container:

docker run -it --gpus all --ipc=host --rm -v .:/workspace -v /workspace/.venv -v /workspace/examples/cosmos_rl/.venv -v /root/.cache:/root/.cache -e HF_TOKEN="$HF_TOKEN" $image_tag

Optional arguments:

--ipc=host: Use host system's shared memory, since parallel torchrun consumes a large amount of shared memory. If not allowed by security policy, increase --shm-size (documentation).
-v /root/.cache:/root/.cache: Mount host cache to avoid re-downloading cache entries.
-e HF_TOKEN="$HF_TOKEN": Set Hugging Face token to avoid re-authenticating.

Inference

Minimum GPU Memory

Model	GPU Memory
Cosmos-Reason2-2B	24GB
Cosmos-Reason2-8B	32GB

Tested Platforms

Cosmos-Reason2 works on Hopper and Blackwell. Additional hardware configurations may work but are not officially validated at the time of this release.

Examples have been tested on the following devices:

GPU	CUDA Version	Functionality
NVIDIA H100	12.8	inference/post-training/quantization
NVIDIA GB200	13.0	inference
NVIDIA DGX Spark	13.0	inference
NVIDIA Jetson AGX Thor (Edge)	13.0	Transformers inference. vLLM inference is coming soon!

Transformers

Cosmos-Reason2 is included in transformers>=4.57.0.

Minimal example (sample output):

python scripts/inference_sample.py

Deployment

For deployment and batch inference, we recommend using vllm>=0.11.0.

Online Serving

Start the server in a separate terminal or a background process.

Tip

Docker users: Run docker exec -it <CONTAINER_ID> bash to exec into your container. Find your container ID with docker ps.

vllm serve nvidia/Cosmos-Reason2-2B \
  --allowed-local-media-path "$(pwd)" \
  --max-model-len 16384 \
  --media-io-kwargs '{"video": {"num_frames": -1}}' \
  --reasoning-parser qwen3 \
  --port 8000

Optional arguments:

--max-model-len 16384: Maximum model length to avoid OOM. Recommended range: 8192 - 16384.
--media-io-kwargs '{"video": {"num_frames": -1}}': Allow overriding FPS per sample.
--reasoning-parser qwen3: Parse reasoning trace.
--port 8000: Server port. Change if you encounter Address already in use errors.

Note

First startup takes a couple minutes for model loading and CUDA graph compilation. Subsequent starts are faster with cached graphs.

Once ready, the server will print Application startup complete..

Warning

Remember to stop the server when done! The vllm server consumes significant GPU memory while running. To stop it:

If running in foreground: Press Ctrl+C
If running in background: Find the process with ps aux | grep vllm and kill it with kill <PID>

Caption a video (sample output):

cosmos-reason2-inference online --port 8000 -i prompts/caption.yaml --reasoning --videos assets/sample.mp4 --fps 4

Embodied reasoning with verbose output (sample output):

cosmos-reason2-inference online -v --port 8000 -i prompts/embodied_reasoning.yaml --reasoning --images assets/sample.png

To list available arguments:

cosmos-reason2-inference online --help

Offline Inference

Temporally caption a video and save the input frames to outputs/temporal_localization for debugging (sample output):

cosmos-reason2-inference offline -v --max-model-len 16384 -i prompts/temporal_localization.yaml --videos assets/sample.mp4 --fps 4 -o outputs/temporal_localization

To list available arguments:

cosmos-reason2-inference offline --help

Common arguments:

--model nvidia/Cosmos-Reason2-2B: Model name or path.

Post-Training

Quantization

llmcompressor

Troubleshooting

See troubleshooting guide

Additional Resources

Troubleshooting
Example prompts
Cosmos-Reason2 is based on the Qwen3-VL architecture.
vLLM

License and Contact

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

NVIDIA Cosmos source code is released under the Apache 2 License.

NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact cosmos-license@nvidia.com.