README.md
June 6, 2026 Β· View on GitHub
Important
π Cosmos 3 Has Arrived
Cosmos 3 is NVIDIA's next-generation foundation model platform for Physical AI. Compared with Cosmos-Reason2, Cosmos 3 delivers substantially stronger physical reasoning capabilities while extending beyond reasoning to support world prediction, simulation, transfer, and action generation within a single unified model.
Rather than relying on separate models for reasoning, prediction, transfer, and policy learning, a single Cosmos 3 model can understand the world, reason about physical interactions, predict future outcomes, transform observations across domains, and generate actions for embodied agents. This unified architecture enables stronger performance across a broad range of Physical AI applications, including robotics, autonomous vehicles, and smart spaces.
This repository is no longer under active development and will receive only limited maintenance updates. Future model releases, features, documentation, and community support will be focused on Cosmos 3.
π Visit the new Cosmos home: https://github.com/NVIDIA/Cosmos
There you will find the latest Cosmos 3 models, technical reports, tutorials, benchmarks, and ecosystem updates.
Thank you for your support of Cosmos-Reason2. We encourage all users to migrate to Cosmos 3 for the latest state-of-the-art Physical AI capabilities.
π€ Hugging FaceΒ | Cosmos Cookbook
NVIDIA Cosmos Reason β an open, customizable, reasoning vision language model (VLM) for physical AI and robotics - enables robots and vision AI agents to reason like humans, using prior knowledge, physics understanding and common sense to understand and act in the real world. This model understands space, time, and fundamental physics, and can serve as a planning model to reason what steps an embodied agent might take next.
Cosmos Reason excels at navigating the long tail of diverse scenarios of the physical world with spatial-temporal understanding. Cosmos Reason is post-trained with physical common sense and embodied reasoning data with supervised fine-tuning and reinforcement learning. It uses chain-of-thought reasoning capabilities to understand world dynamics without human annotations.
Table of Contents
- News!
- Model Family
- Setup
- Inference
- Post-Training
- Quantization
- Troubleshooting
- Additional Resources
- License and Contact
News!
- [June 1, 2026] Cosmos 3 is live! Weβre excited to share that Cosmos 3 has launched β NVIDIAβs next-generation family of open omnimodal world foundation models for Physical AI. As part of this launch, the Cosmos GitHub repositories have moved to the main NVIDIA GitHub organization: https://github.com/nvidia/cosmos. Cosmos 3 unifies language, images, video, audio, and actions in a single architecture, enabling developers to build agents that can understand, reason, simulate, and act in the physical world. Please use the new repository as the source of truth for the latest code, models, documentation, and updates.
- [April 29, 2026] The Cosmos-Reason2-32B model is now available on HuggingFace, along with its performance benchmarks. The 2B and 8B model performance is also included.
- [February 9, 2026] We have Improved documentation and troubleshooting guidance, expanded platform support GB200 and ARM (torchcodec & inference sample fixed), enhanced quantization and training debuggability, and updated CUDA compatibility
- [December 19, 2025] We have released the Cosmos-Reason2 models and code for Physical AI common sense and embodied reasoning. The 2B and 8B models are now available on Hugging Face.
Model Family
Setup
This repository only contains documentation/examples/utilities. You do not need it to run inference. See Inference example for a minimal inference example. The following setup instructions are only needed to run the examples in this repository.
Clone the repository:
git clone https://github.com/nvidia-cosmos/cosmos-reason2.git
cd cosmos-reason2
Install one of the following environments:
Virtual Environment
Install system dependencies:
sudo apt-get install curl ffmpeg git git-lfs unzip
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
uvx hf auth login
Install the repository:
uv sync --extra cu128
source .venv/bin/activate
CUDA variants:
| CUDA Version | Arguments | Notes |
|---|---|---|
| CUDA 12.8 | --extra cu128 | NVIDIA Driver |
| CUDA 13.0 | --extra cu130 | NVIDIA Driver |
For DGX Spark and Jetson AGX, you must use CUDA 13.0. Additionally, you must set TRITON_PTXAS_PATH to your system PTXAS:
export TRITON_PTXAS_PATH="/usr/local/cuda/bin/ptxas"
Docker Container
Please make sure you have access to Docker on your machine and the NVIDIA Container Toolkit is installed.
Build the container:
image_tag=$(docker build -f Dockerfile --build-arg=CUDA_VERSION=12.8.1 -q .)
CUDA variants:
| CUDA Version | Arguments | Notes |
|---|---|---|
| CUDA 12.8 | --build-arg=CUDA_VERSION=12.8.1 | NVIDIA Driver |
| CUDA 13.0 | --build-arg=CUDA_VERSION=13.0.0 | NVIDIA Driver |
For DGX Spark and Jetson AGX, you must use CUDA 13.0.
Run the container:
docker run -it --gpus all --ipc=host --rm -v .:/workspace -v /workspace/.venv -v /workspace/examples/cosmos_rl/.venv -v /root/.cache:/root/.cache -e HF_TOKEN="$HF_TOKEN" $image_tag
Optional arguments:
--ipc=host: Use host system's shared memory, since parallel torchrun consumes a large amount of shared memory. If not allowed by security policy, increase--shm-size(documentation).-v /root/.cache:/root/.cache: Mount host cache to avoid re-downloading cache entries.-e HF_TOKEN="$HF_TOKEN": Set Hugging Face token to avoid re-authenticating.
Inference
Minimum GPU Memory
| Model | GPU Memory |
|---|---|
| Cosmos-Reason2-2B | 24GB |
| Cosmos-Reason2-8B | 32GB |
Tested Platforms
Cosmos-Reason2 works on Hopper and Blackwell. Additional hardware configurations may work but are not officially validated at the time of this release.
Examples have been tested on the following devices:
| GPU | CUDA Version | Functionality |
|---|---|---|
| NVIDIA H100 | 12.8 | inference/post-training/quantization |
| NVIDIA GB200 | 13.0 | inference |
| NVIDIA DGX Spark | 13.0 | inference |
| NVIDIA Jetson AGX Thor (Edge) | 13.0 | Transformers inference. vLLM inference is coming soon! |
Transformers
Cosmos-Reason2 is included in transformers>=4.57.0.
Minimal example (sample output):
python scripts/inference_sample.py
Deployment
For deployment and batch inference, we recommend using vllm>=0.11.0.
Online Serving
Start the server in a separate terminal or a background process.
Tip
Docker users: Run docker exec -it <CONTAINER_ID> bash to exec into your container. Find your container ID with docker ps.
vllm serve nvidia/Cosmos-Reason2-2B \
--allowed-local-media-path "$(pwd)" \
--max-model-len 16384 \
--media-io-kwargs '{"video": {"num_frames": -1}}' \
--reasoning-parser qwen3 \
--port 8000
Optional arguments:
--max-model-len 16384: Maximum model length to avoid OOM. Recommended range: 8192 - 16384.--media-io-kwargs '{"video": {"num_frames": -1}}': Allow overriding FPS per sample.--reasoning-parser qwen3: Parse reasoning trace.--port 8000: Server port. Change if you encounterAddress already in useerrors.
Note
First startup takes a couple minutes for model loading and CUDA graph compilation. Subsequent starts are faster with cached graphs.
Once ready, the server will print Application startup complete..
Warning
Remember to stop the server when done! The vllm server consumes significant GPU memory while running. To stop it:
- If running in foreground: Press
Ctrl+C - If running in background: Find the process with
ps aux | grep vllmand kill it withkill <PID>
Caption a video (sample output):
cosmos-reason2-inference online --port 8000 -i prompts/caption.yaml --reasoning --videos assets/sample.mp4 --fps 4
Embodied reasoning with verbose output (sample output):
cosmos-reason2-inference online -v --port 8000 -i prompts/embodied_reasoning.yaml --reasoning --images assets/sample.png
To list available arguments:
cosmos-reason2-inference online --help
Offline Inference
Temporally caption a video and save the input frames to outputs/temporal_localization for debugging (sample output):
cosmos-reason2-inference offline -v --max-model-len 16384 -i prompts/temporal_localization.yaml --videos assets/sample.mp4 --fps 4 -o outputs/temporal_localization
To list available arguments:
cosmos-reason2-inference offline --help
Common arguments:
--model nvidia/Cosmos-Reason2-2B: Model name or path.
Post-Training
Quantization
Troubleshooting
Additional Resources
- Troubleshooting
- Example prompts
- Cosmos-Reason2 is based on the Qwen3-VL architecture.
- vLLM
License and Contact
This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
NVIDIA Cosmos source code is released under the Apache 2 License.
NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact cosmos-license@nvidia.com.