FAQ
June 3, 2026 · View on GitHub
Skills:
.agents/skills/cosmos3-setup/SKILL.md·.agents/skills/cosmos3-inference/SKILL.md
A catch-all collection of frequently asked questions, tips, and troubleshooting for the Cosmos3 package. Can't find what you need? Check setup.md for installation issues or inference.md for inference details.
To add a new entry, append it under the most relevant section — or under Miscellaneous if nothing fits.
Table of Contents
Setup and Installation
Q: I get ImportError: cannot import name '_functionalization' from 'torch._C' inside an NGC container
Clear the library path before running anything:
export LD_LIBRARY_PATH=''
This is needed because the NGC PyTorch container ships its own libraries that conflict with the venv-installed versions. See setup.md#pytorch-import-issue.
Q: ModuleNotFoundError: No module named 'cosmos_framework'
Make sure you installed the package:
uv sync --all-extras --group=cu130
source .venv/bin/activate
If already installed, try --reinstall to force a clean state.
Q: Which CUDA version should I use?
CUDA 13.0 is recommended. CUDA 12.8 is also supported. The major version must match between your system CUDA and the installed PyTorch wheels. Check with:
nvidia-smi # system CUDA
python -c "import torch; print(torch.version.cuda)" # PyTorch CUDA
Q: I get a CUDA / NVIDIA driver mismatch error (e.g. CUDA error: no kernel image is available for execution on the device, libcudart.so.* cannot open, The NVIDIA driver on your system is too old)
The installed PyTorch CUDA wheels do not match your system's NVIDIA driver. Check the driver's reported CUDA version with nvidia-smi, then delete .venv/ and uv sync against the matching group (cu130-train if the driver supports CUDA 13.x; cu128-train if it supports CUDA 12.8):
nvidia-smi # check the "CUDA Version" field (top right)
rm -rf .venv
uv sync --all-extras --group=cu130-train --reinstall # or --group=cu128-train
source .venv/bin/activate && export LD_LIBRARY_PATH=
Use the inference-only groups (cu130 / cu128) instead if you don't need the training-only dependencies.
Q: How do I download model checkpoints?
Checkpoints are downloaded automatically from Hugging Face during inference. You need:
- A Hugging Face token with Read permission
- Accepted NVIDIA Open Model License Agreement
HF_TOKENenvironment variable set, oruvx hf auth login
Control the download location with HF_HOME (default: ~/.cache/huggingface). If downloads fail, the commands are printed to the console — run them manually to debug. See setup.md#downloading-base-checkpoints.
Q: fatal error: Python.h: No such file or directory
Reinstall uv and the venv from scratch:
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install --reinstall
rm -rf .venv
uv sync --all-extras --group=cu130 --reinstall
source .venv/bin/activate
Q: How much disk space do I need?
Plan for ~150 GiB free before the first run. A successful first-run inference or training workflow typically consumes:
- Hugging Face cache (
$HF_HOME, default~/.cache/huggingface): ~90 GiB — base checkpoints (e.g. Cosmos3-Nano, Wan2.2 VAE), tokenizers, and any dataset snapshots pulled by training recipes. - uv cache (
$UV_CACHE_DIR, default~/.cache/uv): ~20 GiB — wheels for torch/CUDA dependencies across the install groups (cu130-train,cu128-train, etc.). - Run outputs (
$IMAGINAIRE_OUTPUT_ROOT, training, or your-ooutput dir, inference): ~30 GiB per run — config snapshots, DCP checkpoints saved everysave_freqiterations, callback outputs, optional wandb files.
Actual sizes scale with the model tier (Cosmos3-Super is larger than Cosmos3-Nano), the dataset, and how many checkpoints you keep. To relocate any of these off the system disk, set the corresponding env var before installation/run (e.g. export HF_HOME=/data/hf, export UV_CACHE_DIR=/data/uv, export IMAGINAIRE_OUTPUT_ROOT=/data/cosmos-runs).
Configuration and Defaults
Q: Where are the default inference parameters (guidance, shift, num_steps, etc.)?
Per-modality defaults live in JSON files under cosmos_framework/inference/defaults/<mode>/sample_args.json:
| Mode | Default file |
|---|---|
text2image | cosmos_framework/inference/defaults/text2image/sample_args.json |
text2video | cosmos_framework/inference/defaults/text2video/sample_args.json |
image2video | cosmos_framework/inference/defaults/image2video/sample_args.json |
Action and image/video-to-video modes have parallel files under cosmos_framework/inference/defaults/{image2image,video2video,forward_dynamics,inverse_dynamics,policy}/sample_args.json.
See AGENTS.md for the full config defaults chain.
Q: How do I override a default parameter?
From most temporary to most permanent:
- CLI flag:
--shift 5.0(per-run, applies to all samples) - Sample argument file: set the field in your input JSON (per-sample)
- Custom defaults file: pass
"defaults_file": "my_defaults.json"in your sample argument file (see inference.md#custom-defaults) - Built-in default: edit
cosmos_framework/inference/defaults/<mode>/sample_args.json(permanent change)
Fields set in the sample argument file take precedence over defaults. CLI flags override both.
Q: What is the shift parameter and which value should I use?
shift controls the time-shift in the UniPC diffusion sampler. Higher values produce more detail but can introduce artifacts. Recommended values:
| Model | Recommended shift |
|---|---|
| Cosmos3-Nano (8B) | 10.0 (default) |
| Cosmos3-Super (32B) | 5.0 |
Q: How do I add a new parameter to the inference pipeline?
- Add the field to
SamplingArgsandSamplingOverridesincosmos_framework/inference/args.py - Add its default to each
cosmos_framework/inference/defaults/<mode>/sample_args.json - Wire it through
OmniSampleOverrides.build_sample()incosmos_framework/inference/args.py
Q: What does defaults_file do?
It lets you supply a custom JSON file of default values instead of the built-in presets. The format is the same as the files in cosmos_framework/inference/defaults/. Fields in your sample argument file still take precedence over the custom defaults. See inference.md#custom-defaults.
Inference
Q: How much GPU memory do the models need?
| Model | GPU Memory |
|---|---|
| Cosmos3-Nano (8B) | 32 GB |
| Cosmos3-Super (32B) | 128 GB |
Q: I get torch.cuda.OutOfMemoryError during inference
Try these in order:
-
Reduce allocator fragmentation — usually the cheapest fix:
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True -
Increase
--dp-shard-sizeto shard model weights across more GPUs via FSDP. Inference auto-picks a value that fits the model at ~75% device memory (see_get_dp_shard_sizeincosmos_framework/inference/args.py); passing a larger explicit value drops per-GPU memory at the cost of more all-gather traffic. Requires multi-GPU. -
Lower
--device-memory-utilization(default0.75). The auto-dp_shard_sizeformula isceil(model_memory / device_memory / utilization), so passing e.g.--device-memory-utilization=0.5forces auto-mode to pick a largerdp_shard_sizeand leaves more per-GPU headroom for activations / KV cache. Requires multi-GPU. -
Add
--offload-guardrail-modelsto move the text and video guardrail models to CPU. Frees the GPU memory they would otherwise hold for the full run, at the cost of some extra latency when guardrails are invoked.
See inference.md#torch-cuda-out-of-memory-error for the full troubleshooting section.
Q: What is the difference between latency and throughput parallelism presets?
| Preset | What it does | When to use |
|---|---|---|
latency | Spreads each sample across all GPUs | Interactive / real-time use |
throughput | One sample per GPU in parallel | Large batch jobs |
Q: How do I generate images instead of videos?
Use a text2image input file:
python -m cosmos_framework.scripts.inference -i inputs/omni/t2i.json -o outputs/ --checkpoint-path Cosmos3-Nano
The modality is determined by the input JSON (num_frames=1 for images), not by a separate flag. See inputs/omni/t2i.json for the format.
Q: How many frames can I generate?
Depends on resolution:
| Resolution | Max frames |
|---|---|
| 256p | 400 |
| 480p | 300 |
| 720p | 200 |
Default is 189 frames at 24 FPS (~7.9 seconds).
Q: What input formats does image-to-video support?
Provide a vision_path pointing to an image (.jpg, .jpeg, .png) or a URL. See inputs/omni/i2v.json for the format.
Q: How do I run online inference with Ray?
Install serve dependencies and start the server:
uv pip install -e ".[serve]"
python -m cosmos_framework.inference.ray.serve --parallelism-preset=latency -o outputs/ray_serve --checkpoint-path Cosmos3-Nano
Then submit requests via curl, the submit CLI, or the Gradio UI.
Q: How do I fix a torch.compile error during inference?
Add a command-line argument --no-use-torch-compile
OR
Delete the torchinductor cache under the /tmp directory, rm -rf /tmp/torchinductor_*
Q: I get torch.distributed.DistNetworkError: ... port: 29500 ... EADDRINUSE, address already in use
torchrun defaults its rendezvous to port 29500. The error means that port is already taken on the node — usually because another torchrun job (yours or someone else's on a shared node) is still using it.
Pass a different free port with --master-port, placed before -m (it is a torchrun argument, not an inference argument):
torchrun --nproc-per-node=8 --master-port=29501 -m cosmos_framework.scripts.inference \
--parallelism-preset=throughput \
-i "inputs/omni/t2i.json" \
-o outputs/omni_t2i \
--checkpoint-path Cosmos3-Super-Text2Image \
--seed=0
Any free port works (e.g. 29501, 29510); give each concurrent job on the same node a distinct port. Alternatively, --rdzv-endpoint=localhost:0 lets torchrun auto-pick a free port.
Training
Q: I get torch.cuda.OutOfMemoryError during training (SFT)
Knobs are in the recipe TOML under [model], [model.parallelism], and [dataloader_train]. Try in order:
-
Reduce allocator fragmentation — usually the cheapest fix:
export PYTORCH_ALLOC_CONF=expandable_segments:True(The
_superlaunch shells already export this — seeexamples/launch_sft_*_super.sh.) -
Enable activation checkpointing in
[model.activation_checkpointing]:mode = "full"— checkpoint every transformer block (largest memory savings, trades extra recompute for memory).mode = "selective"— per-op SAC, MoT only (smaller savings, smaller overhead). Falls back to no checkpointing on the VLM path.
-
Raise
[model.parallelism].data_parallel_shard_degreeto shard weights/optimizer state across more ranks via FSDP. Runtime invariant (fromcosmos_framework/utils/vfm/parallelism.py:50-52):data_parallel_replicate_degree × data_parallel_shard_degree == WORLD_SIZEalways holds —context_parallel_shard_degreeandcfg_parallel_shard_degreeare overlay axes that share dp rank slots, not separate mesh dims. Use-1to letdata_parallel_shard_degreeauto-fill fromtorchrunworld size. -
Raise
[model.parallelism].context_parallel_shard_degreeto split the sequence dimension across ranks. Helpful when activations (not weights) drive the OOM — long videos, high resolution. -
Lower
[dataloader_train].max_samples_per_batchto cap samples per micro-batch.Nonelets the packer's token budget decide; setting an explicit small number trades throughput for headroom. -
Enable LoRA on a Cosmos3-Nano recipe. Nano recipes are full-finetune by default (
lora_enabled = false); setting[model].lora_enabled = truetrains low-rank adapters instead of the full weights, dropping optimizer-state memory substantially. The_superrecipes (e.g.vision_sft_super) are already LoRA-only, so this lever doesn't apply there.
See docs/training.md for the full SFT setup and TOML reference ([model.activation_checkpointing], [model.parallelism], [dataloader_train] sections).
Tips and Tricks
Seed reproducibility
Always pass --seed when comparing runs. Without it, a random seed is used each time.
Prompt upsampling
Short prompts produce worse results. Use the built-in prompt upsampler with a vLLM-served Qwen3 model:
python -m cosmos_framework.scripts.upsample_prompts -i "inputs/omni/*.json" -o outputs/upsample_prompts
Batch inference resume
The inference script automatically skips samples whose output files already exist. If a run is interrupted, re-run the same command to resume.
Generate all modalities at once
python -m cosmos_framework.scripts.inference -i "inputs/omni/*.json" -o outputs/ --checkpoint-path Cosmos3-Nano --seed=0
CLI help
All available flags and their current defaults:
python -m cosmos_framework.scripts.inference --help
Miscellaneous
This section is a catch-all for tips that don't fit elsewhere. Add new entries freely.
Q: What are the example scripts in examples/ for?
They illustrate how the inference logic works under the hood — examples/inference.py shows the low-level model API and examples/inference_pipeline.py shows the pipeline API. For production use, prefer python -m cosmos_framework.scripts.inference.
Q: Where are the Ray Serve config files?
cosmos_framework/inference/ray/configs/latency.yaml and cosmos_framework/inference/ray/configs/throughput.yaml. These configure the Ray Serve deployment with different parallelism strategies.