Cosmos Curator - Developer Guide
June 10, 2026 · View on GitHub
- Cosmos Curator - Developer Guide
Development Environment Setup
Please refer to the following section in End User Guide:
- Prerequisites for hardware and software requirements.
- Initial Setup for preparaing configurations files and workspace directories, etc.
- Setup Environment and Install Dependencies for setting up Cosmos Curator.
For developers to contribute back to the repo, install the local git hooks and run a package build smoke test:
./devset.sh
For an interactive development shell, run:
pixi shell -e dev
For an explicit one-time setup without the smoke test, install the pre-commit hook directly:
pixi run -e tools pre-commit install
macOS is not a supported runtime platform for Cosmos Curator pipelines, but Apple Silicon developers can use the
lightweight tools Pixi environment for pre-commit formatting checks and the ./devset.sh bootstrap. Full CPU/GPU
tests and pipeline execution should run on Linux or inside the project container.
Working with Pixi Environments
Cosmos Curator uses Pixi to manage Python environments for local development and Docker images.
The pixi.toml file at the repository root defines all Pixi environments used by Cosmos Curator.
Installing Pixi
Follow the Pixi installation instructions, or run the following commands:
# Assuming your current user has write access to /usr/local/bin.
# If not, you can change ownership of /usr/local to your user: sudo chown -R $USER:$USER /usr/local
wget -qO- https://pixi.sh/install.sh | PIXI_HOME=/usr/local PIXI_NO_PATH_UPDATE=1 sh
This installs Pixi to /usr/local/bin/pixi, which matches the location in the Docker image.
You can verify the installation by running pixi --version.
Adding New Dependencies
When adding new dependencies to a pixi environment:
- Edit the
pixi.tomlfile to add your dependency under the appropriate environment - Run
pixi lockto resolve the new dependency - Rebuild the Docker image
- Test your changes inside the Docker container to ensure compatibility
- Commit both
pixi.tomland the updatedpixi.lockfile to version control to ensure reproducibility
Note: Environment names in pixi use hyphens (e.g., model-download) rather than underscores, as per pixi conventions.
Updating Pixi Environments
To update pixi environments:
- Run
pixi updateto update the environments - Rebuild the Docker image
- Test your changes inside the Docker container to ensure compatibility
Viewing Available Environments
To see all available pixi environments defined in pixi.toml:
# Show detailed information about all environments
pixi info
# Or list the environments only
pixi workspace environment list
Viewing Packages in Pixi Environments
To see all the packages and their versions in a pixi environment:
# For the default environment
pixi list
# For a specific environment
pixi list -e <env-name>
# For a specific package
pixi list -e <env-name> <package-name>
# Example: see the pytorch packages in the 'default' environment
pixi list -e default pytorch
Running Commands in Pixi Environments
While most pipeline execution happens inside Docker containers, you may need to run commands in pixi environments:
# Run in the default environment
pixi run python -m <module>
# Run in a specific environment
pixi run -e <env-name> python -m <module>
# Example: check if PyTorch is CUDA enabled in the default environment
pixi run -e default python -c "import torch; print(torch.cuda.is_available())"
Developer tools support two local workflows. Without activating a shell, use Pixi task aliases from the repository root:
pixi run ruff check
pixi run mypy --pretty
pixi run pytest tests/test_pixi_dev_commands.py
pixi run build
Or activate the developer environment once, then run the tools directly:
pixi shell -e dev
ruff check
mypy --pretty
pytest tests/test_pixi_dev_commands.py
python -m build
The dev environment also includes nvtop and nvitop; run pixi run nvtop or pixi run nvitop for interactive GPU
utilization and memory monitoring on Linux GPU hosts.
Note: For pipeline execution, always use the Docker container as shown in the testing section.
Interactive Slurm Development
For GPU development on a Slurm compute node, use the Interactive Slurm Development Guide. It covers starting the slim container from an interactive allocation, using a live Lustre checkout, running a pipeline command, and troubleshooting common environment issues.
Code Quality Checks
This project uses the following development tools:
- ruff: For code formatting and linting
- mypy: For static type checking
- Pixi: For dependency management
Before submitting any changes, format Python files and run the checks from the repository root:
pixi run ruff format
pixi run ruff check
pixi run mypy --pretty
If you are already inside pixi shell -e dev, omit pixi run and run ruff format, ruff check, and mypy --pretty
directly.
Building the Client package
- The
cosmos-curatorclient can be built as a wheel and installed in a standalone mode, without the need for the rest of the source environment
pixi run build
pip3 install dist/cosmos_curator*.whl
Inside pixi shell -e dev, use python -m build.
Testing
Tests under the tests/ directory can be categorized into 3 levels:
- Unit tests: for testing critical/complex function which are typically CPU-only and can run in the default Pixi environment.
- Model/stage tests: for testing functional correctness of a model, a pipeline stage, and a combination of a few stages, which typically require GPU and should run inside the container.
- End-to-end pipeline tests: for testing the functionality of reference pipelines.
Unit Tests
Run the CPU-only pytest suite from the repository root:
PYTEST_ADDOPTS='--junitxml=report.xml --cov=cosmos_curator --cov-report=term --cov-report=xml:unit-coverage.xml --cov-report=html:unit-htmlcov' \
pixi run pytest -s
================ test session starts ================
configfile: pytest.ini
testpaths: tests
collected 155 items / 26 deselected / 129 selected
tests/client/slurm_cli/test_slurm.py ..................................... [ 36%]
tests/client/slurm_cli/test_start_ray.py ................................... [ 70%]
tests/cosmos_curator/pipelines/video/filtering/motion/test_motion_filter.py . [ 71%]
tests/cosmos_curator/pipelines/video/utils/test_decoder_utils.py ............................. [100%]
================ 102 passed, 6 deselected, 2 warnings in 2.98s ================
Inside pixi shell -e dev, use the same PYTEST_ADDOPTS value with pytest -s.
Model and Stage Tests
Launch the docker container locally and run the gputest task, which runs the
env-marked model and stage tests:
for conda_env in default legacy-transformers; do
cosmos-curator local launch --image-name cosmos-curator --image-tag 1.0.0 --curator-path . \
-- pixi run --as-is -e $conda_env gputest;
done
================ test session starts ================
configfile: pytest.ini
collected 58 items / 40 deselected / 18 selected
tests/cosmos_curator/pipelines/video/clipping/test_fixed_stride_extraction.py ........ [ 44%]
tests/cosmos_curator/pipelines/video/clipping/test_transnetv2_extraction.py ..... [ 72%]
tests/cosmos_curator/pipelines/video/filtering/motion/test_motion_filter.py . [100%]
================ 18 passed, 40 deselected, 3 warning in 14.44 ================
...
================ test session starts ================
configfile: pytest.ini
collected 58 items / 52 deselected / 6 selected
tests/cosmos_curator/pipelines/video/captioning/test_t5_embedding.py . [ 16%]
tests/cosmos_curator/pipelines/video/filtering/aesthetics/test_aesthetic_filter.py ..... [100%]
================ 6 passed, 52 deselected, 2 warning in 30.02 ================
End-to-End Pipeline Tests
Run the reference video pipeline based on instructions in Run the Split-Annotate Pipeline section to make sure everything works.
The CI will test more scenarios.
Pipeline CLI settings (attrs)
Where it lives
- Shared flags (S3 profiles, execution mode, limit, model weights, profiling):
cosmos_curator/pipelines/common_pipeline_settings.py— classCommonPipelineSettings, helperscli(),add_settings_cli_arguments, andCommonPipelineSettings.from_namespace. - Shard-only flags:
cosmos_curator/pipelines/video/shard_pipeline_settings.py— classShardPipelineSettings,add_shard_args. - Parser registration used by multiple entry points:
cosmos_curator/pipelines/pipeline_args.py(add_common_args,add_profiling_args).
How to add a new flag (shard example)
- Add an attrs field on the right class (
ShardPipelineSettingsfor shard-only,CommonPipelineSettingsonly if every consumer should see the flag). - Put validation on the field with
validator=(for examplevalidators.ge(1),validators.in_(choices),validators.min_len(1)). This runs whenever the settings instance is constructed (CLI, NVCF JSON →Namespace, and any code path that builds the class explicitly). - Attach CLI metadata with
metadata=cli(...)on the same field. Typical keys arehelp=,default=, and when neededarg_type=int/float,choices=frozenset(...),required=True,action=oraction=argparse.BooleanOptionalAction, or a customflag="--my-name". The field name becomes argparsedestunless you override the flag string only (the dest stays the field name). - Register flags by calling
add_settings_cli_arguments(parser, YourSettingsClass)(or the thin wrappersadd_shard_args/add_common_args). Do not add a second manualadd_argumentfor the same option. - Build settings at the pipeline entry point from the parsed
Namespace(seeshard()insharding_pipeline.py:CommonPipelineSettings.from_namespace(args)plus shard fields keyed by field name). Pipeline logic should takeShardPipelineSettings(or the relevant settings type), not rawargs, for the refactored paths. - attrs field order: optional fields with defaults cannot appear before required fields. Keep pipeline-specific constraints in mind (see existing comments on
ShardPipelineSettings).
NVCF and Slurm
- NVCF builds
argparse.Namespace(**args)from JSON. Keys must use Python dest names (input_clip_path, not--input-clip-path). Values are not re-parsed through argparse’stype=/choices=, but attrs validators still run when settings are constructed. Put every constraint you need for cloud invokes onvalidator=(and keepchoices=incli()aligned for CLI users). - Slurm usually runs the same
python -m ... run_pipeline ...CLI as local development, so argparse and attrs both apply.
Tests
tests/cosmos_curator/pipelines/video/test_shard_pipeline_settings.pychecks parserdestsets against settings fields and exercises attrs validation after parse.
Other large pipelines may still use legacy patterns until migrated; follow the existing module when changing them.
Best Practices
-
Virtual Environment:
- Always work in a virtual environment
- Avoid using
$HOME/.localfor Python packages
-
Dependencies:
- Maintain the
pyproject.tomlfile - Document any new dependencies
- Use
pixi shell -e devfor local development
Note: You may execute
./devset.shto install local git hooks and run a package build smoke test. - Maintain the
-
Code Quality:
- Write clean, well-documented code
- Use ruff for formatting and linting
- Ensure type hints are properly used and checked with mypy
- Write meaningful commit messages
-
Testing:
- Write tests for new features
- Run existing tests before submitting changes
- Ensure all tests pass
-
Documentation:
- Update documentation when adding new features
- Keep the README files up to date
- Document any API changes
Environment Variables
Cosmos Curator uses several cosmos_curator_* environment variables for
internal communication between the driver process and Ray workers.
Most are set automatically by the framework; developers rarely need to
set them manually.
| Variable | Set by | Purpose |
|---|---|---|
COSMOS_CURATOR_ARTIFACTS_STAGING_DIR | ArtifactDelivery.create() | Shared base directory for artifact staging on each node. Workers inherit this from the driver so all processes agree on the same temp path. Subdirectories (profiling/, traces/) isolate different artifact kinds. |
COSMOS_CURATOR_TRACE_DIR | enable_tracing() | Per-process directory for OTel span files. Defaults to <staging>/traces/ when COSMOS_CURATOR_ARTIFACTS_STAGING_DIR is set, otherwise /tmp/cosmos_curator_traces. |
COSMOS_CURATOR_TRACEPARENT | propagate_trace_context() | W3C traceparent header propagated from the driver's root span so all stage spans join a single distributed trace. |
COSMOS_CURATOR_LOCAL_WORKSPACE_PREFIX | User (optional) | Override the home-directory prefix for the local workspace (~/cosmos_curator_local_workspace). See End User Guide. |
COSMOS_CURATOR_DOCKER_BUILD_ULIMIT | User (optional) | Custom nofile ulimit for Docker image builds (default 65536). |
For details on the artifact delivery mechanism and why env vars are used, see the Artifact Transport Guide.
Contributing
- Create a new branch for your feature/fix
- Make your changes
- Run all code quality checks
- Submit a pull request with a clear description of changes
Troubleshooting
If you encounter issues during development:
-
Environment Issues:
- Ensure you're using the correct Python version
- Verify all dependencies are installed
- Check virtual environment activation
-
Build Issues:
- Clear any cached files
- Rebuild the environment if necessary
- Check for conflicting dependencies
-
Pipeline Stage Debugging:
- Use Stage Replay to debug specific stages without re-running entire pipelines
- Save task inputs from problematic stages and replay them in isolation
- Iterate rapidly on stage logic by replaying saved tasks
Responsible Use of AI Models
Support
For development-related questions or issues:
- Create an issue in the repository
- Contact the development team
- Check existing documentation and issues