CLI Commands Reference
April 24, 2026 ยท View on GitHub
This section documents all commands available in the slop-code CLI.
Quick Reference
| Command | Description |
|---|---|
run | Run agents on benchmarks with unified config system |
sync | Install/update the managed problem catalog |
eval | Evaluate a directory of agent inference results |
eval-problem | Evaluate a single problem directory |
eval-snapshot | Evaluate a single snapshot directory |
infer-problem | Run inference on a single problem |
metrics | Calculate metrics (static, judge, carry-forward, variance) |
utils | Utility commands for maintenance and data processing |
docker | Docker image building utilities |
problems | Problem inspection and registry commands |
tools | Interactive tools and case runners |
viz | Visualization tools (diff viewer) |
Global Options
These options are available on all commands:
| Option | Type | Default | Description |
|---|---|---|---|
-v, --verbose | flag | 0 | Increase verbosity (repeatable) |
--seed | int | 42 | Random seed |
--overwrite | flag | false | Overwrite existing output directory |
--debug | flag | false | Enable debugging mode |
--snapshot-dir-name | string | snapshot | Name of the snapshot directory |
Problem catalog location is controlled by the SCBENCH_HOME environment
variable. If unset, it defaults to ~/.cache/scbench.
Set SCBENCH_PROBLEMS_PATH to point at a flat local problems directory
(each direct child must contain config.yaml) to bypass the managed release
catalog for problem-loading commands.
Problem catalog behavior:
- First problem-using command bootstraps the latest release if no catalog is installed yet.
- Commands do not auto-update once installed; run
slop-code syncexplicitly. - Resume requires the installed catalog commit to match the run's saved
problem_catalog.jsonmetadata.
Command Categories
Core Workflow
Running agents:
# Install/update the managed problem catalog
slop-code sync
# Run with config file
slop-code run --config my_run.yaml --problem file_backup
# Run with CLI flags
slop-code run --model anthropic/sonnet-4.5 --problem file_backup
Evaluating results:
# Evaluate all problems in a run
slop-code eval outputs/my_run
# Evaluate a single problem
slop-code eval-problem outputs/my_run/file_backup
# Evaluate a single snapshot
slop-code eval-snapshot outputs/my_run/file_backup/checkpoint_1/snapshot \
-o outputs/eval -p file_backup -c 1 -e configs/environments/docker-python3.12-uv.yaml
Metrics and Analysis
# Calculate static code quality metrics
slop-code metrics static outputs/my_run
# Run LLM judge evaluation
slop-code metrics judge outputs/my_run -r configs/rubrics/slop.jsonl -m anthropic/sonnet-4.5
# Compute variance across runs
slop-code metrics variance base outputs/runs -o outputs/variance
Utilities
# Backfill reports for existing runs
slop-code utils backfill-reports outputs/my_run
# Combine results from multiple runs
slop-code utils combine-results outputs/all_runs -o outputs/combined.jsonl
Docker Management
# Build base image
slop-code docker build-base configs/environments/docker-python3.12-uv.yaml
# Build agent image
slop-code docker build-agent configs/agents/claude_code-2.0.51.yaml configs/environments/docker-python3.12-uv.yaml
Problem Inspection
# List all problems
slop-code problems ls
# Check problem conversion status
slop-code problems status file_backup
Test Case Runner
# Run pytest tests for a snapshot
slop-code tools run-case -s outputs/snapshot -p file_backup -c 1 -e configs/environments/docker-python3.12-uv.yaml
Visualization
# Launch diff viewer for a run
slop-code viz diff outputs/my_run
Documentation Index
| Document | Description |
|---|---|
| run.md | Comprehensive guide to slop-code run with configuration system |
| sync.md | Managing the external problem catalog |
| eval.md | Evaluating run directories |
| eval-problem.md | Evaluating single problems |
| eval-snapshot.md | Evaluating single snapshots |
| infer-problem.md | Running inference on single problems |
| metrics.md | All metrics subcommands |
| utils.md | All utility subcommands |
| docker.md | Docker image management |
| problems.md | Problem inspection commands |
| tools.md | Interactive tools |
| viz.md | Visualization tools |
See Also
- Agent Configuration - Agent setup and configuration
- Evaluation System - How evaluation works
- Problem Authoring - Creating and configuring problems