infer-problem
January 6, 2026 · View on GitHub
Run inference on a single problem with explicit configuration.
Quick Start
slop-code infer-problem file_backup \
-a configs/agents/claude_code-2.0.51.yaml \
-e configs/environments/docker-python3.12-uv.yaml \
-o outputs/test_run \
-prompt configs/prompts/just-solve.jinja \
-m anthropic/sonnet-4.5
Usage
slop-code infer-problem [OPTIONS] PROBLEM_NAME
Arguments
| Argument | Required | Description |
|---|---|---|
PROBLEM_NAME | Yes | Name of the problem (must exist in problem directory) |
Options
| Option | Type | Default | Description |
|---|---|---|---|
-a, --agent | path | required | Path to agent configuration YAML |
-e, --environment-config-path | path | required | Path to environment configuration |
-o, --output-path | path | required | Path to output directory |
-prompt, --prompt-template | path | required | Path to prompt template |
-m, --model | string | required | Model in format {provider}/{model} |
--provider-api-key-env | string | - | Override API key environment variable |
--thinking | string | - | Thinking budget: none, low, medium, high |
--max-thinking-tokens | int | - | Maximum thinking tokens |
-pass, --pass-policy | enum | ANY | Policy to determine checkpoint pass (any, any-case, all-cases, all-non-error-cases, core-cases, any-core-cases, all-core-cases) |
--evaluate/--no-evaluate | flag | true | Whether to run evaluation |
Behavior
This command provides low-level control over running a single problem. It:
- Loads agent, environment, and prompt configurations
- Resolves API credentials for the specified provider
- Builds Docker image if required by agent
- Runs the agent through all checkpoints
- Optionally evaluates results after inference
Thinking Options
--thinking and --max-thinking-tokens are mutually exclusive:
# Use preset
--thinking medium
# Use explicit token budget
--max-thinking-tokens 10000
Examples
Basic run:
slop-code infer-problem file_backup \
-a configs/agents/claude_code-2.0.51.yaml \
-e configs/environments/docker-python3.12-uv.yaml \
-o outputs/test \
-prompt configs/prompts/just-solve.jinja \
-m anthropic/sonnet-4.5
With extended thinking:
slop-code infer-problem file_backup \
-a configs/agents/claude_code-2.0.51.yaml \
-e configs/environments/docker-python3.12-uv.yaml \
-o outputs/test_thinking \
-prompt configs/prompts/just-solve.jinja \
-m anthropic/sonnet-4.5 \
--thinking high
Inference only (no evaluation):
slop-code infer-problem file_backup \
-a configs/agents/claude_code-2.0.51.yaml \
-e configs/environments/docker-python3.12-uv.yaml \
-o outputs/infer_only \
-prompt configs/prompts/just-solve.jinja \
-m anthropic/sonnet-4.5 \
--no-evaluate
Custom API key:
slop-code infer-problem file_backup \
-a configs/agents/claude_code-2.0.51.yaml \
-e configs/environments/docker-python3.12-uv.yaml \
-o outputs/test \
-prompt configs/prompts/just-solve.jinja \
-m anthropic/sonnet-4.5 \
--provider-api-key-env MY_ANTHROPIC_KEY
Output Structure
OUTPUT_PATH/
└── PROBLEM_NAME/
├── checkpoint_1/
│ ├── snapshot/
│ │ └── <agent code>
│ └── evaluation.json # if --evaluate
├── checkpoint_2/
│ └── ...
└── agent/
└── <agent artifacts>