infer-problem

January 6, 2026 · View on GitHub

Run inference on a single problem with explicit configuration.

Quick Start

slop-code infer-problem file_backup \
  -a configs/agents/claude_code-2.0.51.yaml \
  -e configs/environments/docker-python3.12-uv.yaml \
  -o outputs/test_run \
  -prompt configs/prompts/just-solve.jinja \
  -m anthropic/sonnet-4.5

Usage

slop-code infer-problem [OPTIONS] PROBLEM_NAME

Arguments

Argument	Required	Description
`PROBLEM_NAME`	Yes	Name of the problem (must exist in problem directory)

Options

Option	Type	Default	Description
`-a, --agent`	path	required	Path to agent configuration YAML
`-e, --environment-config-path`	path	required	Path to environment configuration
`-o, --output-path`	path	required	Path to output directory
`-prompt, --prompt-template`	path	required	Path to prompt template
`-m, --model`	string	required	Model in format `{provider}/{model}`
`--provider-api-key-env`	string	-	Override API key environment variable
`--thinking`	string	-	Thinking budget: none, low, medium, high
`--max-thinking-tokens`	int	-	Maximum thinking tokens
`-pass, --pass-policy`	enum	`ANY`	Policy to determine checkpoint pass (any, any-case, all-cases, all-non-error-cases, core-cases, any-core-cases, all-core-cases)
`--evaluate/--no-evaluate`	flag	true	Whether to run evaluation

Behavior

This command provides low-level control over running a single problem. It:

Loads agent, environment, and prompt configurations
Resolves API credentials for the specified provider
Builds Docker image if required by agent
Runs the agent through all checkpoints
Optionally evaluates results after inference

Thinking Options

--thinking and --max-thinking-tokens are mutually exclusive:

# Use preset
--thinking medium

# Use explicit token budget
--max-thinking-tokens 10000

Examples

Basic run:

slop-code infer-problem file_backup \
  -a configs/agents/claude_code-2.0.51.yaml \
  -e configs/environments/docker-python3.12-uv.yaml \
  -o outputs/test \
  -prompt configs/prompts/just-solve.jinja \
  -m anthropic/sonnet-4.5

With extended thinking:

slop-code infer-problem file_backup \
  -a configs/agents/claude_code-2.0.51.yaml \
  -e configs/environments/docker-python3.12-uv.yaml \
  -o outputs/test_thinking \
  -prompt configs/prompts/just-solve.jinja \
  -m anthropic/sonnet-4.5 \
  --thinking high

Inference only (no evaluation):

slop-code infer-problem file_backup \
  -a configs/agents/claude_code-2.0.51.yaml \
  -e configs/environments/docker-python3.12-uv.yaml \
  -o outputs/infer_only \
  -prompt configs/prompts/just-solve.jinja \
  -m anthropic/sonnet-4.5 \
  --no-evaluate

Custom API key:

slop-code infer-problem file_backup \
  -a configs/agents/claude_code-2.0.51.yaml \
  -e configs/environments/docker-python3.12-uv.yaml \
  -o outputs/test \
  -prompt configs/prompts/just-solve.jinja \
  -m anthropic/sonnet-4.5 \
  --provider-api-key-env MY_ANTHROPIC_KEY

Output Structure

OUTPUT_PATH/
└── PROBLEM_NAME/
    ├── checkpoint_1/
    │   ├── snapshot/
    │   │   └── <agent code>
    │   └── evaluation.json  # if --evaluate
    ├── checkpoint_2/
    │   └── ...
    └── agent/
        └── <agent artifacts>