infer-problem

January 6, 2026 · View on GitHub

Run inference on a single problem with explicit configuration.

Quick Start

slop-code infer-problem file_backup \
  -a configs/agents/claude_code-2.0.51.yaml \
  -e configs/environments/docker-python3.12-uv.yaml \
  -o outputs/test_run \
  -prompt configs/prompts/just-solve.jinja \
  -m anthropic/sonnet-4.5

Usage

slop-code infer-problem [OPTIONS] PROBLEM_NAME

Arguments

ArgumentRequiredDescription
PROBLEM_NAMEYesName of the problem (must exist in problem directory)

Options

OptionTypeDefaultDescription
-a, --agentpathrequiredPath to agent configuration YAML
-e, --environment-config-pathpathrequiredPath to environment configuration
-o, --output-pathpathrequiredPath to output directory
-prompt, --prompt-templatepathrequiredPath to prompt template
-m, --modelstringrequiredModel in format {provider}/{model}
--provider-api-key-envstring-Override API key environment variable
--thinkingstring-Thinking budget: none, low, medium, high
--max-thinking-tokensint-Maximum thinking tokens
-pass, --pass-policyenumANYPolicy to determine checkpoint pass (any, any-case, all-cases, all-non-error-cases, core-cases, any-core-cases, all-core-cases)
--evaluate/--no-evaluateflagtrueWhether to run evaluation

Behavior

This command provides low-level control over running a single problem. It:

  1. Loads agent, environment, and prompt configurations
  2. Resolves API credentials for the specified provider
  3. Builds Docker image if required by agent
  4. Runs the agent through all checkpoints
  5. Optionally evaluates results after inference

Thinking Options

--thinking and --max-thinking-tokens are mutually exclusive:

# Use preset
--thinking medium

# Use explicit token budget
--max-thinking-tokens 10000

Examples

Basic run:

slop-code infer-problem file_backup \
  -a configs/agents/claude_code-2.0.51.yaml \
  -e configs/environments/docker-python3.12-uv.yaml \
  -o outputs/test \
  -prompt configs/prompts/just-solve.jinja \
  -m anthropic/sonnet-4.5

With extended thinking:

slop-code infer-problem file_backup \
  -a configs/agents/claude_code-2.0.51.yaml \
  -e configs/environments/docker-python3.12-uv.yaml \
  -o outputs/test_thinking \
  -prompt configs/prompts/just-solve.jinja \
  -m anthropic/sonnet-4.5 \
  --thinking high

Inference only (no evaluation):

slop-code infer-problem file_backup \
  -a configs/agents/claude_code-2.0.51.yaml \
  -e configs/environments/docker-python3.12-uv.yaml \
  -o outputs/infer_only \
  -prompt configs/prompts/just-solve.jinja \
  -m anthropic/sonnet-4.5 \
  --no-evaluate

Custom API key:

slop-code infer-problem file_backup \
  -a configs/agents/claude_code-2.0.51.yaml \
  -e configs/environments/docker-python3.12-uv.yaml \
  -o outputs/test \
  -prompt configs/prompts/just-solve.jinja \
  -m anthropic/sonnet-4.5 \
  --provider-api-key-env MY_ANTHROPIC_KEY

Output Structure

OUTPUT_PATH/
└── PROBLEM_NAME/
    ├── checkpoint_1/
    │   ├── snapshot/
    │   │   └── <agent code>
    │   └── evaluation.json  # if --evaluate
    ├── checkpoint_2/
    │   └── ...
    └── agent/
        └── <agent artifacts>

See Also

  • run - Run agents with unified config system (recommended)
  • eval - Evaluate inference results