OSOP Agent Runtime Binding

April 1, 2026 · View on GitHub

Version: 1.0.0-draft Status: Draft Date: 2026-03-31


1. Purpose

This document defines how AI agents consume .osop files as executable standard operating procedures. While the core OSOP spec defines what a process is, this document defines how agents bind to it at execution time.

An OSOP file serves two audiences simultaneously:

AudienceReadsSees
Human.osop via editor/docsGraph, Story, Role views
AI Agent.osop directlyStructured SOP to execute step-by-step

This dual-mode design is the core value proposition of OSOP: one file, two consumers, zero translation.


2. Agent Execution Model

An agent executing an OSOP workflow follows a plan-act-verify loop, where the .osop file is the plan:

┌──────────────────────────────┐
│         .osop File           │
│  (the plan / the SOP)        │
└──────────────┬───────────────┘


┌──────────────────────────────┐
│     Agent Runtime Loop       │
│                              │
│  1. Read current node        │
│  2. Resolve inputs           │
│  3. Execute action           │
│  4. Capture outputs          │
│  5. Evaluate outgoing edges  │
│  6. Advance to next node(s)  │
│  7. Log to .osoplog          │
│  8. Repeat until terminal    │
│                              │
└──────────────────────────────┘

2.1 Node-to-Action Mapping

Each OSOP node type maps to a specific agent action:

Node TypeAgent ActionTool / Mechanism
humanPause and request human inputPresent prompt, wait for response
agentInvoke LLM / sub-agentAPI call to model provider
apiHTTP/gRPC callUse http_request tool or MCP tool
cliExecute shell commandUse bash / exec tool
dbExecute database queryUse db_query tool or API
gitExecute git operationUse git CLI or API
dockerContainer operationUse docker CLI
cicdTrigger CI/CD pipelineUse platform API (GitHub Actions, etc.)
infraProvision infrastructureUse terraform / kubectl / etc.
mcpInvoke MCP toolDirect MCP tool call
systemSystem operationFile I/O, process management
eventWait for or emit eventWebhook listener, event bus
gatewayEvaluate routing logicEvaluate when expressions
dataTransform dataApply mapping / aggregation
companyCross-org messageSend via protocol (API, EDI, email)
departmentInternal routingForward to department queue

2.2 Edge-to-Control-Flow Mapping

Each edge mode maps to an agent control flow pattern:

Edge ModeAgent Behavior
sequentialExecute target after source completes
conditionalEvaluate when expression; skip target if false
parallelFork: execute all parallel targets concurrently (Promise.all / asyncio.gather)
loopWhile when is true, re-execute target; respect max_iterations
eventBlock until external event matches when filter
fallbackOn source failure, execute target instead of propagating error
errorOn specific error matching when, execute target
timeoutOn source timeout, execute target
compensationOn downstream failure, execute target to undo source's effects
messageSend structured message to external system (B2B)
dataflowPass data from source outputs to target inputs (no control dependency)
signalWait for external signal before proceeding
weightedRandomly select target based on weight percentages

3. Agent Reading Protocol

When an agent receives a .osop file, it MUST follow this reading order:

Step 1: Parse Metadata

# Agent extracts:
osop_version: "1.0"     # → Verify compatibility
id: "deploy-service"     # → Use as workflow identifier
name: "Deploy Service"   # → Use as task description
description: "..."       # → Use as context for decision-making

Step 2: Identify Entry Points

Entry points are nodes with no incoming edges, or nodes referenced by triggers:

# Pseudocode
entry_nodes = [n for n in nodes if no_edge_targets(n.id)]
if workflow.triggers:
    entry_nodes = match_trigger(workflow.triggers, current_context)

Step 3: Build Execution Graph

The agent constructs a directed graph and performs topological sort:

# Pseudocode
graph = build_graph(nodes, edges)
execution_order = topological_sort(graph)
parallel_groups = identify_parallel_forks(graph)

Step 4: Resolve Variables

Before executing each node, resolve all ${...} interpolations:

NamespaceSource
${inputs.*}Workflow inputs or previous node outputs
${outputs.<node_id>.*}Specific node's output
${secrets.*}Secrets provider (Vault, env vars, etc.)
${env.*}Environment variables
${metadata.*}Runtime metadata (run_id, timestamp, etc.)

Step 5: Execute Node

For each node, the agent:

  1. Checks preconditions — all CEL expressions must evaluate to true
  2. Checks valid_window — current time must be within window
  3. Resolves inputs from upstream outputs or workflow inputs
  4. Executes the action based on type and runtime config
  5. Validates outputs against success_criteria
  6. Records execution to .osoplog

Step 6: Evaluate Outgoing Edges

After a node completes (or fails), the agent evaluates all outgoing edges:

# Pseudocode
for edge in outgoing_edges(current_node):
    if edge.mode == "sequential":
        queue(edge.to)
    elif edge.mode == "conditional":
        if evaluate_cel(edge.when, context):
            queue(edge.to)
    elif edge.mode == "parallel":
        fork(edge.to)
    elif edge.mode == "fallback" and current_node.status == "FAILED":
        queue(edge.to)
    elif edge.mode == "loop":
        if evaluate_cel(edge.when, context):
            queue(edge.to)  # re-queue source
    # ... etc

4. Node Field Semantics for Agents

4.1 purpose — The Agent's Instruction

The purpose field is the primary instruction an agent uses to understand what to do at each step. It SHOULD be written as a clear, actionable instruction.

# Good: Specific, actionable
purpose: "Query Jira API for all tickets in sprint 42 with status 'In Progress'"

# Bad: Vague
purpose: "Get ticket data"

4.2 explain — Context for Decision-Making

The explain block provides additional context that helps agents make better decisions:

explain:
  why: "We check for duplicates before assignment to avoid wasted effort"
  what: "Compare incoming bug title against open issues using fuzzy match"
  result: "Returns is_duplicate: bool and duplicate_of: string"
FieldAgent Use
whyHelps agent understand intent when making judgment calls
whatDetailed description of the action to perform
resultExpected output format — helps agent structure its response

4.3 runtime — Execution Configuration

The runtime block contains everything an agent needs to execute the node:

# For agent nodes: model, prompt, tools
runtime:
  model: "claude-sonnet-4-20250514"
  provider: "anthropic"
  system_prompt: "You are a code reviewer..."
  temperature: 0.2
  tools:
    - name: "read_file"
    - name: "search_code"

# For API nodes: endpoint, method, auth
runtime:
  method: POST
  url: "https://api.example.com/tickets"
  headers:
    Authorization: "Bearer ${secrets.JIRA_TOKEN}"

# For CLI nodes: command, env
runtime:
  command: "npm run test"
  working_dir: "${env.PROJECT_ROOT}"

4.4 inputs / outputs — Data Contract

Agents MUST respect input/output schemas for data passing between nodes:

inputs:
  - name: bug_description
    type: string
    required: true
  - name: severity_hint
    type: string
    required: false

outputs:
  - name: category
    schema: "critical | high | medium | low"
  - name: assigned_to
    schema: string

4.5 handoff — Agent-to-Agent Context Transfer

When one agent node hands off to another, the handoff block defines what context to pass:

handoff:
  summary_for_next_node: "I analyzed the bug report and found it's a memory leak in the cache layer. Stack trace points to cache.js:142."
  expected_output: "A pull request fixing the memory leak with tests"
  escalation: "human_review"  # If next agent fails, escalate to this node

4.6 success_criteria — Verification

After execution, agents evaluate success_criteria to determine if the node succeeded:

success_criteria:
  - "All tests pass with 0 failures"
  - "Code coverage >= 80%"
  - "No critical security vulnerabilities"

If any criterion is not met, the node status is FAILED and fallback/error edges are evaluated.

4.7 retry_policy — Automatic Retry

Agents MUST implement retry behavior when specified:

retry_policy:
  max_retries: 3
  strategy: "exponential_backoff"
  backoff_sec: 2

Each retry attempt is logged as a separate entry in the .osoplog with incrementing attempt numbers.


5. MCP Tool Binding

OSOP nodes map directly to Model Context Protocol tool calls. The osop-mcp server exposes these tools:

5.1 Core MCP Tools

MCP ToolDescriptionMaps To
osop_validateValidate a .osop file against the schemaPre-execution check
osop_runExecute a workflow (dry_run, simulated, live)Full agent loop
osop_stepExecute a single nodeOne iteration of the loop
osop_renderRender workflow in a given format (story, graph, agent)Human output
osop_statusGet current execution statusMonitoring
osop_logWrite a .osoplog entryRecording

5.2 Node-Level MCP Binding

When a node has type: mcp, it directly invokes an MCP tool:

- id: "search_codebase"
  type: mcp
  purpose: "Search the codebase for references to the deprecated API"
  runtime:
    server: "filesystem"
    tool: "search_files"
    arguments:
      pattern: "deprecated_api_v1"
      path: "${env.PROJECT_ROOT}"

5.3 Any Node via MCP

Non-MCP nodes can also be executed via MCP tools if the agent runtime supports it:

# This API node...
- id: "create_ticket"
  type: api
  runtime:
    method: POST
    url: "https://api.jira.com/rest/api/3/issue"

# ...can be executed by an agent using an HTTP MCP tool:
# mcp_call("http_request", { method: "POST", url: "...", body: "..." })

6. .osoplog — Execution Record Format

Every workflow execution produces a .osoplog file — an immutable, append-only record of what happened.

6.1 File Format

PropertyValue
Extension.osoplog, .osoplog.yaml, .osoplog.json
EncodingUTF-8
StructureSingle YAML/JSON document

6.2 Schema

osoplog_version: "1.0"
run_id: "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
workflow_id: "pr-review-pipeline"
workflow_version: "1.2.0"
workflow_hash: "sha256:abc123..."       # Hash of the .osop file at execution time

mode: "live"                            # live | dry_run | simulated
status: "COMPLETED"                     # PENDING | RUNNING | COMPLETED | FAILED | CANCELLED

trigger:
  type: "event"
  source: "github-webhook"
  actor: "developer@example.com"
  timestamp: "2026-03-31T10:00:00Z"

started_at: "2026-03-31T10:00:00Z"
ended_at: "2026-03-31T10:05:23Z"
duration_ms: 323000

# Agent runtime that executed the workflow
runtime:
  agent: "claude-code"                  # Which agent system ran this
  agent_version: "1.0.45"
  model: "claude-sonnet-4-20250514"            # Primary model used
  platform: "macos-14.2"
  session_id: "sess_abc123"

# Node-by-node execution trace
node_records:
  - node_id: "pr_opened"
    node_type: "event"
    attempt: 1
    status: "COMPLETED"
    started_at: "2026-03-31T10:00:00Z"
    ended_at: "2026-03-31T10:00:01Z"
    duration_ms: 1000
    outputs:
      pr_url: "https://github.com/org/repo/pull/42"
      diff_size: 234

  - node_id: "ai_code_review"
    node_type: "agent"
    attempt: 1
    status: "COMPLETED"
    started_at: "2026-03-31T10:00:01Z"
    ended_at: "2026-03-31T10:00:04Z"
    duration_ms: 3200
    inputs:
      diff_content: "[234 lines]"
    outputs:
      review_comments: 3
      risk_level: "medium"
    ai_metadata:
      model: "claude-sonnet-4-20250514"
      provider: "anthropic"
      prompt_tokens: 4500
      completion_tokens: 800
      cost_usd: 0.021
      confidence: 0.87
    tools_used:
      - { tool: "read_file", calls: 3 }
      - { tool: "search_code", calls: 1 }

  - node_id: "run_ci"
    node_type: "cicd"
    attempt: 1
    status: "FAILED"
    started_at: "2026-03-31T10:00:01Z"
    ended_at: "2026-03-31T10:02:30Z"
    duration_ms: 149000
    error:
      code: "TEST_FAILURE"
      message: "3 tests failed in auth.test.ts"
      details: "Expected 200, got 401 on line 42"

  - node_id: "run_ci"
    node_type: "cicd"
    attempt: 2                          # Retry
    status: "COMPLETED"
    started_at: "2026-03-31T10:02:35Z"
    ended_at: "2026-03-31T10:04:50Z"
    duration_ms: 135000
    outputs:
      test_report: "42/42 passed"

# Variables snapshot at end of execution
variables:
  pr_url: "https://github.com/org/repo/pull/42"
  ci_result: "pass"
  review_risk: "medium"

# Human-readable summary
result_summary: "PR #42 reviewed and merged. AI found 3 issues (1 fixed, 2 acknowledged). CI passed on retry."

# Cost tracking
cost:
  total_usd: 0.034
  breakdown:
    - { node_id: "ai_code_review", cost_usd: 0.021 }
    - { node_id: "human_review", cost_usd: 0.0 }
    - { node_id: "run_ci", cost_usd: 0.013 }

6.3 Required Fields

FieldTypeDescription
osoplog_versionstringMust be "1.0"
run_idstringUUID v4, unique per execution
workflow_idstringMatches the .osop file's id
statusstringFinal execution status
started_atstringISO 8601 timestamp
node_recordsarrayAt least one node record

6.4 Node Record Required Fields

FieldTypeDescription
node_idstringMatches a node id in the .osop file
node_typestringThe node's type
attemptintegerAttempt number (1-indexed)
statusstringCOMPLETED, FAILED, SKIPPED, TIMED_OUT
started_atstringISO 8601 timestamp

6.5 Optional Metadata Blocks

ai_metadata — Present when node involves LLM:

ai_metadata:
  model: "claude-sonnet-4-20250514"
  provider: "anthropic"
  prompt_tokens: 4500
  completion_tokens: 800
  cost_usd: 0.021
  confidence: 0.87              # Agent's self-assessed confidence (0-1)
  reasoning_trace: "..."        # Optional: chain-of-thought summary

human_metadata — Present when node involves human:

human_metadata:
  actor: "alice@example.com"
  decision: "approved"
  notes: "Looks good, but watch the auth changes"
  response_time_ms: 45000

tools_used — Tools invoked during node execution:

tools_used:
  - tool: "read_file"
    calls: 3
    avg_duration_ms: 12
  - tool: "bash"
    calls: 1
    avg_duration_ms: 2300

7. Iteration Protocol

After multiple executions, agents can analyze .osoplog files to improve the workflow.

7.1 Stats Aggregation

From N execution logs, compute:

stats:
  total_runs: 25
  success_rate: 0.84
  avg_duration_ms: 312000
  node_stats:
    ai_code_review:
      avg_duration_ms: 3200
      failure_rate: 0.04
      avg_cost_usd: 0.021
    run_ci:
      avg_duration_ms: 142000
      failure_rate: 0.16        # ← Hotspot
      common_errors:
        - "TEST_FAILURE: auth.test.ts"
        - "TIMEOUT: integration tests"
    human_review:
      avg_duration_ms: 86400000  # ← Bottleneck (24h average)
      failure_rate: 0.0

7.2 Improvement Suggestions

Based on stats, an agent can propose improvements:

PatternSuggestion
High failure rate on a nodeAdd retry_policy or fallback edge
Slow node blocking the pipelineAdd timeout or parallelize with other work
Human node with long wait timesAdd auto-approval for low-risk cases
Repeated error patternAdd pre-check node before the failing step
Redundant sequential stepsMerge or parallelize

7.3 Iteration Record

iteration:
  id: "iter-001"
  based_on_runs: ["run-1", "run-2", ..., "run-25"]
  analysis:
    hotspots:
      - node_id: "run_ci"
        issue: "16% failure rate, mostly auth test failures"
        suggestion: "Add retry_policy with max_retries: 2"
    bottlenecks:
      - node_id: "human_review"
        issue: "Average 24h wait time"
        suggestion: "Add auto-approve path for PRs with risk_level == 'low'"
  proposed_changes:
    - type: "add_retry"
      target: "run_ci"
      config:
        max_retries: 2
        strategy: "fixed"
        backoff_sec: 30
    - type: "add_conditional_bypass"
      target: "human_review"
      condition: "risk_level == 'low' AND ci_result == 'pass'"
      bypass_to: "auto_merge"
  status: "proposed"            # proposed | approved | applied | rejected

8. Multi-Agent Orchestration

When a workflow contains multiple agent nodes, the runtime must handle agent-to-agent coordination.

8.1 Context Passing

Agent nodes pass context through outputsinputs via edges:

nodes:
  - id: planner
    type: agent
    runtime:
      model: openclaw-v1
    outputs:
      - name: architecture_doc

  - id: coder
    type: agent
    runtime:
      model: claude-sonnet-4-20250514
    inputs:
      - name: architecture_doc     # ← Receives planner's output

edges:
  - from: planner
    to: coder
    mode: sequential

8.2 Handoff Protocol

When agents hand off to each other, the handoff block standardizes what gets passed:

Agent A completes

    ├─ outputs → stored as node outputs
    ├─ handoff.summary_for_next_node → prepended to Agent B's context
    └─ handoff.expected_output → becomes Agent B's success criterion


Agent B receives

    ├─ inputs ← mapped from Agent A's outputs
    ├─ context ← handoff summary
    └─ goal ← expected output

8.3 Sub-Agent Isolation

When a node uses subtype: multi-agent, the runtime spawns isolated sub-agents:

- id: research_team
  type: agent
  subtype: multi-agent
  runtime:
    agents:
      - role: "web_researcher"
        model: "claude-sonnet-4-20250514"
        task: "Search for competitor pricing data"
      - role: "data_analyst"
        model: "claude-sonnet-4-20250514"
        task: "Analyze the collected data for trends"
    coordination: "sequential"    # sequential | parallel | consensus
    isolation: true               # Each sub-agent gets clean context

9. Security Considerations

9.1 Permission Model

Before executing any node, the agent runtime MUST check:

  1. Node permissions — Does the agent have the required security.permissions?
  2. Secret access — Are referenced secrets available and authorized?
  3. Classification — Does the agent's clearance level meet the node's classification?
  4. Approval gate — If approval_gate.required, pause and wait for human approval

9.2 Dangerous Action Classification

For cli and bash-equivalent nodes, the agent MUST classify the command:

Risk LevelExamplesRequired Action
Safegit status, ls, catExecute immediately
Moderatenpm install, docker buildLog and execute
Dangerousrm -rf, DROP TABLE, git push --forceRequire human approval
Blocked`curlbash`, untrusted URLs

9.3 Data Classification Enforcement

Nodes with classification fields restrict data flow:

- id: process_pii
  type: agent
  classification: "confidential"
  # Agent MUST NOT:
  # - Log PII to .osoplog outputs
  # - Pass PII to nodes with lower classification
  # - Send PII to external services without encryption

10. Conformance Requirements

Level 0 — Descriptive (Read-Only)

Agent can read and understand the workflow. No execution.

  • Parse .osop file
  • Explain workflow in natural language
  • Identify entry points and terminal nodes
  • List all node types and edge modes

Level 1 — Validatable

Agent can validate the workflow structure.

  • All Level 0 requirements
  • Validate input/output schemas
  • Check edge connectivity (no orphan nodes)
  • Verify CEL expressions are syntactically valid
  • Validate against JSON Schema

Level 2 — Executable

Agent can execute the workflow.

  • All Level 1 requirements
  • Execute nodes based on type and runtime
  • Evaluate when conditions
  • Handle retry_policy and timeout
  • Produce .osoplog records
  • Implement all 13 edge modes

Level 3 — Observable

Agent produces full telemetry.

  • All Level 2 requirements
  • Emit OpenTelemetry traces
  • Track cost per node (ai_metadata.cost_usd)
  • Compute and report WorkflowStats
  • Generate iteration improvement suggestions

11. Reference Implementation

The reference implementation is the osop-mcp server, which exposes OSOP operations as MCP tools that any compatible agent can use.

{
  "name": "osop",
  "version": "1.0.0",
  "tools": [
    {
      "name": "osop_validate",
      "description": "Validate a .osop workflow file",
      "inputSchema": {
        "type": "object",
        "properties": {
          "workflow": { "type": "string", "description": "YAML content or file path" }
        },
        "required": ["workflow"]
      }
    },
    {
      "name": "osop_run",
      "description": "Execute a workflow",
      "inputSchema": {
        "type": "object",
        "properties": {
          "workflow": { "type": "string" },
          "mode": { "type": "string", "enum": ["live", "dry_run", "simulated"] },
          "inputs": { "type": "object" }
        },
        "required": ["workflow"]
      }
    },
    {
      "name": "osop_step",
      "description": "Execute a single node in a workflow",
      "inputSchema": {
        "type": "object",
        "properties": {
          "workflow": { "type": "string" },
          "node_id": { "type": "string" },
          "inputs": { "type": "object" }
        },
        "required": ["workflow", "node_id"]
      }
    }
  ]
}

This document is part of the OSOP specification. See SPEC.md for the core protocol. For the .osoplog JSON Schema, see schema/osoplog.schema.json.