OSOP Agent Runtime Binding
April 1, 2026 · View on GitHub
Version: 1.0.0-draft Status: Draft Date: 2026-03-31
1. Purpose
This document defines how AI agents consume .osop files as executable standard operating procedures. While the core OSOP spec defines what a process is, this document defines how agents bind to it at execution time.
An OSOP file serves two audiences simultaneously:
| Audience | Reads | Sees |
|---|---|---|
| Human | .osop via editor/docs | Graph, Story, Role views |
| AI Agent | .osop directly | Structured SOP to execute step-by-step |
This dual-mode design is the core value proposition of OSOP: one file, two consumers, zero translation.
2. Agent Execution Model
An agent executing an OSOP workflow follows a plan-act-verify loop, where the .osop file is the plan:
┌──────────────────────────────┐
│ .osop File │
│ (the plan / the SOP) │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ Agent Runtime Loop │
│ │
│ 1. Read current node │
│ 2. Resolve inputs │
│ 3. Execute action │
│ 4. Capture outputs │
│ 5. Evaluate outgoing edges │
│ 6. Advance to next node(s) │
│ 7. Log to .osoplog │
│ 8. Repeat until terminal │
│ │
└──────────────────────────────┘
2.1 Node-to-Action Mapping
Each OSOP node type maps to a specific agent action:
| Node Type | Agent Action | Tool / Mechanism |
|---|---|---|
human | Pause and request human input | Present prompt, wait for response |
agent | Invoke LLM / sub-agent | API call to model provider |
api | HTTP/gRPC call | Use http_request tool or MCP tool |
cli | Execute shell command | Use bash / exec tool |
db | Execute database query | Use db_query tool or API |
git | Execute git operation | Use git CLI or API |
docker | Container operation | Use docker CLI |
cicd | Trigger CI/CD pipeline | Use platform API (GitHub Actions, etc.) |
infra | Provision infrastructure | Use terraform / kubectl / etc. |
mcp | Invoke MCP tool | Direct MCP tool call |
system | System operation | File I/O, process management |
event | Wait for or emit event | Webhook listener, event bus |
gateway | Evaluate routing logic | Evaluate when expressions |
data | Transform data | Apply mapping / aggregation |
company | Cross-org message | Send via protocol (API, EDI, email) |
department | Internal routing | Forward to department queue |
2.2 Edge-to-Control-Flow Mapping
Each edge mode maps to an agent control flow pattern:
| Edge Mode | Agent Behavior |
|---|---|
sequential | Execute target after source completes |
conditional | Evaluate when expression; skip target if false |
parallel | Fork: execute all parallel targets concurrently (Promise.all / asyncio.gather) |
loop | While when is true, re-execute target; respect max_iterations |
event | Block until external event matches when filter |
fallback | On source failure, execute target instead of propagating error |
error | On specific error matching when, execute target |
timeout | On source timeout, execute target |
compensation | On downstream failure, execute target to undo source's effects |
message | Send structured message to external system (B2B) |
dataflow | Pass data from source outputs to target inputs (no control dependency) |
signal | Wait for external signal before proceeding |
weighted | Randomly select target based on weight percentages |
3. Agent Reading Protocol
When an agent receives a .osop file, it MUST follow this reading order:
Step 1: Parse Metadata
# Agent extracts:
osop_version: "1.0" # → Verify compatibility
id: "deploy-service" # → Use as workflow identifier
name: "Deploy Service" # → Use as task description
description: "..." # → Use as context for decision-making
Step 2: Identify Entry Points
Entry points are nodes with no incoming edges, or nodes referenced by triggers:
# Pseudocode
entry_nodes = [n for n in nodes if no_edge_targets(n.id)]
if workflow.triggers:
entry_nodes = match_trigger(workflow.triggers, current_context)
Step 3: Build Execution Graph
The agent constructs a directed graph and performs topological sort:
# Pseudocode
graph = build_graph(nodes, edges)
execution_order = topological_sort(graph)
parallel_groups = identify_parallel_forks(graph)
Step 4: Resolve Variables
Before executing each node, resolve all ${...} interpolations:
| Namespace | Source |
|---|---|
${inputs.*} | Workflow inputs or previous node outputs |
${outputs.<node_id>.*} | Specific node's output |
${secrets.*} | Secrets provider (Vault, env vars, etc.) |
${env.*} | Environment variables |
${metadata.*} | Runtime metadata (run_id, timestamp, etc.) |
Step 5: Execute Node
For each node, the agent:
- Checks
preconditions— all CEL expressions must evaluate totrue - Checks
valid_window— current time must be within window - Resolves
inputsfrom upstream outputs or workflow inputs - Executes the action based on
typeandruntimeconfig - Validates outputs against
success_criteria - Records execution to
.osoplog
Step 6: Evaluate Outgoing Edges
After a node completes (or fails), the agent evaluates all outgoing edges:
# Pseudocode
for edge in outgoing_edges(current_node):
if edge.mode == "sequential":
queue(edge.to)
elif edge.mode == "conditional":
if evaluate_cel(edge.when, context):
queue(edge.to)
elif edge.mode == "parallel":
fork(edge.to)
elif edge.mode == "fallback" and current_node.status == "FAILED":
queue(edge.to)
elif edge.mode == "loop":
if evaluate_cel(edge.when, context):
queue(edge.to) # re-queue source
# ... etc
4. Node Field Semantics for Agents
4.1 purpose — The Agent's Instruction
The purpose field is the primary instruction an agent uses to understand what to do at each step. It SHOULD be written as a clear, actionable instruction.
# Good: Specific, actionable
purpose: "Query Jira API for all tickets in sprint 42 with status 'In Progress'"
# Bad: Vague
purpose: "Get ticket data"
4.2 explain — Context for Decision-Making
The explain block provides additional context that helps agents make better decisions:
explain:
why: "We check for duplicates before assignment to avoid wasted effort"
what: "Compare incoming bug title against open issues using fuzzy match"
result: "Returns is_duplicate: bool and duplicate_of: string"
| Field | Agent Use |
|---|---|
why | Helps agent understand intent when making judgment calls |
what | Detailed description of the action to perform |
result | Expected output format — helps agent structure its response |
4.3 runtime — Execution Configuration
The runtime block contains everything an agent needs to execute the node:
# For agent nodes: model, prompt, tools
runtime:
model: "claude-sonnet-4-20250514"
provider: "anthropic"
system_prompt: "You are a code reviewer..."
temperature: 0.2
tools:
- name: "read_file"
- name: "search_code"
# For API nodes: endpoint, method, auth
runtime:
method: POST
url: "https://api.example.com/tickets"
headers:
Authorization: "Bearer ${secrets.JIRA_TOKEN}"
# For CLI nodes: command, env
runtime:
command: "npm run test"
working_dir: "${env.PROJECT_ROOT}"
4.4 inputs / outputs — Data Contract
Agents MUST respect input/output schemas for data passing between nodes:
inputs:
- name: bug_description
type: string
required: true
- name: severity_hint
type: string
required: false
outputs:
- name: category
schema: "critical | high | medium | low"
- name: assigned_to
schema: string
4.5 handoff — Agent-to-Agent Context Transfer
When one agent node hands off to another, the handoff block defines what context to pass:
handoff:
summary_for_next_node: "I analyzed the bug report and found it's a memory leak in the cache layer. Stack trace points to cache.js:142."
expected_output: "A pull request fixing the memory leak with tests"
escalation: "human_review" # If next agent fails, escalate to this node
4.6 success_criteria — Verification
After execution, agents evaluate success_criteria to determine if the node succeeded:
success_criteria:
- "All tests pass with 0 failures"
- "Code coverage >= 80%"
- "No critical security vulnerabilities"
If any criterion is not met, the node status is FAILED and fallback/error edges are evaluated.
4.7 retry_policy — Automatic Retry
Agents MUST implement retry behavior when specified:
retry_policy:
max_retries: 3
strategy: "exponential_backoff"
backoff_sec: 2
Each retry attempt is logged as a separate entry in the .osoplog with incrementing attempt numbers.
5. MCP Tool Binding
OSOP nodes map directly to Model Context Protocol tool calls. The osop-mcp server exposes these tools:
5.1 Core MCP Tools
| MCP Tool | Description | Maps To |
|---|---|---|
osop_validate | Validate a .osop file against the schema | Pre-execution check |
osop_run | Execute a workflow (dry_run, simulated, live) | Full agent loop |
osop_step | Execute a single node | One iteration of the loop |
osop_render | Render workflow in a given format (story, graph, agent) | Human output |
osop_status | Get current execution status | Monitoring |
osop_log | Write a .osoplog entry | Recording |
5.2 Node-Level MCP Binding
When a node has type: mcp, it directly invokes an MCP tool:
- id: "search_codebase"
type: mcp
purpose: "Search the codebase for references to the deprecated API"
runtime:
server: "filesystem"
tool: "search_files"
arguments:
pattern: "deprecated_api_v1"
path: "${env.PROJECT_ROOT}"
5.3 Any Node via MCP
Non-MCP nodes can also be executed via MCP tools if the agent runtime supports it:
# This API node...
- id: "create_ticket"
type: api
runtime:
method: POST
url: "https://api.jira.com/rest/api/3/issue"
# ...can be executed by an agent using an HTTP MCP tool:
# mcp_call("http_request", { method: "POST", url: "...", body: "..." })
6. .osoplog — Execution Record Format
Every workflow execution produces a .osoplog file — an immutable, append-only record of what happened.
6.1 File Format
| Property | Value |
|---|---|
| Extension | .osoplog, .osoplog.yaml, .osoplog.json |
| Encoding | UTF-8 |
| Structure | Single YAML/JSON document |
6.2 Schema
osoplog_version: "1.0"
run_id: "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
workflow_id: "pr-review-pipeline"
workflow_version: "1.2.0"
workflow_hash: "sha256:abc123..." # Hash of the .osop file at execution time
mode: "live" # live | dry_run | simulated
status: "COMPLETED" # PENDING | RUNNING | COMPLETED | FAILED | CANCELLED
trigger:
type: "event"
source: "github-webhook"
actor: "developer@example.com"
timestamp: "2026-03-31T10:00:00Z"
started_at: "2026-03-31T10:00:00Z"
ended_at: "2026-03-31T10:05:23Z"
duration_ms: 323000
# Agent runtime that executed the workflow
runtime:
agent: "claude-code" # Which agent system ran this
agent_version: "1.0.45"
model: "claude-sonnet-4-20250514" # Primary model used
platform: "macos-14.2"
session_id: "sess_abc123"
# Node-by-node execution trace
node_records:
- node_id: "pr_opened"
node_type: "event"
attempt: 1
status: "COMPLETED"
started_at: "2026-03-31T10:00:00Z"
ended_at: "2026-03-31T10:00:01Z"
duration_ms: 1000
outputs:
pr_url: "https://github.com/org/repo/pull/42"
diff_size: 234
- node_id: "ai_code_review"
node_type: "agent"
attempt: 1
status: "COMPLETED"
started_at: "2026-03-31T10:00:01Z"
ended_at: "2026-03-31T10:00:04Z"
duration_ms: 3200
inputs:
diff_content: "[234 lines]"
outputs:
review_comments: 3
risk_level: "medium"
ai_metadata:
model: "claude-sonnet-4-20250514"
provider: "anthropic"
prompt_tokens: 4500
completion_tokens: 800
cost_usd: 0.021
confidence: 0.87
tools_used:
- { tool: "read_file", calls: 3 }
- { tool: "search_code", calls: 1 }
- node_id: "run_ci"
node_type: "cicd"
attempt: 1
status: "FAILED"
started_at: "2026-03-31T10:00:01Z"
ended_at: "2026-03-31T10:02:30Z"
duration_ms: 149000
error:
code: "TEST_FAILURE"
message: "3 tests failed in auth.test.ts"
details: "Expected 200, got 401 on line 42"
- node_id: "run_ci"
node_type: "cicd"
attempt: 2 # Retry
status: "COMPLETED"
started_at: "2026-03-31T10:02:35Z"
ended_at: "2026-03-31T10:04:50Z"
duration_ms: 135000
outputs:
test_report: "42/42 passed"
# Variables snapshot at end of execution
variables:
pr_url: "https://github.com/org/repo/pull/42"
ci_result: "pass"
review_risk: "medium"
# Human-readable summary
result_summary: "PR #42 reviewed and merged. AI found 3 issues (1 fixed, 2 acknowledged). CI passed on retry."
# Cost tracking
cost:
total_usd: 0.034
breakdown:
- { node_id: "ai_code_review", cost_usd: 0.021 }
- { node_id: "human_review", cost_usd: 0.0 }
- { node_id: "run_ci", cost_usd: 0.013 }
6.3 Required Fields
| Field | Type | Description |
|---|---|---|
osoplog_version | string | Must be "1.0" |
run_id | string | UUID v4, unique per execution |
workflow_id | string | Matches the .osop file's id |
status | string | Final execution status |
started_at | string | ISO 8601 timestamp |
node_records | array | At least one node record |
6.4 Node Record Required Fields
| Field | Type | Description |
|---|---|---|
node_id | string | Matches a node id in the .osop file |
node_type | string | The node's type |
attempt | integer | Attempt number (1-indexed) |
status | string | COMPLETED, FAILED, SKIPPED, TIMED_OUT |
started_at | string | ISO 8601 timestamp |
6.5 Optional Metadata Blocks
ai_metadata — Present when node involves LLM:
ai_metadata:
model: "claude-sonnet-4-20250514"
provider: "anthropic"
prompt_tokens: 4500
completion_tokens: 800
cost_usd: 0.021
confidence: 0.87 # Agent's self-assessed confidence (0-1)
reasoning_trace: "..." # Optional: chain-of-thought summary
human_metadata — Present when node involves human:
human_metadata:
actor: "alice@example.com"
decision: "approved"
notes: "Looks good, but watch the auth changes"
response_time_ms: 45000
tools_used — Tools invoked during node execution:
tools_used:
- tool: "read_file"
calls: 3
avg_duration_ms: 12
- tool: "bash"
calls: 1
avg_duration_ms: 2300
7. Iteration Protocol
After multiple executions, agents can analyze .osoplog files to improve the workflow.
7.1 Stats Aggregation
From N execution logs, compute:
stats:
total_runs: 25
success_rate: 0.84
avg_duration_ms: 312000
node_stats:
ai_code_review:
avg_duration_ms: 3200
failure_rate: 0.04
avg_cost_usd: 0.021
run_ci:
avg_duration_ms: 142000
failure_rate: 0.16 # ← Hotspot
common_errors:
- "TEST_FAILURE: auth.test.ts"
- "TIMEOUT: integration tests"
human_review:
avg_duration_ms: 86400000 # ← Bottleneck (24h average)
failure_rate: 0.0
7.2 Improvement Suggestions
Based on stats, an agent can propose improvements:
| Pattern | Suggestion |
|---|---|
| High failure rate on a node | Add retry_policy or fallback edge |
| Slow node blocking the pipeline | Add timeout or parallelize with other work |
| Human node with long wait times | Add auto-approval for low-risk cases |
| Repeated error pattern | Add pre-check node before the failing step |
| Redundant sequential steps | Merge or parallelize |
7.3 Iteration Record
iteration:
id: "iter-001"
based_on_runs: ["run-1", "run-2", ..., "run-25"]
analysis:
hotspots:
- node_id: "run_ci"
issue: "16% failure rate, mostly auth test failures"
suggestion: "Add retry_policy with max_retries: 2"
bottlenecks:
- node_id: "human_review"
issue: "Average 24h wait time"
suggestion: "Add auto-approve path for PRs with risk_level == 'low'"
proposed_changes:
- type: "add_retry"
target: "run_ci"
config:
max_retries: 2
strategy: "fixed"
backoff_sec: 30
- type: "add_conditional_bypass"
target: "human_review"
condition: "risk_level == 'low' AND ci_result == 'pass'"
bypass_to: "auto_merge"
status: "proposed" # proposed | approved | applied | rejected
8. Multi-Agent Orchestration
When a workflow contains multiple agent nodes, the runtime must handle agent-to-agent coordination.
8.1 Context Passing
Agent nodes pass context through outputs → inputs via edges:
nodes:
- id: planner
type: agent
runtime:
model: openclaw-v1
outputs:
- name: architecture_doc
- id: coder
type: agent
runtime:
model: claude-sonnet-4-20250514
inputs:
- name: architecture_doc # ← Receives planner's output
edges:
- from: planner
to: coder
mode: sequential
8.2 Handoff Protocol
When agents hand off to each other, the handoff block standardizes what gets passed:
Agent A completes
│
├─ outputs → stored as node outputs
├─ handoff.summary_for_next_node → prepended to Agent B's context
└─ handoff.expected_output → becomes Agent B's success criterion
│
▼
Agent B receives
│
├─ inputs ← mapped from Agent A's outputs
├─ context ← handoff summary
└─ goal ← expected output
8.3 Sub-Agent Isolation
When a node uses subtype: multi-agent, the runtime spawns isolated sub-agents:
- id: research_team
type: agent
subtype: multi-agent
runtime:
agents:
- role: "web_researcher"
model: "claude-sonnet-4-20250514"
task: "Search for competitor pricing data"
- role: "data_analyst"
model: "claude-sonnet-4-20250514"
task: "Analyze the collected data for trends"
coordination: "sequential" # sequential | parallel | consensus
isolation: true # Each sub-agent gets clean context
9. Security Considerations
9.1 Permission Model
Before executing any node, the agent runtime MUST check:
- Node permissions — Does the agent have the required
security.permissions? - Secret access — Are referenced
secretsavailable and authorized? - Classification — Does the agent's clearance level meet the node's
classification? - Approval gate — If
approval_gate.required, pause and wait for human approval
9.2 Dangerous Action Classification
For cli and bash-equivalent nodes, the agent MUST classify the command:
| Risk Level | Examples | Required Action |
|---|---|---|
| Safe | git status, ls, cat | Execute immediately |
| Moderate | npm install, docker build | Log and execute |
| Dangerous | rm -rf, DROP TABLE, git push --force | Require human approval |
| Blocked | `curl | bash`, untrusted URLs |
9.3 Data Classification Enforcement
Nodes with classification fields restrict data flow:
- id: process_pii
type: agent
classification: "confidential"
# Agent MUST NOT:
# - Log PII to .osoplog outputs
# - Pass PII to nodes with lower classification
# - Send PII to external services without encryption
10. Conformance Requirements
Level 0 — Descriptive (Read-Only)
Agent can read and understand the workflow. No execution.
- Parse
.osopfile - Explain workflow in natural language
- Identify entry points and terminal nodes
- List all node types and edge modes
Level 1 — Validatable
Agent can validate the workflow structure.
- All Level 0 requirements
- Validate input/output schemas
- Check edge connectivity (no orphan nodes)
- Verify CEL expressions are syntactically valid
- Validate against JSON Schema
Level 2 — Executable
Agent can execute the workflow.
- All Level 1 requirements
- Execute nodes based on
typeandruntime - Evaluate
whenconditions - Handle
retry_policyandtimeout - Produce
.osoplogrecords - Implement all 13 edge modes
Level 3 — Observable
Agent produces full telemetry.
- All Level 2 requirements
- Emit OpenTelemetry traces
- Track cost per node (
ai_metadata.cost_usd) - Compute and report
WorkflowStats - Generate iteration improvement suggestions
11. Reference Implementation
The reference implementation is the osop-mcp server, which exposes OSOP operations as MCP tools that any compatible agent can use.
{
"name": "osop",
"version": "1.0.0",
"tools": [
{
"name": "osop_validate",
"description": "Validate a .osop workflow file",
"inputSchema": {
"type": "object",
"properties": {
"workflow": { "type": "string", "description": "YAML content or file path" }
},
"required": ["workflow"]
}
},
{
"name": "osop_run",
"description": "Execute a workflow",
"inputSchema": {
"type": "object",
"properties": {
"workflow": { "type": "string" },
"mode": { "type": "string", "enum": ["live", "dry_run", "simulated"] },
"inputs": { "type": "object" }
},
"required": ["workflow"]
}
},
{
"name": "osop_step",
"description": "Execute a single node in a workflow",
"inputSchema": {
"type": "object",
"properties": {
"workflow": { "type": "string" },
"node_id": { "type": "string" },
"inputs": { "type": "object" }
},
"required": ["workflow", "node_id"]
}
}
]
}
This document is part of the OSOP specification. See SPEC.md for the core protocol.
For the .osoplog JSON Schema, see schema/osoplog.schema.json.