Supervised agentic workloads on Restate
June 15, 2026 · View on GitHub
A small demo of the supervisor pattern for an agentic tech-risk remediation workload: a fast executor agent drafts ticket + PR fixes for security findings, a heavy judge agent audits each plan before anything executes, and a policy gate suspends execution until an engineer signs off.
It answers, concretely, the questions about running agentic workloads in an orchestrated runtime:
| Question | Where in this demo |
|---|---|
| Workload identity | Automatic. ctx.request().id in src/remediation.ts — see below |
| Policy gates | The approval durable promise + approve handler in src/remediation.ts |
| Eval hooks | The judge + export eval steps in src/remediation.ts |
| Audit | The journal itself, the status handler, and the export audit record step |
| Guardrails | requiresHumanSignoff() and the judge's checks — it's just code |
| Rogue sub-agents / recalibration | The ESCALATION_LADDER loop in src/remediation.ts — see below |
The external world (LLMs, GitHub, Jira, Slack, LangSmith, SIEM) is stubbed in
src/stubs.ts; every stub is called through ctx.run(...), so swapping in
real calls changes nothing about the orchestration.
Workload identity (automatic)
Every execution gets an invocation ID (e.g.
inv_1bk06Ltr740Q5Torsxjik25zypll3mL12q). Through Restate's journal and
introspection API, that one ID resolves to:
- The journal: every step, decision, judge verdict, and approval — in
order, with inputs/outputs (
restate invocations describe <id>, or the UI on port 9070). - The immutable deployment ID: deployments in Restate are immutable — a new code revision is a new deployment, and each invocation is pinned to the deployment it runs on. So the invocation ID tells you the exact code the agent ran.
- The prompt version: pinned per deployment as
PROMPT_VERSIONand journaled into the workload identity record. - Evals/traces/audit logs sent elsewhere: every export step includes the identity, so external systems (LangSmith, your SIEM) join back to the exact code + journal + prompts. Restate also emits OpenTelemetry traces keyed by invocation ID.
Supervisor recalibration: rogue sub-agents are shut down, not retried
You have fine-grained control over error handling and retries. Two different failure modes get two different mechanisms:
- Transient infra failure (LLM API flake, network blip): Restate retries the step with the same intent. That's plumbing, invisible to the agent logic.
- Semantic failure (a sub-agent goes rogue: bad plan, runaway execution): this is a supervisor decision, expressed as plain control flow in the workflow. The supervisor deactivates that sub-agent and deduces an alternate path — it never blindly re-runs the rogue one.
How the demo implements it (ESCALATION_LADDER loop in src/remediation.ts):
- Delegation = its own invocation. Each sub-agent call
(
ctx.serviceClient(subAgents).fastExecutor(...)) is a separate invocation with its own invocation ID and journal — the sub-agent is observable and governable independently of the supervisor. - Runaway behavior → shutdown. The call is bounded with
.orTimeout(...); on timeout the supervisor callsctx.cancel(subAgentInvocationId)— the sub-agent is shut down (its compensation logic can run), and the supervisor moves to the next rung. - Rogue output → recalibration. The judge audits every candidate plan. On rejection, the supervisor activates the next agent in the ladder (fast executor → conservative executor) instead of retrying the rogue one.
- Crash-safe decisions. Every decision (which agent was activated, shut
down, rejected, and why) is journaled and mirrored to state
(
attempts). If the supervisor itself crashes or is redeployed mid-recalibration, replay restores its decisions — a sub-agent that was shut down stays shut down; resume never re-executes it. - Exhaustion → escalation. If no agent produces an acceptable plan, the workflow terminates with the full attempt trail, for manual handling.
The recalibration trail is part of the audit surface:
curl localhost:8080/RemediationWorkflow/cve-003/status
# { "attempts": [
# { "agent": "fastExecutor", "subAgentInvocationId": "inv_...", "outcome": "rejected: Plan contains destructive operations." },
# { "agent": "conservativeExecutor", "subAgentInvocationId": "inv_...", "outcome": "accepted" } ], ... }
Try it: submit a finding whose repo touches a database (e.g. "repo": "prod-db")
— the fast executor goes rogue with a destructive plan, the judge catches it,
and the supervisor recalibrates to the conservative agent.
Policy gates
A durable step that either auto-clears or hands off to a human:
requiresHumanSignoff(finding, verdict)is plain, reviewable code.- If sign-off is needed, the workflow awaits
ctx.promise("approval")and suspends — no process held hostage, survives restarts and redeploys, can wait days. - The
approveshared handler resolves the gate; the payload (approver,comment) lands in the durable journal → you can always answer who signed off on which generated fix, and why.
Eval hooks
The judge agent runs as a durable step. Its verdict is persisted three ways:
- Journal — the step result itself (free, automatic).
- Workflow state —
ctx.set("verdict", ...), queryable via thestatushandler or the introspection API while the workflow runs. - External eval system — the
export evalstep ships it to your analytics stack, coupled to the workload identity (code + journal + prompts).
A failing verdict throws TerminalError: the workload halts before any
side effect executes.
Run it
npm install
npx @restatedev/restate-server # terminal 1: Restate server
npm run app # terminal 2: the service (port 9080)
restate deployments register http://localhost:9080 # terminal 3
Human sign-off example
Use the terminal or the Claude-generated UI to submit a request:
Via terminal:
Submit a high-severity finding (pauses at the policy gate):
curl localhost:8080/RemediationWorkflow/cve-005/run --json '{
"id": "cve-005",
"source": "guardduty",
"severity":
"high",
"description": "exposed credentials in CI logs", "repo": "ci-infra"
}'
Audit the suspended workload (or open the UI at http://localhost:9070):
curl localhost:8080/RemediationWorkflow/cve-002/status
restate invocations list
Sign off — approver metadata becomes part of the durable journal:
curl localhost:8080/RemediationWorkflow/cve-002/approve --json '{
"approved": true, "approver": "you@corp.com", "comment": "Verified, LGTM"
}'
The workflow resumes, opens the ticket + PR, and exports the audit record.
Or use the UI:
If you'd rather click than curl, there's a small web UI to submit findings, watch the supervisor recalibrate, and sign off at the policy gate. It's a simple Claude-generated demo UI — nothing Restate-specific, just a thin front-end over the same ingress/admin endpoints you'd hit by hand.
With the Restate server, service, and deployment registered as above, start it in another terminal (no install or build step — it's a dependency-free Node server):
node ui/server.mjs # serves http://localhost:4321
Then open http://localhost:4321. It proxies to Restate's ingress (:8080) and
admin (:9070); override with RESTATE_INGRESS / RESTATE_ADMIN env vars if
yours run elsewhere.
Rogue agent recalibration
Submit this finding:
curl localhost:8080/RemediationWorkflow/cve-006/run --json '{
"id": "cve-006",
"source": "snyk",
"severity": "low",
"description": "SQL injection in session store",
"repo": "prod-db"
}'
Tests
npm test # requires Docker; runs against a real Restate server
Tests run with alwaysReplay: true, forcing journal replay of every step on
every invocation — non-deterministic handler code fails the test instead of
failing in production.
Further materials
- Restate AI examples — durable agents, multi-agent patterns, human-in-the-loop, with Vercel AI SDK / OpenAI Agents SDK / Pydantic AI integrations
- Managing invocations — cancel, kill, pause, resume; the primitives behind sub-agent governance
- Error handling guide — retries vs terminal errors vs sagas/compensations
- Sagas guide — undoing the work of a cancelled/rogue sub-agent
- Invocations & introspection — the journal/identity model used throughout this demo