@agentsonar/oma

May 3, 2026 · View on GitHub

Your AI agents are burning money right now. Detect agent loops and runaway token spend during your OMA workflows. Stop them before the bill arrives.

AgentSonar integration for Open Multi-Agent (OMA). Bridges OMA task dependencies and trace events to a local AgentSonar Python sidecar over HTTP, so cycles, repetition, and runaway throughput surface in real time as your workflow runs.

New here? The full AgentSonar guide (concepts, all four adapters, examples, FAQ) lives at agentsonar/agentsonar/docs. This README focuses specifically on the OMA TypeScript adapter.

What it detects

Three classes of multi-agent coordination failures, detected purely from the structure of your agent graph (no LLM is asked to evaluate anything):

cyclic_delegation: agent-to-agent delegation cycles that emerge across independent task chains or runs.
repetitive_delegation: the same delegation edge repeated past an exponential-decay threshold.
resource_exhaustion: per-edge throughput bursts beyond a sliding-window limit.

Output is a standalone HTML report at agentsonar_logs/run-<slug>/report.html.

How this complements OMA's runtime guards

OMA blocks same-chain A → B → A cycles at delegate-tool time (delegate.ts:60). @agentsonar/oma catches the cumulative-graph patterns those guards don't see, cycles and repetition that emerge across independent chains or across runs.

Install

# TypeScript client
npm install @agentsonar/oma

# Python sidecar dependency
pip install agentsonar

The npm package ships with the Python sidecar script bundled at node_modules/@agentsonar/oma/sidecar/sidecar.py. Run it from there or copy it next to your app code:

python node_modules/@agentsonar/oma/sidecar/sidecar.py

Requirements: Node 18+, Python 3.10+.

Heads up: starting in agentsonar 0.4.0, the Python sidecar sends one anonymous session-start event per run (install ID, version, OS, adapter, no agent content). On by default, opt-out with AGENTSONAR_TELEMETRY=off or DO_NOT_TRACK=1 in the sidecar's environment before starting it. Full details.

Quickstart

import { OpenMultiAgent } from '@jackchen_me/open-multi-agent'
import {
  emitDelegations,
  createTraceHandler,
  shutdown,
  type DelegationTask,
} from '@agentsonar/oma'

const tasks: DelegationTask[] = [
  { title: 'research', description: '...', assignee: 'researcher' },
  { title: 'write',    description: '...', assignee: 'writer',
    dependsOn: ['research'] },
]

const orchestrator = new OpenMultiAgent({
  defaultModel: 'gpt-4o-mini',
  onTrace: createTraceHandler(),
})

const team = orchestrator.createTeam('my-team', { /* ... */ })

await emitDelegations(tasks)            // emit delegation edges before the run
await orchestrator.runTasks(team, tasks)
await shutdown()                        // write the report and close the sidecar

The sidecar must be running. The simplest setup is a separate terminal; for production, spawn it as a subprocess from your application.

Using outside OMA: event-driven buses (Electron, EventEmitter, custom orchestrators)

emitDelegations is built around OMA's task-graph pattern (you pass a list of tasks with dependsOn arrays and it walks the DAG). For setups where agents communicate through a bus or EventEmitter in real time, the right primitive is recordDelegation. It takes one edge directly:

import { recordDelegation } from '@agentsonar/oma'

class AgentBus extends EventEmitter {
  send(from: string, to: string, message: unknown) {
    this.emit(`agent:${to}`, { from, message })
    // Fire-and-forget: never blocks the bus, never throws on network errors
    recordDelegation(from, to).catch(() => {})
  }
}

That's the whole integration. The sidecar still runs separately on localhost:8787, your Node app stays Node-only, and every coordination failure (cycles, repetition, cost spikes) shows up in the same HTML report.

Optional metadata for breadcrumbs in the report:

recordDelegation(from, to, {
  metadata: { taskId: 'task-42', sessionId: 'sess-01', via: 'electron_bus' },
}).catch(() => {})

Same safety contract as everything else in the package: never blocks longer than timeoutMs (default 2 s), silently swallows network errors, only PreventError propagates (and only when Prevent Mode is enabled on the sidecar).

Run the included demo

Two terminals.

Terminal 1: start the sidecar

python sidecar/sidecar.py

You'll see:

AgentSonar OMA sidecar listening on http://localhost:8787
  POST /ingest     delegation events
  POST /trace      OMA trace events (stashed for cost work)
  POST /shutdown   write report.html + exit
  GET  /health     liveness + current counts

Terminal 2: run the demo

export OPENAI_API_KEY=sk-...
npm run demo

The demo runs a 4-task workflow researcher → reviewer → writer → researcher, the last task is a fact-check returning to the same researcher. The task DAG is linear, but the agent graph forms a 3-node cycle. CycleDetector fires cyclic_delegation on the third edge.

When the demo finishes, the sidecar prints the report path. Open it in a browser to see the graph and detected alerts.

What the output looks like

Every run produces a self-contained HTML report at agentsonar_logs/run-<slug>/report.html, no external CSS or JavaScript, no network requests, dark mode that respects your system preference. Two top-level tabs organize the view:

1. Coordination Failures: the primary signal. One card per detected failure with severity badge, failure class (hover for a definition), fingerprint, and expandable topology / thresholds / provider-error / downstream-impact blocks. Filter chips at the top let you narrow to Critical or Warning with one click.

Coordination Failures tab: the primary signal, Sentry-style

2. Session Activity: INFO-level context, always one click away. Two sub-tabs switch between lenses on the same run:

Edge Activity: every delegation edge the graph saw, with fire count and severity attribution. Red border = edge involved in a critical alert, no border = clean.
Chronological Log: raw event stream with timestamps. Rows color-coded where an alert fired: light red for critical, light orange for warning.

Session Activity tab: Edge Activity view

The "Coordination Failures, Raw JSON" drop-down at the bottom of every report carries the same payload as report.json, copy it straight into a dashboard or CI gate without opening a second file.

All four output files land in a per-run session directory under agentsonar_logs/:

File	Written	Purpose
`timeline.jsonl`	Live, flushed on every event	Every event, one JSON object per line. Tail with `tail -f` to watch what's happening as your OMA run progresses.
`alerts.log`	Live, flushed on every alert	Signal-only, human-readable. The "just show me the problems" view.
`report.json`	On `shutdown()`	Structured summary report, deduped + inhibited. Pipe into your dashboard.
`report.html`	On `shutdown()`	The standalone two-tab HTML report shown above.

Configuration

Two config surfaces.

TS client options (Node side)

Passed on every emitDelegations / createTraceHandler / shutdown call, or via env var.

Option / env var	Default	Purpose
`endpoint` / `AGENTSONAR_ENDPOINT`	`http://localhost:8787`	Sidecar URL.
`timeoutMs`	`2000`	Per-request HTTP timeout in ms.
`debug`	`false`	Log wire activity to stderr.

Detection thresholds (sidecar side)

Pass as CLI flags to the sidecar, or set env vars before starting it. Run python sidecar/sidecar.py --help for the full list.

Flag	Env var	Default	Controls
`--warning-threshold`	`AGENTSONAR_WARNING_THRESHOLD`	`5`	Rotations / events to fire WARNING
`--critical-threshold`	`AGENTSONAR_CRITICAL_THRESHOLD`	`15`	Rotations / events to escalate to CRITICAL
`--per-edge-limit`	`AGENTSONAR_PER_EDGE_LIMIT`	`10`	Max events on one edge in the window
`--global-limit`	`AGENTSONAR_GLOBAL_LIMIT`	`200`	Max total events in the window
`--window-size`	`AGENTSONAR_WINDOW_SIZE`	`180.0`	Rate-limiter sliding window in seconds
`--half-life`	`AGENTSONAR_HALF_LIFE`	`180.0`	`repetitive_delegation` decay half-life
`--z-score-threshold`	,	`3.0`	Z-score to fire `repetitive_delegation`
`--resolve-after`	,	`60.0`	Seconds before alerts auto-resolve
`--log-dir`	`AGENTSONAR_LOG_DIR`	`.`	Where `agentsonar_logs/` lands
`--port`	`AGENTSONAR_PORT`	`8787`	Sidecar HTTP port
`--no-console`	,	,	Suppress alert streaming to stderr
`--no-report`	,	,	Skip the HTML/JSON report write
`--report-title`	`AGENTSONAR_REPORT_TITLE`	`"AgentSonar Report"`	HTML report title
`--prevent-cyclic-delegation`	`AGENTSONAR_PREVENT_CYCLIC_DELEGATION`	off	Enable Prevent Mode (see below)
`--prevent-max-rotations`	`AGENTSONAR_PREVENT_MAX_ROTATIONS`	,	Trip Prevent Mode at exactly N rotations (overrides CRITICAL severity gating)

Resolution order: CLI flag > env var > SDK default.

Example: tighter thresholds for testing

python sidecar/sidecar.py --warning-threshold 1 --critical-threshold 2

Example: alternate port

python sidecar/sidecar.py --port 9100

Then on the Node side:

await emitDelegations(tasks, { endpoint: 'http://localhost:9100' })

Prevent Mode

Opt-in "circuit breaker" mode. When enabled, the sidecar's coordination engine raises an exception in the TypeScript client if it detects a cycle that crosses the trip threshold, letting your code stop a runaway workflow before more tokens are spent.

How it works

Start the sidecar with --prevent-cyclic-delegation.
When a tracked failure (currently cyclic_delegation) crosses CRITICAL severity, the sidecar answers the next /ingest with HTTP 409 + RFC 7807 Problem Details (Content-Type: application/problem+json).
The TS client detects this exact response shape and throws PreventError from emitDelegations() into your code.
Every other failure mode (network errors, plain 409s, malformed responses, 500s) stays silently swallowed, only PreventError ever reaches your try/catch.

Quickstart

# Sidecar: enable Prevent Mode
python sidecar/sidecar.py --prevent-cyclic-delegation

import {
  emitDelegations,
  shutdown,
  PreventError,
  type DelegationTask,
} from '@agentsonar/oma'

const tasks: DelegationTask[] = [/* ... */]

try {
  await emitDelegations(tasks)
  await orchestrator.runTasks(team, tasks)
} catch (e) {
  if (e instanceof PreventError) {
    console.log(`Stopped: ${e.reason}`)
    console.log(`Cycle:   ${e.cyclePath.join(' -> ')}`)
    console.log(`After:   ${e.rotations} rotations (severity ${e.severity})`)
  } else {
    throw e
  }
} finally {
  await shutdown()
}

Custom trip threshold

By default Prevent Mode trips on CRITICAL severity (= --critical-threshold rotations, default 15). To trip earlier:

python sidecar/sidecar.py --prevent-cyclic-delegation --prevent-max-rotations 5

This trips at exactly rotation 5 regardless of severity gating. Useful for tight test loops or production caps below the default CRITICAL threshold.

Wire format

The 409 response body follows RFC 7807 Problem Details with an agentsonar extension namespace:

{
  "type": "https://github.com/agentsonar/agentsonar/blob/main/docs/problems/coordination-prevented.md",
  "title": "Coordination Failure Prevented",
  "status": 409,
  "detail": "cyclic_delegation prevented after 15 rotations: a -> b -> c",
  "instance": "/ingest",
  "agentsonar": {
    "failure_class": "cyclic_delegation",
    "severity": "CRITICAL",
    "rotations": 15,
    "cycle_path": ["a", "b", "c"],
    "reason": "cyclic_delegation prevented after 15 rotations: a -> b -> c",
    "timestamp": 1714089600.123
  }
}

Additional response headers:

Cache-Control: no-cache, no-store, discourage proxy caching of this informational state
X-AgentSonar-Prevent: cyclic_delegation, cheap observability hook for proxy/log inspection

Limitations

Once tripped, the sidecar's tracked state stays tripped. Subsequent emitDelegations calls keep throwing PreventError. To resume detection, restart the sidecar.
Static-DAG cycles only, currently. Prevent Mode trips on cycles visible at emitDelegations time, walked from the dependsOn graph. Cycles that emerge at runtime through the OMA orchestrator's runTasks are forwarded to /trace and stashed for future cost work, but they do not currently feed the detection engine. Wiring the runtime path into detection is on the roadmap.
Sidecar must be reachable. If the sidecar is down, emitDelegations warns once and continues, no PreventError is thrown because detection isn't running.

Sidecar lifecycle

One sidecar process = one observation session = one final report. The model is shaped for short-lived workloads (CLI tools, demos, batch jobs). Match your usage to one of these patterns:

Pattern	Setup
One-shot script (the demo, CLI tools)	Start sidecar in another terminal. Your script calls `shutdown()` at the end → sidecar writes the report and exits.
Long-running web server / app	Start the sidecar once at process startup. Make many `runTasks` calls over its lifetime. Call `shutdown()` ONCE when your process exits, not between runs. All runs accumulate into a single report.
Multiple concurrent sessions	Run a separate sidecar per session, each on a different port via `--port`. Each emits its own report.

If you call shutdown() between runs, the next call has no sidecar to talk to and operates as if it were unreachable (silent no-op). The next run won't be observed unless you start a fresh sidecar first.

The sidecar is stateless across restarts: killing and restarting it loses the in-memory graph for the current session. There's no checkpointing in v1.

Architecture

The TypeScript client posts two kinds of events to the local Python sidecar over HTTP. The sidecar runs the AgentSonar detection engine and writes a self-contained run report.

flowchart LR
    subgraph oma["Your OMA app (Node.js)"]
        traceFn["OpenMultiAgent<br/>onTrace handler"]
        emitFn["emitDelegations<br/>(walks dependsOn)"]
    end
    subgraph py["AgentSonar sidecar (Python)"]
        engine["monitor_orchestrator()<br/>engine"]
        layers["Detection layers:<br/>cycle / repetitive<br/>rate / SCC"]
        out["agentsonar_logs/<br/>run-&lt;slug&gt;/<br/>report.html<br/>report.json"]
    end
    traceFn -- "POST /trace" --> engine
    emitFn -- "POST /ingest" --> engine
    engine --> layers
    layers --> out

The TypeScript client is fire-and-forget. Every HTTP call has a 2-second timeout, every public function wraps its body in try/catch, and every fetch failure is swallowed silently (with a single console warning the first time the sidecar is unreachable). If the sidecar is down, slow, or throwing, the OMA run completes normally without observability, never with a crash. This invariant is enforced by the test suite (npm test).

License

Apache-2.0