AGENTS.md

June 2, 2026 ยท View on GitHub

Host-neutral guidance for any AI coding agent connected to the deliberation MCP server. This file is standalone on purpose - it is not an include of CLAUDE.md, so it stays portable across hosts (Cursor, Codex, Kiro, Windsurf, Zed, and others). Claude Code users get the same routing from CLAUDE.md and the README; this file is for everyone else.

What deliberation is

A single MCP server that exposes GPT (via the Codex CLI), Gemini 3 (via the Antigravity CLI), Grok (via the xAI API), and OpenRouter models (400+, advisory) as expert subagents. You stay the primary agent. When a task benefits from a second opinion or cross-model review, call one of the tools below, read the result, and apply your own judgment. GPT and Gemini can also implement changes; Grok and OpenRouter only advise.

Tools

Fan-out and single-provider:

  • ask-all - send one question to GPT, Gemini, Grok, and configured OpenRouter models in parallel, get every answer back independently (no cross-talk).
  • consensus - run the FULL multi-round convergence loop server-side with a provider arbiter (blind pass + peer fan-out -> adjudicate -> revise) and get the converged verdict in one call. Depth is consensus.maxRounds (config, default 5); pass maxRounds to override. Pass synthesizeAlways:true for a SINGLE arbiter synthesis pass instead of the loop (best for open questions): it returns a free-text synthesis (the enum verdict and converged/confidence are null, rounds is 1). Set a concrete consensus.arbiter (a provider or openrouter:<alias>) for the server-side pass; in host mode the tool returns the opinions for YOU to synthesize. An optional blind pre-vote (consensus.blindVote) is available on the synthesize path.
  • consensus-step - drive the loop yourself as the arbiter, one action per call: init (returns a sessionId + blind prompt) -> record_blind (your pre-commit verdict) -> dispatch_peers (the server fans out to the panel) -> submit_adjudication (your verdict + per-issue accept/dismiss/defer, each dismiss needs a reason) -> submit_revision (your revised plan), looping until converged or the round cap. State is held server-side by sessionId (ephemeral).
  • ask-gpt / ask-gemini / ask-grok / ask-openrouter - one question to one provider for a single-shot second opinion.
  • panel - return the exact provider names ask-all would dispatch for the current config + expert (enabled built-ins + eligible OpenRouter aliases, fanout cap applied), WITHOUT calling them. Read-only.
  • ask-one { provider, prompt } - one question to ONE provider named by panel (e.g. codex, grok, openrouter:<alias>). The progress pattern: call panel, then issue one ask-one per name in a single turn so they run concurrently and each result lands independently as it finishes - visible per-provider progress with parallel wall-time, instead of the one opaque ask-all call. (The single-call ask-all still works; ask-one is the progressive alternative.)
  • analyze - read-only run analytics. Reads the opt-in debug log (per-model p50/p95/max latency, mean tokens, error rate, reasoning effort) and the session store (verdict agreement rate), then returns advisory tuning suggestions (disable a slow/redundant model in ask-all, lower an OpenRouter model's reasoning, adjust maxFanout). Two lenses reported side by side - timing and agreement are NOT joined. Needs debug.enabled for the timing lens. Writes nothing.

Every result carries provider, model, text, ms (wall time), and the effective reasoningEffort (real value for HTTP providers; null for the Codex/Gemini CLIs). HTTP providers (Grok, OpenRouter) also include token usage.

Expert personas (pass as the tool, or via the expert argument on the fan-out tools to apply one persona to every delegate):

  • architect - system design, tradeoffs, complex decisions.
  • plan-reviewer - check a plan is executable before work starts.
  • scope-analyst - catch ambiguities and hidden requirements before planning.
  • code-reviewer - bugs, security holes, maintainability on a diff or file.
  • security-analyst - threat modeling and vulnerability assessment.
  • researcher - external libraries, APIs, and best practices, with evidence.
  • debugger - ranked root-cause hypotheses and the smallest safe fix.

Session tools (only useful when sessions.persist is enabled in config; they report "persistence disabled" otherwise). When on, consensus/ask-all return a sessionId:

  • session-get { sessionId } - fetch a recorded run (opinions, verdict, annotations).
  • session-revisit { sessionId } - re-run the recorded question with the current providers/config and save a linked child record. A consensus record replays its mode (the loop, or a synthesize pass).
  • session-annotate { sessionId, note } - append a note to a run's audit trail.

Every fan-out, single-provider, and expert tool takes a prompt. Give it full context: the goal, the relevant code or paths, and any prior attempts. The experts do not share your session, so a self-contained prompt gets a better answer.

Performance + debugging (optional)

These apply to every MCP host, not just Claude Code:

  • Per-provider progress - prefer panel + parallel ask-one (above) when you want to watch each model finish instead of waiting on one opaque ask-all call.
  • Debug log - set "debug": { "enabled": true } in config.json to append one JSON line per provider call and per consensus round to <XDG cache>/deliberation/debug.jsonl (override with DELIBERATION_DEBUG_LOG). It records latency, reasoning effort, HTTP token usage, and voting/approval outcomes - never prompts, responses, or issue text. OFF by default.
  • Live progress notifications - the server declares the MCP logging capability and emits notifications/message per provider as it settles during a fan-out. Hosts that render server log notifications mid-call show this automatically (Claude Code does not - hence the panel + ask-one pattern there).

When to delegate

  • Reviewing a plan or an architecture decision before you commit to it.
  • A security review of auth, untrusted input, or a new endpoint.
  • A second opinion when you are unsure, or after a fix has failed twice.
  • Cross-model consensus on a high-stakes or contested call.

Skip delegation for simple edits, the first attempt at a fix, and trivial questions you can answer directly.

Updating

If you run the standalone server via npx -y @antonbabenko/deliberation-mcp, each fresh resolve picks up the latest published version. npx caches resolved packages, so if you keep getting an old build, clear the cache (rm -rf ~/.npm/_npx) or pin/refresh the version in your host's MCP config. (The Claude Code plugin manifest is a separate mechanism and does not affect non-Claude hosts.)