Council Review

May 31, 2026 · View on GitHub

Important

This skill now lives in ngmeyer/skills — a curated library of eight composable Claude Code skills. This standalone repo is archived and frozen at council-review V2.1; the maintained version ships in the monorepo. Install everything with npx skills@latest add ngmeyer/skills.

License: MIT Claude Code GitHub Stars Platform

A Claude Code skill that runs decisions, code, plans, and PRs through a Diverse Multi-Agent Debate (DMAD) council of 5 advisors with distinct reasoning methods. Advisors collaborate, peer-review each other anonymously, and a chairman synthesizes a verdict.

This is the collaborative-DMAD pattern — empirically validated to outperform adversarial multi-agent debate (M3MADBench 2026; DMAD ICLR 2025). For single-critic adversarial stress-testing of a known artifact, use /adversarial-review instead.

Based on Andrej Karpathy's LLM Council and Ole Lehmann's Claude Code adaptation.

How It Works

You -> Question/Code/Plan
         |
   5 Advisors (parallel, independent, distinct reasoning methods)
         |
   5 Peer Reviews (anonymous, cross-review)
         |
   Chairman Synthesis (verdict + dissent preservation + verification)
         |
   Report + Recommendation + "What You Lose"

The 5 Advisors:

AdvisorAngleReasoning Method
The ContrarianWhat will fail?Inversion -- assume it failed, trace backward
First PrinciplesWhat are we actually solving?Decomposition -- break into atomic claims, challenge each
The ExpansionistWhat upside are we missing?Analogy -- what adjacent domain solved this differently?
The OutsiderZero context, fresh eyesNaive questioning -- flag anything requiring insider knowledge
The ExecutorWhat do you do Monday morning?Dependency graphing -- what blocks what?

Each advisor uses a different reasoning method, not just a different angle. This is the key insight from DMAD research: same-model councils need method diversity to avoid converging on the same reasoning patterns.

Install

This skill ships in the ngmeyer/skills library. Install it (and the rest) with the skills.sh installer — it auto-detects your agents and copies the skills into place:

npx skills@latest add ngmeyer/skills

Then use it in Claude Code:

/council-review Should we rewrite our auth layer in Rust?
/council-review docs/plans/v2-migration.md
/council-review https://github.com/org/repo/pull/42
/council-review --quick Is this naming convention worth changing?
/council-review --jury Should we adopt microservices?

Modes

ModeFlagCallsBest For
Full (default)none11High-stakes decisions
Quick--quick4Routine decisions, gut-checks
Adaptive--adaptive6–26Multi-round debate with KS-statistic early stopping (up to 94.5% cost cut on convergent questions)
Confidence--confidence11Confidence-weighted synthesis. Each advisor self-rates and rates peers; chairman weights by calibrated confidence rather than majority vote
Measure diversity--measure-diversity11Score reasoning-footprint overlap; flag theatrical consensus when advisors converge despite different methods
Jury (V2)--jury13+Replace the single chairman with a 3-judge jury, ideally across model families. For close calls and high-stakes verdicts where single-judge reliability isn't enough (jury-of-judges / PoLL)

Quick mode runs 3 advisors (Contrarian, Executor, Outsider) + chairman. No peer review. Fast.

Adaptive mode measures response distribution convergence between rounds via Kolmogorov-Smirnov statistic; halts when shift drops below epsilon for two consecutive rounds. Best for open questions where the council may converge quickly.

Confidence mode has each advisor end with CONFIDENCE: 1-10 and a one-line rationale; peers also rate confidence; the chairman synthesis is weighted by calibrated certainty. Surfaces low-confidence consensus as a yellow flag.

Measure-diversity mode scores reasoning-footprint overlap across responses. High overlap (>60%) is flagged as theatrical consensus — the chairman treats it as a single advisor's opinion.

Jury mode replaces the single chairman with three independent judges (ideally across model families); each synthesizes a verdict and a brief reconciliation step merges them. Use it on close calls where one judge's reliability isn't enough.

V2 note: the maintained version adds a mandatory Devil's-Advocate pass against the emerging consensus (the one configuration shown to reliably induce genuine disagreement), a sycophancy guardrail in the advisor and peer prompts, a Mediating-Assessments step in the chairman synthesis (score independent attributes before the holistic call), and an outside-view/base-rate check. Independent blind A/B: V1 3.8 → V2 4.8. See the changelog in SKILL.md.

Flags compose: /council-review --adaptive --confidence "Should we adopt GraphQL?" runs convergence-stopped, confidence-weighted deliberation.

What It Reviews

InputWhat Happens
A question or decision5 advisors independently analyze tradeoffs, peer-review each other, synthesize a recommendation
An implementation planAdvisors stress-test feasibility, scope, risks, missing pieces, and execution order
A PR or code changeAdvisors review from security, architecture, performance, usability, and pragmatism angles
A file pathReads the file and councils its contents

Why This Works

  1. Method diversity -- Each advisor uses a distinct reasoning method (inversion, decomposition, analogy, naive questioning, dependency graphing). Same-model councils converge without this.
  2. Parallel independence -- Advisors don't see each other's responses. No groupthink.
  3. Anonymous peer review -- Responses are shuffled as A-E before review. No deference to roles.
  4. Forced tension -- The Contrarian must find flaws. The Expansionist must find upside. Coverage is structural.
  5. Disagreement classification -- Chairman distinguishes "value tensions" (both sides valid) from "error catches" (one advisor found a real flaw).
  6. Dissent preservation -- "What You Lose" section explicitly names the cost of ignoring the minority view.
  7. Chairman override -- The synthesizer can side with a minority if their reasoning is strongest.
  8. "What did ALL five miss?" -- The most valuable question in the peer review.

Output Format

## Council Verdict: [Topic]

### Where the Council Agrees
[High-confidence signals -- multiple advisors converged independently]

### Where the Council Clashes
[Value Tension] Both sides valid, depends on priorities...
[Error Catch] One advisor found a real flaw others missed...

### Blind Spots Revealed
[Things only the peer review caught]

### Recommendation
[Clear, actionable. Not "it depends." A real answer.]

### What You Lose
[Cost of following this recommendation. The strongest dissent, preserved.]

### Do This First
[One concrete next step.]
How to verify: [2-3 checks to confirm the recommendation was right]

Auto-Context

The skill automatically reads project files before framing the question:

  • README.md, CLAUDE.md / AGENTS.md
  • Recent git log
  • PR diff (if reviewing a PR)
  • Referenced files

No manual context-pasting needed. Advisors see your project, not a blank slate.

Credits

License

MIT