constitutional-agent Roadmap
April 17, 2026 · View on GitHub
Current stable: v0.5.0
v0.4.0 — Persistence & Observability (stable — 2026-04-11)
The stable release locks the persistence and observability API surface. No breaking changes from v0.4.0b3.
Included in beta
on_evaluatecallback hook — fire side-effects on every evaluationon_amendment_ratifiedcallback hook — trigger downstream on governance eventshistoryproperty — in-memory evaluation log (timestamp, context snapshot, result)Constitution.load("governance.yaml")— file-based config with YAML hard constraintsrequired: trueon YAML HC entries — absent key treated as violation- Full YAML operator coverage:
eq,ne,lt,lte,gt,gte - Strict mode with
_KNOWN_GATE_METRICSoverlap detection fria_evidence(context)— EU AI Act Article 27 FRIA output;fria.pymodule withFRIAEvidence,fria_summary(),fria_narrative()- 150 tests, 97% coverage, mypy strict, Python 3.11–3.13
What the beta is testing
- Callback API shape — are
on_evaluateandon_amendment_ratifiedsufficient for real integrations, or do callers need richer event metadata? - YAML HC expressiveness — do the 6 operators cover real org use cases, or is
range/regexneeded? - History fidelity — is a full context snapshot per evaluation the right granularity, or is it too noisy at high frequency?
Feedback on these three questions shapes what locks into v0.4.0 stable.
v0.5.0 — Multi-Agent Coordination
Governance for systems of agents, not just single agents.
Planned features
Coalitionclass — evaluate a shared Constitution across N agents; aggregate gate results with configurable quorum rules (all-PASS, majority-PASS, any-FAIL)- Gate delegation — agent A defers a specific gate to agent B's evaluation result (e.g., EpistemicGate from a dedicated verifier agent)
- Shared amendment log — ratification proposals visible and votable across a coalition
- Cross-agent HC enforcement — hard constraints that span agent boundaries (e.g., "no two agents may spend simultaneously")
Research question
The core design question is whether Coalition governance should be pull-based (each agent queries a central evaluator) or push-based (agents publish gate results, coalition aggregates). Pull is simpler; push is more fault-tolerant. Production data from the HRAO-E system (54 agents) informs this choice — the current implementation uses a push-adjacent pattern via cron.
v0.6.0 — Adaptive Thresholds
Gates that learn from evaluation history.
Planned features
- Threshold advisor — given N evaluations, suggest threshold adjustments that would have caught the top-K misses
- Calibration mode — run gates in observe-only mode for M days, then propose initial thresholds based on observed distribution
- Drift detection — alert when a metric's rolling mean crosses a threshold band, before a gate FAIL occurs
- Amendment auto-proposal — when drift is detected, automatically draft an amendment for human ratification (never auto-ratify)
Hard constraint
Amendment auto-proposal generates a draft in PENDING state. Human ratification is always required — no adaptive threshold change may be ratified autonomously. This is a hard constraint in the library's own governance model.
v1.0.0 — Production Stable
Criteria for 1.0:
- Coalition API stable (v0.5.0 shipped and battle-tested)
- Adaptive thresholds validated against ≥2 real deployments
- Persistence backend pluggable (default: in-memory; adapters: SQLite, PostgreSQL)
- OpenTelemetry span export for gate evaluations
- Zero breaking changes since v0.4.0 stable
- Security audit (third-party)
- Documentation site (not just docstrings)
Not Planned
Items explicitly out of scope to keep the library focused:
- LLM integration — constitutional-agent evaluates metrics, not LLM outputs. LLM output evaluation belongs in a separate layer.
- UI / dashboard — the library emits structured data; visualization is the caller's responsibility.
- Cloud-hosted evaluation — evaluation runs in the caller's process. No SaaS offering planned.
- Preset governance templates — the six-gate architecture is opinionated enough. Domain-specific presets belong in downstream packages.
Versioning Policy
- Patch (0.x.Y): Bug fixes, test additions, docstring improvements. No API changes.
- Minor (0.X.0): New features, backward-compatible. Deprecation warnings for anything being removed.
- Major (X.0.0): Breaking API changes. Minimum 6-month deprecation window from the last minor release.
Pre-1.0, minor versions may include small breaking changes with a changelog entry and migration note.