Claude Bootstrap + Maggy

June 4, 2026 · View on GitHub

Turn Claude Code into a self-reviewing, test-enforced engineering system that remembers context across sessions — then route work across 13 models from a single dashboard.

Claude Bootstrap is an installable config pack (skills, hooks, rules, templates) for Claude Code. Maggy is the optional local server that adds multi-model routing, a web dashboard, intent-driven protocols, and plugin orchestration. Both live in this repo. Start with Bootstrap; add Maggy when you need the harness.

Tests Version Stars License: MIT

1100+ tests. 67 skills. 15 MCP tools. Used daily across production codebases.


Who This Is For

  • Solo engineers using Claude Code who want TDD enforcement, quality gates, and memory that survives context compaction — without changing their workflow
  • Teams routing work across Claude, DeepSeek, Kimi, Gemini, and Codex from a single dashboard with cost-aware model selection
  • Platform engineers building AI-assisted developer tooling who need a reference implementation with intent tracking, protocol execution, and plugin architecture

Choose Your Path

Claude BootstrapMaggy Harness
What it isSkills, hooks, rules installed into ~/.claude/Local FastAPI server + web dashboard
Install time~30 seconds~5 minutes (Python 3.11+, API keys)
RequiresClaude Code (also works with Codex, Kimi, Gemini CLI)Everything in Bootstrap + Python + optional Docker
You getTDD enforcement, 67 skills, quality gates, ADR reviews, iCPG, Mnemos memoryAll of Bootstrap + 13-tier routing, skill protocols, Telos testing, Cortex MCP, plugins, dashboard

Bootstrap — 30-second install

git clone https://github.com/alinaqi/maggy.git
cd maggy && ./install.sh

Your next Claude Code session picks it up automatically.

Full Harness

cd maggy && pip install -e .
maggy serve   # Dashboard at localhost:8080

See GETTING_STARTED.md for prerequisites, API keys, and config setup.


What It Looks Like in Practice

Routing a task:

You: "review the auth middleware for timing attacks"
→ Blast score: 8/10 (security + architecture)
→ Routed to: Claude (Tier 11)
→ ADR gate: found docs/adr/0003-jwt-strategy.md → injected as context
→ Review runs with full architectural context

Skill Protocol execution:

You: "push to git"
→ Intent matched: git-push protocol
→ ✅ lint       (2.1s)
→ ✅ typecheck   (4.3s)
→ ✅ tests       (11.2s)
→ ✅ stage
→ ✅ commit      [AI-generated: "fix: resolve token refresh race condition"]
→ ✅ push

Fatigue-aware memory:

Session fatigue: 0.61 (PRE-SLEEP)
→ Mnemos: auto-checkpoint written
→ Micro-consolidation: 3 ResultNodes compressed
→ iCPG context injected: 2 ReasonNodes, 1 constraint
→ Context freed: ~18k tokens

The Problem This Solves

You're using Claude Code. It's impressive — but:

  • It picks the most expensive model for everything, including trivial tasks
  • Context fills up, state is lost, you re-explain yourself every session
  • There's no enforcement: code quality, test coverage, and ADR compliance only happen if you remember to ask
  • Running multiple agents on the same repo causes file conflicts
  • You have no visibility into what Claude is actually doing inside your codebase

What Bootstrap Gives You

LayerWhat it does
67 skillsPython, TypeScript, React, React Native, Flutter, Supabase, Firebase, Stripe, Playwright, security, ADRs, cross-agent delegation
TDD enforcementStop hooks — tests must pass before Claude considers a task done
Quality gatesMax 20 lines/function, 3 params, 2 nesting levels. Enforced per file
iCPGIntent-Augmented Code Property Graph. Stores why code exists. 6-dimension drift detection. Prevents duplicate implementations
MnemosTask-scoped memory with 4-dimension fatigue model. Survives context compaction with typed checkpoints
ADR enforcementNon-trivial changes require an Architectural Decision Record. Missing one? Reverse-engineered from git history
Agent teams6 agents: Lead, Quality, Security, Review, Merger, Feature

What Maggy Adds

SystemWhat it does
13-Tier RoutingSemantic blast score (1–10) routes to cheapest capable model. Local Qwen3 classifier → DeepSeek (~80% of tasks) → Kimi → Gemini → Grok → Codex → Claude. Budget-capped with auto-demotion. Routing details
Skill ProtocolsYAML-defined workflows in maggy/skills/protocols/. "Push to git" → lint → test → stage → commit → push. Drop a .yaml to add your own
TelosTesting beyond TDD. Three planes: Conformance × Validation × Integrity. A zero in any plane collapses the total score. Details
Cortex MCPCode intelligence: 10 edge types, cyclomatic complexity, FTS5 search, bidirectional traversal. 15 tools, single SQLite DB. Benchmarks
PolyphonyDocker-isolated parallel agent execution. Second session auto-provisions a workspace. Spec
EngramCross-session memory. 7 amnesia types. Persists architectural knowledge across weeks
PluginsDrop-in system. Ships with: Build-in-Public (auto-posts to LinkedIn/X), Telos, GitHub/Asana/Monday providers

Model Routing

Every message is scored 1–10 for complexity and classified by task type. The cheapest capable model wins.

TierModelRole
T0Qwen3 (local)Classification, triage, free bulk ops
T1Gemini Flash-LiteBulk extraction, CIG pipelines
T2DeepSeek FlashDocs, tests, scaffolding
T3Gemini FlashMultimodal, vision, audio
T4DeepSeek ProComplex coding, multi-file refactors
T5Gemini CLIMulti-file agentic coding
T6AGYEnd-to-end implementation (git + code + test)
T7KimiLong-context analysis, routing alt
T8Gemini Pro SearchDeep research, Google grounding, 2M context
T9GrokCompetitor intel, deep reasoning
T10CodexBulk generation, security-sensitive tasks
T11Claude SonnetQuality-critical code, complex debugging
T12Claude OpusArchitecture, security review, ADR decisions

Routing is semantic (Qwen3 as local classifier), fatigue-aware, budget-capped, and cascading.

Gateway routing with srooter — www.srooter.ai

We've added first-class support for srooter, an Anthropic/OpenAI-compatible LLM gateway that routes your requests across models (Claude, MiniMax, DeepSeek, Kimi, Gemini, Grok, local Qwen) transparently — intent-based routing, budget caps, fallbacks, and a usage dashboard, without changing your tools.

Recommended with Maggy, Claude Code, or Codex. Point any of them at the gateway and your traffic is routed for you — no per-tool config:

# Claude Code (or Codex) → srooter
export ANTHROPIC_BASE_URL="https://www.srooter.ai/anthropic"   # or your local gateway
export ANTHROPIC_API_KEY="<your-srooter-key>"
claude        # now routed through srooter

Pick the model you "follow" once with /model-config — Maggy, the route-task hooks, and srooter all honor the same choice. Trivial asks stay on the cheap/local tier; real coding goes to your primary model (e.g. MiniMax-M2.5).


Telos: Testing Beyond TDD

Standard TDD tells you if your code passes tests. Telos tells you if your code fulfills its intent.

``$ \text{IFS} (\text{Intent} \text{Fidelity} \text{Scale}) = \text{F1} \times \text{F2} \times \text{F3}

\text{F1} — \text{Conformance}: \text{passed} / \text{total} \text{tests} (\text{pytest} / \text{vitest}) \text{F2} — \text{Validation}: \text{drift} \text{severity} (\text{Cortex} \text{drift_events}) \text{F3} — \text{Integrity}: \text{IF}-3 \text{orphan} \text{symbols} (\text{no} \text{reason} \text{edges}) \text{IF}-4 \text{empty} \text{contracts} (\text{no} \text{pre}/\text{post}/\text{invariants}) \text{IF}-6 \text{stale} \text{reasons} (\text{proposed} >7\text{d}, \text{never} \text{fulfilled}) \text{IF}-7 \text{scope} \text{sprawl} (\text{reason} \text{scopes} >10 \text{files}) $``

A zero in any plane collapses IFS to zero. 100% test pass rate with severe architectural drift = score of 0. This is intentional. See the Telos RFC.


Repo Structure

.claude/
  skills/       # 67 skills — Python, TS, React, security, mobile, databases
  hooks/        # TDD enforcement, quality gates, Mnemos lifecycle
  rules/        # Conditional rules by file glob
  templates/    # settings.json, CLAUDE.md, ADR template, PR template

maggy/
  maggy/
    pipeline/   # Unified ChatPipeline orchestrator
    skills/     # Skill injection + YAML protocol engine
    api/        # REST API (chat, routing, plugins, pipeline logs)
    static/     # Web dashboard (vanilla JS, no build step)
    services/   # Routing, memory, execution, Mnemos

cortex-mcp/     # Code intelligence MCP server
  src/cortex/
    structure/  # AST extraction, edge types, complexity
    storage/    # SQLite graph store, FTS5 index

plugins/        # Drop-in plugins (build-in-public, telos, providers)

Tests

cd maggy && python3 -m pytest tests/ -x -q        # 900+ tests
cd cortex-mcp && python3 -m pytest tests/ -q       # 207 tests

What's New in v6.37

  • Skill Protocols — YAML intent-driven workflows. "Push to git" runs lint → test → commit → push automatically
  • Unified Pipeline — single ChatPipeline orchestrator with real-time streaming, fallback, per-request logging
  • Telos — intent-grounded testing with IFS scoring on every project open
  • Cortex MCP — modular edge extraction (Python AST, TypeScript, Git co-change). Elixir support
  • 13-Tier Routing — AGY, Gemini CLI, and Grok added to the routing ladder

See CHANGELOG.md for full history.


Docs

Getting StartedInstallation, prerequisites, first session walkthrough
Architecture v5System design, routing, dashboard
CLI ReferenceREPL commands, slash commands, routing
Telos RFCIntent-grounded testing spec
Cortex docsCode intelligence, edge types, MCP tools
Cortex benchmarksPerformance vs codebase-memory-mcp
ChangelogVersion history (current: v6.37.0)

Contributing

Skill PRs welcome. All skills run through the linter before merge:

PYTHONPATH=scripts python3 -m skill_lint --fail-on error skills/your-skill/

See CONTRIBUTING.md for the quality gate checklist.


License

MIT — See LICENSE


Need help scaling AI engineering in your org? LeanAI Ventures — Claude Code & MCP specialists