bkit
June 1, 2026 · View on GitHub
The only Claude Code plugin that verifies AI-generated code against its own design specs.
Three commands. Anyone — even someone vibe-coding for the first time — can ship robust, production-quality software. bkit turns Claude Code into a Context Engineering system: 44 skills, 34 specialist agents, 11 quality gates, and a memory that survives across sessions deliver the right context to the AI at the right moment, so you don't have to know prompts, commands, or PDCA to get high-quality results.
Requirement: bkit requires Claude Code v2.1.143 or later (the strict plugin-manifest path recognizes the official
displayNamefield only from v2.1.143). On older Claude Code you will seeValidation errors: Unrecognized key: "displayName"duringclaude plugin install. Runnpm install -g @anthropic-ai/claude-code@latestto upgrade, or seedocs/06-guide/cc-compatibility.guide.md.
Who bkit is for
| You are… | bkit gives you |
|---|---|
| 🌱 First-time vibe coder — you describe what you want and AI codes for you, but you don't yet know how to tell if the result is correct | A safety net. AI proposes, bkit measures the result against its own design spec, and auto-repairs the gap when it drifts. You can ship without becoming a senior engineer first. |
| 👤 Solo developer / indie builder | A team-in-a-box. /pdca team spawns 4–6 specialist agents in parallel — frontend, backend, QA, security — orchestrated by an AI tech lead. |
| 👥 Team lead planning a release | Sprint Management. /sprint master-plan splits your release into context-budgeted sprints so a single Claude Code session can finish each one without running out of memory, and resumes after any session interruption. |
| 🌐 Non-English speaker | 8-language auto-detection. Type "로그인 기능 만들어줘" or "作成新功能" and bkit picks the right command for you. |
What you'll achieve with bkit
The promise: anyone — including non-developers — can build robust, production-quality software by using bkit. bkit replaces the senior engineer's intuition with a workflow.
| Without bkit | With bkit |
|---|---|
| AI gives you plausible code; you have no way to know if it really matches what you asked for | gap-detector measures match rate between your design spec and the generated code. Below 90 % → bkit auto-repairs (up to 5 cycles). |
| You only spot drift at PR review — by then the cleanup is expensive | 11 Quality Gates halt the workflow before drift compounds (match rate, critical issues, convention, test coverage, security, dataFlow integrity, …) |
| Long projects exceed Claude Code's session window and you lose context | bkit splits the project into context-budgeted Sprints (≤ 75 K tokens each); memory + Task Management lets any session resume where the last one stopped |
| You don't know which command, agent, or skill to use | 8-language auto-trigger + intent-router pick the right skill / agent automatically. You just describe what you want. |
| AI ships code, you ship hope; nobody documents what changed | Docs = Code philosophy: every feature produces a PRD + plan + design + analysis + completion report. The history is the audit trail. |
The 4-step experience
This is the canonical workflow when you use bkit. You describe a release; bkit handles the rest.
flowchart TB
You(["You: describe the release"])
You --> S1["Step 1<br/>/sprint master-plan my-release<br/>--features auth, billing, reports"]
S1 --> Auto1["bkit auto-action<br/>• sprint-master-planner agent investigates code + web in depth<br/>• Splits features into context-budgeted Sprints (≤ 75K tokens each)<br/>• Writes the master plan with Context Anchor"]
Auto1 --> S2["Step 2<br/>You approve the plan"]
S2 --> Auto2["bkit auto-action<br/>• Every sprint registered in Task Management<br/>• Memory saved — survives session clear"]
Auto2 --> S3["Step 3<br/>/sprint start sprint-1"]
S3 --> Auto3["bkit auto-action — full PDCA per feature:<br/>PRD/Plan → Design → Do → Iterate (target 100%, gated) →<br/>QA (gated) → Report. /pdca pm, /pdca team, /pdca qa as needed."]
Auto3 --> Gate{"All quality<br/>gates pass?"}
Gate -- "no, auto-fix" --> Repair["pdca-iterator self-repair<br/>(max 5 cycles)"]
Repair --> Gate
Gate -- "yes" --> Done(["Release-ready code + docs"])
Ctrl["Step 4<br/>/control level 0..4"] -.->|"set how much<br/>runs unattended"| Auto1
Ctrl -.->|"applies"| Auto3
style You fill:#e3f2fd
style S1 fill:#fff3e0
style S2 fill:#fff8e1
style S3 fill:#fff3e0
style Auto3 fill:#fce4ec
style Repair fill:#ffe0b2
style Done fill:#c8e6c9
style Ctrl fill:#f3e5f5
Step 1 — Plan the release in depth
You type: /sprint master-plan my-release --features auth, billing, reports.
bkit's sprint-master-planner agent investigates carefully — not quickly. Depending on what you're building, it reads your existing code base in depth or researches the web, then writes a master plan. Critically, it splits your features into context-budgeted Sprints: each sprint is sized so a single Claude Code session can finish it without overflowing the context window (the default is ≤ 75 K tokens per sprint, dependency-aware via Kahn topological sort + greedy bin-packing).
You can also call specialist agents directly when you want more depth: /pdca pm runs 4 product-management agents in parallel with 43 frameworks; /pdca team spawns a multi-specialist implementation team; /pdca qa runs a 5-agent QA team.
Step 2 — Approve, register, remember
You read the master plan. When you approve, bkit registers every sprint in its Task Management System with the right dependency order (Kahn-topologically sorted). It also writes to memory.
This means: if your laptop crashes, your session clears, or you start a new Claude Code session next week — bkit picks up exactly where you stopped. The plan and progress are durable.
Step 3 — Auto-run each sprint through PDCA
You type: /sprint start sprint-1.
bkit runs the 8-phase Sprint lifecycle (prd → plan → design → do → iterate → qa → report → archived). Inside the do phase, bkit runs the full PDCA loop once per feature:
| Phase | What runs | Output |
|---|---|---|
| PRD / Plan | pm-lead orchestrates 4 PM agents (discovery, strategy, research, prd) with 43 frameworks | Comprehensive PRD + plan with Context Anchor |
| Design | cto-lead proposes 3 architecture options; you pick one (the only required user input) | Detailed design doc |
| Do | /pdca team spawns 4–6 specialist agents in parallel (developer, qa, frontend, backend, security, architect) | Working code |
| Iterate | Target 100 % match between design and code. gap-detector measures; pdca-iterator self-repairs until quality gate M1 (matchRate ≥ 90 %) passes. Max 5 cycles. | Repaired code + iteration report |
| QA | qa-lead runs 4 QA agents through L1–L5 tests + dataFlow integrity (S1 gate, 7-layer hop check) | QA report |
| Report | report-generator writes the completion report | Final report with KPI + lessons learned |
Every phase is gated by quality thresholds. If anything fails — match rate too low, critical issue found, dataFlow broken — bkit pauses and tells you why. You don't have to remember to verify; verification is automatic.
Step 4 — Govern with one dial
/control level N is the single autonomy knob. It decides how far the orchestrator runs before stopping. The same dial controls both Sprint and PDCA — no second knob to manage.
| Level | What it means |
|---|---|
| L0 Manual | Ask me at every phase. Best for your first sprint — inspect each output. |
| L2 Semi-Auto (default) | Plan/Design/Do auto; ask me on QA and Report. |
| L4 Full-Auto | Wake me when the sprint is done. Pauses only on a quality gate failure or one of 4 auto-pause triggers. |
L1 (Guided) and L3 (Auto) sit between. bkit also computes a Trust Score (0–100) from your track record and can recommend a level — but you stay in charge.
How the bkit author actually uses bkit
The 4-step experience above is the canonical flow. Here's the concrete recipe the person who built bkit uses every day — two steps — so you can copy it without needing to understand the full machinery underneath.
Step A — Use Claude Code as a thinking amplifier before the master plan
The master plan is the single most important artifact in a sprint. Garbage in, garbage out. So don't write it alone, and don't let bkit write it cold either.
- Open a fresh Claude Code session.
- Talk through the release in plain language — what's the goal, who is it for, what could go wrong, what depends on what, what's still unknown? Let Claude Code push back.
- Have it investigate — read your repo, search the web, surface edge cases you hadn't considered.
- Then run
/sprint master-plan my-release --features .... By the time you do, your context is already saturated; the master plan reflects what you really meant, not a generic template.
This is what Context Engineering looks like in practice: you don't tune the perfect prompt — you saturate the context first, then let bkit synthesise.
Step B — Hand the rest to Trust Level 4
Once you approve the master plan, the keyboard goes quiet until the final report lands. The bkit author picks one of two patterns based on release size:
| Release size | What you type | What happens |
|---|---|---|
| Small–medium — a few related features, single context budget | /control level 4, then /sprint start <project> once | bkit runs every sprint and every PDCA phase straight through to archived. You read the final report. |
| Large — many sprints, large total token budget, or cross-sprint dependencies | /control level 4, then /sprint start <sprint-id> per sprint in dependency order | Each sprint runs full-auto. You skim the sprint report before launching the next sprint — a natural checkpoint between context windows. |
Trust Level 4 is not "no safety". The 11 quality gates and 4 auto-pause triggers still fire. bkit halts and asks for you only when a measurable rule fails. The dial removes the unnecessary pauses, not the necessary ones.
The outcome, repeatable: AI plans, designs, codes, self-verifies, self-repairs, tests, and writes the completion report. You provide intent — and the final ship/no-ship decision. Nothing else.
The three commands
| Command | When to use | What it spawns | Output |
|---|---|---|---|
/sprint | Multi-feature release (quarter, milestone, multiple linked features) | sprint-master-planner, sprint-orchestrator, sprint-qa-flow, sprint-report-writer | Master plan + 8-phase per sprint + cumulative report |
/pdca | A single feature (or runs inside a sprint per feature) | pm-lead, cto-lead, gap-detector, pdca-iterator, qa-lead, report-generator (any of 34 agents) | PRD + plan + design + code + analysis + report |
/control | Anytime — set autonomy | — | Updates Trust Level scope; affects both /sprint and /pdca |
How bkit closes the AI-coding gap — Context Engineering
bkit is more than commands. It is a Context Engineering system that solves the root cause of AI-coding failures: the AI doesn't have the right context.
| The AI coding problem | bkit's Context Engineering answer |
|---|---|
| AI hallucinates because it doesn't know your conventions | 44 skills (PDCA, Sprint, PM frameworks, …) auto-injected based on intent |
| AI loses focus as the session grows | Memory + Task Management resumes across sessions; Sprints are context-budgeted |
| AI ships code that drifts from the spec | 11 Quality Gates + gap-detector measurement + pdca-iterator auto-repair |
| You only catch bugs at PR review | Phase-by-phase gating: drift is caught at every transition, not at the end |
| You need to remember the right command | 8-language auto-trigger + intent-router; type "login 만들어줘" or "build login" and bkit picks the path |
| AI sessions are ephemeral; no audit trail | Audit log + Token Ledger + Docs = Code philosophy: every decision is on disk |
bkit's three philosophies (from bkit-system/philosophy/core-mission.md)
| Principle | What it means for you |
|---|---|
| Automation First | You don't need to know PDCA, Sprint, or any command. Type what you want; bkit picks the right workflow. The state machine + workflow engine drive the rest. |
| No Guessing | If bkit isn't sure, it checks the docs. If still unsure, it asks you. It never makes up an answer. gap-detector, design-validator, and 11 quality gates enforce this. |
| Docs = Code | Every feature produces docs (PRD + plan + design + analysis + report). The docs are the contract; bkit verifies that the code matches. scripts/docs-code-sync.js enforces 0 drift in CI. |
Quick start
# 1. Install (one time)
claude plugin install bkit
# 2. Enable parallel team execution (optional, recommended)
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
# 3. Your very first run — single feature
/pdca pm my-feature # Describes what you want; bkit handles the rest
# 4. When you're ready for a multi-feature release
/sprint master-plan my-release --name "Q2 Launch" --features auth, billing, reports
# (You approve the plan)
/sprint start my-release-s1
Recommended Claude Code runtime: v2.1.150 (conservative, stable) or v2.1.159 (balanced, 112 consecutive compatible). Minimum v2.1.78.
Quality gates — the safety net explained
A "quality gate" is a hard stop that won't let the workflow advance until a measurable condition is true. bkit ships 11 of them. The ones that matter most for new users:
| Gate | What it measures | If it fails |
|---|---|---|
| M1 matchRate | How much of your design actually appears in the code, 0–100 % | Below 90 % → pdca-iterator automatically rewrites the code (up to 5 cycles) |
| M3 critical issues | Security or correctness bugs flagged by code-analyzer | Any critical → workflow pauses, you decide |
| S1 dataFlow integrity | 7-layer check: UI → Client → API → Validation → DB → Response → Client → UI | Below 85 % → 7 hops re-verified one by one |
Full M1–M10 + S1 catalog in README-FULL.md §5.
Architecture at a glance
44 skills · 34 agents · 21 hook events / 24 blocks · 2 MCP servers (19 tools) · 190 lib modules across 22 subdirs · 61 scripts · 40 templates · 118+ test files / 4,000+ test cases. Clean Architecture 4-Layer · Defense-in-Depth 4-Layer · Invocation Contract L1–L5 (226 CI-gated assertions).
Full architecture deep-dive: README-FULL.md §9.
Documentation
| Path | What's there |
|---|---|
| README-FULL.md | Full command reference, deep workflow internals, agent teams, architecture, Skill Evals |
| CHANGELOG.md | Release history (single source of truth) |
| CUSTOMIZATION-GUIDE.md | Override any bkit component in your .claude/ directory |
| AI-NATIVE-DEVELOPMENT.md | The 6 AI-Native principles and how bkit implements them |
bkit-system/philosophy/ | Core mission, Context Engineering, PDCA methodology, AI-Native principles |
docs/06-guide/sprint-management.guide.md | Sprint Management deep-dive (English) |
docs/06-guide/sprint-migration.guide.md | PDCA ↔ Sprint migration mapping (English) |
🌟 Real User Hall of Fame
bkit's quality is measured by how well it responds to real users running
bkit in production on their own projects. This section recognizes
external dogfooders whose precise bug reports + reproduction scripts
have been absorbed directly into bkit's regression test suite
(test/e2e/external-dogfood/).
v2.1.19 (2026-05-30 — first entry)
- @pruge (James Kim) —
dandi-village-ledgerproject. 10 GitHub issues over 1.5 days (#92–#107) driving bkit v2.1.17, v2.1.18 closes + the entire v2.1.19 Quality Maturation Sprint. 5 reproduction scenarios absorbed as E2E tests attest/e2e/external-dogfood/dandi-*.test.js. Seedocs/external-dogfooders/pruge.mdfor the full contribution archive. Thank you for trusting bkit with your production sprint. 🙏
v2.1.20 (2026-05-26 — second entry, first-follower effect validated)
- @bj (정병진) —
bkit v2.1.14 install incident (2026-05-26,
Validation errors: : Unrecognized key: "displayName"). Precise error message + cache path + Cursor IDE environment metadata sharing drove the entire v2.1.20 Marketplace Recovery Sprint (14 features / 3 sub-sprints / 3 new ENH 321/322/323 / 1 new ADR 0011 Plugin Manifest Schema Compliance Policy). Reproduction absorbed attest/e2e/external-dogfood/cc-min-version.test.js(5 TC, Lifecycle Stage 4 Regression Lock achieved). Triggered ADR 0006 § Empirical Validation Gate recovery (~30-day wire delay closed). Seedocs/external-dogfooders/bj.mdfor the full contribution archive. Thank you for sharing the precise error message that scoped the entire sprint correctly. 🙏
🚀 bkit Early Adopter Program
Running bkit on a non-trivial production project and willing to file detailed bug reports with reproductions makes you part of bkit's quality system, not just a bug reporter.
Established v2.1.19 (master plan §15.4 DA-1~DA-4, ENH-318 차별화 7/7).
Benefits:
- 🏆 Public recognition in this README + dedicated archive
- 🔒 Your reproduction scripts become permanent E2E regression tests
- 📊 Your activity directly powers bkit's Trust Score
(
externalDogfoodFeedbackResponseRatecomponent, weight 0.05) - 📝 CHANGELOG attribution on every release where your scenarios were absorbed
- 🤝 Direct line to bkit maintainers — priority issue triage, reproduction-script-first response
How to join: file your first detailed issue at
bkit-claude-code/issues
with bkit version, reproduction steps, expected vs actual behavior,
and file:line references. See
docs/external-dogfooders/_README.md
for the full 5-stage User-Feedback Lifecycle and program structure.
DA-4 acquisition goal: by v2.1.20 (30 days post-v2.1.19 GA), measure dogfooder population. DA-4 status (v2.1.20): N=2 confirmed (@pruge + @bj) — first-follower effect validated. v2.1.21+ continues active outreach (CC marketplace narrative, community engagement) to grow N≥3.
License
Apache 2.0 — see LICENSE and NOTICE. POPUP STUDIO PTE. LTD. · kay@popupstudio.ai