bkit
June 1, 2026 · View on GitHub
The only Claude Code plugin that verifies AI-generated code against its own design specs.
The 5-minute version is in README.md. This file is the deep one. It exists for people who want to know exactly what
/sprint,/pdca, and/controldo, which agent runs in which phase, how the 11 quality gates measure things, and how the architecture stops AI from drifting. Release history is in CHANGELOG.md and is not duplicated here.
Requirement: bkit requires Claude Code v2.1.143 or later (the strict plugin-manifest path recognizes the official
displayNamefield only from v2.1.143). On older Claude Code you will seeValidation errors: Unrecognized key: "displayName"duringclaude plugin install. Runnpm install -g @anthropic-ai/claude-code@latestto upgrade, or seedocs/06-guide/cc-compatibility.guide.md.
Who this file is for
| You are… | Read this file because… |
|---|---|
| 🌱 Vibe coder / non-developer who saw the README and asked "but how does it actually work?" | Sections 0, 4.2, and 7.3 explain the moving parts in plain language with worked examples. You will not need to know Clean Architecture to follow them. |
| 👤 Solo dev evaluating bkit before installing | Sections 2, 3, 5, 8 cover the commands, the quality-gate thresholds, and the agent rosters. |
| 👥 Team lead thinking about adoption | Sections 6, 7, 9 cover Trust Level governance, Sprint planning, and the 4-Layer architecture / Invocation Contract that keeps drift out of the codebase. |
| 🛠️ Plugin author who wants to customise bkit | Section 11 covers the project-local override pattern; sections 9 and 10 cover the contract surface. |
What you'll achieve by working through this file
The bkit promise: anyone can ship robust, production-quality software with AI — even if you've never written a unit test or seen a CI pipeline. This document explains the underlying mechanics so you can trust the workflow when it runs unattended at Trust Level 4.
| If you understand… | …you can rely on |
|---|---|
| Section 4 — Workflow Internals | bkit auto-running every PDCA phase. You'll know what happens in pm, plan, design, do, check, act, qa, report. |
| Section 5 — Quality Gates | bkit pausing before drift compounds. You'll know what M1, M3, M5, S1 measure and what bkit does when they fail. |
| Section 6 — Trust Level | Setting the autonomy dial confidently. You'll know exactly what L2 and L4 do differently. |
| Section 7 — Sprint Management | Releases that span sessions and weeks. You'll know how the Context Sizer keeps each sprint within a single Claude session window. |
| Section 8 — Agent Teams | Knowing which specialist to call. You'll know why /pdca pm, /pdca team, /pdca qa spawn different rosters. |
Table of Contents
- Plain-language Glossary
- Design Philosophy
- The Three Commands
- Command Cheat Sheet
- Workflow Internals
- Quality Gates & Self-Repair
- Trust Level & Control
- Sprint Management Deep-Dive
- Agent Teams
- Architecture
- Skill Evals
- Installation & Customization
- Requirements
- Language Support
- License & Contributing
0. Plain-language Glossary
The terms used in this file, explained for someone who is new to AI-coding.
| Term | What it really means |
|---|---|
| Claude Code | The AI coding tool you talk to. Think of it as a coworker who lives in your terminal. |
| Plugin | Extra capabilities you bolt on to Claude Code. bkit is a plugin. |
| bkit | The plugin you're reading about. It adds 3 commands (/sprint, /pdca, /control), 44 skills, 34 specialist agents, 11 quality checks. |
| Skill | A bundle of instructions Claude Code reads to remember "how do I run a PDCA cycle?" or "how do I plan a sprint?" When you type /pdca, a skill activates. |
| Agent | A specialist version of Claude Code that knows one job very well — e.g. frontend-architect, qa-lead, gap-detector. The agent's job is in its name. |
| Hook | A piece of code that runs around AI actions. bkit's hooks intercept dangerous actions (deleting files, leaking secrets), log every step, and inject context. There are 21 hook events Claude Code fires. |
| Context | Everything the AI knows at the moment it decides what to write — your prompt, the file it just read, the rules it was given, its memory. AI quality is mostly about getting the right context to the AI at the right moment. |
| Context Engineering | A discipline that says: don't try to write the perfect prompt — build a system that gives the AI the right context every time. bkit is a Context Engineering system. |
| PDCA | Plan → Do → Check → Act. A 70-year-old continuous improvement loop. bkit's 9-phase version is pm → plan → design → do → check → act → qa → report → archive. |
| Sprint | A container that groups multiple features under a shared budget and timeline. Each sprint runs through its own 8 phases. A sprint can contain many features, each going through PDCA. |
| Match rate | A percentage measuring how much of your design spec actually appears in the generated code. 90 % means the AI did 90 % of what was specified. bkit auto-repairs anything below that. |
| Quality gate | A hard stop that won't let the workflow advance until a measurable rule is satisfied. bkit has 11 of them. |
| Trust Level (L0–L4) | The dial that decides how far the AI runs unattended. L0 = "ask me at every step", L4 = "wake me when it's done". |
| Auto-pause trigger | A condition that pauses an automated run so a human can intervene. bkit has 4: quality gate fail, iteration exhausted, budget exceeded, phase timeout. |
| Audit log | A permanent record of every important action bkit took, with sensitive data scrubbed out. Lives in .bkit/runtime/audit-log.ndjson. |
| MCP | Model Context Protocol — Anthropic's standard way to plug data sources into Claude Code. bkit ships 2 MCP servers (bkit-pdca, bkit-analysis) exposing 19 tools. |
| Docs = Code | bkit's principle that says "every feature must produce design docs, and the code must match those docs." A CI gate enforces 0 drift. |
1. Design Philosophy
bkit is not a productivity hack. It brings engineering discipline to AI-native development.
The software industry refined how humans write code over decades — version control, code review, CI/CD, testing pyramids. When AI enters the loop, most of that discipline evaporates: developers prompt, accept, ship. Documentation becomes an afterthought. Quality becomes luck.
bkit exists because AI-assisted development deserves the same rigor as traditional engineering.
1.1 What we optimise for
| We optimise for | Over | Concretely |
|---|---|---|
| Process | Output | One feature through proper planning + design + implementation + verification beats ten hacked-together features. The PDCA cycle is the product. |
| Verification | Trust | AI generates plausible code. Plausible is not correct. Every implementation goes through gap analysis. Below 90 % match, the system iterates. We do not ship hope. |
| Context | Prompts | A clever prompt helps once. A systematic context system helps every time. 44 skills + 34 agents + 190 lib modules exist so the AI receives the right context at the right moment. |
| Constraints | Features | Three project levels, not infinite configuration. Fixed 9-phase PDCA and 8-phase Sprint, not a customizable workflow builder. Opinionated defaults eliminate decision fatigue. |
"We do not offer a hundred features. We engineer each one through proper design and verification. That is the difference between a tool and a discipline."
1.2 The three core philosophies (the contract with you)
These come from bkit-system/philosophy/core-mission.md. Every line of bkit code is judged against them.
| # | Principle | What it means for you, the user |
|---|---|---|
| 1 | Automation First | You don't need to memorise PDCA. You don't need to know which skill to call. Describe what you want; the intent-router (lib/orchestrator/intent-router.js) picks the right skill or agent. The state machine drives the rest. Manual is the fallback, not the default. |
| 2 | No Guessing | bkit refuses to fabricate. If gap-detector is unsure, it reads the spec again. If still unsure, it asks you via AskUserQuestion — it does not invent. 11 quality gates + design-validator + code-analyzer enforce this. |
| 3 | Docs = Code | Every feature produces docs: PRD + plan + design + analysis + completion report. The docs are the contract; the code must match. scripts/docs-code-sync.js runs in CI and fails the build if the doc-side counts drift from the code-side counts. |
1.3 Context Engineering — the deeper "why"
Most AI-coding tools focus on prompts. bkit focuses on context. The distinction is the entire reason bkit exists.
A prompt is a single message you send. Context is the entire information environment the AI is working inside — your conventions, your prior decisions, your design spec, the rules you set, the memory of last week's session, the audit trail of what changed.
"bkit is a practical implementation of Context Engineering — a systematic discipline for building, maintaining, and verifying the right context for AI-assisted development." —
bkit-system/philosophy/context-engineering.md
| Symptom of bad context | What goes wrong | How bkit fixes it |
|---|---|---|
| AI hallucinates names, paths, or APIs that don't exist | The AI has no real grounding in your code | Skills auto-load the right grounding (PDCA rules, Sprint state, your style guide). 44 skills, 8-language auto-trigger. |
| AI is confident but wrong | The AI doesn't know what "correct" means in your project | gap-detector measures match rate against the design spec; code-analyzer checks quality scores. Wrong is detected, not assumed away. |
| AI loses focus halfway through | Context window overflows in long sessions | Sprint Management splits work into context-budgeted chunks (≤ 75 K tokens each). Memory + Task Management survive session clears. |
| You ship and then discover the bug | No verification was ever run | 11 quality gates run between phases. Failing a gate auto-pauses. |
| You ship and the new dev can't tell what changed | No audit trail | Audit log + Token Ledger + Docs = Code keep a permanent record. |
1.4 Controllable AI — the principles behind /control
These come from AI-NATIVE-DEVELOPMENT.md. They are why bkit ships a Trust Level dial instead of "AI off / AI on".
| Principle | What it gives you |
|---|---|
| Safe defaults | Out of the box, bkit asks before doing anything destructive. Trust Level starts at L2 (Semi-Auto). |
| Progressive trust | As your matchRate track record improves, bkit's Trust Score can recommend a higher level — but never silently. |
| Full visibility | Every phase emits an audit entry. /sprint status, /sprint watch, /pdca status show you the current state at any moment. |
| Always interruptible | Ctrl+C cancels. /sprint pause halts. pdca-iterator auto-stops after 5 cycles. 4 auto-pause triggers guarantee the run never gets away from you. |
2. The Three Commands
Everything else in bkit — 44 skills, 34 agents, 21 hooks, 11 quality gates, 226+ contract assertions — exists to make these three commands work reliably.
| Command | One-line purpose | When you use it |
|---|---|---|
/sprint | Group multiple features into a release container, plan them, and run them. | Quarter launch, milestone, 2+ linked features sharing scope/budget/timeline |
/pdca | Drive a single feature from PRD to release-ready report through 9 phases. | A single feature, or inside a sprint (the orchestrator calls /pdca per feature automatically) |
/control | One dial setting how much of /sprint and /pdca runs unattended. | Anytime. The setting persists. |
flowchart TB
You(["You"])
You --> SM["/sprint master-plan my-release<br/>--features auth,billing,reports"]
SM --> Auto1["Master plan + task registration"]
Auto1 --> SS["/sprint start sprint-1"]
SS --> Auto2["Sprint 8-phase. Inside 'do':<br/>PDCA 9-phase per feature"]
Auto2 --> Gate{"matchRate ≥ 90%?"}
Gate -- "no — auto-fix" --> Iter["pdca-iterator (max 5)"]
Iter --> Gate
Gate -- "yes" --> Done(["Release-ready"])
Ctrl["/control level 0..4"] -.->|"scopes auto-run"| Auto1
Ctrl -.->|"applies"| Auto2
style You fill:#e3f2fd
style SM fill:#fff3e0
style SS fill:#fff3e0
style Auto2 fill:#fce4ec
style Iter fill:#ffe0b2
style Done fill:#c8e6c9
style Ctrl fill:#f3e5f5
The sprint user journey, step by step
| Step | What you type | What bkit does | Output |
|---|---|---|---|
| 1. Plan | /sprint master-plan my-release --name "Q2 Launch" --features auth,billing,reports | sprint-master-planner writes a Context-Anchor-driven master plan. The Context Sizer (Kahn topological + greedy bin-packing) splits features into ≤ 75 K-token sprints honoring dependencies. | docs/01-plan/features/my-release.master-plan.md + per-sprint prd / plan / design templates; .bkit/state/master-plans/my-release.json; audit entry master_plan_created |
| 2. Register | (automatic in Step 1) | Each sprint in plan.sprints[] becomes a task via TaskCreate. Cross-sprint dependsOn becomes task blockedBy. | One task per sprint, Kahn-ordered. Visible via /sprint list or TaskList |
| 3. Execute | /sprint start my-release-s1 | sprint-orchestrator advances the 8-phase sprint. Inside do, PDCA 9-phase runs once per feature — pm-lead PRD, cto-lead team spawn, gap-detector measure, pdca-iterator repair, qa-lead test, report-generator summarize. | Per-feature artifacts under docs/00-pm/..., docs/01-plan/features/..., docs/02-design/features/..., docs/04-report/features/...; sprint state .bkit/state/sprints/my-release-s1.json |
| 4. Govern | /control level 0..4 (anytime) | The dial scopes how far both Sprint and PDCA phases auto-advance before stopping. Trust Score (0–100) can also recommend a level from your track record. | .bkit/state/trust-profile.json; effective scope mirrored into lib/control/automation-controller.js:SPRINT_AUTORUN_SCOPE (L3 contract test SC-07 enforces the 1:1 mirror) |
Single feature shortcut: skip steps 1–2 and run /pdca pm <feature> directly. Step 4 still applies.
2.1 Why each step exists — the worked detail
The 4 steps map directly to the user experience the bkit author had in mind. Each one is a deliberate answer to a specific failure mode of AI-assisted development.
Step 1 — Plan the release in depth, not in haste.
When you type /sprint master-plan my-release --features auth, billing, reports, bkit does not rush to a plan. The sprint-master-planner agent runs deliberately:
- If your repo is non-trivial, it explores your existing code (
Task(Explore)) before assuming anything about your conventions. - If your feature involves a domain it doesn't already know (e.g., a payment provider you haven't integrated before), it does web research first.
- Then it splits your features into context-budgeted Sprints using:
- Kahn topological sort — to respect "billing depends on auth" type relationships.
- Greedy bin-packing — to keep each sprint ≤ 75 K tokens, so a single Claude Code session can finish the sprint without running out of context.
- Finally it writes a master plan with a Context Anchor (WHY / WHO / WHAT / RISK / SUCCESS / SCOPE) that every later phase reads.
You can also call specialist agents directly when you want more depth for a particular feature:
| Command | What it spawns | Output |
|---|---|---|
/pdca pm <feature> | 4 PM agents in parallel (pm-discovery + pm-strategy + pm-research + pm-prd) using 43 product-management frameworks (JTBD, Lean Canvas, SWOT, PESTLE, Porter's, Pre-mortem, Personas, TAM/SAM/SOM, …) | A comprehensive PRD at docs/00-pm/<feature>.prd.md |
/pdca team <feature> | A multi-specialist implementation team led by cto-lead — 4–6 agents in parallel (developer · qa · frontend · backend · security · architect) | Code + reviews from multiple perspectives |
/pdca qa <feature> | 5-agent QA team led by qa-lead — test-planner · test-generator · debug-analyst · qa-monitor with Zero Script QA (Docker log analysis) | Full L1–L5 test plan, generated tests, runtime verification |
Step 2 — Approve, register, remember.
After you approve the master plan, bkit takes three durable actions:
- Task Management registration — every sprint becomes a task via the system's
TaskCreatetool. The sprint dependency order becomes taskblockedByorder. You can see them via/sprint listor via Claude Code'sTaskList. - Memory persistence — the sprint roster is written to
.bkit/state/memory.json. If your laptop dies, your session clears, or you start a new Claude Code session next month, the plan is still there. - Audit entry — an
master_plan_createdaction lands in.bkit/runtime/audit-log.ndjson.
The combination means bkit picks up exactly where it stopped, every single time.
Step 3 — Auto-run each sprint through PDCA.
/sprint start sprint-1 advances the 8-phase sprint (prd → plan → design → do → iterate → qa → report → archived). Inside the do phase, the orchestrator runs the full PDCA 9-phase loop once per feature. The default targets are aggressive:
iteratephase targets matchRate = 100 % (will accept ≥ 90 %, controlled by gate M1)qaphase requires all 11 gates pass, including S1 dataFlow integrity ≥ 85 %- Below threshold →
pdca-iteratorauto-fires, up to 5 self-repair cycles
Critically, every transition is gated (see §5). The workflow can't accidentally advance past a failure.
Step 4 — Govern with one dial.
/control level N is the autonomy knob. The same setting governs both Sprint and PDCA — there's no second knob to forget. The dial maps to a stopAfter phase (see §6). Trust Score (0–100) can recommend a level from your track record, but you stay in charge: autoEscalation / autoDowngrade flags in bkit.config.json decide whether bkit may move the dial on its own.
Hook-driven invisible execution: while you read the master plan, Claude Code's 21 hook events are quietly firing —
PreToolUseblocks unsafe operations,PostToolUselogs every action,SessionStartrestores memory,Stopwrites the closing audit entry. You never invoke a hook directly; bkit attaches them automatically throughhooks/hooks.json(24 blocks across 21 events). See §4.3 for the lifecycle map.
8-language auto-trigger: skills and agents declare keywords in 8 languages (EN, KO, JA, ZH, ES, FR, DE, IT). If you type "로그인 기능 만들어줘", "作成一个登录功能", or "build a login feature" — bkit's intent-router maps to the same skill. You never need to know the English command name.
3. Command Cheat Sheet
Sprint (16 sub-actions)
| Sub-action | Purpose |
|---|---|
init <id> | Create a sprint manually (without a master plan) |
master-plan <project> --features ... | Auto-write the master plan + register every sprint as a task |
start <id> | Run the sprint up to the Trust Level scope |
status <id> | Current state + triggers + matrix snapshot |
list | All sprints with phase + status |
watch <id> | Live dashboard, ticks every 30 s |
phase <id> --to <phase> | Manual phase transition |
iterate <id> | matchRate-100 % loop (max 5) |
qa <id> | 7-Layer S1 dataFlow integrity check |
report <id> | Cumulative KPI + lessons learned |
archive <id> | Move to terminal state (forward-only) |
pause <id> / resume <id> | Stop / restart auto-run |
fork <id> --new <newId> | Carry incomplete features into a new sprint |
feature <id> --action list/add/remove --feature <name> | Manage features inside the sprint |
help | Help text |
PDCA (single feature, 9 phases + utilities)
| Sub-action | Purpose | Spawned agents |
|---|---|---|
pm <feat> | 4 PM agents in parallel → PRD with 43 frameworks | pm-lead · pm-discovery · pm-strategy · pm-research · pm-prd |
plan <feat> | Plan with Context Anchor + Module Map | product-manager |
design <feat> | 3 architecture options (Minimal / Clean / Pragmatic) | cto-lead · frontend-architect · security-architect |
do <feat> | Implementation (single-agent mode) | bkend-expert · frontend-architect |
team <feat> | 4–6 specialists in parallel (recommended for do) | cto-lead orchestrates developer · qa · frontend · security · architect |
check <feat> | Design ↔ code gap analysis | gap-detector |
iterate <feat> | Auto-fix sub-90 % match | pdca-iterator |
qa <feat> | L1–L5 test execution | qa-lead · qa-test-planner · qa-test-generator · qa-debug-analyst · qa-monitor |
report <feat> | KPI + lessons learned | report-generator |
archive <feat> | Move docs to archive + state cleanup | — |
status | Current PDCA state across features | — |
cleanup | Remove stale features (idle > 7 d) | — |
watch | Live dashboard | — |
Control & utilities
| Command | Purpose |
|---|---|
/control level 0..4 | Set autonomy (applies to /sprint + /pdca) |
/control status | Current Trust Level + Trust Score |
/bkit | List skills, agents, commands |
/bkit-explore | Browse component tree (5 categories) |
/pdca-batch | Independent parallel PDCA cycles (no shared scope) |
4. Workflow Internals
4.1 What each PDCA phase does (without you)
PDCA is bkit's 9-phase loop for a single feature. Each phase has a definite output written to disk — that's the "Docs = Code" principle in action. You can stop after any phase, inspect the output, and resume.
| Phase | What bkit auto-runs | Where the result lands | Why this phase exists |
|---|---|---|---|
| pm (product management) | pm-lead spawns 4 PM agents in parallel: discovery (Opportunity Solution Tree + Brainstorm + Assumption Risk) · strategy (JTBD + Lean Canvas + SWOT + PESTLE + Porter's + Growth Loops) · research (Personas + Competitors + TAM/SAM/SOM + Journey Map + ICP) · prd (Pre-mortem + User/Job Stories + Test Scenarios + Stakeholder Map + Battlecards) | docs/00-pm/<feature>.prd.md | So the AI knows who this feature serves, why, and what success looks like — before writing a line of code. |
| plan | product-manager writes the plan with a Context Anchor (WHY / WHO / WHAT / RISK / SUCCESS / SCOPE) + Module Map + Session Guide | docs/01-plan/features/<feature>.plan.md | The Context Anchor is what every later phase reads. It is the single source of truth for intent. |
| design | cto-lead proposes 3 architecture options (Minimal / Clean / Pragmatic). Single AskUserQuestion pause for the choice. | docs/02-design/features/<feature>.design.md | Three options force a real trade-off discussion. You pick one; bkit honours it for the rest of the cycle. |
| do (single-agent mode) | developer / bkend-expert / frontend-architect writes the code on its own | Source files | Best for small, contained features where one specialist is enough. |
do (team mode, via /pdca team) | cto-lead spawns 4–6 specialists in parallel: developer · qa · frontend · backend · security · architect. Sequential dispatch enforced under ENH-292 to dodge sub-agent caching regressions. | Source files + per-agent review notes | The default for non-trivial work. Multiple perspectives catch bugs single specialists miss. |
| check | gap-detector measures design ↔ code match rate | docs/03-analysis/<feature>.analysis.md | Verifies that what got built matches what was designed. No fabricated progress reports — the percentage is measured, not asserted. |
| act | matchRate ≥ 90 % → advance. < 90 % → pdca-iterator (Evaluator-Optimizer pattern, max 5 cycles) | Iteration log appended to analysis doc | Self-repair. The phase where bkit fixes drift before you ever see it. |
| qa | qa-lead orchestrates 4 QA agents: qa-test-planner (L1–L5 plan) · qa-test-generator (test code) · qa-debug-analyst (runtime errors) · qa-monitor (Zero Script QA via Docker logs) | docs/05-qa/<feature>.qa.md + actual test files | L1 unit · L2 integration · L3 contract · L4 system · L5 E2E. The full pyramid. |
| report | report-generator produces a completion report with KPI + lessons learned + carry items | docs/04-report/features/<feature>.report.md | The audit trail. Next sprint planner reads this to learn from this one. |
| archive | Checkpoint preserved, state cleaned, MEMORY.md appended | .bkit/state/ + docs/archive/ | Closes the loop. The feature is done; the docs are searchable forever. |
Beginner note: you almost never type
/pdca checkor/pdca actyourself. The orchestrator runs them automatically when the previous phase ends. The only phase that pauses for input (under the default Trust Level L2) is design, where you pick one of the three architecture options.
4.2 Live scenario — Trust L4, autoIterate=true
A realistic 60-minute run with one user input:
10:00 /pdca pm user-auth
└─ pm-lead spawns 4 PM agents in parallel (43 frameworks)
10:08 PRD complete · auto-advance
10:12 /pdca plan (auto) → product-manager → Context Anchor written
10:18 /pdca design (auto) → cto-lead → 3 architecture options
Checkpoint AskUserQuestion: "Minimal / Clean / Pragmatic?" [1 USER INPUT]
10:20 Design confirmed · auto-advance
10:20 /pdca team (auto) → cto-lead spawns 4 specialists
10:45 Implementation complete · auto-advance to check
10:45 /pdca check (auto) → gap-detector → matchRate = 78 %
M1 FAIL (78 < 90) → AUTO-TRIGGER /pdca iterate
10:48 Cycle 1: pdca-iterator patches 7 gaps → re-measure 89 %
10:50 Cycle 2: patches 3 more → re-measure 94 % ✅ EXIT
10:50 /pdca qa (auto) → qa-lead → 4 QA agents L1–L5
10:58 QA PASS · auto-advance
10:58 /pdca report (auto) → report-generator → completion report
11:00 Feature complete.
Total: 60 min · 1 user input · 4–6 parallel agents · 2 self-repair cycles
4.3 Claude Code hooks lifecycle — what runs around every AI action
You don't invoke hooks. They run automatically because bkit attaches them via hooks/hooks.json (24 hook blocks across 21 hook events). Hooks are what make bkit's safety net invisible: you never have to remember to verify, log, or block — Claude Code fires the events, bkit responds.
flowchart TD
A["You type a message"] --> B["UserPromptSubmit hook<br/>intent-router · skill auto-trigger"]
B --> C["AI decides on a tool call"]
C --> D["PreToolUse hook<br/>unified-bash-pre.js · unified-write-pre.js<br/>SAFETY: block unsafe ops"]
D -- "allowed" --> E["Tool executes (Write, Bash, MCP, …)"]
E --> F["PostToolUse hook<br/>unified-bash-post.js · unified-write-post.js · skill-post.js<br/>AUDIT: log, redact secrets"]
F --> G["AI continues / replies"]
G --> H["Stop hook<br/>unified-stop.js · next-action-engine"]
H --> I["Session ends or continues"]
I -.->|"new session"| J["SessionStart hook<br/>memory restore"]
D -- "blocked" --> X["AskUserQuestion fallback"]
style D fill:#ffe0b2
style F fill:#fce4ec
style H fill:#e3f2fd
style X fill:#ffcdd2
| Hook event | bkit script | What it stops or does |
|---|---|---|
| SessionStart | hooks/session-start.js | Restore memory; print Trust Level banner; warn about unfinished sprints |
| UserPromptSubmit | scripts/user-prompt-handler.js | Route intent; auto-trigger the right skill (/pdca pm, /sprint, …) based on 8-language keywords |
| PreToolUse(Bash) | scripts/unified-bash-pre.js | Block dangerous shell (rm -rf /, curl-to-shell, fork-bombs, …) before they run |
| PreToolUse(Write/Edit) | scripts/unified-write-pre.js | Block writes to protected paths; verify the file is in the active sprint scope |
| PreToolUse(Skill) | scripts/skill-pre.js | Inject skill context; verify the skill is allowed at the current Trust Level |
| PostToolUse(Bash) | scripts/unified-bash-post.js | Audit-log the command + exit code; scrub 7 PII patterns |
| PostToolUse(Write/Edit) | scripts/unified-write-post.js | Update docs-code index; recompute drift |
| PostToolUse(Skill) | scripts/skill-post.js | Emit skill-completion telemetry |
| Stop / SubagentStop / SessionEnd | scripts/unified-stop.js | Final audit entry; commit memory; fire the next-action-engine if a Sprint/PDCA phase wants to chain |
| PreCompact | scripts/context-compaction.js | Persist context before Claude Code compresses the conversation; defends /compact regressions (e.g., #47855 Opus 1M block) |
The point is: safety is not your job. You describe the work; bkit's hooks enforce the invariants.
4.4 8-language auto-trigger — type in your language, bkit picks the command
Skills and agents declare trigger keywords in 8 languages. The intent-router matches your input against all of them.
| Language | Sample trigger | Skill it activates |
|---|---|---|
| English | "build login feature" | /pdca pm |
| Korean (한국어) | "로그인 기능 만들어줘" | /pdca pm |
| Japanese (日本語) | "ログイン機能を作って" | /pdca pm |
| Chinese (中文) | "创建一个登录功能" | /pdca pm |
| Spanish (Español) | "crear una función de inicio de sesión" | /pdca pm |
| French (Français) | "créer une fonction de connexion" | /pdca pm |
| German (Deutsch) | "Anmeldefunktion erstellen" | /pdca pm |
| Italian (Italiano) | "creare una funzione di accesso" | /pdca pm |
You never need to know the English command name. You also never need to remember which of the 44 skills matches your intent — the keyword detection does that for you.
4.5 When to use which specialist team
Three subcommands of /pdca spawn different rosters. Pick by what you need now, not by phase name.
| You want… | Use | Roster |
|---|---|---|
| Deep product analysis — personas, market sizing, JTBD, pre-mortem before any code | /pdca pm <feature> | pm-lead orchestrates pm-discovery + pm-strategy + pm-research + pm-prd (4 agents, 43 frameworks) |
| Parallel implementation by multiple specialists — frontend, backend, QA, security all working at once | /pdca team <feature> | cto-lead orchestrates 4–6 specialists (developer · qa · frontend · backend · security · architect) |
| Thorough QA — full L1–L5 test plan, generated tests, runtime verification | /pdca qa <feature> | qa-lead orchestrates qa-test-planner + qa-test-generator + qa-debug-analyst + qa-monitor (4 agents) |
| Sprint-level meta-orchestration — multi-feature plan, dependency order, budget | /sprint master-plan <project> | sprint-master-planner (uses Context Sizer) |
| One-shot single-feature run | /pdca pm <feature> then auto-advance under Trust Level | The orchestrator picks the agent per phase |
5. Quality Gates & Self-Repair
A quality gate is a hard stop that won't let the workflow advance until a measurable condition is true. It's the mechanical version of a code reviewer who refuses to merge bad work — except this reviewer never sleeps, never has a bad day, and never lets a metric slide because it's Friday.
Every phase transition is gated. Failure pauses the run and writes an audit entry. You don't have to remember to verify — verification is automatic.
| Gate | Threshold | Triggered when | On failure |
|---|---|---|---|
| M1 matchRate | ≥ 90 % | check phase ends | pdca-iterator auto-fires (Evaluator-Optimizer, max 5 cycles) |
| M2 codeQualityScore | ≥ 80 | post-do | code-analyzer re-runs, user confirmation requested |
| M3 criticalIssue count | 0 | post-do | Immediate pause, user escalation |
| M4 conventionCompliance | ≥ 90 % | post-do | Lint auto-fix attempted |
| M5 testCoverage | ≥ 70 % | post-qa | qa-test-generator adds tests |
| M6 securityScore | ≥ 85 | post-do | security-architect review |
| M7 documentationCompleteness | ≥ 90 % | post-report | Auto-doc generation |
| M8 sprint matchRate | ≥ 85 % | sprint iterate phase | Sprint iterate loops (max 5) |
| M9 contractInvariant | 0 violation | CI gate | Build blocked |
| M10 regressionGuard | 0 new regression | post-iterate | regression-registry registers + monitors |
| S1 dataFlowIntegrity | ≥ 85 % | sprint qa phase | 7-Layer hop re-verified (UI → Client → API → Validation → DB → Response → Client → UI) |
Thresholds live in bkit.config.json. Sprint-specific overrides via sprint.config.{...} at sprint init.
What each gate protects you from — in plain language
| Gate | What it prevents | A failure looks like |
|---|---|---|
| M1 matchRate | The AI claiming "done" when half the design is missing | "We asked for login + signup + reset; AI shipped login only" → matchRate = 33 %, pdca-iterator fires |
| M2 codeQualityScore | Code that runs but is unreadable / unmaintainable | Long functions, deep nesting, magic numbers → 78 / 100 score, review re-runs |
| M3 criticalIssue count | Bugs that will break production | Hardcoded credentials, SQL injection, null deref → workflow pauses immediately |
| M4 conventionCompliance | Files that don't follow your existing style | Wrong indentation, mixed quotes, broken imports → lint auto-fix runs |
| M5 testCoverage | Code with no tests at all | New module ships with 0 % coverage → qa-test-generator adds tests |
| M6 securityScore | OWASP Top-10 type vulnerabilities | Missing input validation, unsafe deserialization → security-architect reviews |
| M7 documentationCompleteness | Code that nobody can pick up next quarter | API endpoint with no description → auto-doc fills gap |
| M8 sprint matchRate | Sprint declared done with one feature half-built | One of three features stuck at 70 % → sprint iterate loops, max 5 |
| M9 contractInvariant | Architecture decisions silently violated | Someone imports fs into the Domain layer → CI build blocked |
| M10 regressionGuard | A new bug that fixes another | Same test fails again after iterate → tracked, monitored |
| S1 dataFlowIntegrity | Front-end form that posts but back-end never receives it | UI → Client → API hop count check fails → 7-Layer trace runs |
Worked example: a failure and the auto-repair
Scenario: you ask for a "user-auth" feature. The AI implements login but forgets the password-reset spec line.
/pdca checkruns.gap-detectorreadsdesign.md(3 specs: login, signup, reset) and the generated code (2 implemented).- matchRate = 67 %. M1 fails (threshold 90 %).
pdca-iteratorauto-fires — no user input needed.- Cycle 1 — iterator reads the missing spec, adds the password-reset module → re-measure → 91 %.
- M1 passes → workflow auto-advances to
qa.
You only learn this happened from the audit log. The AI fixed its own bug before you saw it.
The self-repair loop
sequenceDiagram
autonumber
participant U as You
participant P as /pdca check
participant GD as gap-detector
participant PI as pdca-iterator
participant Code as Your code
U->>P: /pdca check my-feature
P->>GD: measure design ↔ code
GD-->>P: matchRate = 72%
P->>PI: AUTO-FIRE (no user input)
PI->>Code: Cycle 1 — patch 7 missing items
PI->>GD: re-measure
GD-->>PI: 84%
PI->>Code: Cycle 2 — patch 3 more
PI->>GD: re-measure
GD-->>PI: 91% ✅
PI-->>P: EXIT (90% threshold met)
P-->>U: ✅ check passed, advance to qa
/pdca iterate is not a button you press. gap-detector detects sub-90 → pdca-iterator fires automatically. If the 5th cycle still fails, ITERATION_EXHAUSTED auto-pauses the sprint and escalates to you.
6. Trust Level & Control
/control level N is the single autonomy dial. It scopes how far /sprint and /pdca run before stopping — one knob, both surfaces.
| Level | Name | stopAfter | Pick when |
|---|---|---|---|
| L0 | Manual | every phase | First-time user; inspect each output |
| L1 | Guided | plan | Verify scope before AI implements |
| L2 | Semi-Auto | do | Default — Plan/Design/Do auto, QA/Report manual |
| L3 | Auto | qa | Trust implementation, double-check QA |
| L4 | Full-Auto | archived | Fire-and-forget; pauses only on quality-gate failure or auto-pause trigger |
Trust Score (0–100)
bkit computes a Trust Score from your recent track record (matchRate history, manual-override frequency, gate-pass rate). High scores can auto-escalate the level; low scores auto-downgrade. Override anytime with /control level N.
| Trust Score | Effect |
|---|---|
| ≥ 80 | pdca-fast-track available — auto-approves Checkpoints 1–8 |
| 60–79 | Defaults to L2 (Semi-Auto) |
| < 60 | Defaults to L1 (Guided) |
autoEscalation and autoDowngrade flags in bkit.config.json:automation decide whether bkit may change the level on its own.
Which level should I pick?
If you're new to bkit, start low and earn trust. The Trust Score climbs automatically as your sprint history accumulates clean matchRate runs; bkit can offer to upgrade you when it's safe.
| Situation | Recommended level |
|---|---|
| Day 1 with bkit, you want to see what each phase produces | L0 Manual — inspect every output |
| You're confident the spec is right but want to review architecture choices | L1 Guided — stop after Plan |
| Daily driver — you trust planning and design but want to check QA yourself | L2 Semi-Auto (default) |
| You've shipped a few features with bkit and the matchRate is consistently > 95 % | L3 Auto — only Report is manual |
| Overnight run, fire-and-forget, you'll review the report tomorrow | L4 Full-Auto — pauses only on a quality-gate failure or auto-pause trigger |
The dial is reversible. Drop it back down whenever you want; the next phase respects the new setting.
7. Sprint Management Deep-Dive
7.1 The 8-phase sprint lifecycle
flowchart LR
prd --> plan --> design --> do
do --> iterate
iterate -- "matchRate < 100%, max 5" --> iterate
iterate --> qa
qa --> report --> archived
style prd fill:#e3f2fd
style iterate fill:#ffe0b2
style qa fill:#fce4ec
style report fill:#e8f5e9
style archived fill:#c8e6c9
| Phase | Output | Agent |
|---|---|---|
| prd | docs/00-pm/<sprint>.prd.md | sprint-master-planner |
| plan | docs/01-plan/features/<sprint>.plan.md | sprint-master-planner |
| design | docs/02-design/features/<sprint>.design.md | sprint-master-planner |
| do | Per-feature PDCA cycles run inside | sprint-orchestrator |
| iterate | docs/03-analysis/<sprint>.iterate.md (per cycle) | pdca-iterator (delegated) |
| qa | docs/05-qa/<sprint>.qa.md (7-Layer S1) | sprint-qa-flow |
| report | docs/04-report/features/<sprint>.report.md | sprint-report-writer |
| archived | Terminal state; sprint state preserved | — |
7.2 The 4 auto-pause triggers
A sprint pauses automatically on any of these. Resume with /sprint resume <id> after fixing the root cause.
| Trigger | Condition | Most common cause |
|---|---|---|
QUALITY_GATE_FAIL | Any M-gate or S1 fails | matchRate stuck below 90 % after iterate exhausts |
ITERATION_EXHAUSTED | iterate phase exceeds 5 cycles | Gap too large to auto-fix; needs human intervention |
BUDGET_EXCEEDED | Token usage > sprint budget (default 1 M) | Feature scope underestimated |
PHASE_TIMEOUT | Phase exceeds timeout (default 4 h) | Hung or blocked |
7.3 Sprint vs PDCA vs pdca-batch — pick one
flowchart TD
Q["What are you building?"]
Q --> Single{"Single feature?"}
Single -- "Yes" --> PDCA["/pdca pm feature<br/>9-phase per feature"]
Single -- "No" --> Shared{"Shared scope/<br/>budget/timeline?"}
Shared -- "Yes" --> Sprint["/sprint master-plan project<br/>8-phase container,<br/>PDCA runs inside"]
Shared -- "No" --> Batch["/pdca-batch<br/>independent parallel cycles"]
Q --> Static{"Non-technical<br/>or static-only?"}
Static -- "Yes" --> Starter["/starter init<br/>no PDCA needed"]
Deep-dive guide: docs/06-guide/sprint-management.guide.md. PDCA ↔ Sprint migration mapping: docs/06-guide/sprint-migration.guide.md.
8. Agent Teams
bkit ships 34 agents organised into specialist teams. Three teams matter most for the daily workflow:
8.1 PM Agent Team — /pdca pm <feature>
Runs before the Plan phase to produce a comprehensive PRD via automated product discovery. Based on pm-skills by Pawel Huryn (MIT).
flowchart LR
PM["pm-lead (opus)"] --> D["pm-discovery (sonnet)"]
PM --> S["pm-strategy (sonnet)"]
PM --> R["pm-research (sonnet)"]
D --> PRD["pm-prd (sonnet)"]
S --> PRD
R --> PRD
PRD --> Out["docs/00-pm/feature.prd.md"]
8.2 CTO-Led Team — /pdca team <feature>
Parallel implementation with multiple specialists.
flowchart TB
CTO["cto-lead (opus)"]
CTO --> FE["frontend-architect"]
CTO --> BE["bkend-expert / enterprise-expert"]
CTO --> QA["qa-strategist"]
CTO --> SEC["security-architect"]
CTO --> PM["product-manager"]
CTO -. "Enterprise +1" .-> Arch["infra-architect"]
| Level | Teammates | Default roster |
|---|---|---|
| Dynamic | 3 | developer · qa · frontend |
| Enterprise | 5 | architect · developer · qa · reviewer · security |
| Enterprise + Sprint (v2.1.13) | 6 | + sprint-orchestrator |
Requirements: CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 + Claude Code v2.1.32+.
8.3 QA Lead Team — /pdca qa <feature>
flowchart LR
QL["qa-lead"] --> TP["qa-test-planner<br/>L1-L5 plan"]
QL --> TG["qa-test-generator<br/>test code"]
QL --> DA["qa-debug-analyst<br/>runtime errors"]
QL --> QM["qa-monitor<br/>Zero Script QA"]
8.4 Sprint Team — added in v2.1.13
| Agent | Role |
|---|---|
sprint-master-planner | Writes Context-Anchor-driven master plan; invokes Context Sizer |
sprint-orchestrator | Advances sprint 8 phases; spawns PDCA per feature in do |
sprint-qa-flow | Runs 7-Layer S1 dataFlow integrity check |
sprint-report-writer | Aggregates phase + iterate history + KPI + lessons learned |
9. Architecture
9.1 Clean Architecture 4-Layer
flowchart TB
subgraph Presentation
hooks["hooks/ (21 events / 24 blocks)"]
scripts["scripts/ (51 Node.js)"]
skills["skills/ (44)"]
agents["agents/ (34)"]
end
subgraph Infrastructure
infra["lib/infra/ (cc-bridge · telemetry · docs-code-scanner · mcp-port-registry · sprint · …)"]
end
subgraph Application
app["lib/application/ (pdca-lifecycle · sprint-lifecycle · …)"]
ccr["lib/cc-regression/"]
team["lib/team/"]
end
subgraph Domain
domain["lib/domain/ (18 modules · 0 forbidden imports · CI-enforced)"]
end
Presentation --> Infrastructure
Infrastructure --> Application
Application --> Domain
9.2 Component inventory (v2.1.13, measured 2026-05-12)
| Surface | Count | Notes |
|---|---|---|
| Skills | 44 | +sprint added v2.1.13 |
| Agents | 34 | +4 sprint agents added v2.1.13 (sprint-master-planner · sprint-orchestrator · sprint-qa-flow · sprint-report-writer) |
| Hook events / blocks | 21 / 24 | Invariant maintained |
| MCP servers / tools | 2 / 19 | +3 sprint tools (bkit_sprint_list · bkit_sprint_status · bkit_master_plan_read) |
| Lib modules / subdirs | 190 / 22 | +lib/application/sprint-lifecycle/ (13 modules) + lib/infra/sprint/ (9 modules) |
| Scripts | 51 | +sprint-handler.js (660 LOC) + sprint-memory-writer.js (138 LOC) |
| Templates | 39 | +7 sprint templates |
| Test files / cases | 118+ / 4,000+ | +tests/contract/v2113-sprint-contracts.test.js (10 SC contracts) |
| ACTION_TYPES | 20 | +sprint_paused · sprint_resumed · master_plan_created · task_created |
| CATEGORIES | 11 | +sprint |
| Port↔Adapter pairs | 7 | cc-payload · state-store · regression-registry · audit-sink · token-meter · docs-code-index · mcp-tool |
9.3 Defense-in-Depth 4-Layer
flowchart LR
User["User input"] --> L1["Layer 1: CC built-in sandbox"]
L1 --> L2["Layer 2: bkit PreToolUse hooks<br/>pre-write.js · unified-bash-pre.js<br/>defense-coordinator"]
L2 --> L3["Layer 3: audit-logger sanitizer<br/>OWASP A03/A08 · 7-key PII redaction"]
L3 --> L4["Layer 4: Token Ledger NDJSON<br/>.bkit/runtime/token-ledger.ndjson"]
L4 --> Action["Tool execution"]
9.4 3-Layer Orchestration
flowchart TB
user["User message"]
user --> IR["intent-router<br/>feature > skill > agent priority"]
IR --> NA["next-action-engine<br/>Stop-family hook routing"]
NA --> TP["team-protocol<br/>PM / CTO / QA Lead Task spawn"]
NA --> WSM["workflow-state-machine<br/>matchRate SSoT 90"]
TP --> Agents["34 agents"]
WSM --> Phases["PDCA + Sprint phases"]
9.5 Invocation Contract L1–L5
| Level | What | Count | Where |
|---|---|---|---|
| L1 | Contract baseline JSON | 94 | tests/contract/baseline.json |
| L2 | Hook attribution smoke | 98 TC | tests/integration/hooks/ |
| L3 | MCP stdio runtime | 42 TC | tests/contract/l3-mcp-stdio.test.js |
| L3 (v2.1.13) | Sprint cross-sprint contracts | 10 TC (SC-01~10) | tests/contract/v2113-sprint-contracts.test.js |
| L5 | E2E shell scenarios | 5 | tests/e2e/run-all.sh |
CI gate contract-check.yml enforces 226+ assertions.
10. Skill Evals
bkit extends Claude Code's Skill Evals into a complete skill lifecycle management system: "are my skills still worth keeping?"
10.1 Three layers over native evals
| Layer | Claude Code native | bkit enhancement |
|---|---|---|
| Eval execution | Basic runner | evals/runner.js with benchmark mode + 29 eval definitions |
| A/B testing | Not available | evals/ab-tester.js compares skill performance across models |
| Classification | Not available | All 44 skills classified Workflow / Capability / Hybrid with deprecation-risk scoring |
10.2 Skill classification
| Class | Count | Purpose | What evals measure |
|---|---|---|---|
| Workflow | 17 | Process automation (PDCA, pipelines) | Regression — these skills are permanent |
| Capability | 18 | Model ability augmentation | Parity testing — can the model match the skill's output without it? |
| Hybrid | 1 | Both process + capability | Both regression and parity |
When a model upgrade makes a Capability skill redundant, the Model Parity Test detects it:
# Does the model produce equivalent results without this skill?
node evals/ab-tester.js --parity phase-3-mockup --model claude-opus-4-7
# Compare skill performance between two models
node evals/ab-tester.js --skill pdca --modelA claude-sonnet-4-6 --modelB claude-opus-4-7
# Run all 29 skill evaluations
node evals/runner.js --benchmark
Philosophy: bkit's third principle is No Guessing. Skill Evals replace intuition with measurement.
11. Installation & Customization
11.1 Marketplace install (recommended)
# Step 1: Add bkit marketplace
/plugin marketplace add popup-studio-ai/bkit-claude-code
# Step 2: Install bkit plugin
/plugin install bkit
# Step 3: (Optional) Enable Agent Teams
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
| Plugin | Best for |
|---|---|
| bkit | Full PDCA methodology + Sprint Management for experienced developers |
| bkit-starter | Korean learning guide for first-time Claude Code users |
11.2 Customization (project-local overrides)
Claude Code searches in this priority order:
- Project
.claude/(your customizations — highest priority) - User
~/.claude/ - Plugin installation (default)
# Step 1: Find the plugin installation
ls ~/.claude/plugins/bkit/
# Step 2: Copy only the file you want to customize
mkdir -p .claude/skills/starter
cp ~/.claude/plugins/bkit/skills/starter/SKILL.md .claude/skills/starter/
# Step 3: Edit; your version overrides the plugin's
Full guide with platform paths + license attribution: CUSTOMIZATION-GUIDE.md.
⚠️ CC v2.1.113+ Users — ~/.claude/skills/ may be silently deleted on first run (#51234). bkit plugin itself is unaffected (uses ${CLAUDE_PLUGIN_ROOT}/skills/). Back up user custom skills before upgrading.
12. Requirements
| Requirement | Minimum | Recommended | Notes |
|---|---|---|---|
| Claude Code | v2.1.78 | v2.1.150 (conservative) · v2.1.159 (balanced) | 112 consecutive compatible releases since v2.1.34 |
| Node.js | v18+ | — | Hook script execution |
| Agent Teams (optional) | CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 | — | Required for /pdca team |
Troubleshooting: If you see
"Failed to load hooks"after install, runclaude update.
13. Language Support
bkit auto-detects 8 languages from trigger keywords:
| Language | Trigger sample |
|---|---|
| English | static website, beginner, API design |
| Korean | 정적 웹, 초보자, API 설계 |
| Japanese | 静的サイト, 初心者, API設計 |
| Chinese | 静态网站, 初学者, API设计 |
| Spanish | sitio web estático, principiante |
| French | site web statique, débutant |
| German | statische Webseite, Anfänger |
| Italian | sito web statico, principiante |
Set your reply language with language in .claude/settings.json:
{ "language": "korean" }
Trigger keywords work in any language regardless of the reply setting.
14. License & Contributing
| License | Apache 2.0 · LICENSE · NOTICE (required for redistribution) |
| Copyright | 2024–2026 POPUP STUDIO PTE. LTD. |
| Contributing | CONTRIBUTING.md — main requires admin merge + PR review |
| Issues | GitHub Issues |
contact@popupstudio.ai |
Release history
bkit follows Semantic Versioning. All release notes live in CHANGELOG.md and are not duplicated here.
Made with AI by POPUP STUDIO