bkit

June 1, 2026 · View on GitHub

The only Claude Code plugin that verifies AI-generated code against its own design specs.

The 5-minute version is in README.md. This file is the deep one. It exists for people who want to know exactly what /sprint, /pdca, and /control do, which agent runs in which phase, how the 11 quality gates measure things, and how the architecture stops AI from drifting. Release history is in CHANGELOG.md and is not duplicated here.

License Claude Code Version Author

Requirement: bkit requires Claude Code v2.1.143 or later (the strict plugin-manifest path recognizes the official displayName field only from v2.1.143). On older Claude Code you will see Validation errors: Unrecognized key: "displayName" during claude plugin install. Run npm install -g @anthropic-ai/claude-code@latest to upgrade, or see docs/06-guide/cc-compatibility.guide.md.


Who this file is for

You are…Read this file because…
🌱 Vibe coder / non-developer who saw the README and asked "but how does it actually work?"Sections 0, 4.2, and 7.3 explain the moving parts in plain language with worked examples. You will not need to know Clean Architecture to follow them.
👤 Solo dev evaluating bkit before installingSections 2, 3, 5, 8 cover the commands, the quality-gate thresholds, and the agent rosters.
👥 Team lead thinking about adoptionSections 6, 7, 9 cover Trust Level governance, Sprint planning, and the 4-Layer architecture / Invocation Contract that keeps drift out of the codebase.
🛠️ Plugin author who wants to customise bkitSection 11 covers the project-local override pattern; sections 9 and 10 cover the contract surface.

What you'll achieve by working through this file

The bkit promise: anyone can ship robust, production-quality software with AI — even if you've never written a unit test or seen a CI pipeline. This document explains the underlying mechanics so you can trust the workflow when it runs unattended at Trust Level 4.

If you understand……you can rely on
Section 4 — Workflow Internalsbkit auto-running every PDCA phase. You'll know what happens in pm, plan, design, do, check, act, qa, report.
Section 5 — Quality Gatesbkit pausing before drift compounds. You'll know what M1, M3, M5, S1 measure and what bkit does when they fail.
Section 6 — Trust LevelSetting the autonomy dial confidently. You'll know exactly what L2 and L4 do differently.
Section 7 — Sprint ManagementReleases that span sessions and weeks. You'll know how the Context Sizer keeps each sprint within a single Claude session window.
Section 8 — Agent TeamsKnowing which specialist to call. You'll know why /pdca pm, /pdca team, /pdca qa spawn different rosters.

Table of Contents

  1. Plain-language Glossary
  2. Design Philosophy
  3. The Three Commands
  4. Command Cheat Sheet
  5. Workflow Internals
  6. Quality Gates & Self-Repair
  7. Trust Level & Control
  8. Sprint Management Deep-Dive
  9. Agent Teams
  10. Architecture
  11. Skill Evals
  12. Installation & Customization
  13. Requirements
  14. Language Support
  15. License & Contributing

0. Plain-language Glossary

The terms used in this file, explained for someone who is new to AI-coding.

TermWhat it really means
Claude CodeThe AI coding tool you talk to. Think of it as a coworker who lives in your terminal.
PluginExtra capabilities you bolt on to Claude Code. bkit is a plugin.
bkitThe plugin you're reading about. It adds 3 commands (/sprint, /pdca, /control), 44 skills, 34 specialist agents, 11 quality checks.
SkillA bundle of instructions Claude Code reads to remember "how do I run a PDCA cycle?" or "how do I plan a sprint?" When you type /pdca, a skill activates.
AgentA specialist version of Claude Code that knows one job very well — e.g. frontend-architect, qa-lead, gap-detector. The agent's job is in its name.
HookA piece of code that runs around AI actions. bkit's hooks intercept dangerous actions (deleting files, leaking secrets), log every step, and inject context. There are 21 hook events Claude Code fires.
ContextEverything the AI knows at the moment it decides what to write — your prompt, the file it just read, the rules it was given, its memory. AI quality is mostly about getting the right context to the AI at the right moment.
Context EngineeringA discipline that says: don't try to write the perfect prompt — build a system that gives the AI the right context every time. bkit is a Context Engineering system.
PDCAPlan → Do → Check → Act. A 70-year-old continuous improvement loop. bkit's 9-phase version is pm → plan → design → do → check → act → qa → report → archive.
SprintA container that groups multiple features under a shared budget and timeline. Each sprint runs through its own 8 phases. A sprint can contain many features, each going through PDCA.
Match rateA percentage measuring how much of your design spec actually appears in the generated code. 90 % means the AI did 90 % of what was specified. bkit auto-repairs anything below that.
Quality gateA hard stop that won't let the workflow advance until a measurable rule is satisfied. bkit has 11 of them.
Trust Level (L0–L4)The dial that decides how far the AI runs unattended. L0 = "ask me at every step", L4 = "wake me when it's done".
Auto-pause triggerA condition that pauses an automated run so a human can intervene. bkit has 4: quality gate fail, iteration exhausted, budget exceeded, phase timeout.
Audit logA permanent record of every important action bkit took, with sensitive data scrubbed out. Lives in .bkit/runtime/audit-log.ndjson.
MCPModel Context Protocol — Anthropic's standard way to plug data sources into Claude Code. bkit ships 2 MCP servers (bkit-pdca, bkit-analysis) exposing 19 tools.
Docs = Codebkit's principle that says "every feature must produce design docs, and the code must match those docs." A CI gate enforces 0 drift.

1. Design Philosophy

bkit is not a productivity hack. It brings engineering discipline to AI-native development.

The software industry refined how humans write code over decades — version control, code review, CI/CD, testing pyramids. When AI enters the loop, most of that discipline evaporates: developers prompt, accept, ship. Documentation becomes an afterthought. Quality becomes luck.

bkit exists because AI-assisted development deserves the same rigor as traditional engineering.

1.1 What we optimise for

We optimise forOverConcretely
ProcessOutputOne feature through proper planning + design + implementation + verification beats ten hacked-together features. The PDCA cycle is the product.
VerificationTrustAI generates plausible code. Plausible is not correct. Every implementation goes through gap analysis. Below 90 % match, the system iterates. We do not ship hope.
ContextPromptsA clever prompt helps once. A systematic context system helps every time. 44 skills + 34 agents + 190 lib modules exist so the AI receives the right context at the right moment.
ConstraintsFeaturesThree project levels, not infinite configuration. Fixed 9-phase PDCA and 8-phase Sprint, not a customizable workflow builder. Opinionated defaults eliminate decision fatigue.

"We do not offer a hundred features. We engineer each one through proper design and verification. That is the difference between a tool and a discipline."

1.2 The three core philosophies (the contract with you)

These come from bkit-system/philosophy/core-mission.md. Every line of bkit code is judged against them.

#PrincipleWhat it means for you, the user
1Automation FirstYou don't need to memorise PDCA. You don't need to know which skill to call. Describe what you want; the intent-router (lib/orchestrator/intent-router.js) picks the right skill or agent. The state machine drives the rest. Manual is the fallback, not the default.
2No Guessingbkit refuses to fabricate. If gap-detector is unsure, it reads the spec again. If still unsure, it asks you via AskUserQuestion — it does not invent. 11 quality gates + design-validator + code-analyzer enforce this.
3Docs = CodeEvery feature produces docs: PRD + plan + design + analysis + completion report. The docs are the contract; the code must match. scripts/docs-code-sync.js runs in CI and fails the build if the doc-side counts drift from the code-side counts.

1.3 Context Engineering — the deeper "why"

Most AI-coding tools focus on prompts. bkit focuses on context. The distinction is the entire reason bkit exists.

A prompt is a single message you send. Context is the entire information environment the AI is working inside — your conventions, your prior decisions, your design spec, the rules you set, the memory of last week's session, the audit trail of what changed.

"bkit is a practical implementation of Context Engineering — a systematic discipline for building, maintaining, and verifying the right context for AI-assisted development."bkit-system/philosophy/context-engineering.md

Symptom of bad contextWhat goes wrongHow bkit fixes it
AI hallucinates names, paths, or APIs that don't existThe AI has no real grounding in your codeSkills auto-load the right grounding (PDCA rules, Sprint state, your style guide). 44 skills, 8-language auto-trigger.
AI is confident but wrongThe AI doesn't know what "correct" means in your projectgap-detector measures match rate against the design spec; code-analyzer checks quality scores. Wrong is detected, not assumed away.
AI loses focus halfway throughContext window overflows in long sessionsSprint Management splits work into context-budgeted chunks (≤ 75 K tokens each). Memory + Task Management survive session clears.
You ship and then discover the bugNo verification was ever run11 quality gates run between phases. Failing a gate auto-pauses.
You ship and the new dev can't tell what changedNo audit trailAudit log + Token Ledger + Docs = Code keep a permanent record.

1.4 Controllable AI — the principles behind /control

These come from AI-NATIVE-DEVELOPMENT.md. They are why bkit ships a Trust Level dial instead of "AI off / AI on".

PrincipleWhat it gives you
Safe defaultsOut of the box, bkit asks before doing anything destructive. Trust Level starts at L2 (Semi-Auto).
Progressive trustAs your matchRate track record improves, bkit's Trust Score can recommend a higher level — but never silently.
Full visibilityEvery phase emits an audit entry. /sprint status, /sprint watch, /pdca status show you the current state at any moment.
Always interruptibleCtrl+C cancels. /sprint pause halts. pdca-iterator auto-stops after 5 cycles. 4 auto-pause triggers guarantee the run never gets away from you.

2. The Three Commands

Everything else in bkit — 44 skills, 34 agents, 21 hooks, 11 quality gates, 226+ contract assertions — exists to make these three commands work reliably.

CommandOne-line purposeWhen you use it
/sprintGroup multiple features into a release container, plan them, and run them.Quarter launch, milestone, 2+ linked features sharing scope/budget/timeline
/pdcaDrive a single feature from PRD to release-ready report through 9 phases.A single feature, or inside a sprint (the orchestrator calls /pdca per feature automatically)
/controlOne dial setting how much of /sprint and /pdca runs unattended.Anytime. The setting persists.
flowchart TB
    You(["You"])
    You --> SM["/sprint master-plan my-release<br/>--features auth,billing,reports"]
    SM --> Auto1["Master plan + task registration"]
    Auto1 --> SS["/sprint start sprint-1"]
    SS --> Auto2["Sprint 8-phase. Inside 'do':<br/>PDCA 9-phase per feature"]
    Auto2 --> Gate{"matchRate &ge; 90%?"}
    Gate -- "no — auto-fix" --> Iter["pdca-iterator (max 5)"]
    Iter --> Gate
    Gate -- "yes" --> Done(["Release-ready"])
    Ctrl["/control level 0..4"] -.->|"scopes auto-run"| Auto1
    Ctrl -.->|"applies"| Auto2

    style You fill:#e3f2fd
    style SM fill:#fff3e0
    style SS fill:#fff3e0
    style Auto2 fill:#fce4ec
    style Iter fill:#ffe0b2
    style Done fill:#c8e6c9
    style Ctrl fill:#f3e5f5

The sprint user journey, step by step

StepWhat you typeWhat bkit doesOutput
1. Plan/sprint master-plan my-release --name "Q2 Launch" --features auth,billing,reportssprint-master-planner writes a Context-Anchor-driven master plan. The Context Sizer (Kahn topological + greedy bin-packing) splits features into ≤ 75 K-token sprints honoring dependencies.docs/01-plan/features/my-release.master-plan.md + per-sprint prd / plan / design templates; .bkit/state/master-plans/my-release.json; audit entry master_plan_created
2. Register(automatic in Step 1)Each sprint in plan.sprints[] becomes a task via TaskCreate. Cross-sprint dependsOn becomes task blockedBy.One task per sprint, Kahn-ordered. Visible via /sprint list or TaskList
3. Execute/sprint start my-release-s1sprint-orchestrator advances the 8-phase sprint. Inside do, PDCA 9-phase runs once per featurepm-lead PRD, cto-lead team spawn, gap-detector measure, pdca-iterator repair, qa-lead test, report-generator summarize.Per-feature artifacts under docs/00-pm/..., docs/01-plan/features/..., docs/02-design/features/..., docs/04-report/features/...; sprint state .bkit/state/sprints/my-release-s1.json
4. Govern/control level 0..4 (anytime)The dial scopes how far both Sprint and PDCA phases auto-advance before stopping. Trust Score (0–100) can also recommend a level from your track record..bkit/state/trust-profile.json; effective scope mirrored into lib/control/automation-controller.js:SPRINT_AUTORUN_SCOPE (L3 contract test SC-07 enforces the 1:1 mirror)

Single feature shortcut: skip steps 1–2 and run /pdca pm <feature> directly. Step 4 still applies.

2.1 Why each step exists — the worked detail

The 4 steps map directly to the user experience the bkit author had in mind. Each one is a deliberate answer to a specific failure mode of AI-assisted development.

Step 1 — Plan the release in depth, not in haste.

When you type /sprint master-plan my-release --features auth, billing, reports, bkit does not rush to a plan. The sprint-master-planner agent runs deliberately:

  • If your repo is non-trivial, it explores your existing code (Task(Explore)) before assuming anything about your conventions.
  • If your feature involves a domain it doesn't already know (e.g., a payment provider you haven't integrated before), it does web research first.
  • Then it splits your features into context-budgeted Sprints using:
    • Kahn topological sort — to respect "billing depends on auth" type relationships.
    • Greedy bin-packing — to keep each sprint ≤ 75 K tokens, so a single Claude Code session can finish the sprint without running out of context.
  • Finally it writes a master plan with a Context Anchor (WHY / WHO / WHAT / RISK / SUCCESS / SCOPE) that every later phase reads.

You can also call specialist agents directly when you want more depth for a particular feature:

CommandWhat it spawnsOutput
/pdca pm <feature>4 PM agents in parallel (pm-discovery + pm-strategy + pm-research + pm-prd) using 43 product-management frameworks (JTBD, Lean Canvas, SWOT, PESTLE, Porter's, Pre-mortem, Personas, TAM/SAM/SOM, …)A comprehensive PRD at docs/00-pm/<feature>.prd.md
/pdca team <feature>A multi-specialist implementation team led by cto-lead — 4–6 agents in parallel (developer · qa · frontend · backend · security · architect)Code + reviews from multiple perspectives
/pdca qa <feature>5-agent QA team led by qa-lead — test-planner · test-generator · debug-analyst · qa-monitor with Zero Script QA (Docker log analysis)Full L1–L5 test plan, generated tests, runtime verification

Step 2 — Approve, register, remember.

After you approve the master plan, bkit takes three durable actions:

  1. Task Management registration — every sprint becomes a task via the system's TaskCreate tool. The sprint dependency order becomes task blockedBy order. You can see them via /sprint list or via Claude Code's TaskList.
  2. Memory persistence — the sprint roster is written to .bkit/state/memory.json. If your laptop dies, your session clears, or you start a new Claude Code session next month, the plan is still there.
  3. Audit entry — an master_plan_created action lands in .bkit/runtime/audit-log.ndjson.

The combination means bkit picks up exactly where it stopped, every single time.

Step 3 — Auto-run each sprint through PDCA.

/sprint start sprint-1 advances the 8-phase sprint (prd → plan → design → do → iterate → qa → report → archived). Inside the do phase, the orchestrator runs the full PDCA 9-phase loop once per feature. The default targets are aggressive:

  • iterate phase targets matchRate = 100 % (will accept ≥ 90 %, controlled by gate M1)
  • qa phase requires all 11 gates pass, including S1 dataFlow integrity ≥ 85 %
  • Below threshold → pdca-iterator auto-fires, up to 5 self-repair cycles

Critically, every transition is gated (see §5). The workflow can't accidentally advance past a failure.

Step 4 — Govern with one dial.

/control level N is the autonomy knob. The same setting governs both Sprint and PDCA — there's no second knob to forget. The dial maps to a stopAfter phase (see §6). Trust Score (0–100) can recommend a level from your track record, but you stay in charge: autoEscalation / autoDowngrade flags in bkit.config.json decide whether bkit may move the dial on its own.

Hook-driven invisible execution: while you read the master plan, Claude Code's 21 hook events are quietly firing — PreToolUse blocks unsafe operations, PostToolUse logs every action, SessionStart restores memory, Stop writes the closing audit entry. You never invoke a hook directly; bkit attaches them automatically through hooks/hooks.json (24 blocks across 21 events). See §4.3 for the lifecycle map.

8-language auto-trigger: skills and agents declare keywords in 8 languages (EN, KO, JA, ZH, ES, FR, DE, IT). If you type "로그인 기능 만들어줘", "作成一个登录功能", or "build a login feature" — bkit's intent-router maps to the same skill. You never need to know the English command name.


3. Command Cheat Sheet

Sprint (16 sub-actions)

Sub-actionPurpose
init <id>Create a sprint manually (without a master plan)
master-plan <project> --features ...Auto-write the master plan + register every sprint as a task
start <id>Run the sprint up to the Trust Level scope
status <id>Current state + triggers + matrix snapshot
listAll sprints with phase + status
watch <id>Live dashboard, ticks every 30 s
phase <id> --to <phase>Manual phase transition
iterate <id>matchRate-100 % loop (max 5)
qa <id>7-Layer S1 dataFlow integrity check
report <id>Cumulative KPI + lessons learned
archive <id>Move to terminal state (forward-only)
pause <id> / resume <id>Stop / restart auto-run
fork <id> --new <newId>Carry incomplete features into a new sprint
feature <id> --action list/add/remove --feature <name>Manage features inside the sprint
helpHelp text

PDCA (single feature, 9 phases + utilities)

Sub-actionPurposeSpawned agents
pm <feat>4 PM agents in parallel → PRD with 43 frameworkspm-lead · pm-discovery · pm-strategy · pm-research · pm-prd
plan <feat>Plan with Context Anchor + Module Mapproduct-manager
design <feat>3 architecture options (Minimal / Clean / Pragmatic)cto-lead · frontend-architect · security-architect
do <feat>Implementation (single-agent mode)bkend-expert · frontend-architect
team <feat>4–6 specialists in parallel (recommended for do)cto-lead orchestrates developer · qa · frontend · security · architect
check <feat>Design ↔ code gap analysisgap-detector
iterate <feat>Auto-fix sub-90 % matchpdca-iterator
qa <feat>L1–L5 test executionqa-lead · qa-test-planner · qa-test-generator · qa-debug-analyst · qa-monitor
report <feat>KPI + lessons learnedreport-generator
archive <feat>Move docs to archive + state cleanup
statusCurrent PDCA state across features
cleanupRemove stale features (idle > 7 d)
watchLive dashboard

Control & utilities

CommandPurpose
/control level 0..4Set autonomy (applies to /sprint + /pdca)
/control statusCurrent Trust Level + Trust Score
/bkitList skills, agents, commands
/bkit-exploreBrowse component tree (5 categories)
/pdca-batchIndependent parallel PDCA cycles (no shared scope)

4. Workflow Internals

4.1 What each PDCA phase does (without you)

PDCA is bkit's 9-phase loop for a single feature. Each phase has a definite output written to disk — that's the "Docs = Code" principle in action. You can stop after any phase, inspect the output, and resume.

PhaseWhat bkit auto-runsWhere the result landsWhy this phase exists
pm (product management)pm-lead spawns 4 PM agents in parallel: discovery (Opportunity Solution Tree + Brainstorm + Assumption Risk) · strategy (JTBD + Lean Canvas + SWOT + PESTLE + Porter's + Growth Loops) · research (Personas + Competitors + TAM/SAM/SOM + Journey Map + ICP) · prd (Pre-mortem + User/Job Stories + Test Scenarios + Stakeholder Map + Battlecards)docs/00-pm/<feature>.prd.mdSo the AI knows who this feature serves, why, and what success looks like — before writing a line of code.
planproduct-manager writes the plan with a Context Anchor (WHY / WHO / WHAT / RISK / SUCCESS / SCOPE) + Module Map + Session Guidedocs/01-plan/features/<feature>.plan.mdThe Context Anchor is what every later phase reads. It is the single source of truth for intent.
designcto-lead proposes 3 architecture options (Minimal / Clean / Pragmatic). Single AskUserQuestion pause for the choice.docs/02-design/features/<feature>.design.mdThree options force a real trade-off discussion. You pick one; bkit honours it for the rest of the cycle.
do (single-agent mode)developer / bkend-expert / frontend-architect writes the code on its ownSource filesBest for small, contained features where one specialist is enough.
do (team mode, via /pdca team)cto-lead spawns 4–6 specialists in parallel: developer · qa · frontend · backend · security · architect. Sequential dispatch enforced under ENH-292 to dodge sub-agent caching regressions.Source files + per-agent review notesThe default for non-trivial work. Multiple perspectives catch bugs single specialists miss.
checkgap-detector measures design ↔ code match ratedocs/03-analysis/<feature>.analysis.mdVerifies that what got built matches what was designed. No fabricated progress reports — the percentage is measured, not asserted.
actmatchRate ≥ 90 % → advance. < 90 % → pdca-iterator (Evaluator-Optimizer pattern, max 5 cycles)Iteration log appended to analysis docSelf-repair. The phase where bkit fixes drift before you ever see it.
qaqa-lead orchestrates 4 QA agents: qa-test-planner (L1–L5 plan) · qa-test-generator (test code) · qa-debug-analyst (runtime errors) · qa-monitor (Zero Script QA via Docker logs)docs/05-qa/<feature>.qa.md + actual test filesL1 unit · L2 integration · L3 contract · L4 system · L5 E2E. The full pyramid.
reportreport-generator produces a completion report with KPI + lessons learned + carry itemsdocs/04-report/features/<feature>.report.mdThe audit trail. Next sprint planner reads this to learn from this one.
archiveCheckpoint preserved, state cleaned, MEMORY.md appended.bkit/state/ + docs/archive/Closes the loop. The feature is done; the docs are searchable forever.

Beginner note: you almost never type /pdca check or /pdca act yourself. The orchestrator runs them automatically when the previous phase ends. The only phase that pauses for input (under the default Trust Level L2) is design, where you pick one of the three architecture options.

4.2 Live scenario — Trust L4, autoIterate=true

A realistic 60-minute run with one user input:

10:00  /pdca pm user-auth
       └─ pm-lead spawns 4 PM agents in parallel (43 frameworks)
10:08  PRD complete · auto-advance
10:12  /pdca plan (auto)  → product-manager → Context Anchor written
10:18  /pdca design (auto) → cto-lead → 3 architecture options
       Checkpoint AskUserQuestion: "Minimal / Clean / Pragmatic?"   [1 USER INPUT]
10:20  Design confirmed · auto-advance
10:20  /pdca team (auto) → cto-lead spawns 4 specialists
10:45  Implementation complete · auto-advance to check
10:45  /pdca check (auto) → gap-detector → matchRate = 78 %
       M1 FAIL (78 < 90) → AUTO-TRIGGER /pdca iterate
10:48  Cycle 1: pdca-iterator patches 7 gaps → re-measure 89 %
10:50  Cycle 2: patches 3 more → re-measure 94 % ✅ EXIT
10:50  /pdca qa (auto) → qa-lead → 4 QA agents L1–L5
10:58  QA PASS · auto-advance
10:58  /pdca report (auto) → report-generator → completion report
11:00  Feature complete.
       Total: 60 min · 1 user input · 4–6 parallel agents · 2 self-repair cycles

4.3 Claude Code hooks lifecycle — what runs around every AI action

You don't invoke hooks. They run automatically because bkit attaches them via hooks/hooks.json (24 hook blocks across 21 hook events). Hooks are what make bkit's safety net invisible: you never have to remember to verify, log, or block — Claude Code fires the events, bkit responds.

flowchart TD
    A["You type a message"] --> B["UserPromptSubmit hook<br/>intent-router · skill auto-trigger"]
    B --> C["AI decides on a tool call"]
    C --> D["PreToolUse hook<br/>unified-bash-pre.js · unified-write-pre.js<br/>SAFETY: block unsafe ops"]
    D -- "allowed" --> E["Tool executes (Write, Bash, MCP, …)"]
    E --> F["PostToolUse hook<br/>unified-bash-post.js · unified-write-post.js · skill-post.js<br/>AUDIT: log, redact secrets"]
    F --> G["AI continues / replies"]
    G --> H["Stop hook<br/>unified-stop.js · next-action-engine"]
    H --> I["Session ends or continues"]
    I -.->|"new session"| J["SessionStart hook<br/>memory restore"]
    D -- "blocked" --> X["AskUserQuestion fallback"]

    style D fill:#ffe0b2
    style F fill:#fce4ec
    style H fill:#e3f2fd
    style X fill:#ffcdd2
Hook eventbkit scriptWhat it stops or does
SessionStarthooks/session-start.jsRestore memory; print Trust Level banner; warn about unfinished sprints
UserPromptSubmitscripts/user-prompt-handler.jsRoute intent; auto-trigger the right skill (/pdca pm, /sprint, …) based on 8-language keywords
PreToolUse(Bash)scripts/unified-bash-pre.jsBlock dangerous shell (rm -rf /, curl-to-shell, fork-bombs, …) before they run
PreToolUse(Write/Edit)scripts/unified-write-pre.jsBlock writes to protected paths; verify the file is in the active sprint scope
PreToolUse(Skill)scripts/skill-pre.jsInject skill context; verify the skill is allowed at the current Trust Level
PostToolUse(Bash)scripts/unified-bash-post.jsAudit-log the command + exit code; scrub 7 PII patterns
PostToolUse(Write/Edit)scripts/unified-write-post.jsUpdate docs-code index; recompute drift
PostToolUse(Skill)scripts/skill-post.jsEmit skill-completion telemetry
Stop / SubagentStop / SessionEndscripts/unified-stop.jsFinal audit entry; commit memory; fire the next-action-engine if a Sprint/PDCA phase wants to chain
PreCompactscripts/context-compaction.jsPersist context before Claude Code compresses the conversation; defends /compact regressions (e.g., #47855 Opus 1M block)

The point is: safety is not your job. You describe the work; bkit's hooks enforce the invariants.

4.4 8-language auto-trigger — type in your language, bkit picks the command

Skills and agents declare trigger keywords in 8 languages. The intent-router matches your input against all of them.

LanguageSample triggerSkill it activates
English"build login feature"/pdca pm
Korean (한국어)"로그인 기능 만들어줘"/pdca pm
Japanese (日本語)"ログイン機能を作って"/pdca pm
Chinese (中文)"创建一个登录功能"/pdca pm
Spanish (Español)"crear una función de inicio de sesión"/pdca pm
French (Français)"créer une fonction de connexion"/pdca pm
German (Deutsch)"Anmeldefunktion erstellen"/pdca pm
Italian (Italiano)"creare una funzione di accesso"/pdca pm

You never need to know the English command name. You also never need to remember which of the 44 skills matches your intent — the keyword detection does that for you.

4.5 When to use which specialist team

Three subcommands of /pdca spawn different rosters. Pick by what you need now, not by phase name.

You want…UseRoster
Deep product analysis — personas, market sizing, JTBD, pre-mortem before any code/pdca pm <feature>pm-lead orchestrates pm-discovery + pm-strategy + pm-research + pm-prd (4 agents, 43 frameworks)
Parallel implementation by multiple specialists — frontend, backend, QA, security all working at once/pdca team <feature>cto-lead orchestrates 4–6 specialists (developer · qa · frontend · backend · security · architect)
Thorough QA — full L1–L5 test plan, generated tests, runtime verification/pdca qa <feature>qa-lead orchestrates qa-test-planner + qa-test-generator + qa-debug-analyst + qa-monitor (4 agents)
Sprint-level meta-orchestration — multi-feature plan, dependency order, budget/sprint master-plan <project>sprint-master-planner (uses Context Sizer)
One-shot single-feature run/pdca pm <feature> then auto-advance under Trust LevelThe orchestrator picks the agent per phase

5. Quality Gates & Self-Repair

A quality gate is a hard stop that won't let the workflow advance until a measurable condition is true. It's the mechanical version of a code reviewer who refuses to merge bad work — except this reviewer never sleeps, never has a bad day, and never lets a metric slide because it's Friday.

Every phase transition is gated. Failure pauses the run and writes an audit entry. You don't have to remember to verify — verification is automatic.

GateThresholdTriggered whenOn failure
M1 matchRate≥ 90 %check phase endspdca-iterator auto-fires (Evaluator-Optimizer, max 5 cycles)
M2 codeQualityScore≥ 80post-docode-analyzer re-runs, user confirmation requested
M3 criticalIssue count0post-doImmediate pause, user escalation
M4 conventionCompliance≥ 90 %post-doLint auto-fix attempted
M5 testCoverage≥ 70 %post-qaqa-test-generator adds tests
M6 securityScore≥ 85post-dosecurity-architect review
M7 documentationCompleteness≥ 90 %post-reportAuto-doc generation
M8 sprint matchRate≥ 85 %sprint iterate phaseSprint iterate loops (max 5)
M9 contractInvariant0 violationCI gateBuild blocked
M10 regressionGuard0 new regressionpost-iterateregression-registry registers + monitors
S1 dataFlowIntegrity≥ 85 %sprint qa phase7-Layer hop re-verified (UI → Client → API → Validation → DB → Response → Client → UI)

Thresholds live in bkit.config.json. Sprint-specific overrides via sprint.config.{...} at sprint init.

What each gate protects you from — in plain language

GateWhat it preventsA failure looks like
M1 matchRateThe AI claiming "done" when half the design is missing"We asked for login + signup + reset; AI shipped login only" → matchRate = 33 %, pdca-iterator fires
M2 codeQualityScoreCode that runs but is unreadable / unmaintainableLong functions, deep nesting, magic numbers → 78 / 100 score, review re-runs
M3 criticalIssue countBugs that will break productionHardcoded credentials, SQL injection, null deref → workflow pauses immediately
M4 conventionComplianceFiles that don't follow your existing styleWrong indentation, mixed quotes, broken imports → lint auto-fix runs
M5 testCoverageCode with no tests at allNew module ships with 0 % coverage → qa-test-generator adds tests
M6 securityScoreOWASP Top-10 type vulnerabilitiesMissing input validation, unsafe deserialization → security-architect reviews
M7 documentationCompletenessCode that nobody can pick up next quarterAPI endpoint with no description → auto-doc fills gap
M8 sprint matchRateSprint declared done with one feature half-builtOne of three features stuck at 70 % → sprint iterate loops, max 5
M9 contractInvariantArchitecture decisions silently violatedSomeone imports fs into the Domain layer → CI build blocked
M10 regressionGuardA new bug that fixes anotherSame test fails again after iterate → tracked, monitored
S1 dataFlowIntegrityFront-end form that posts but back-end never receives itUI → Client → API hop count check fails → 7-Layer trace runs

Worked example: a failure and the auto-repair

Scenario: you ask for a "user-auth" feature. The AI implements login but forgets the password-reset spec line.

  1. /pdca check runs. gap-detector reads design.md (3 specs: login, signup, reset) and the generated code (2 implemented).
  2. matchRate = 67 %. M1 fails (threshold 90 %).
  3. pdca-iterator auto-fires — no user input needed.
  4. Cycle 1 — iterator reads the missing spec, adds the password-reset module → re-measure → 91 %.
  5. M1 passes → workflow auto-advances to qa.

You only learn this happened from the audit log. The AI fixed its own bug before you saw it.

The self-repair loop

sequenceDiagram
    autonumber
    participant U as You
    participant P as /pdca check
    participant GD as gap-detector
    participant PI as pdca-iterator
    participant Code as Your code

    U->>P: /pdca check my-feature
    P->>GD: measure design ↔ code
    GD-->>P: matchRate = 72%
    P->>PI: AUTO-FIRE (no user input)
    PI->>Code: Cycle 1 — patch 7 missing items
    PI->>GD: re-measure
    GD-->>PI: 84%
    PI->>Code: Cycle 2 — patch 3 more
    PI->>GD: re-measure
    GD-->>PI: 91% ✅
    PI-->>P: EXIT (90% threshold met)
    P-->>U: ✅ check passed, advance to qa

/pdca iterate is not a button you press. gap-detector detects sub-90 → pdca-iterator fires automatically. If the 5th cycle still fails, ITERATION_EXHAUSTED auto-pauses the sprint and escalates to you.


6. Trust Level & Control

/control level N is the single autonomy dial. It scopes how far /sprint and /pdca run before stopping — one knob, both surfaces.

LevelNamestopAfterPick when
L0Manualevery phaseFirst-time user; inspect each output
L1GuidedplanVerify scope before AI implements
L2Semi-AutodoDefault — Plan/Design/Do auto, QA/Report manual
L3AutoqaTrust implementation, double-check QA
L4Full-AutoarchivedFire-and-forget; pauses only on quality-gate failure or auto-pause trigger

Trust Score (0–100)

bkit computes a Trust Score from your recent track record (matchRate history, manual-override frequency, gate-pass rate). High scores can auto-escalate the level; low scores auto-downgrade. Override anytime with /control level N.

Trust ScoreEffect
≥ 80pdca-fast-track available — auto-approves Checkpoints 1–8
60–79Defaults to L2 (Semi-Auto)
< 60Defaults to L1 (Guided)

autoEscalation and autoDowngrade flags in bkit.config.json:automation decide whether bkit may change the level on its own.

Which level should I pick?

If you're new to bkit, start low and earn trust. The Trust Score climbs automatically as your sprint history accumulates clean matchRate runs; bkit can offer to upgrade you when it's safe.

SituationRecommended level
Day 1 with bkit, you want to see what each phase producesL0 Manual — inspect every output
You're confident the spec is right but want to review architecture choicesL1 Guided — stop after Plan
Daily driver — you trust planning and design but want to check QA yourselfL2 Semi-Auto (default)
You've shipped a few features with bkit and the matchRate is consistently > 95 %L3 Auto — only Report is manual
Overnight run, fire-and-forget, you'll review the report tomorrowL4 Full-Auto — pauses only on a quality-gate failure or auto-pause trigger

The dial is reversible. Drop it back down whenever you want; the next phase respects the new setting.


7. Sprint Management Deep-Dive

7.1 The 8-phase sprint lifecycle

flowchart LR
    prd --> plan --> design --> do
    do --> iterate
    iterate -- "matchRate &lt; 100%, max 5" --> iterate
    iterate --> qa
    qa --> report --> archived

    style prd fill:#e3f2fd
    style iterate fill:#ffe0b2
    style qa fill:#fce4ec
    style report fill:#e8f5e9
    style archived fill:#c8e6c9
PhaseOutputAgent
prddocs/00-pm/<sprint>.prd.mdsprint-master-planner
plandocs/01-plan/features/<sprint>.plan.mdsprint-master-planner
designdocs/02-design/features/<sprint>.design.mdsprint-master-planner
doPer-feature PDCA cycles run insidesprint-orchestrator
iteratedocs/03-analysis/<sprint>.iterate.md (per cycle)pdca-iterator (delegated)
qadocs/05-qa/<sprint>.qa.md (7-Layer S1)sprint-qa-flow
reportdocs/04-report/features/<sprint>.report.mdsprint-report-writer
archivedTerminal state; sprint state preserved

7.2 The 4 auto-pause triggers

A sprint pauses automatically on any of these. Resume with /sprint resume <id> after fixing the root cause.

TriggerConditionMost common cause
QUALITY_GATE_FAILAny M-gate or S1 failsmatchRate stuck below 90 % after iterate exhausts
ITERATION_EXHAUSTEDiterate phase exceeds 5 cyclesGap too large to auto-fix; needs human intervention
BUDGET_EXCEEDEDToken usage > sprint budget (default 1 M)Feature scope underestimated
PHASE_TIMEOUTPhase exceeds timeout (default 4 h)Hung or blocked

7.3 Sprint vs PDCA vs pdca-batch — pick one

flowchart TD
    Q["What are you building?"]
    Q --> Single{"Single feature?"}
    Single -- "Yes" --> PDCA["/pdca pm feature<br/>9-phase per feature"]
    Single -- "No" --> Shared{"Shared scope/<br/>budget/timeline?"}
    Shared -- "Yes" --> Sprint["/sprint master-plan project<br/>8-phase container,<br/>PDCA runs inside"]
    Shared -- "No" --> Batch["/pdca-batch<br/>independent parallel cycles"]
    Q --> Static{"Non-technical<br/>or static-only?"}
    Static -- "Yes" --> Starter["/starter init<br/>no PDCA needed"]

Deep-dive guide: docs/06-guide/sprint-management.guide.md. PDCA ↔ Sprint migration mapping: docs/06-guide/sprint-migration.guide.md.


8. Agent Teams

bkit ships 34 agents organised into specialist teams. Three teams matter most for the daily workflow:

8.1 PM Agent Team — /pdca pm <feature>

Runs before the Plan phase to produce a comprehensive PRD via automated product discovery. Based on pm-skills by Pawel Huryn (MIT).

flowchart LR
    PM["pm-lead (opus)"] --> D["pm-discovery (sonnet)"]
    PM --> S["pm-strategy (sonnet)"]
    PM --> R["pm-research (sonnet)"]
    D --> PRD["pm-prd (sonnet)"]
    S --> PRD
    R --> PRD
    PRD --> Out["docs/00-pm/feature.prd.md"]

8.2 CTO-Led Team — /pdca team <feature>

Parallel implementation with multiple specialists.

flowchart TB
    CTO["cto-lead (opus)"]
    CTO --> FE["frontend-architect"]
    CTO --> BE["bkend-expert / enterprise-expert"]
    CTO --> QA["qa-strategist"]
    CTO --> SEC["security-architect"]
    CTO --> PM["product-manager"]
    CTO -. "Enterprise +1" .-> Arch["infra-architect"]
LevelTeammatesDefault roster
Dynamic3developer · qa · frontend
Enterprise5architect · developer · qa · reviewer · security
Enterprise + Sprint (v2.1.13)6+ sprint-orchestrator

Requirements: CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 + Claude Code v2.1.32+.

8.3 QA Lead Team — /pdca qa <feature>

flowchart LR
    QL["qa-lead"] --> TP["qa-test-planner<br/>L1-L5 plan"]
    QL --> TG["qa-test-generator<br/>test code"]
    QL --> DA["qa-debug-analyst<br/>runtime errors"]
    QL --> QM["qa-monitor<br/>Zero Script QA"]

8.4 Sprint Team — added in v2.1.13

AgentRole
sprint-master-plannerWrites Context-Anchor-driven master plan; invokes Context Sizer
sprint-orchestratorAdvances sprint 8 phases; spawns PDCA per feature in do
sprint-qa-flowRuns 7-Layer S1 dataFlow integrity check
sprint-report-writerAggregates phase + iterate history + KPI + lessons learned

9. Architecture

9.1 Clean Architecture 4-Layer

flowchart TB
    subgraph Presentation
        hooks["hooks/ (21 events / 24 blocks)"]
        scripts["scripts/ (51 Node.js)"]
        skills["skills/ (44)"]
        agents["agents/ (34)"]
    end
    subgraph Infrastructure
        infra["lib/infra/ (cc-bridge · telemetry · docs-code-scanner · mcp-port-registry · sprint · …)"]
    end
    subgraph Application
        app["lib/application/ (pdca-lifecycle · sprint-lifecycle · …)"]
        ccr["lib/cc-regression/"]
        team["lib/team/"]
    end
    subgraph Domain
        domain["lib/domain/ (18 modules · 0 forbidden imports · CI-enforced)"]
    end
    Presentation --> Infrastructure
    Infrastructure --> Application
    Application --> Domain

9.2 Component inventory (v2.1.13, measured 2026-05-12)

SurfaceCountNotes
Skills44+sprint added v2.1.13
Agents34+4 sprint agents added v2.1.13 (sprint-master-planner · sprint-orchestrator · sprint-qa-flow · sprint-report-writer)
Hook events / blocks21 / 24Invariant maintained
MCP servers / tools2 / 19+3 sprint tools (bkit_sprint_list · bkit_sprint_status · bkit_master_plan_read)
Lib modules / subdirs190 / 22+lib/application/sprint-lifecycle/ (13 modules) + lib/infra/sprint/ (9 modules)
Scripts51+sprint-handler.js (660 LOC) + sprint-memory-writer.js (138 LOC)
Templates39+7 sprint templates
Test files / cases118+ / 4,000++tests/contract/v2113-sprint-contracts.test.js (10 SC contracts)
ACTION_TYPES20+sprint_paused · sprint_resumed · master_plan_created · task_created
CATEGORIES11+sprint
Port↔Adapter pairs7cc-payload · state-store · regression-registry · audit-sink · token-meter · docs-code-index · mcp-tool

9.3 Defense-in-Depth 4-Layer

flowchart LR
    User["User input"] --> L1["Layer 1: CC built-in sandbox"]
    L1 --> L2["Layer 2: bkit PreToolUse hooks<br/>pre-write.js · unified-bash-pre.js<br/>defense-coordinator"]
    L2 --> L3["Layer 3: audit-logger sanitizer<br/>OWASP A03/A08 · 7-key PII redaction"]
    L3 --> L4["Layer 4: Token Ledger NDJSON<br/>.bkit/runtime/token-ledger.ndjson"]
    L4 --> Action["Tool execution"]

9.4 3-Layer Orchestration

flowchart TB
    user["User message"]
    user --> IR["intent-router<br/>feature > skill > agent priority"]
    IR --> NA["next-action-engine<br/>Stop-family hook routing"]
    NA --> TP["team-protocol<br/>PM / CTO / QA Lead Task spawn"]
    NA --> WSM["workflow-state-machine<br/>matchRate SSoT 90"]
    TP --> Agents["34 agents"]
    WSM --> Phases["PDCA + Sprint phases"]

9.5 Invocation Contract L1–L5

LevelWhatCountWhere
L1Contract baseline JSON94tests/contract/baseline.json
L2Hook attribution smoke98 TCtests/integration/hooks/
L3MCP stdio runtime42 TCtests/contract/l3-mcp-stdio.test.js
L3 (v2.1.13)Sprint cross-sprint contracts10 TC (SC-01~10)tests/contract/v2113-sprint-contracts.test.js
L5E2E shell scenarios5tests/e2e/run-all.sh

CI gate contract-check.yml enforces 226+ assertions.


10. Skill Evals

bkit extends Claude Code's Skill Evals into a complete skill lifecycle management system: "are my skills still worth keeping?"

10.1 Three layers over native evals

LayerClaude Code nativebkit enhancement
Eval executionBasic runnerevals/runner.js with benchmark mode + 29 eval definitions
A/B testingNot availableevals/ab-tester.js compares skill performance across models
ClassificationNot availableAll 44 skills classified Workflow / Capability / Hybrid with deprecation-risk scoring

10.2 Skill classification

ClassCountPurposeWhat evals measure
Workflow17Process automation (PDCA, pipelines)Regression — these skills are permanent
Capability18Model ability augmentationParity testing — can the model match the skill's output without it?
Hybrid1Both process + capabilityBoth regression and parity

When a model upgrade makes a Capability skill redundant, the Model Parity Test detects it:

# Does the model produce equivalent results without this skill?
node evals/ab-tester.js --parity phase-3-mockup --model claude-opus-4-7

# Compare skill performance between two models
node evals/ab-tester.js --skill pdca --modelA claude-sonnet-4-6 --modelB claude-opus-4-7

# Run all 29 skill evaluations
node evals/runner.js --benchmark

Philosophy: bkit's third principle is No Guessing. Skill Evals replace intuition with measurement.


11. Installation & Customization

# Step 1: Add bkit marketplace
/plugin marketplace add popup-studio-ai/bkit-claude-code

# Step 2: Install bkit plugin
/plugin install bkit

# Step 3: (Optional) Enable Agent Teams
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
PluginBest for
bkitFull PDCA methodology + Sprint Management for experienced developers
bkit-starterKorean learning guide for first-time Claude Code users

11.2 Customization (project-local overrides)

Claude Code searches in this priority order:

  1. Project .claude/ (your customizations — highest priority)
  2. User ~/.claude/
  3. Plugin installation (default)
# Step 1: Find the plugin installation
ls ~/.claude/plugins/bkit/

# Step 2: Copy only the file you want to customize
mkdir -p .claude/skills/starter
cp ~/.claude/plugins/bkit/skills/starter/SKILL.md .claude/skills/starter/

# Step 3: Edit; your version overrides the plugin's

Full guide with platform paths + license attribution: CUSTOMIZATION-GUIDE.md.

⚠️ CC v2.1.113+ Users — ~/.claude/skills/ may be silently deleted on first run (#51234). bkit plugin itself is unaffected (uses ${CLAUDE_PLUGIN_ROOT}/skills/). Back up user custom skills before upgrading.


12. Requirements

RequirementMinimumRecommendedNotes
Claude Codev2.1.78v2.1.150 (conservative) · v2.1.159 (balanced)112 consecutive compatible releases since v2.1.34
Node.jsv18+Hook script execution
Agent Teams (optional)CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1Required for /pdca team

Troubleshooting: If you see "Failed to load hooks" after install, run claude update.


13. Language Support

bkit auto-detects 8 languages from trigger keywords:

LanguageTrigger sample
Englishstatic website, beginner, API design
Korean정적 웹, 초보자, API 설계
Japanese静的サイト, 初心者, API設計
Chinese静态网站, 初学者, API设计
Spanishsitio web estático, principiante
Frenchsite web statique, débutant
Germanstatische Webseite, Anfänger
Italiansito web statico, principiante

Set your reply language with language in .claude/settings.json:

{ "language": "korean" }

Trigger keywords work in any language regardless of the reply setting.


14. License & Contributing

LicenseApache 2.0 · LICENSE · NOTICE (required for redistribution)
Copyright2024–2026 POPUP STUDIO PTE. LTD.
ContributingCONTRIBUTING.mdmain requires admin merge + PR review
IssuesGitHub Issues
Emailcontact@popupstudio.ai

Release history

bkit follows Semantic Versioning. All release notes live in CHANGELOG.md and are not duplicated here.


Made with AI by POPUP STUDIO