Skills - progressive-disclosure capability packs

May 21, 2026 · View on GitHub

Status: active (April 2026). Applies to: all CLI adapters spawned by Bernstein.

Motivation

Loading the full templates/roles/<role>/system_prompt.md into every agent spawn pays the token bill on every retry and fork, whether or not the agent exercises the deep guidance. Across 17 roles the bodies average ~40 lines each.

A capability pack is a directory with SKILL.md (YAML frontmatter + markdown body) and optional references/, scripts/, assets/ siblings. The resolver injects only an index (name + description per skill) into the system prompt; agents pull the full body on demand via load_skill when they decide a capability is relevant.

Directory layout

templates/
  roles/                # legacy bodies (kept for backwards compat)
    backend/
      system_prompt.md
      task_prompt.md
      config.yaml

  skills/               # new skill packs
    backend/
      SKILL.md
      references/
        python-conventions.md
        test-patterns.md
        error-handling.md
      scripts/
        lint.sh
    qa/
      SKILL.md
      references/
        test-strategy.md
        edge-cases.md

Empty buckets (references/, scripts/, assets/) are omitted - the manifest's corresponding field is just an empty list.

SKILL.md format

---
name: backend
description: Python server code, APIs, async, strict typing.
trigger_keywords: [python, backend, async, pyright]
references:
  - python-conventions.md
  - test-patterns.md
  - error-handling.md
scripts:
  - lint.sh
---

# Backend Engineering Skill

You are a backend engineer…

Descriptions stay terse (one line) because the full index ships in every spawn's system prompt. Every byte multiplies by the number of agents Bernstein launches.

Frontmatter schema

Defined by :class:bernstein.core.skills.SkillManifest (Pydantic, extra="forbid" so typos fail loudly):

fieldtypenotes
namestrmatches ^[a-z][a-z0-9-]*$
descriptionstr20-500 chars, shown in the index
trigger_keywordslist[str]optional keyword hints
referenceslist[str]files under <skill>/references/
scriptslist[str]files under <skill>/scripts/
assetslist[str]files under <skill>/assets/
versionstrdefaults to "1.0.0"
authorstr|Noneoptional

Validation failures raise :class:bernstein.core.skills.SkillManifestError with the originating path baked into the message.

Resolution flow

bernstein.core.planning.role_resolver.resolve_role_prompt is called once per spawn. It tries three things in order:

  1. Skill pack - templates/skills/<role>/SKILL.md exists → inject the compact index plus the matched skill body.
  2. Legacy role template - no skill pack, but templates/roles/<role>/system_prompt.md exists → render via the existing Jinja-style engine and inject that.
  3. Fallback stub - neither path found → "You are a <role> specialist."

The resolver is cached per (templates_dir, skills/ mtime) so dev reloads pick up edits but production spawns do not re-parse 17 manifests on every tick.

load_skill MCP tool

Registered by :mod:bernstein.mcp.server under the name load_skill:

async def load_skill(
    name: str,
    reference: str | None = None,
    script: str | None = None,
) -> dict: ...

Returns JSON with:

  • name - echoed back.
  • body - SKILL.md body when reference and script are unset.
  • available_references / available_scripts - always populated.
  • reference_content - the requested reference's raw text (only when reference was passed).
  • script_content - the requested script's raw text.
  • error - populated when the skill / file could not be loaded.

Every invocation emits a skill_loaded WAL event (best-effort) with name, reference, script, source, duration_s, and error fields.

Sources

Skills are aggregated from multiple sources into a single :class:SkillLoader. Name collisions abort startup with :class:DuplicateSkillError - duplicate names across sources are never silently shadowed.

First-party

bernstein/templates/skills/ loaded by :class:bernstein.core.skills.sources.LocalDirSkillSource.

Plugin packs

Register a zero-arg factory under bernstein.skill_sources:

[project.entry-points."bernstein.skill_sources"]
my-data-pack = "my_pack.skills:source"

Where my_pack/skills.py exposes:

from pathlib import Path

from bernstein.core.skills import SkillSource
from bernstein.core.skills.sources import LocalDirSkillSource


def source() -> SkillSource:
    return LocalDirSkillSource(
        Path(__file__).parent / "skills",
        source_name="plugin:my-data-pack",
    )

:func:bernstein.core.skills.sources.load_plugin_sources enumerates the group at loader construction time. Broken factories log a warning and are skipped rather than aborting startup - a noisy third-party bug should not take down the orchestrator.

CLI

bernstein skills list                 # compact table of every skill
bernstein skills show backend         # print SKILL.md body
bernstein skills show backend --reference python-conventions.md
bernstein skills show backend --script lint.sh

Observability

Every successful load_skill invocation emits a structured skill_loaded event. Hook it into the WAL (src/bernstein/core/persistence/wal.py) or a Prometheus metric (skill_load_total{name=..., source=...}, skill_load_duration_seconds{name=...}) to see:

  • Which skills get exercised vs. sit dead.
  • Whether agents converge on a small core set.
  • Which references are worth keeping close and which can be retired.

Dead skills (zero loads in 30 days) become candidates for deprecation.

Migration status

All 17 roles migrated to skill packs:

rolereferencesnotes
backendpython-conventions, test-patterns, error-handling + lint.shfull split
qatest-strategy, edge-casesfull split
securityowasp-top-10, auth-checklist, secrets-handlingfull split
frontenda11y, state-managementfull split
devopsci-patterns, docker-practicesfull split
architectadr-template, decomposition-principlesfull split
docsdocstring-style, doc-structure + check-links.shfull split
retrievalhybrid-search, chunkingfull split
ml-engineerevaluation, reproducibilityfull split
reviewerreview-rubric, feedback-tonefull split
managertask-api, planning-rulesfull split
vppivot-evaluation, cell-decompositionfull split
prompt-engineer-body small, no references
visionary-body is the output schema
analyst-body is the scoring rubric
resolver-single-purpose skill
ci-fixer-single-purpose skill

Legacy templates/roles/<role>/system_prompt.md files remain on disk for backwards compat.