Approved-LLM registry
June 2, 2026 · View on GitHub
The registry answers a single question every skill that touches
<private-list> content asks at Step 0: "is the active LLM
stack approved to receive this data?" If yes, the skill
proceeds. If no, it stops with a pointer at this doc and the
adopter's <project-config>/privacy-llm.md.
The registry has two tiers: default-approved entries that require no adopter action, and opt-in entries the adopter declares explicitly per the rationale in Why this list is provisional below.
The default-approved entries
These four classes are pre-approved by the framework. An adopter
running with only these does not need to write
<project-config>/privacy-llm.md (the gate auto-detects the
default-approved Claude Code instance and passes).
| Class | Rationale | Examples |
|---|---|---|
| Claude Code itself | The Claude-Code instance running framework skills is treated as an approved privacy model for the data it directly processes. See docs/setup/privacy-llm.md — Claude Code trust boundary for the rationale and the limits of this default. | The agent invoking the skill |
*.apache.org-hosted endpoints | Anything served from an Apache Software Foundation domain runs on infra under ASF governance — data residency, retention, and access are bounded by the ASF infra agreement. | A future ASF-hosted inference endpoint at e.g. inference.apache.org; an in-tracker endpoint at <project>.apache.org/llm/ |
| Local-only inference | The data never leaves the user's machine. No external party (cloud LLM operator, network operator, log aggregator) can observe it. | Ollama serving a local model, vLLM on the user's workstation, llama.cpp embedded in a CLI helper |
| Air-gapped on-prem | Same rationale as local inference, scaled to a contributor's organisation. The model server runs on infra the adopter operationally controls and which has no path to a third-party LLM operator. | A PMC-hosted inference appliance on a private VLAN |
Detection lives in
checker/src/checker/check.py
(the _approve_by_default_rules function); the markdown contract
here is the source-of-truth for what those rules implement, and
the
<project-config>/privacy-llm.md
declaration shape is what the checker parses.
The opt-in entries — adopter declares explicitly
Every other LLM endpoint requires the adopter to declare it
explicitly in <project-config>/privacy-llm.md, naming:
- the endpoint URL (or provider product name);
- the data-residency / retention contract that backs the choice (a link to a contract clause, vendor doc, or BAA-equivalent);
- the security-team member who approved the addition (initials + date), so the audit trail is local and visible.
The framework does not ship a curated allow-list of third-party endpoints. The opt-in mechanism puts the choice — and the responsibility — on the adopting project's security team, where ASF policy expects it to live.
Recipes for the most common opt-in cases (AWS Bedrock with a
data-residency-bounded region, direct Anthropic API with a
no-training agreement, Vertex AI with VPC-SC) are in
docs/setup/privacy-llm.md.
The recipes spell out the data-residency contract each one
implies.
The pre-flight check
Skills that may read <private-list> content (or any private
content beyond <security-list> — recall the redactor handles
third-party PII inside <security-list> mail, leaving the
reporter's own identity intact) run this check at Step 0:
1. Read <project-config>/privacy-llm.md, if present.
2. Build the active-LLM-stack set:
- Claude Code (always present — this is what's running)
- any model named in <project-config>/privacy-llm.md as
"currently configured"
3. For every entry in the stack, decide approved? per:
- Claude Code → ✓ default-approved
- URL ending in .apache.org → ✓ default-approved
- Hostname ∈ {localhost, 127.0.0.1, ::1} → ✓ default-approved
- Listed under "approved third-party" with a complete
data-residency note → ✓ adopter-approved
- Anything else → ✗
4. If every entry is approved, proceed.
If any entry is not approved, stop with:
"Skill <name> reads <private-list> content. The active LLM
stack contains <unapproved entry>, which is not in the
framework's default-approved list and is not declared in
<project-config>/privacy-llm.md. See
tools/privacy-llm/models.md and docs/setup/privacy-llm.md."
The check is deliberately conservative: any single unapproved entry in the stack stops the skill. The intent is to make adding a new LLM hop a deliberate act, not something a skill can silently grow into.
Adopter config — <project-config>/privacy-llm.md
Adopters declare their privacy-LLM posture in a single markdown
file at <project-config>/privacy-llm.md. The framework's
projects/_template/privacy-llm.md
ships a starting point pre-filled with the Claude-Code default;
adopters customise from there.
The file has three sections:
## Currently configured LLM stack
- Claude Code (the agent running framework skills)
<!-- list every additional LLM the adopter has wired into any
skill or tool here, one per line, with the endpoint URL or
provider name. -->
## Approved third-party endpoints (opt-in)
<!-- adopter populates this section per the recipes in
docs/setup/privacy-llm.md. Each entry includes:
- endpoint URL / provider name
- data-residency contract (link)
- approved-by: <initials> <YYYY-MM-DD>
For empty (Claude-Code-only) deployments this section
stays empty. -->
## Private mailing lists for this project
- `<private-list>` # PMC private list
- (any additional PMC-private foundation lists this project's
security team reads, e.g. cross-project security relay lists)
The "Private mailing lists" section is what
tools/ponymail/ reads for the
tools.ponymail.private_lists config knob; the privacy-llm tool
re-uses the same source-of-truth so the two stay in sync.
How skills call the gate
Skills call the gate via the privacy-llm-check console script
in checker/. Run it at Step 0 (pre-flight); a
non-zero exit is a hard stop.
# Returns exit code 0 if the active stack is fully approved,
# non-zero with a stderr explanation if not.
uv run --project <framework>/tools/privacy-llm/checker privacy-llm-check \
--reads-private-list # set when the skill may read <private-list>
The checker auto-locates <project-config>/privacy-llm.md:
explicit --config → $PRIVACY_LLM_CONFIG env var → standard
adopter paths (<cwd>/.apache-magpie/privacy-llm.md,
<cwd>/.apache-magpie-overrides/privacy-llm.md). On approval
it prints a one-line banner per stack entry; on rejection it
prints the failing entries to stderr and exits 1. Exit 2 means
the config file could not be located or parsed.
For <security-list>-only skills the gate-call is also
required as a defence-in-depth measure: even though the body
classification permits Claude-Code-default LLMs, running the
check ensures the adopter's config is in a valid state (no
unconfigured opt-in entries lurking in the active stack) before
any private content flows. The redactor (pii-redact) is
required for every <security-list> body read regardless; see
pii.md for the redaction contract and
wiring.md for how the two mechanisms compose at
the skill level.
Why this list is provisional
There is no ratified ASF Legal Affairs / Privacy policy yet that enumerates approved LLM endpoints for handling foundation private data. The default-approved list above is the working position the framework adopts until such a policy lands. Specifically:
- The "Claude Code itself" default reflects the framework
maintainer's current trust posture (per
docs/setup/privacy-llm.md— Claude Code trust boundary). If ASF Legal subsequently rules that Anthropic-hosted endpoints require a data-processing agreement for foundation private data, the framework will narrow the default and bump the registry version. - The
*.apache.orgblanket approval assumes infra-level governance; if a future ASF endpoint runs at*.apache.orgbut proxies to a third-party LLM, that endpoint may need re-classification.
When ASF Legal does ratify a list, this file becomes the pointer to that list rather than the list itself, and the default-approved entries get re-checked against it. Until then, this file is the framework's source-of-truth for adopters and the rationale-of-record for the choices it encodes.