PRML Cookbook

July 24, 2026 · View on GitHub

Short, opinionated patterns for using PRML in real ML evaluation pipelines.

This is the field-manual for the PRML specification. The spec tells you what a manifest is. The cookbook tells you how to use it without shooting yourself in the foot.

Every pattern is:

One page — read in under three minutes
Self-contained — the example runs end-to-end with the snippets shown
Failure-mode-first — what goes wrong is named before what goes right

Patterns

#	Pattern	When to use
1	Single-shot eval claim	One model, one benchmark, one number — the 90% case.
2	Multi-seed eval claim	When you report mean ± std over N seeds.
3	Streaming Elo / arena eval	Live leaderboards. (Uses v0.2 streaming variant.)
4	Dataset version pinning	Benchmarks evolve; how to commit to a specific revision.
5	CI gate via prml-verify-action	Block PRs that ship a model with a tampered eval claim.
6	Public registry anchoring	When and when not to publish your hash publicly.
7	Revocation	Withdrawing a manifest after publication. (v0.2 feature.)
8	Pre-registration without infrastructure	The minimum-viable workflow: a YAML file and `sha256sum`.
9	RLHF win-rate evaluations	Judge-model comparisons (AlpacaEval, MT-Bench, Arena-Hard).
10	Federated evaluation	Multi-org replication: shared hash, distinct producers, regulator-grade audit trail.
11	PRML + Sigstore for execution integrity	Closes the §8.1 gap: who ran the eval, when, against which exact artefacts.
12	PRML in Hugging Face model cards	Make the accuracy number on a published HF model card verifiable, not trust-me prose.
13	PRML + commit-reveal validation for independence attestation ▶ runnable	Closes the other §8.1 gap: structural proof that independent evaluators couldn't coordinate verdicts. Co-authored with ValiChord.

Anti-patterns

#	Anti-pattern	Why it bites
A1	Computing the hash after the run	The whole point is committing before.
A2	Editing the manifest "to fix a typo"	Any edit breaks the hash. Use revocation.
A3	Storing private data in the manifest	The hash is published; the manifest content might be too.
A4	Treating the hash as proof of truth	The hash proves commitment, not correctness.

Reference

Identity levels (0–4) — a non-normative ladder for the binding strength between producer and the real-world authoring entity. Used by Pattern 11 and the v0.3 RFC.

Audit & compliance crosswalks

Subcategory-by-subcategory maps from major AI governance frameworks to PRML fields (FULL / PARTIAL / NONE tagged):

EU AI Act Article 12 — code-level pattern for the 2 December 2027 high-risk deadline
NIST AI RMF 1.0 — GOVERN / MAP / MEASURE / MANAGE subcategory map
ISO/IEC 42001:2023 — AI Management System clause-by-clause evidence map

Examples

Working code in examples/:

pytorch-imagenet/ — Full example: PRML manifest before a PyTorch ImageNet eval, hash committed, post-run verification
stable-baselines3-rl/ — RL agent on LunarLander-v2, mean episode reward claim, threshold direction >=
inspect-ai-refusal/ — Refusal-rate eval via Inspect AI, PRML pre-registration via falsify-inspect
huggingface-eval/ — lm-eval-harness integration, multi-task pre-registration

License

Documentation, patterns, examples: CC0 1.0 — public domain dedication. Mirror, fork, modify without attribution.
Any tooling: MIT.

Contributing

Pattern proposals welcome via PR. Each new pattern must:

Solve a real problem someone hit while implementing PRML
Be reproducible — name the tools and their versions
Include a "what doesn't work" section (we are not selling)
Be under 800 words

Open an issue first if you're unsure whether your pattern fits.

Authors

Cüneyt Öztürk Contact: hello@falsify.dev · falsify.dev

Status

v0.1 stable. v0.2 RFC frozen 2026-05-22 — spec.falsify.dev/v0.2-rfc.
The PRML JSON Schema is in the SchemaStore catalog (merged 2026-05-11), so *.prml.yaml files autocomplete in VS Code, JetBrains, Helix, Zed, and Cursor out of the box.

Contributing

See CONTRIBUTING.md and the good first issue label for scoped work.

Cite the spec: Öztürk, C. (2026). PRML v0.1. Zenodo. https://doi.org/10.5281/zenodo.20177839