PRML Cookbook
June 1, 2026 · View on GitHub
Short, opinionated patterns for using PRML in real ML evaluation pipelines.
This is the field-manual for the PRML specification. The spec tells you what a manifest is. The cookbook tells you how to use it without shooting yourself in the foot.
Every pattern is:
- One page — read in under three minutes
- Self-contained — the example runs end-to-end with the snippets shown
- Failure-mode-first — what goes wrong is named before what goes right
Patterns
| # | Pattern | When to use |
|---|---|---|
| 1 | Single-shot eval claim | One model, one benchmark, one number — the 90% case. |
| 2 | Multi-seed eval claim | When you report mean ± std over N seeds. |
| 3 | Streaming Elo / arena eval | Live leaderboards. (Uses v0.2 streaming variant.) |
| 4 | Dataset version pinning | Benchmarks evolve; how to commit to a specific revision. |
| 5 | CI gate via prml-verify-action | Block PRs that ship a model with a tampered eval claim. |
| 6 | Public registry anchoring | When and when not to publish your hash publicly. |
| 7 | Revocation | Withdrawing a manifest after publication. (v0.2 feature.) |
| 8 | Pre-registration without infrastructure | The minimum-viable workflow: a YAML file and sha256sum. |
| 9 | RLHF win-rate evaluations | Judge-model comparisons (AlpacaEval, MT-Bench, Arena-Hard). |
| 10 | Federated evaluation | Multi-org replication: shared hash, distinct producers, regulator-grade audit trail. |
| 11 | PRML + Sigstore for execution integrity | Closes the §8.1 gap: who ran the eval, when, against which exact artefacts. |
| 12 | PRML in Hugging Face model cards | Make the accuracy number on a published HF model card verifiable, not trust-me prose. |
| 13 | PRML + commit-reveal validation for independence attestation | Closes the other §8.1 gap: structural proof that independent evaluators couldn't coordinate verdicts. Co-authored with ValiChord. |
Anti-patterns
| # | Anti-pattern | Why it bites |
|---|---|---|
| A1 | Computing the hash after the run | The whole point is committing before. |
| A2 | Editing the manifest "to fix a typo" | Any edit breaks the hash. Use revocation. |
| A3 | Storing private data in the manifest | The hash is published; the manifest content might be too. |
| A4 | Treating the hash as proof of truth | The hash proves commitment, not correctness. |
Reference
- Identity levels (0–4) — a non-normative ladder for the binding strength between
producerand the real-world authoring entity. Used by Pattern 11 and the v0.3 RFC.
Audit & compliance crosswalks
Subcategory-by-subcategory maps from major AI governance frameworks to PRML fields (FULL / PARTIAL / NONE tagged):
- EU AI Act Article 12 — code-level pattern for the 2 December 2027 high-risk deadline
- NIST AI RMF 1.0 — GOVERN / MAP / MEASURE / MANAGE subcategory map
- ISO/IEC 42001:2023 — AI Management System clause-by-clause evidence map
Examples
Working code in examples/:
pytorch-imagenet/— Full example: PRML manifest before a PyTorch ImageNet eval, hash committed, post-run verificationstable-baselines3-rl/— RL agent on LunarLander-v2, mean episode reward claim, threshold direction>=inspect-ai-refusal/— Refusal-rate eval via Inspect AI, PRML pre-registration viafalsify-inspecthuggingface-eval/—lm-eval-harnessintegration, multi-task pre-registration
License
- Documentation, patterns, examples: CC0 1.0 — public domain dedication. Mirror, fork, modify without attribution.
- Any tooling: MIT.
Contributing
Pattern proposals welcome via PR. Each new pattern must:
- Solve a real problem someone hit while implementing PRML
- Be reproducible — name the tools and their versions
- Include a "what doesn't work" section (we are not selling)
- Be under 800 words
Open an issue first if you're unsure whether your pattern fits.
Authors
Cüneyt Öztürk Contact: hello@falsify.dev · falsify.dev
Status
- v0.1 stable. v0.2 RFC open through 2026-05-22 — spec.falsify.dev/v0.2-rfc.
- The PRML JSON Schema is in the SchemaStore catalog (merged 2026-05-11), so
*.prml.yamlfiles autocomplete in VS Code, JetBrains, Helix, Zed, and Cursor out of the box.
Contributing
See CONTRIBUTING.md and the good first issue label for scoped work.
Cite the spec: Öztürk, C. (2026). PRML v0.1. Zenodo. https://doi.org/10.5281/zenodo.20177839