Eval Snapshot Workflow
May 27, 2026 · View on GitHub
This project keeps publishable benchmark numbers in a local gitignored file so README metrics can be updated without committing private baseline files.
Files
- Local metrics (gitignored):
${EVAL_BASELINES_DIR:-~/.cache/origin-eval}/readme_metrics.json - Tracked template:
docs/eval/readme_metrics.example.json - README updater:
scripts/update-readme-eval.py
Update flow
- Run benchmark(s) locally and record headline metrics.
- Update
${EVAL_BASELINES_DIR:-~/.cache/origin-eval}/readme_metrics.json. - Regenerate README snapshot:
python3 scripts/update-readme-eval.py
- Commit the README and script/docs changes (the local metrics JSON stays untracked).
Notes
- LongMemEval and LoCoMo use
Recall@5,MRR, andNDCG@10as headline fields. - Current README numbers are retrieval-only, single-run local snapshots unless a reproducibility pass is explicitly documented.
- Name the retrieval mode once in surrounding prose when all rows use the same mode.
- Keep
notesin the metrics JSON for maintainer-facing caveats and run metadata; the root README does not render them.
Links
- useorigin.app — project home
- useorigin.app#benchmarks — the public benchmark table sourced from this workflow