Core algorithm: evidence, candidate selection, and autorun
May 13, 2026 · View on GitHub
This document explains the current hermes-curator-evolver algorithm in plain terms, including exactly where embedding and reranking are supported.
For the clean-room rationale on why HyperAgents is not a runtime dependency, and the precise scope of the multi-variant candidate evaluation and staged verifier gate documented below, see hyperagents-design-notes.md.
Short answer
| Path | Uses embedding? | Uses reranker? | Can write skills? | Purpose |
|---|---|---|---|---|
bootstrap | No by default; yes with --semantic | No by default; yes with --semantic | Installs the timer; the timer can write only low-risk local-agent-created skill notes | One-command setup: backfill sessions + install/enable autorun. |
backfill-sessions | No | No | No | Import existing Hermes session_*.json transcripts into evidence.sqlite so prior history can inform reports/autorun. |
auto-run / install-auto default | No | No | Yes, only bounded low-risk blocks for local agent-created skills when explicitly enabled | Safe automatic skill improvement with deterministic evidence thresholds, provenance write protection, and size guardrails. |
auto-run --semantic-candidates | Yes: Qwen/Qwen3-Embedding-0.6B | No unless --rerank-candidates | Yes, but only after the same write flags and local-agent-created source gate | Model-assisted ordering of evidence-eligible skills. |
auto-run --semantic-candidates --rerank-candidates | Yes | Yes: BAAI/bge-reranker-v2-m3 | Yes, but only after the same write flags and local-agent-created source gate | Embedding + reranker ordering of evidence-eligible skills. |
candidates --execute-semantic | Yes: Qwen/Qwen3-Embedding-0.6B | No unless --rerank | No | Manual/review candidate discovery. |
candidates --execute-semantic --rerank | Yes | Yes: BAAI/bge-reranker-v2-m3 | No | Better manual/review ranking. |
propose --draft-with-model | Uses Hermes configured chat model | No | No | Draft a reviewable proposal artifact. |
apply | No | No | Yes, after explicit approval/hash/backup gates | Apply reviewed content. |
Default autorun remains model-free. Embedding/rerank autorun is explicit opt-in and can only reorder candidates that already passed the evidence threshold; model output does not generate write content. Unattended apply is provenance-safe by default: only local agent-created skills are writable. Official/bundled, hub-installed, plugin-provided, skills.external_dirs, pinned, unknown sources, and already-over-hard-cap skills may be proposed in dry-run output, but are skipped before write.
Semantic execution is runtime-guarded for local machines: texts are truncated for candidate ranking (HERMES_CURATOR_EVOLVER_SEMANTIC_TEXT_LIMIT, default 512 chars), embedding batches run one at a time, and model runtime device is configurable with HERMES_CURATOR_EVOLVER_SEMANTIC_DEVICE (default auto; set cpu or cuda explicitly if needed). If local model execution fails, auto-run falls back to deterministic evidence ordering instead of crashing.
Historical session backfill algorithm
backfill-sessions is implemented in hermes_curator_evolver/backfill.py and fills the same evidence store used by reports and autorun.
1. Read newest `session_*.json` files from `--sessions-dir`.
2. Keep files inside the `--days` lookback window; optionally stop at `--limit` newest files.
3. For each session:
a. Use `session_id`, `session_start`, `model`, and `platform` from the transcript.
b. Import assistant `tool_calls` with matching `tool` responses when available.
c. Import user → assistant text turns for compact context evidence.
d. Import one session completion marker.
4. Skip duplicate tool/turn/session signatures so repeated backfill runs are safe.
5. Return counts; no skills are changed.
Backfill is intentionally model-free. It does not infer missing tool calls from prose; it only records evidence that is present in the Hermes session JSON structure. After backfill, report and auto-run see the imported historical evidence through normal SQLite queries.
Current autorun algorithm
auto-run is implemented in hermes_curator_evolver/auto_evolve.py.
Inputs
- Evidence DB:
~/.hermes/plugins/curator-evolver/data/evidence.sqlite - Optional historical source:
~/.hermes/sessions/session_*.jsonimported withbackfill-sessions - Skills root: default
~/.hermes/skills - Lookback window: default
--days 7 - Optional bootstrap wrapper:
bootstrap,bootstrap --semantic - Candidate cap: default
--max-skills 3 - Minimum evidence threshold: default
--min-evidence 2 - Size guardrails: target a 90k
SKILL.mdsoft cap; skip unattended updates when the targetSKILL.mdalready exceeds the 100k hard cap - Optional candidate ordering:
--semantic-candidates,--rerank-candidates - Auto-apply policy: provenance gate writes only local agent-created skills;
--protect-core-skillsdefault on adds an extra name-based guard;--allow-auto-apply-skill <glob>and--block-auto-apply-skill <glob>operate inside that provenance boundary
Steps
1. Build aggregate evidence report for the lookback window.
2. Read report.summary.skills.
3. Build an evidence-eligible candidate set:
skill.event_count >= min_evidence.
4. If semantic/rerank is not requested:
order candidates by deterministic evidence summary order.
5. If semantic/rerank is requested:
a. Build an evidence query from eligible skill counts and recent evidence rows.
b. Run embedding candidate search over SKILL.md files.
c. Optionally run reranker on query/skill pairs.
d. Keep only skills from the evidence-eligible set.
e. Use model scores only to reorder those eligible skills.
6. Discover matching SKILL.md files under the skills directory and classify source provenance.
7. For each selected skill:
a. Read the current SKILL.md.
b. Skip pinned skills.
c. In dry-run, still plan protected sources for review visibility.
d. In approved auto-apply mode, skip any source other than `local-agent-created`: bundled/official, hub-installed, plugin-provided, `skills.external_dirs`, and unknown sources.
e. Apply the extra core/workflow name guard and explicit blocklist/allowlist patterns inside that provenance boundary.
f. Build a per-skill evidence report.
g. Prepare a bounded managed curator-evolver:auto block.
h. If the updated SKILL.md would exceed the 90k soft cap, reduce inline evidence and spill bulky details into references/.
i. If the existing SKILL.md already exceeds the 100k hard cap, skip unattended update with reason skill-content-hard-cap.
j. Preserve all existing skill text outside that block.
8. If --apply-low-risk is not set:
return dry-run plan only.
9. If --apply-low-risk is set but --approve-auto-apply is missing:
refuse to write.
10. If both write flags are set and the policy gate permits the skill:
apply through guarded apply with SHA256 check, backup, post-apply verification, and rollback manifest. Timers installed by `bootstrap`/`install-auto` use the built-in `skill_validate` verifier by default; direct `auto-run` can still provide a custom `--verify-command`.
Pseudocode
report = build_report(store, days=days)
eligible = [
row.skill_name
for row in report.summary.skills
if row.event_count >= min_evidence
]
if semantic_candidates or rerank_candidates:
query = build_semantic_query(report, eligible)
ranked = find_skill_candidates(
query=query,
skills_dir=skills_dir,
semantic=True,
load_models=True,
load_reranker=rerank_candidates,
)
names = [item.skill_name for item in ranked if item.skill_name in eligible][:max_skills]
names += [name for name in eligible if name not in names]
else:
names = eligible
for name in names[:max_skills]:
skill_file = skill_files.get(name)
original = read(skill_file)
if pinned(original):
skip("pinned-skill")
source = classify_skill_source(skill_file, name)
if apply_low_risk and approve_auto_apply and source != "local-agent-created":
skip("source-not-agent-created")
if apply_low_risk and approve_auto_apply and auto_apply_blocked(name):
skip("core-skill-auto-apply-protected")
skill_report = build_report(store, days=days, skill=name)
prepared = prepare_low_risk_skill_update(
skill_name=name,
skill_text=original,
days=days,
summary=skill_report.summary,
evidence_rows=skill_report.skill_evidence,
)
if prepared.skipped_reason:
skip(prepared.skipped_reason)
if apply_low_risk and approve_auto_apply:
apply_guarded_patch(
target_path=skill_file,
new_content=prepared.content,
expected_sha256=sha256_file(skill_file),
backup_root=backup_dir,
verify_command=verify_command,
)
write_support_files(prepared.support_files)
What gets written
Autorun only writes a managed block like this:
<!-- curator-evolver:auto:start -->
## Auto-curated evidence notes
Low-risk bounded auto-curation generated by `hermes-curator-evolver`.
These notes are evidence summaries for future agents; they do not replace human-authored SOPs.
- Skill: `example-skill`
- Generated at: `...`
- Evidence window: last 7 day(s)
- Tool events: ...
- Skill events: ...
- Error-like events: ...
### Recent evidence
- ...
### Agent guidance
- When this skill is relevant, check these observed signals before choosing a workflow.
- Prefer targeted verification over broad retries when similar errors recur.
- If a repeated issue is understood, replace this evidence note with a concise human-readable SOP update.
<!-- curator-evolver:auto:end -->
If the block already exists, autorun replaces only that managed block. It does not rewrite the rest of the skill. When the block would make SKILL.md too large, autorun keeps a compact pointer in the block and writes bulky evidence to a references/curator-evolver-auto-*.md support file. If the starting SKILL.md is already above the 100k hard cap, unattended update is skipped instead of making the file larger.
Embedding/rerank autorun choice
Default model-free timer
hermes-curator-evolver install-auto --schedule daily --enable
Equivalent auto-run:
hermes-curator-evolver auto-run \
--skills-dir ~/.hermes/skills \
--format json \
--apply-low-risk \
--approve-auto-apply
Semantic/rerank timer
Install optional model dependencies first:
uv pip install --python ~/.hermes/hermes-agent/venv/bin/python -e "$HOME/.hermes/plugins/curator-evolver[semantic]"
Then opt in:
hermes-curator-evolver install-auto \
--schedule daily \
--enable \
--semantic-candidates \
--rerank-candidates
Equivalent auto-run:
hermes-curator-evolver auto-run \
--skills-dir ~/.hermes/skills \
--format json \
--semantic-candidates \
--rerank-candidates \
--apply-low-risk \
--approve-auto-apply
Where embedding and reranking are used
Semantic ranking is implemented in hermes_curator_evolver/semantic.py and reused by both:
- The advisory
candidatescommand. - The opt-in
auto-run --semantic-candidates/--rerank-candidatespath.
Plan-only semantic mode for review
hermes-curator-evolver candidates \
--query "gateway plugin restart" \
--skills-dir ~/.hermes/skills \
--semantic
This shows the semantic model plan. It does not download or run models.
Embedding execution
hermes-curator-evolver candidates \
--query "gateway plugin restart" \
--skills-dir ~/.hermes/skills \
--execute-semantic
This loads:
Qwen/Qwen3-Embedding-0.6B
through sentence-transformers and ranks skills by embedding cosine similarity.
Embedding + reranker execution
hermes-curator-evolver candidates \
--query "gateway plugin restart" \
--skills-dir ~/.hermes/skills \
--execute-semantic \
--rerank
This loads:
Qwen/Qwen3-Embedding-0.6B
BAAI/bge-reranker-v2-m3
The embedding model finds likely candidates first, then the reranker scores query/skill pairs.
Safety contract for model-assisted autorun
- Default autorun remains deterministic and model-free.
--semantic-candidatesand--rerank-candidatesare explicit opt-ins.- Model output can only influence candidate ordering, not write content directly.
- Model-ranked skills must already satisfy
min_evidence. - Writes still require
--apply-low-risk --approve-auto-apply. - Output includes selection mode, model names, scores, and reasons for each candidate.
- Timer install stays model-free unless the user explicitly opts in.
Optional: multi-variant candidate evaluation (--variants N)
auto-run --variants N (default N=1) deterministically generates up to four bounded variants per evidence-eligible skill and picks one winner before any guarded apply.
- Variants only vary knobs that are already inside the bounded mutation policy: number of inline evidence rows, whether evidence spills into
references/, and which "agent guidance" phrasing leads the managed block. - Variant 0 is always the prior default, so
--variants 1is byte-identical to the pre-variants behavior. - The scorer is deterministic and model-free: it prefers inline strategy over spillover, more hard-cap slack, and smaller diff from the existing skill. Ties break on variant index, so the same input always picks the same winner.
- Dry-run output exposes
candidate.variants[]summaries (name, spec, size strategy, content chars, score, selected flag) and acandidate.selected_variantblock for review. - Auto-apply still goes through the existing source/approval/hash/backup/verification gates and writes only the winner.
This adapts the multi-candidate evaluation idea clean-room — there is no agent loop, no model selecting variants, and no execution of model-generated content. See hyperagents-design-notes.md for the full rationale.
Optional: staged verifier gate
apply_guarded_patch (and auto-run via --staged-verify or --pre-verify-command) supports a cheap-then-expensive verifier chain after the write happens.
- Stage 1 —
builtin-structural: in-process check; the post-write file must stay under the 100k hard cap, keep the managed-block markers balanced, and keep parseable frontmatter. No subprocess. - Stage 2 (optional) —
pre-verify-command: a caller-supplied cheap pre-check shell command. Useful for a fast lint or schema validation. - Stage 3 —
verify-command: the same expensive verifier callers were passing before (e.g., the bundledskill_validatevalidator orpython -m pytest -q).
The expensive stage is skipped entirely if any earlier stage fails, and any stage failure triggers the existing rollback path. Backward compatibility: when no --staged-verify / --pre-verify-command is requested, the verifier shape and behavior are unchanged. When staged verification is in use, the result keeps top-level verify.passed / verify.exit_code / verify.output so existing tooling continues to work, and adds a verify.stages[] list with per-stage results plus verify.failed_stage on failure.
Mental model
Think of the current system as two lanes that now meet at the candidate-ordering step:
Lane A — safe automation default
Evidence counts → deterministic candidate ordering → bounded managed notes + optional reference spillover → guarded apply
Lane B — model-assisted ordering opt-in
Evidence-eligible candidates → embedding/rerank ordering → bounded managed notes + optional reference spillover → guarded apply
Models can improve which eligible skill is considered first, but they cannot bypass evidence thresholds or guarded apply.