Feature catalog

June 27, 2026 · View on GitHub

The full inventory of dirge's capabilities. For a short overview and the headline differentiators, see the top-level README.

Core

  • Multi-provider: OpenRouter, OpenAI, Anthropic, Gemini, DeepSeek, GLM, Ollama, plus custom OpenAI-compatible endpoints. A single providers map declares aliases; top-level role keys (provider, review_provider, escalation_provider, summarization_provider, subagent_provider) point each role at one of those aliases. See config.md.
  • Standard tools: read, write, edit, edit_lines, bash, grep, find_files, glob, list_dir, write_todo_list, issue, apply_patch, repo_overview, session_search, webfetch, websearch, question, memory, skill, spec, task, task_status, tool_search.
  • Hash-anchored editing (edit_lines + read(line_hashes=true)): read can prefix each line with a 3-char content hash (42 a3f: …); edit_lines then replaces a line range by number (start_line/end_line + the expected_hashes for that range + new_text) instead of reproducing the old text. The tool recomputes the hashes from disk and rejects the edit with a per-line diff if any line drifted since the read, so it can't clobber changed content. The hash is a staleness guard, not a locator. On cheaper models (DeepSeek) this cuts retries and output tokens — the model emits line numbers + tiny hashes, not the block it's replacing. Reuses the read-before-edit gate, tree-sitter pre-write validation, and atomic write; the exact-string edit is unchanged.
  • Line-numbered read output: read prefixes each line with right-aligned line numbers (123: content).
  • Environment-aware: system prompt includes OS, shell, working directory, and git branch for context.
  • Semantic code tools (tree-sitter): list_symbols, get_symbol_body, find_definition, find_callers, find_callees — TypeScript/TSX, Python, Clojure (clj/cljs/cljc/edn/bb), Go, Ruby, Rust, Java, C, C++. See semantic.md.
  • Claude-compatible skills: discover skills from .claude/skills/, .opencode/skills/, .agents/skills/, .dirge/skills/. The agent calls the skill tool to load instructions on demand. See skills.md.
  • Spec-driven workflow: align on what before how. The spec tool tracks changes (proposal + requirement deltas + a task checklist with real status) in the session DB; living specs (capability → requirement → scenario) are the current truth, and archiving folds a change's deltas into them in one transaction. The active change is injected into context, and archiving forms a memory of the rationale. Inspect with /spec. SQLite-backed, OpenSpec-inspired. See spec-workflow.md.
  • Native issue tracker: a persistent, agent-facing kanban in the session DB (.dirge/sessions/state.db, issues table) — a stateful extension of the memory model. The issue tool creates/starts/blocks/closes durable tasks (status open/in_progress/blocked/done, priority high/normal/low); unlike the ephemeral write_todo_list, issues persist across sessions. The harness injects the top open issues as a board at the start of each turn (bounded, with a "+N more" hint) so the model works its backlog without polling. Inspect with /issues (/issues list, /issues <id>, /issues search <q>). SQLite-backed, no external tracker. See issues.md.
  • Bash permissions (tree-sitter): parses shell commands to split &&/;/| into individual segments, detects command substitution and complex constructs. See permissions.md.
  • Permission system: a single decision engine (one Policy Decision Point) with four configurable modes, op-based rules (read/edit/execute/network/mcp/…), session allowlists, external directory policies, and a /why decision-trace command. See permissions.md.
  • Session management & compaction: save/load/resume sessions with lineage-aware session search. Resume the most recent with -c, browse-and-pick with -r, or pin a stable id with --session <id> — which resumes that session if it exists and creates it under that exact id otherwise (used by scripts and the shell plugin). /compress (alias /compact) runs an LLM-summarization compaction pass that folds the conversation middle into a structured summary; an explicit /compress forces the pass even when the context is still within limits (automatic compaction stays threshold-gated). Automatic turn-boundary compaction keeps the runtime within the context window: it first caps oversized tool results (per-result token cap) and prunes tool output without an LLM call, and when the context-ratio crosses the high fold threshold it runs the same structured LLM summarization — routed through the auxiliary summarization_provider when one is configured, else the main model. Explicit /compress, preemptive compaction, and reactive overflow recovery use the same summarization route; with Anthropic OAuth and no safe summarizer, reactive overflow can fall back to local prune-only emergency compaction before retrying the main turn. See agent-loop.md.
  • Durable session checkpoint & resume that picks up where it left off (adapted from MiMo-Code): every conversation has a stable identity (origin_id) carried across compaction rotations, and a durable checkpoint row (schema v10) holding the latest structured summary plus the verbatim original request (a write-once anchor that survives re-summarization). A compaction fold rotates the session id and leaves the old file behind, so resuming by the id you started with used to load a stale pre-fold snapshot; resume now resolves any id to the live chain tip, and the session list collapses a folded conversation to one entry instead of one per rotation. Incremental checkpointing (default on) refreshes that checkpoint in the background at 20%-of-window usage thresholds (20/40/60/80% up to 200K windows, denser above; disabled under 25K) without folding, so a resume after a quit/crash recovers fresh state rather than falling back to lossy compaction; disable with incremental_checkpoint = false. An optional compaction_fold_threshold (0.3–0.75) folds — and thus checkpoints — earlier, from more coherent context.
  • Terminal UI: crossterm-based, markdown rendering, soft-wrapping input box, mouse selection/copy, scrollback, reasoning visibility toggle, configurable themes, and --no-color (collapses the whole UI to the terminal's default foreground via a single theme chokepoint). See tui.md and themes.md.
  • Shell integration (the : prefix, zsh): an optional plugin that lets you run :<prompt> from your normal shell prompt — the prompt runs through dirge headlessly, the answer prints, and you stay in the shell. All : commands in a shell share one session (so follow-ups have context); :resume opens the full TUI on it, :new starts a fresh one. Built on --session <id>. See shell-plugin/README.md.
  • Side panels: optional gutters auto-shown at ≥152 cols. Toggle both together with /panel, or pick panes individually with /display (e.g. /display main, /display main|right, /display left|main|right); set a startup default with the display config key. A hidden panel's gutter is reclaimed by the conversation, which widens to use the freed space rather than leaving it blank. The right sidebar shows system load (CPU/MEM), MCP/LSP server status, pending todos, and recently-modified files. The left gutter shows live session vitals when idle — a context-window fill gauge (with a "compaction soon" cue near the auto-fold threshold), a recent-tool-activity ticker, and a git working-tree snapshot (branch + staged/unstaged/untracked + last commit) — and switches to per-subagent status rows when background subagents are running.
  • Mid-execution interjection: type while the agent is running to queue a follow-up message — the runner stops at the next tool-result boundary so it's picked up promptly instead of waiting for the whole multi-turn run. Alt+X drops queued messages (leaving the run going), Ctrl+C cancels both the run and the queue.
  • Prompts system: switch between system prompt modes at runtime (code, plan, review, debug, etc.). See prompts.md.
  • Per-prompt tool restrictions: each prompt (prompts/<name>.md) can declare a deny_tools frontmatter list. The permission checker refuses those tools while the prompt is active — a real security boundary, not a prose gate. plan, review, and review-security ship with edit, write, apply_patch, bash, and webfetch denied. See prompts.md.
  • Agent profiles (opt-in): named personas that bundle a system prompt, a model, and a tool policy, defined in .dirge/agents/<name>.md, ~/.config/dirge/agents/<name>.md, or config.json agents (layered project > global > config). Switch the active persona mid-session with /agent <name> — it applies the profile's prompt + deny_tools/allow_tools (at the permission layer) and swaps the model (same-client), so the right model-for-the-job is one keystroke away. /agent off reverts; /agents lists profiles plus the built-in role routing. The task tool can also spawn a subagent under a profile via task(prompt=…, agent="<name>") — the subagent runs on that profile's model + system prompt, so work fans out to specialized personas. Fully opt-in — no profiles defined means unchanged behavior, and the built-in critic/role routing is untouched. See agents.md.
  • Ask-vs-proceed calibration: the base prompt gives concrete signals for when to ask the user versus proceed — ask only when a wrong guess is costly/irreversible, can't be inferred from the code, and has genuinely divergent interpretations; otherwise proceed with the most reasonable interpretation and state the assumption. Reduces both over-asking and silently-wrong work.
  • Progress updates: for multi-step tool runs the agent gives a brief up-front plan and terse one-line progress notes between major steps, so long runs are steerable — scoped to progress during the run, leaving the terse final reply unchanged.
  • Finish discipline: the base system prompt carries an explicit definition-of-done and a single fast pre-reply self-check (did exactly what was asked, verified it works, no unrequested changes) plus a stop condition — so the agent verifies before claiming done and stops at the request boundary instead of gold-plating.
  • Model-aware steering: the harness detects the active model family and tailors guidance to it. DeepSeek chat models (v3/v4) get an extra preamble fragment — a Plan-Execute-Verify working method, structural-constraint framing (name files/functions/order, not "be modular"), an explicit success/never contract, and an anti-repetition rule (accept an errored or truncated tool result and adapt rather than re-issuing the same call) — appended last so it sits closest to the action boundary, where rules resist drift in long tool-calling loops. Other models, and the DeepSeek reasoner (which ignores the system prompt), are unaffected. Baked-in and automatic; no config key. See prompts.md.
  • Subagent support: task tool spawns a subagent for research or general analysis subtasks. Optionally run the subagent under a named agent profile (task(agent="<name>")) to give it its own model + system prompt.
  • Background shells (Claude-Code-style): the bash tool accepts background: true to run a command detached and unbounded (for dev servers, watch builds, long-running jobs) — it returns immediately with a shell id. The model reads accumulated output incrementally with the bash_output tool and stops the shell with kill_shell (both take the id); an optional timeout auto-kills after N seconds. Shells are tracked in a dedicated registry, capped at 8 concurrent, listed by /tasks, and killed when the session ends. The status bar shows live counts when any are running: agents:N (background subagents) and shells:N (background shells).
  • Memory & self-improvement: persistent per-project memory in the session DB (.dirge/sessions/state.db, memories table — facts under the memory target, anti-patterns under pitfalls), injected into the system prompt as a frozen snapshot. Memory is two-tiered: hot entries inline verbatim; when the inline budget fills, the least-salient entries demote to a breadcrumb index (one line of id + preview each) that the agent dereferences with memory(action='expand') or queries with memory(action='search') (FTS5). Removal archives (tombstones) rather than deletes — restore brings entries back. Legacy MEMORY.md/PITFALLS.md files are imported automatically on first load and parked as *.imported. A global, cross-project memory tier (a single store in the user data dir, default on) holds durable user preferences that follow you across every repo; it's injected into the prompt under its own header, and the memory tool writes/searches it with scope: "global" (default scope stays project). After an idle session, a unified post-session orchestrator runs (in order, fire-and-forget): a background review that extracts learnings into memory + skills, then a skills curator and a memory curator (stale-detection + lifecycle + LLM consolidation, with audit reports under each store's .curator_reports/).
  • MCP support: connect MCP servers for extended tooling (optional compile-time feature).
  • dirge as an MCP server (dirge mcp): run dirge itself as an MCP server so another agent (e.g. Claude Code) can delegate implementation tasks to dirge and review them — the caller plans/architects, dirge implements. Keeps a persistent per-project session (delegate extends it, new_session rotates); each delegation returns a summary + the files it changed for review. Built into the binary (mcp-server feature, default on). See mcp-server.md.
  • File-state rewind: /rewind rolls back the working tree, not just the conversation. Every write/edit/edit_lines/apply_patch (incl. delete/rename) snapshots the touched file's pre-mutation content keyed by the triggering user prompt; rewinding to a prompt restores all files to their pre-prompt state in lockstep with the conversation truncation, so a long autonomous run is safe to unwind. A file created in the rewound region is deleted on restore; a deleted file is recreated. Content is deduplicated through a content-addressed pool. In-memory and process-scoped (works within a live session, not across a restart).
  • Git worktrees: /worktree to create branch-per-task worktrees, /wt-merge, /wt-exit.
  • Loop system: iterative coding loop for long-horizon tasks.
  • Goal gate (--goal, adapted from MiMo-Code): an opt-in natural-language stop condition for autonomous runs (e.g. --goal "all tests pass and changes committed"). At each finalization an independent judge — the configured critic_provider — rules whether the condition holds; if not, its reason re-enters the loop and the run continues, bounded so a mis-stated goal can't spin forever. Off unless both --goal and a critic_provider are set.
  • ACP support (gated): Agent Communication Protocol server for editor integration. ACP locks the active prompt at launch — use --prompt <name> on startup to opt into a restricted mode (the protocol has no mid-session prompt-switch message).
  • Plugin system (Janet, on a dedicated worker thread): hooks across the full session/agent/tool lifecycle. Plugins can intercept tool calls (block/mutate/replace), register slash commands / tools / keyboard shortcuts / LLM providers, augment the system prompt (before-agent-start), transform the message context before each LLM call (transform-context), rewrite finalized assistant messages (message-end), supply custom compaction summaries (on-compact), post notifications, and prompt the user with blocking confirm/select dialogs. See plugins.md.

Robust agent loop

Hardening against the failure modes that plague long sessions and weaker models. See agent-loop.md for the architecture and tool-input-repair.md for the repair layer.

Reasoning & tool-use guidance suite

A set of research-backed, model-agnostic guidance features that steer reasoning and tool use, layered so they compose. All are baked-in (no config keys) and verified wired end-to-end:

  • Few-shot tool-use exemplars (loop): on-topic worked demonstrations retrieved per task and injected before the prompt.
  • Finish discipline (prompt): a single pre-reply self-check + explicit definition-of-done and stop condition.
  • Progress updates (prompt): up-front plan and terse step notes for multi-step runs, distinct from the terse final reply.
  • Ask-vs-proceed calibration (prompt): ask only on costly, un-inferable, genuinely divergent ambiguity; otherwise proceed and state the assumption.
  • In-session reflexion memory (loop): accumulates abandoned approaches and re-surfaces the full list in the repeat-loop guard.
  • Pre-finalization verifier gate (loop): nudges to verify when code was edited but nothing was run to check it.

On top of these, model-aware steering tailors guidance to the active model family (e.g. a DeepSeek-specific fragment). The prompt-layer features live in the base system prompt (assemble_base_preamble); the loop-layer features are wired into run_agent_loop. Each is covered by its own end-to-end "actually used" test, plus cross-feature composition tests.

  • Few-shot tool-use exemplars: a small curated corpus of worked tool-call demonstrations (read-before-edit, locate-then-read, multi-file apply_patch, run-tests-and-adapt, parallel reads). At the start of each task the most relevant 0–3 are retrieved by lexical match (nucleo-matcher scored per token, no embeddings) and injected into the model-facing context just before the prompt — on-topic demonstrations only, nothing for an off-topic task. In-context tool demonstrations are one of the largest reliability levers for open models.
  • Pre-finalization verifier gate: a cheap, signal-based in-loop critic that backs "verify before done" with a mechanism, not just prose. It watches the run for code edits (to source-file extensions) and build/test commands and their outcomes (a failing command is detected from bash's non-zero exit marker — no semantics parsed). At the finalization boundary — when the agent is about to declare done — it injects one soft nudge: fix it if a build/test failed after a code change, or verify it if code was changed but no build/test ran; it stays silent when a build/test ran green. Doc-only edits don't count; bounded to fire at most once per run so it can never loop; no extra LLM call. Optional tier 3 — when a critic_provider is configured, the gate escalates on substantive runs to one bounded LLM critique ("is this actually complete and correct?"); an INCOMPLETE verdict injects the concrete issues and re-enters the loop once. Off by default (no provider, no cost).
  • Tool-input repair layer: catches and fixes common malformed tool calls before they hit the tool — strips null optional fields, parses JSON-string arrays, unwraps markdown links in path fields, applies relational defaults declared in the tool's schema. Failed repairs emit a structured tool_input_invalid log with the original args.
  • Schema-aware contract hints (dirge-hints): per-tool schemas can declare semantic: "absolute_path", relational: [{requires, defaults}], etc. The repair layer reads these to drive automatic defaults + agent-facing Note: text — removing per-tool hardcoded heuristics.
  • Tree-sitter pre-write validation: every write / edit / apply_patch is parsed through the matching tree-sitter grammar before bytes hit disk. Syntactically-broken code is rejected with line/column-precise errors so the model corrects it on the same turn. Languages: Rust, TS/TSX, Python, Go, Ruby, Java, C, C++, Clojure, Bash (each gated on its semantic-<lang> feature). The error is enriched mechanically so the model never counts delimiters by hand: a missing token is named from the grammar ("missing }"), and for brace languages (Rust, C/C++, Go, Java, Python, Lisp) a comment/string/char-literal-aware delimiter-balance summary points at the exact unclosed (/{/[ or the unmatched closer.
  • Dynamic tool_search (opt-in via dynamic_tool_search: true): ships only tool_search + a small always-on set in each request; the model calls tool_search(query) to discover and load more tools on demand. ~30% token savings on MCP-heavy sessions.
  • Disk-backed large-output relay: bash / webfetch outputs over an inline budget (default 8 KiB) are written to ~/.dirge/transient/<pid>/<tool>-<ts>.txt and replaced with a head + ellipsis + tail summary plus a hint to read for specifics. Aged cleanup runs on every relay write.
  • Anthropic prompt-cache positioning: system prompt + tool defs sit at the start of every request (cache-warm prefix); a prompt_cache_prefix tracing event emits per-turn with stable hashes so unexpected prefix drift is observable.
  • Prefix-cache hit accounting (/cache): real provider-reported usage (cached_input_tokens / cache_creation_input_tokens, e.g. DeepSeek prompt_tokens_details.cached_tokens, Anthropic cache reads) is folded into a cumulative per-session counter; /cache prints the session's cumulative prefix-cache hit ratio. Since cheaper models (DeepSeek) bill cached input at a steep discount, the hit ratio is the headline cost lever — this is the instrument that tells you whether the cache-warm prefix is actually landing hits.
  • Dual-client tiering (escalation_provider role): when a tool input fails to repair OR generated code fails the tree-sitter pre-write check, the next model call is routed through a more capable provider. One-shot per failure, capped at 3 per session, surfaced as a dim ↑ escalating to <provider> status line.
  • Context-depth reminders (context_depth_reminder_threshold): tracks consecutive file-touching tool calls on the same file(s); when the streak crosses the threshold (default 8, opt-in), injects a single mid-turn reminder restating the active task + touched files so long runs don't drift.
  • Tool-loop circuit breaker + reflect-then-pivot: a per-tool-call repeat counter trips on the 3rd identical (tool, input) within a 32-call window — catching non-progressing loops without needing model cooperation. On the first trip the suppressed call is answered with a reflect-then-pivot intervention that makes the model diagnose what it tried, name the wrong assumption, and propose fundamentally different approaches (rather than "try again", which tends to reinforce the same failing chain). An in-session reflexion buffer accumulates every approach the model looped on and abandoned this run, and the guard re-surfaces the full list each time it fires — so the model is reminded of all prior dead ends, not just the latest repeat, and doesn't cycle back to one it already gave up on (Reflexion in miniature). When the loop nonetheless gives up (terminal stuck case), a storm-breaker graceful-failure narrative appends a short first-person assistant message naming the tool(s) it looped on and explaining that it stopped — so the user sees a coherent reply instead of an empty turn, and the model carries its own failure account into the next turn (the suppressed calls are still backfilled so history stays well-formed). See agent-loop.md.
  • Cross-turn failure recovery checkpoint: the repeat-loop guard above only catches a model repeating the same call. A separate tracker counts consecutive errored tool results across turns (reset by any success) and, at a streak of 3 distinct failures, injects one structured recovery checkpoint — quoting the recent errors and asking the model to name the shared root cause and take a different next step rather than tweak-and-retry. Re-arms every further 3 failures. When one tool dominates the streak it's named explicitly, pointing the model back at that tool's contract (tool-doc re-grounding, cf. arXiv:2510.17874). Threshold tuned low per the structured-reflection literature (arXiv:2509.18847), where corrective gains concentrate over the first few attempts.
  • "Did you mean?" feedback: actionable hints for the mistakes weaker models actually make. An unknown tool name gets a Levenshtein nearest-match suggestion (ehcoecho); a closed-set (enum) argument violation lists the valid values plus the nearest one; read on a mistyped path suggests the near-miss neighbour in the same directory (parserr.rsparser.rs).
  • Hard read-before-edit gate: a session-lifetime read-set tracks which files the agent has actually read; edit / apply_patch to a path that was never read is refused with a directive to read it first. Mechanical, not advisory — it can't be talked past — so the model never edits a file it's only guessing the contents of. (Ported from vix.)
  • Todo-completion nudge: when the agent tries to end its turn with pending items on its own todo list, a single nudge blocks the premature end_turn and restates the open items, so multi-step tasks aren't abandoned half-done. (Ported from vix.)
  • Mandatory reason/intent fields: the read/grep/glob/find/lsp tools require a short reason field and bash carries anti-misuse intent fields, forcing the model to state why before each navigation call — which both curbs aimless exploration and makes tool traces self-explaining. (Ported from vix.)
  • Thinking-stall watchdog: the request-timeout backstop additionally detects a stalled run and injects a summary-reinjection nudge for graceful recovery, restating the task + progress so a hung reasoning chain restarts instead of dying. (Ported from vix.)
  • Minified tree-sitter read/edit (read_minified / edit_minified): token-efficient file I/O that collapses a source file to its structural skeleton via tree-sitter — aggressive collapse for proven-safe grammars (Rust, Java, Go), gap-preserving collapse (whitespace verbatim, blank lines folded, comments stripped) for whitespace- or ASI-sensitive grammars (Bash, Python, Ruby, Elixir, C/C++, TS, Clojure). edit_minified maps a match in the minified view back to the exact original byte span and syntax-checks before writing. Each language gated on its semantic-<lang> feature; unsupported extensions fall back to a plain read. (Ported from vix.)
  • Phased plan workflow (/plan, opt-in via phased_workflow_enabled): an explicit per-task command that splits a complex request into context-isolated phases — explore (read-only fork) → plan (read-only fork, fed only the explore findings) → implement (a normal streamed turn you watch) → review (a write-disabled reviewer fork that runs the code and emits a machine-parsed verdict; NEEDS_FIX feeds a punch-list back for a bounded re-implement). Faithful to vix's model: a deliberate opt-in for big tasks, never forced — regular chat is unchanged. See agent-loop.md.

NOTE: Windows support is not tested, but feel free to try and open an issue if you encounter any bugs.

Performance

dirge is one of the smallest and most performant coding agents on the market.

  • Lines of code: ~100k LoC
  • Binary size: ~36 MB (the release profile is speed-optimized — opt-level=3 + fat LTO + strip; an opt-level="z" build is ~28 MB if you prefer size)
  • RAM footprint: ~8 MB on an empty session, ~15 MB when working (vs ~300 MB for opencode or other JS-based coding agents)

Tool result caching

Most tool calls (read, write, edit, bash, grep, find_files, list_dir) are cached per agent turn. Repeated calls with identical arguments within the same turn return cached results, avoiding redundant filesystem I/O. The cache clears automatically before each new prompt, and after write/edit/bash so a re-read sees fresh content.

Error recovery

Transient API errors (network, rate limits, Anthropic overloaded_error) are automatically retried with exponential backoff (1s → 2s → 4s, max 3 retries) plus 0–25% jitter so concurrent agents don't retry in lockstep. Auth and unknown errors surface immediately. Context-length errors are not retried — surface a /compress hint instead. Tokens stream live to the chat as they arrive; if a retry fires, the user sees an "(error: …; retrying)" banner and the next attempt's tokens stream in fresh. If any tool calls were already dispatched (side effects applied), the error is surfaced without retrying so a partial-but-applied turn isn't re-run.