README.md

June 10, 2026 · View on GitHub

uvx agentburn — animated demo: TL;DR verdict, burn bars by source, the overnight bill, and what to change

Hermes Agent · OpenClaw · Claude Code — one normalized core, local, read-only, zero dependencies

uvx agentburn

▶ Try it in your browser — no install

What is this, in plain words?

You run an AI assistant — OpenClaw, Hermes Agent, or Claude Code. It works around the clock: answers you in Telegram, runs scheduled jobs at night, spawns helper agents. Every word it reads and writes costs money — and the bill arrives as one number with no explanation.

agentburn is a free tool that reads your assistant's own diary (log files already sitting on your computer) and turns that number into answers:

You ask	It answers	Command
Where does the money actually go?	scheduled jobs 79% · chats 7% · helpers 5% · you 9%	`agentburn`
What happened while I slept?	the overnight bill, isolated and named	`agentburn$
\text{Why} \text{is} \text{it} \text{so} \text{expensive}?	\text{same} \text{file} \text{read} 14 \times , \text{a} \text{broken} \text{tool} \text{retried} 6 \times , \text{wake}-\text{ups} \text{that} \text{did} \text{nothing}	$agentburn why`
What did it do in Telegram?	every function it called there, with counts and errors	`agentburn why --source telegram`
What exactly do I change?	ready config lines + expected saving in dollars	`agentburn fix`
Am I paying for a dying model?	your spend vs the world's 4-week trend, with a cheaper rising alternative	`agentburn drift`
Is my setup even normal?	"your overhead is worse than ~75% of N setups"	`agentburn rank`
Can I just ask the assistant?	yes — install the skill/MCP and ask "where do you burn my money?"	`agentburn mcp`

No accounts. No cloud. Nothing leaves your computer. One command to try: uvx agentburn.

Why this exists

Always-on agents bill you around the clock — and their built-in counters only show totals. Real threads that made this tool:

"73% of every API call is fixed overhead — ~13.9K tokens of tool definitions and system prompt, resent every time." — hermes-agent #4379

"One entrant wrote about waking up to a $47 surprise bill from an overnight run — that's not an exotic failure, it's the default behavior of an unsupervised loop." — dev.to

"I've seen runs where step 3 costs 4× step 1 — no alert, just a bill." — comment, ibid.

agentburn reads the agent's own accounting data (read-only) and answers the question the totals never do: where.

What it answers

Where it burns — by source: cron / subagent / gateway:telegram|discord|whatsapp / cli. Always-on ≠ free: scheduled jobs and gateways spend without you.
🌙 While you slept — the overnight bill, isolated and named (configurable window: --night 23-7).
Fixed overhead — average input tokens per API call per source. The "73% overhead" pattern is visible in one glance; with request dumps enabled, you get the sampled composition (system prompt vs tool definitions vs history).
Subagent rollups — delegation cost chained back to the session that spawned it. Recursion compounds; here is the receipt.
Top tools — which tool results weigh most in your context.
What to do — up to 4 conservative, named recommendations with monthly estimates.

How it compares

	agentburn	ccusage	codeburn	built-in `/usage`
Burn by source (cron · heartbeat · gateways · subagents)	✅	—	—	% only, 7 days, this machine
🌙 the overnight bill, isolated	✅	—	—	—
Behavioral forensics (`why`: loops, retry storms, failed-run cost)	✅	—	—	—
Ready config patches (`fix`, source-verified keys)	✅	—	—	—
Accounting-gap detection (`doctor`, lower-bound honesty)	✅	—	—	—
MCP server (agent answers for its own bill)	✅	—	—	—
Totals / live blocks / many CLIs	basic	✅ best-in-class	✅ TUI, 25 providers	totals

As of June 2026; ccusage and codeburn are excellent at what they do — agentburn deliberately starts where they stop (ccusage scoped per-tool analysis out).

Why trust these numbers

Most token trackers quietly disagree with each other (2–91× in public issue threads). agentburn takes the opposite stance:

Numbers come from the agent's own accounting (~/.hermes/state.db: per-session token counters and cost fields). No scraping, no proxies, no guessing.
Provider-billed costs are shown as-is; Hermes estimates are marked with ~. Mixed data is labeled mixed.
Sessions with messages but zero recorded tokens (known Hermes accounting gaps, e.g. #12023) are detected and reported: totals are then explicitly a lower bound — and fixing the accounting becomes recommendation #1.
Input composition from request dumps is char-proportional and labeled sampled estimate, not truth.

Privacy

Everything runs locally and reads your database read-only. No network calls. No telemetry. The report is yours.

Usage

agentburn                        # every agent on this machine, last 30 days
agentburn --agent openclaw       # just one
agentburn --days 7
agentburn --agent hermes --db /path/to/state.db
agentburn why                    # behavioral forensics: loops, retry storms, idle heartbeats
agentburn why --source telegram  # decompose ONE source: functions called, errors, loops
agentburn --source cron          # cost report for one source only
agentburn explain --model llama3.1   # LLM reads the numbers back to you (local by default)
agentburn --night 23-7           # custom overnight window (local time)
agentburn --budget-month 50 --fail-over   # sentinel for cron/CI
agentburn --json                 # machine-readable, pipe it anywhere
agentburn --no-color

Mechanics

📤 Share your burn (--share). An anonymized card — categories, models and totals only; session titles, paths and content are excluded by construction. Safe to paste into a post; --svg card.svg renders the same card as an image:

🔥 my hermes agent · last 30d
~\$45.50 → ~\$430/mo pace · 1.75M tokens
where it burns: cron 79% · cli 9% · telegram 7% · subagent 5%
🌙 while I slept (00–08): ~\$36.00 — 79% of everything
⚙️ telegram re-sends 20,000 tokens with EVERY call — 2.5× the community norm (≈8k)
— agentburn · local & private

--svg card.svg renders it as an image:

sample burn card

📏 Calibration against public benchmarks. "Is 15k input tokens per call normal?" The report compares your fixed overhead with community-measured references embedded as dated constants (e.g. the Phala always-on-agent benchmark, 2026-03: ≈8k/call baseline). No network — sources are cited inline.

📐 Optimize → prove it (--save-baseline / --compare). Snapshot your pace, change the config (cheaper cron model, trimmed toolsets), then agentburn --compare shows the delta in $/month — pace-normalized, so a 7-day baseline compares honestly with a 30-day window. Every recommendation becomes a testable promise.

🔬 agentburn why — behavioral forensics. report says where it burns; why says why, from the agent's own recorded actions and thoughts:

🔬 agentburn why — openclaw · gateway:telegram

   WHAT IT ACTUALLY DID   browser 34× ≈210K in results · web_search 18× · shell 7× (2 errors)
   RE-READ LOOPS          5× browser(https://news.site/page) — every repeat re-paid in full
   RETRY STORMS           Bash: 3 errors / 6 calls — paying full price for every error
   IDLE HEARTBEATS        4 of 9 heartbeat runs did NOTHING — \$2.40 of pure idle burn
   BURNED ON FAILURES     2 failed runs → ~\$3.90 (timeout, killed)
   THINKS MORE THAN IT WORKS   62% thinking · 84K tokens · "rename files task"

   💡 WHAT TO CHANGE
   1. `/proj/big.md$ \text{was} \text{fetched} 4 \times  \text{in} \text{one} \text{session} ≈32\text{K} \text{tokens} \text{re}-\text{paid} — \text{cache} \text{it}…
$``

Observations with numbers, not verdicts; only tool names, truncated argument keys and counters — message content never leaves the machine (and never enters the report).

**🧠 `agentburn explain` — LLM interpretation, local-first.** The numbers, read back to you in plain language with ranked actions:

```bash
agentburn explain --model llama3.1                      # local ollama — nothing leaves the machine
agentburn explain --llm https://openrouter.ai/api/v1 \
  --model deepseek/deepseek-chat --yes-remote --lang ru # remote: explicit opt-in only

Privacy rules are hard-coded: the default endpoint is localhost (ollama / LM Studio); a remote endpoint requires --yes-remote and receives a redacted summary only — session titles become session-N, file paths shrink to basenames, message content is never in the payload to begin with. Works with any OpenAI-compatible API, zero new dependencies. (Yes — a cost profiler spending ~3K tokens to explain costs. The payload is compact and the answer capped; the irony is acknowledged.)

🧭 agentburn drift — your spend × the world's direction. Are you paying for a model the world is leaving?

🧭 agentburn drift

   YOUR MODELS vs THE WORLD (4-week world trend)
   anthropic/claude-opus-4.6        ~\$341/mo    world -41% ⬊
   deepseek/deepseek-v3.2            ~\$12/mo    world +12% →

   💡 DRIFT ALERTS
   1. claude-opus-4.6: you spend ~\$341/mo; world usage -41% in 4 weeks — the world
      is leaving this model. Rising alternative step-3.5-flash (+180%) is ~98% cheaper.

Your side is computed locally from the agents' own logs; the world side is one read-only GET of token-history's public trend JSON (archived daily from OpenRouter's rankings — deep history unlocks as the archive grows). Nothing about you is sent anywhere; --trends FILE works fully offline. Nobody else joins these two halves.

🩺 agentburn why additions: CRON RUNS — the per-run receipt for every scheduled job (what openclaw #24636 keeps asking for), and CONTEXT THRASH — compactions counted per session, because every compaction silently re-sends a near-full context window.

📊 agentburn rank + --submit — the Burn Index. Anonymous community percentiles of efficiency — the benchmark volume-leaderboards can't be: nothing here rewards burning more.

📊 agentburn rank — you vs the Burn Index

   input tokens per call · cli              you:    15,000   median:    6,200   worse than ~75% of 41
   share of spend at night                  you:     79.0%   median:    12.0%   worse than ~90% of 41
   cache-read share of input volume         you:     61.0%   median:    44.0%   better than ~75% of 41

Joining is consent-by-click: agentburn --submit prints the exact anonymized payload (ratios and a coarse spend band — never raw volumes, titles or paths), then a prefilled GitHub-issue link that you open and submit. A weekly Action aggregates submissions with plausibility bounds (junk and flexing get dropped, not ranked) into public quantiles.

🔧 agentburn fix — from findings to ready config patches (dry-run by design). Not "consider a cheaper model" but the exact file and the exact lines:

🔧 agentburn fix — hermes · DRY-RUN (nothing was changed)

   1. Point Hermes cron jobs at a cheap model
      file   : ~/.hermes/cron/jobs.json
      why    : cron is 79% of spend; maintenance rarely needs a frontier model.
      effect : bulk of ≈\$341/mo moves to cheap-model pricing
      proposed:
        "nightly digest": "model": "deepseek/deepseek-chat"
      ⓘ field verified in hermes-agent cron/jobs.py: per-job `model` override

Patch generators exist only for config keys verified against the agents' source code (Hermes cron/jobs.json, OpenClaw agents.defaults.heartbeat incl. activeHours — the night-burn killer — and lightContext). There is no --apply on purpose: paste it yourself, then prove the saving with --save-baseline → --compare.

🔌 agentburn mcp — your agent answers for its own bill. A zero-dependency MCP stdio server exposing burn_report / burn_why / burn_card. Register it and ask the agent "where do you burn my money?" — it calls the profiler on its own database and explains:

# Claude Code
claude mcp add agentburn -- agentburn mcp
# Hermes / OpenClaw: add an stdio MCP server with command `agentburn mcp`

Prefer skills? There's a ready SKILL.md — drop it into ~/.hermes/skills/agentburn/, ~/.openclaw/skills/agentburn/ or ~/.claude/skills/agentburn/ and just ask the agent "where do you burn my money?".

🩺 agentburn doctor. Trackers disagree because the agent's own accounting has gaps. doctor names the broken combinations (provider × model × source) for zero-usage and unpriced sessions, and generates a ready-to-paste upstream bug report — counters only, no message content.

🚨 Sentinel mode — a budget guard for server agents. Your agent runs 24/7 on a VPS; this watches it:

# alert when overnight burn exceeds \$5/month pace (exit code 1 → any alerting hooks in)
agentburn --agent openclaw --budget-night 5 --fail-over --no-color \
  || notify-send "🚨 agent is burning money at night"

Drop it in cron next to the agent itself — the one-off check becomes a standing guard.

Supported agents

One normalized model, one adapter per agent. Run agentburn and every agent found on the machine gets its own report.

Agent	Status	Data source	Notes
Hermes Agent	✅	`~/.hermes/state.db` (+ optional request dumps)	costs from the agent's own accounting
OpenClaw	✅	`~/.openclaw/agents/*/sessions/sessions.json`	heartbeat is its own category — the famous one; cron / gateways / subagents split out
Claude Code	✅	`~/.claude/projects/**.jsonl`	tokens only, by design: CC doesn't record costs locally and subscription usage has no honest per-token price — we don't invent one

Adapters are ~150 lines over a shared model. Codex CLI / opencode are natural next targets — PRs welcome.

architecture: agent data → adapters → normalized model → report/why/fix/explain/doctor/mcp

token-history — the macro view: daily archive of which agents the world uses (OpenRouter rankings). agentburn is the micro view: where yours burns.

License

MIT

_{mcp-name: io.github.Socialpranker/agentburn}

the token-* family · token-history — which agents the world runs · agentburn — where yours burns

if this saved you a dinner's worth of tokens, a ⭐ helps the next person find it