README.md

June 10, 2026 · View on GitHub

agentburn — where does your AI agent burn money, while you sleep?

PyPI Python zero deps offline checks MIT



uvx agentburn — animated demo: TL;DR verdict, burn bars by source, the overnight bill, and what to change

Hermes Agent · OpenClaw · Claude Code — one normalized core, local, read-only, zero dependencies

uvx agentburn

▶  Try it in your browser — no install


What is this, in plain words?

You run an AI assistant — OpenClaw, Hermes Agent, or Claude Code. It works around the clock: answers you in Telegram, runs scheduled jobs at night, spawns helper agents. Every word it reads and writes costs money — and the bill arrives as one number with no explanation.

agentburn is a free tool that reads your assistant's own diary (log files already sitting on your computer) and turns that number into answers:

You askIt answersCommand
Where does the money actually go?scheduled jobs 79% · chats 7% · helpers 5% · you 9%agentburn
What happened while I slept?the overnight bill, isolated and named`agentburn$
\text{Why} \text{is} \text{it} \text{so} \text{expensive}?\text{same} \text{file} \text{read} 14 \times , \text{a} \text{broken} \text{tool} \text{retried} 6 \times , \text{wake}-\text{ups} \text{that} \text{did} \text{nothing}$agentburn why`
What did it do in Telegram?every function it called there, with counts and errorsagentburn why --source telegram
What exactly do I change?ready config lines + expected saving in dollarsagentburn fix
Am I paying for a dying model?your spend vs the world's 4-week trend, with a cheaper rising alternativeagentburn drift
Is my setup even normal?"your overhead is worse than ~75% of N setups"agentburn rank
Can I just ask the assistant?yes — install the skill/MCP and ask "where do you burn my money?"agentburn mcp

No accounts. No cloud. Nothing leaves your computer. One command to try: uvx agentburn.

Why this exists

Always-on agents bill you around the clock — and their built-in counters only show totals. Real threads that made this tool:

"73% of every API call is fixed overhead — ~13.9K tokens of tool definitions and system prompt, resent every time."hermes-agent #4379

"One entrant wrote about waking up to a $47 surprise bill from an overnight run — that's not an exotic failure, it's the default behavior of an unsupervised loop."dev.to

"I've seen runs where step 3 costs 4× step 1 — no alert, just a bill."comment, ibid.

agentburn reads the agent's own accounting data (read-only) and answers the question the totals never do: where.

What it answers

  • Where it burns — by source: cron / subagent / gateway:telegram|discord|whatsapp / cli. Always-on ≠ free: scheduled jobs and gateways spend without you.
  • 🌙 While you slept — the overnight bill, isolated and named (configurable window: --night 23-7).
  • Fixed overhead — average input tokens per API call per source. The "73% overhead" pattern is visible in one glance; with request dumps enabled, you get the sampled composition (system prompt vs tool definitions vs history).
  • Subagent rollups — delegation cost chained back to the session that spawned it. Recursion compounds; here is the receipt.
  • Top tools — which tool results weigh most in your context.
  • What to do — up to 4 conservative, named recommendations with monthly estimates.

How it compares

agentburnccusagecodeburnbuilt-in /usage
Burn by source (cron · heartbeat · gateways · subagents)% only, 7 days, this machine
🌙 the overnight bill, isolated
Behavioral forensics (why: loops, retry storms, failed-run cost)
Ready config patches (fix, source-verified keys)
Accounting-gap detection (doctor, lower-bound honesty)
MCP server (agent answers for its own bill)
Totals / live blocks / many CLIsbasic✅ best-in-class✅ TUI, 25 providerstotals

As of June 2026; ccusage and codeburn are excellent at what they do — agentburn deliberately starts where they stop (ccusage scoped per-tool analysis out).

Why trust these numbers

Most token trackers quietly disagree with each other (2–91× in public issue threads). agentburn takes the opposite stance:

  • Numbers come from the agent's own accounting (~/.hermes/state.db: per-session token counters and cost fields). No scraping, no proxies, no guessing.
  • Provider-billed costs are shown as-is; Hermes estimates are marked with ~. Mixed data is labeled mixed.
  • Sessions with messages but zero recorded tokens (known Hermes accounting gaps, e.g. #12023) are detected and reported: totals are then explicitly a lower bound — and fixing the accounting becomes recommendation #1.
  • Input composition from request dumps is char-proportional and labeled sampled estimate, not truth.

Privacy

Everything runs locally and reads your database read-only. No network calls. No telemetry. The report is yours.

Usage

agentburn                        # every agent on this machine, last 30 days
agentburn --agent openclaw       # just one
agentburn --days 7
agentburn --agent hermes --db /path/to/state.db
agentburn why                    # behavioral forensics: loops, retry storms, idle heartbeats
agentburn why --source telegram  # decompose ONE source: functions called, errors, loops
agentburn --source cron          # cost report for one source only
agentburn explain --model llama3.1   # LLM reads the numbers back to you (local by default)
agentburn --night 23-7           # custom overnight window (local time)
agentburn --budget-month 50 --fail-over   # sentinel for cron/CI
agentburn --json                 # machine-readable, pipe it anywhere
agentburn --no-color

Mechanics

📤 Share your burn (--share). An anonymized card — categories, models and totals only; session titles, paths and content are excluded by construction. Safe to paste into a post; --svg card.svg renders the same card as an image:

🔥 my hermes agent · last 30d
~\$45.50 → ~\$430/mo pace · 1.75M tokens
where it burns: cron 79% · cli 9% · telegram 7% · subagent 5%
🌙 while I slept (00–08): ~\$36.00 — 79% of everything
⚙️ telegram re-sends 20,000 tokens with EVERY call — 2.5× the community norm (≈8k)
— agentburn · local & private

--svg card.svg renders it as an image:

sample burn card

📏 Calibration against public benchmarks. "Is 15k input tokens per call normal?" The report compares your fixed overhead with community-measured references embedded as dated constants (e.g. the Phala always-on-agent benchmark, 2026-03: ≈8k/call baseline). No network — sources are cited inline.

📐 Optimize → prove it (--save-baseline / --compare). Snapshot your pace, change the config (cheaper cron model, trimmed toolsets), then agentburn --compare shows the delta in $/month — pace-normalized, so a 7-day baseline compares honestly with a 30-day window. Every recommendation becomes a testable promise.

🔬 agentburn why — behavioral forensics. report says where it burns; why says why, from the agent's own recorded actions and thoughts:

🔬 agentburn why — openclaw · gateway:telegram

   WHAT IT ACTUALLY DID   browser 34× ≈210K in results · web_search 18× · shell 7× (2 errors)
   RE-READ LOOPS          5× browser(https://news.site/page) — every repeat re-paid in full
   RETRY STORMS           Bash: 3 errors / 6 calls — paying full price for every error
   IDLE HEARTBEATS        4 of 9 heartbeat runs did NOTHING — \$2.40 of pure idle burn
   BURNED ON FAILURES     2 failed runs → ~\$3.90 (timeout, killed)
   THINKS MORE THAN IT WORKS   62% thinking · 84K tokens · "rename files task"

   💡 WHAT TO CHANGE
   1. `/proj/big.md$ \text{was} \text{fetched} 4 \times  \text{in} \text{one} \text{session} ≈32\text{K} \text{tokens} \text{re}-\text{paid} — \text{cache} \text{it}…
$``

Observations with numbers, not verdicts; only tool names, truncated argument keys and counters — message content never leaves the machine (and never enters the report).

**🧠 `agentburn explain` — LLM interpretation, local-first.** The numbers, read back to you in plain language with ranked actions:

```bash
agentburn explain --model llama3.1                      # local ollama — nothing leaves the machine
agentburn explain --llm https://openrouter.ai/api/v1 \
  --model deepseek/deepseek-chat --yes-remote --lang ru # remote: explicit opt-in only

Privacy rules are hard-coded: the default endpoint is localhost (ollama / LM Studio); a remote endpoint requires --yes-remote and receives a redacted summary only — session titles become session-N, file paths shrink to basenames, message content is never in the payload to begin with. Works with any OpenAI-compatible API, zero new dependencies. (Yes — a cost profiler spending ~3K tokens to explain costs. The payload is compact and the answer capped; the irony is acknowledged.)

🧭 agentburn drift — your spend × the world's direction. Are you paying for a model the world is leaving?

🧭 agentburn drift

   YOUR MODELS vs THE WORLD (4-week world trend)
   anthropic/claude-opus-4.6        ~\$341/mo    world -41% ⬊
   deepseek/deepseek-v3.2            ~\$12/mo    world +12% →

   💡 DRIFT ALERTS
   1. claude-opus-4.6: you spend ~\$341/mo; world usage -41% in 4 weeks — the world
      is leaving this model. Rising alternative step-3.5-flash (+180%) is ~98% cheaper.

Your side is computed locally from the agents' own logs; the world side is one read-only GET of token-history's public trend JSON (archived daily from OpenRouter's rankings — deep history unlocks as the archive grows). Nothing about you is sent anywhere; --trends FILE works fully offline. Nobody else joins these two halves.

🩺 agentburn why additions: CRON RUNS — the per-run receipt for every scheduled job (what openclaw #24636 keeps asking for), and CONTEXT THRASH — compactions counted per session, because every compaction silently re-sends a near-full context window.

📊 agentburn rank + --submit — the Burn Index. Anonymous community percentiles of efficiency — the benchmark volume-leaderboards can't be: nothing here rewards burning more.

📊 agentburn rank — you vs the Burn Index

   input tokens per call · cli              you:    15,000   median:    6,200   worse than ~75% of 41
   share of spend at night                  you:     79.0%   median:    12.0%   worse than ~90% of 41
   cache-read share of input volume         you:     61.0%   median:    44.0%   better than ~75% of 41

Joining is consent-by-click: agentburn --submit prints the exact anonymized payload (ratios and a coarse spend band — never raw volumes, titles or paths), then a prefilled GitHub-issue link that you open and submit. A weekly Action aggregates submissions with plausibility bounds (junk and flexing get dropped, not ranked) into public quantiles.

🔧 agentburn fix — from findings to ready config patches (dry-run by design). Not "consider a cheaper model" but the exact file and the exact lines:

🔧 agentburn fix — hermes · DRY-RUN (nothing was changed)

   1. Point Hermes cron jobs at a cheap model
      file   : ~/.hermes/cron/jobs.json
      why    : cron is 79% of spend; maintenance rarely needs a frontier model.
      effect : bulk of ≈\$341/mo moves to cheap-model pricing
      proposed:
        "nightly digest": "model": "deepseek/deepseek-chat"
      ⓘ field verified in hermes-agent cron/jobs.py: per-job `model` override

Patch generators exist only for config keys verified against the agents' source code (Hermes cron/jobs.json, OpenClaw agents.defaults.heartbeat incl. activeHours — the night-burn killer — and lightContext). There is no --apply on purpose: paste it yourself, then prove the saving with --save-baseline--compare.

🔌 agentburn mcp — your agent answers for its own bill. A zero-dependency MCP stdio server exposing burn_report / burn_why / burn_card. Register it and ask the agent "where do you burn my money?" — it calls the profiler on its own database and explains:

# Claude Code
claude mcp add agentburn -- agentburn mcp
# Hermes / OpenClaw: add an stdio MCP server with command `agentburn mcp`

Prefer skills? There's a ready SKILL.md — drop it into ~/.hermes/skills/agentburn/, ~/.openclaw/skills/agentburn/ or ~/.claude/skills/agentburn/ and just ask the agent "where do you burn my money?".

🩺 agentburn doctor. Trackers disagree because the agent's own accounting has gaps. doctor names the broken combinations (provider × model × source) for zero-usage and unpriced sessions, and generates a ready-to-paste upstream bug report — counters only, no message content.

🚨 Sentinel mode — a budget guard for server agents. Your agent runs 24/7 on a VPS; this watches it:

# alert when overnight burn exceeds \$5/month pace (exit code 1 → any alerting hooks in)
agentburn --agent openclaw --budget-night 5 --fail-over --no-color \
  || notify-send "🚨 agent is burning money at night"

Drop it in cron next to the agent itself — the one-off check becomes a standing guard.

Supported agents

One normalized model, one adapter per agent. Run agentburn and every agent found on the machine gets its own report.

AgentStatusData sourceNotes
Hermes Agent~/.hermes/state.db (+ optional request dumps)costs from the agent's own accounting
OpenClaw~/.openclaw/agents/*/sessions/sessions.jsonheartbeat is its own category — the famous one; cron / gateways / subagents split out
Claude Code~/.claude/projects/**.jsonltokens only, by design: CC doesn't record costs locally and subscription usage has no honest per-token price — we don't invent one

Adapters are ~150 lines over a shared model. Codex CLI / opencode are natural next targets — PRs welcome.

architecture: agent data → adapters → normalized model → report/why/fix/explain/doctor/mcp

token-history — the macro view: daily archive of which agents the world uses (OpenRouter rankings). agentburn is the micro view: where yours burns.

License

MIT

mcp-name: io.github.Socialpranker/agentburn


the token-* family · token-history — which agents the world runs · agentburn — where yours burns

if this saved you a dinner's worth of tokens, a ⭐ helps the next person find it