README.md

June 24, 2026 ยท View on GitHub

Token Optimizer

Latest stable version Last Release Claude Code Plugin OpenClaw version OpenCode version Codex supported

Cuts context waste Survives compaction Saves real dollars every session Live dashboard Live context quality score Tests passing

Zero Dependencies Zero Telemetry Python 3.9+ Platform License: PolyForm Noncommercial GitHub Stars Commit Activity Connect on LinkedIn

Support this project to keep it open source

Cut the tokens you waste. Keep the work you'd lose.

It runs automatically in the background. You keep working as usual. Run the audit when you want the full picture.

Read the documentation Quickstart in 2 minutes

The 30-Second Version

Token Optimizer cuts the tokens your AI coding assistant wastes, keeps your work alive across sessions and compactions, and shows you where every dollar went on a live dashboard. Most of it runs automatically. You install it, run the audit once, and the hooks do the rest.

Why not just use Headroom or RTK? They compress command output, which covers 15-25% of your context. Token Optimizer covers that plus the other 75%: bloated configs, unused skills, stale memory, compaction loss, model misrouting, behavioral waste. Every saving is cache-safe and measured. The dashboard updates after every session, automatically.

Works on Claude Code (CLI and VS Code), OpenCode, OpenClaw, Codex, Hermes, and GitHub Copilot (beta). Windsurf and Cursor are next on the roadmap.

Token Optimizer Quick Scan

Install

Claude Code (recommended):

/plugin marketplace add alexgreensh/token-optimizer
/plugin install token-optimizer@alexgreensh-token-optimizer

Then in Claude Code: /token-optimizer

Enable auto-update after installing. Claude Code ships third-party marketplaces with auto-update off by default. /plugin โ†’ Marketplaces tab โ†’ select alexgreensh-token-optimizer โ†’ Enable auto-update. One-time, 10 seconds.

After install, run /token-optimizer once to set up hooks. From there, everything runs automatically: compression, checkpoints, quality scoring, dashboard updates. You don't need to run any command again unless you want an audit.

Other platforms and install methods

Codex:

codex plugin marketplace add alexgreensh/token-optimizer

Then in the Codex TUI: /plugins and install Token Optimizer. See docs/codex.md.

OpenCode:

opencode plugin token-optimizer-opencode

See opencode/README.md.

OpenClaw:

openclaw plugins install github:alexgreensh/token-optimizer

See openclaw/README.md.

Hermes:

git clone https://github.com/alexgreensh/token-optimizer.git
token-optimizer/install.sh --hermes

See hermes/README.md.

GitHub Copilot (beta):

git clone --depth 1 https://github.com/alexgreensh/token-optimizer.git
cd token-optimizer
bash install.sh --copilot

See docs/copilot.md.

macOS/Linux script install (alternative to plugin):

tmp="$(mktemp -d)"
release_json="$(curl -fsSL https://api.github.com/repos/alexgreensh/token-optimizer/releases/latest)"
tag="$(python3 -c 'import json,sys; print(json.load(sys.stdin)["tag_name"])' <<<"$release_json")"
git clone --branch "$tag" --depth 1 https://github.com/alexgreensh/token-optimizer.git ~/.claude/token-optimizer
bash ~/.claude/token-optimizer/install.sh
rm -rf "$tmp"

Windows users: Use the plugin install only. Do not run install.sh on Windows. If you hit EBUSY errors, close all Claude Code and Git Bash windows, kill lingering git.exe processes, delete C:\Users\<you>\.claude\token-optimizer and C:\Users\<you>\.claude\plugins\marketplaces\alexgreensh-token-optimizer, then retry.

What You Get

Runs automatically, every session, you do nothing:

  • ๐Ÿ”„ Smart Compaction: checkpoints before auto-compact, restores after
  • ๐Ÿ—„๏ธ Session Continuity: cross-session hints, cold-resume, checkpoint scoring
  • ๐Ÿ“ฆ Active Compression: 9 features, all on by default (delta diffs, skeletons, bash/search compression, lean-output nudges, quality nudges, loop detection, activity mode, decision extraction)
  • ๐Ÿ“Š Quality Scoring: 7 signals, real-time, letter grades Sโ€“F
  • ๐Ÿ—ƒ๏ธ Session Database: SQLite, 15 tables, full audit trail, zero network
  • ๐Ÿ” Progressive Disclosure: large outputs archived, expand on demand
  • ๐Ÿง  Context Intel Digest: post-compaction re-orientation without re-reads
  • ๐Ÿ”€ Model Routing Nudges: steers to the right tier for the task

When you ask for it:

  • ๐Ÿฉบ /token-optimizer: full audit with guided fixes
  • ๐Ÿ“ˆ /token-coach: 30-day trend analysis with specific fixes
  • โšก quick: 10-second health check
  • ๐Ÿ”ง doctor: installation check
  • ๐Ÿ’ฐ savings: dollar savings report
  • ๐Ÿ“‹ report: per-component token breakdown
  • ๐ŸŒ dashboard: open the full dashboard
  • ๐Ÿ“ memory-review: MEMORY.md structural audit
  • ๐Ÿ“‚ expand: retrieve archived tool result
  • ๐Ÿ”™ resume-lean: reopen a cold session

Install, run /token-optimizer once, everything else runs automatically.

How It's Different

Most token tools compress command output. That covers 15-25% of your context. The other 75% goes untouched.

Compression coverage. Headroom and RTK compress bash and command output. Token Optimizer compresses eight surfaces of the output stack. Status: ๐ŸŸข supported, ๐ŸŸก partial, ๐Ÿ”ด not supported.

Compression surfaceToken OptimizerHeadroomRTK
Bash / command output (git, tests, lint, build, logs)๐ŸŸข 60+ patterns, credential-safe; 564 โ†’ 115 tokens on a pytest run๐ŸŸข 6 algorithms๐ŸŸข 100+ filters
Search / grep output๐ŸŸข Top hits plus a count; 500 lines โ†’ 20๐Ÿ”ด๐Ÿ”ด
Tabular / JSON output (jq, yq, csvtool, mlr)๐ŸŸข Value-preserving columnar๐ŸŸข SmartCrusher๐Ÿ”ด
File re-reads, delta mode๐ŸŸข Diff only; 2,000-token re-read โ†’ ~50๐Ÿ”ด๐Ÿ”ด
File re-reads, structure map๐ŸŸข Skeleton of signatures and imports; 720KB โ†’ 250 tokens๐Ÿ”ด๐Ÿ”ด
Large tool results (over 4K chars)๐ŸŸข Archived to disk, expandable on demand๐Ÿ”ด๐Ÿ”ด
Model output verbosity๐ŸŸข 10-15% typical, up to 30-41% measured, cache-safe๐Ÿ”ด๐Ÿ”ด
Structural context (configs, skills, MCP, memory)๐ŸŸข Per-component audit, each source scored๐Ÿ”ด๐Ÿ”ด

RTK reaches the first surface. Headroom reaches the first and the third. Token Optimizer covers all eight, then keeps going into what happens around compression:

  • Three kinds of waste, not one. Structural (bloated configs, unused skills, stale memory), runtime (verbose output, re-reads), and behavioral (model misrouting, cache expiry, retry loops). How each works โ†’
  • Savings survive compaction. Checkpoints before auto-compact, restores after. Without this, compression savings vanish the moment compaction fires.
  • Measures whether it helped. Before/after token deltas, dollar savings across four pricing tiers, quality scores that track degradation. Not just "we compressed stuff."
  • Zero baseline overhead. External process, no always-on instructions in your context, no MCP server, no dependencies, no telemetry.

How Token Optimizer works automatically every session

Token OptimizerHeadroomRTKcontext-mode/context
Compaction survival๐ŸŸข Progressive checkpoints, restore, tool output digest๐Ÿ”ด๐Ÿ”ด๐ŸŸก Session guide only๐Ÿ”ด
Session continuity๐ŸŸข Cross-session hints, cold-resume, checkpoint scoring๐Ÿ”ด๐Ÿ”ด๐ŸŸก Session guide๐Ÿ”ด
Model routing and behavioral coaching๐ŸŸข 11 detectors, subagent cost breakdown, anti-patterns๐Ÿ”ด๐Ÿ”ด๐Ÿ”ด๐ŸŸก Basic suggestions
Keep-Warm (cache TTL refresh)๐ŸŸข Opt-in ping before cache expiry, tripwire auto-off๐Ÿ”ด๐Ÿ”ด๐Ÿ”ด๐Ÿ”ด
Historical trend analysis๐ŸŸข 30-day trends, quality/cost/cache/duration correlation, model-switch detection๐Ÿ”ด๐Ÿ”ด๐Ÿ”ด๐Ÿ”ด
Loop and spin detection๐ŸŸข Catches behavioral loops before they burn๐Ÿ”ด๐Ÿ”ด๐Ÿ”ด๐Ÿ”ด
Context quality scoring๐ŸŸข 7-signal quality score with grades๐Ÿ”ด๐Ÿ”ด๐Ÿ”ด๐ŸŸก Capacity % only
Structural waste audit๐ŸŸข Deep per-component (CLAUDE.md, skills, MCP, memory)๐Ÿ”ด๐Ÿ”ด๐Ÿ”ด๐ŸŸก Summary only
CLAUDE.md and MEMORY.md health๐ŸŸข 8 auditors + attention-curve scoring๐Ÿ”ด๐Ÿ”ด๐Ÿ”ด๐Ÿ”ด
Measures if compression helped๐ŸŸข Local telemetry, before/after tokens, dollar savings๐Ÿ”ด๐ŸŸก rtk gain (token counts only)๐Ÿ”ด๐Ÿ”ด
Fleet-level cross-agent analysis๐ŸŸข๐Ÿ”ด๐Ÿ”ด๐Ÿ”ด๐Ÿ”ด
Cache-safe๐ŸŸข Never modifies existing context prefix๐ŸŸก Proxy mode rewrites in-flight๐ŸŸข Pre-shell only๐ŸŸก MCP overhead๐ŸŸข
Zero baseline context overhead๐ŸŸข External process, no context injection๐Ÿ”ด Injects instructions๐ŸŸข Shell-level only๐Ÿ”ด MCP server overhead๐ŸŸข Native
Zero runtime dependencies๐ŸŸข Pure stdlib (Python/TypeScript)๐ŸŸก Python + Rust + optional model๐ŸŸข Single Rust binary๐ŸŸก SQLite adapter required๐ŸŸข N/A
Zero telemetry๐ŸŸข๐ŸŸข๐ŸŸก Opt-in๐ŸŸก Varies๐ŸŸข
Multi-platform๐ŸŸข Claude Code, VS Code, Codex, OpenClaw, OpenCode, Hermes, Copilot๐ŸŸข Claude Code, Cursor, Codex, Aider, Copilot๐ŸŸข 14 integrations๐ŸŸข 15 integrations๐Ÿ”ด Claude Code only

Every claim is tested against real sessions and a 57-fixture compression suite you can run yourself. Full benchmark methodology and results โ†’

The Dashboard

Token Optimizer Dashboard

One HTML page, auto-regenerates after every session via the SessionEnd hook, no manual trigger needed. Bookmark http://localhost:24842/token-optimizer and it's always current.

Per-turn token breakdowns, cost across four pricing tiers, cache analysis with TTL mix and hit rate, quality scores overlaid on every session, subagent cost breakdown, savings tracker with four non-overlapping pools. Zero setup after install. Full dashboard docs โ†’

What It Saves

Savings come from four non-overlapping pools, tracked in two tiers:

PoolWhat it covers
Model routing + cachingLeaner prefix, lighter model mix, cache-write as a routing lever
Subagent routingSidechain cost optimization (Claude Code only)
Compression add-backTokens removed by delta mode, structure map, bash/search compression
Lean-output add-backOutput tokens never produced due to conciseness nudges

Two numbers, kept separate:

  • Counted (~$313/mo), logged action by action. Every time Token Optimizer swapped in a lighter model, trimmed a bulky result, or skipped a repeat read, it added it up: smarter habits ~$260/mo, while-you-work compression ~$53/mo. This is the slice metered event by event, so it is smaller and exact.
  • Big picture (~$1,877/mo, ~18%), the full counterfactual. Had you worked the way you did before Token Optimizer (~95% Opus), you would have paid about ~$10,585/mo versus $8,708/mo now. The gap is mostly a lighter model mix (95% Opus down to 60%, $1,076/mo for main routing + caching), plus cheaper subagents ($741/mo) and the metered compression add-back ($60/mo).

These numbers are never summed. Counted is the floor with hard receipts. Big picture is a model priced against your frozen pre-Token-Optimizer baseline. See the full methodology โ†’

30-day savings report: ~\$313 counted, ~\$1,877 big picture

Based on 684 sessions over 30 days (snapshot ending 2026-06-15), priced against a frozen pre-Token-Optimizer baseline (~95% Opus). Your number is your own. See the methodology โ†’

What happens inside a 1M session

Active Compression

Nine features that actively reduce context, all on by default, all automatic, all toggleable from the dashboard or CLI.

Under the hood, PreToolUse hooks intercept every Read and Bash call before it enters your context. If the file was already read, only the diff comes back. If it's a code file re-read, a structural skeleton replaces the full content. If it's a CLI command, the output is compressed. PostToolUse hooks archive the full original to disk and log a compression event to SQLite. Nothing is lost, and everything is retrievable. You do nothing. The hooks handle it all.

Active Compression overview

FeatureWhat it doesSavings
Delta ModeRe-reads return only what changed~20% on re-reads
Structure MapUnchanged file re-reads return a structural skeleton~30% (up to 99% per file)
Bash CompressionCLI output condensed to essentials~10%
Search Compressiongrep/web results condensed to top hits + counts~15%
Lean-Output NudgesSteers model to concise output when context fills10-15% typical, up to 30-41% output reduction
Quality NudgesWarns when context quality dropsPrevents compaction loss
Loop DetectionCatches retry loops before they burn tokensMeasured per loop
Activity ModeAdapts compaction to your session phasePrevents decision loss
Decision ExtractionPreserves decisions across compactionsPrevents decision drift

Toggle from the dashboard Manage tab, CLI (measure.py v5 enable|disable <feature>), or env vars. The v5 verb is a legacy command name that controls current features.

Read how each feature works โ†’

Per-feature details

Delta Mode

When the AI re-reads a file after editing it, the Read call returns only the diff. 65%+ of Read calls in real sessions are re-reads. A 2,000-token file re-read becomes a 50-token diff.

Delta Mode: smart re-reads

Disable: TOKEN_OPTIMIZER_READ_CACHE_DELTA=0

Structure Map

When Claude re-reads a code file it already saw, the Read call is blocked and replaced with a structural summary: function signatures, class hierarchies, imports. A 720KB Python file (180,000 tokens) becomes a 250-token skeleton.

Disable: TOKEN_OPTIMIZER_READ_CACHE_MODE=shadow

Bash Output Compression

Rewrites common CLI commands to return compressed summaries. Covers lint, log tails, tree, docker pull, long listings, build output, and test runners. A 564-token pytest output becomes 115 tokens.

Bash Output Compression

Disable: TOKEN_OPTIMIZER_BASH_COMPRESS=0

Search Result Compression

When the AI runs grep, rg, or web searches that return long result lists, the output is condensed to the top hits plus a count. A 500-line grep result becomes 20 lines plus a summary.

Disable: TOKEN_OPTIMIZER_BASH_COMPRESS_SEARCH=0

Lean-Output Nudges

When context fills past 25% and quality drops, a short nudge tells the model to reason deeply but keep visible output lean. Live A/B testing showed a 10-15% typical reduction in output tokens, up to 30-41%, on real prompts. Cache-safe: injected as additionalContext, never modifies the existing prefix.

Disable: TOKEN_OPTIMIZER_VERBOSITY_STEER=0

Quality Nudges

Watches context quality in real time. When the score drops 15+ points or crosses below 60, an inline note enters the context. Claude sees it on the next turn and surfaces the warning or adjusts behavior. Cooldown of 5 minutes, max 3 per session.

Disable: TOKEN_OPTIMIZER_QUALITY_NUDGES=0

Loop Detection

Catches the AI getting stuck on a retry loop. Compares the last 4 user messages and last 5 tool results for similarity. Fires at confidence โ‰ฅ0.7, session cap of 2 notes. Savings measured from actual loop turn content.

Disable: TOKEN_OPTIMIZER_LOOP_DETECTION=0

Activity Mode Detection

Classifies your session into one of five modes (code, debug, review, infra, general) using the last 10 tool calls. The mode feeds into compaction guidance so PRESERVE/DROP priorities adapt to what you're doing.

Decision Extraction

Detects decision statements in real-time from tool outputs and stores them in the session database. At compaction time, these decisions are injected as CRITICAL DECISIONS that the compaction summary must preserve verbatim. Capped at 10 per session.

Measuring real savings

All compression features log to a local SQLite table. Nothing leaves your machine.

python3 measure.py compression-stats --days 30

The Session Database

Everything Token Optimizer does is backed by two local SQLite databases. Nothing leaves your machine. Zero network calls.

Session database flow: tool calls compressed, archived, logged to SQLite, retrievable

Per-session DB (~/.claude/token-optimizer/snapshots/session-store/<session>.db) holds 8 tables tracking file reads, tool outputs, command outputs, cached content, context intel events, activity log, decision log, and hint serves. WAL mode for concurrent read/write from hook processes. 50MB cap per session.

Trends DB (~/.claude/token-optimizer/snapshots/trends.db) holds 7 tables tracking session history, daily aggregates, skill/model/subagent usage, savings events, and compression events. Indexed by session UUID for O(log n) joins. This is what powers the dashboard, coach mode, and 30-day trend analysis.

Every compression event, every saving, every quality measurement is a row you can query. The dashboard is a read-only view of this data. measure.py compression-stats is a SQL query. Your data is your data.

Progressive Disclosure

Large tool results (>4KB) are archived to disk and replaced with a short preview plus a retrieval pointer. The full output survives compaction. When the model needs it, it pulls the original via expand, with no command re-run and no lost output.

This isn't just storage. The system tracks how many results were archived vs re-expanded, so you can see the net tokens that stayed collapsed. Re-expansions are netted out of the savings total, so you only count what actually stayed compressed.

python3 measure.py expand --list          # List archived tool results
python3 measure.py expand <tool-use-id>   # Retrieve a specific result

Session Continuity

Compression matters, but the most important thing Token Optimizer does is keep your work alive across sessions and compactions, automatically.

When you end a session, Token Optimizer checkpoints your state to SQLite: active task, key decisions, modified files, git branch, recent reads. When you start a new session, it scores all recent checkpoints against your prompt and surfaces the most relevant one as a hint. You resume with context, not from zero.

What happens automatically:

  • Cross-session hints: relevance-scored checkpoints surface when you start a new session. The hint includes the prior session's task, decisions, files, and branch. All fenced as RECOVERED DATA, never instructions.
  • Cold-resume-lean: reopen a stale session without paying the full transcript cost. Token Optimizer reconstructs a lean context from its checkpoint. No LLM call, no full-transcript cold-resume. Token-free reconstruction from SQLite.
  • Hint-follow measurement: when a continuity hint surfaces file paths and the model subsequently reads one, Token Optimizer credits an avoided exploratory search. Measured causality, not a guess.
python3 measure.py resume-lean                    # list reopenable cold sessions
python3 measure.py resume-lean <#|session_id> --print  # emit lean context block

Compression savings only stick if your session survives compaction. Session continuity is what makes that happen.

Smart Compaction

When auto-compact fires, 60-70% of your conversation vanishes. Decisions, error-fix sequences, agent state, all gone.

Token Optimizer checkpoints your session before compaction and restores what the summary dropped, automatically, via hooks. It also injects a context intel digest: heuristic summaries of large tool outputs the model already processed (file paths touched, errors seen, line counts). After compaction, the model knows what it saw without re-reading everything.

Activity mode detection classifies your session in real time (code, debug, review, infra, general) using the last 10 tool calls. The mode feeds into compaction guidance so PRESERVE/DROP priorities adapt to what you're doing right now, not a generic heuristic.

Decision extraction captures decision statements from tool outputs as they happen and stores them in the session DB. At compaction time, these are injected as CRITICAL DECISIONS the summary must preserve verbatim. Capped at 10 per session.

Compression savings only stick if your session survives compaction. Saving tokens on git status doesn't help if the next auto-compact wipes out the decision that made you run it.

Quality Nudges and Loop Detection in action

How Smart Compaction works

Progressive Checkpoints

Captures session state at multiple thresholds: 20%, 35%, 50%, 65%, 80% context fill, plus quality drops below 80, 70, 50, and 40. Also snapshots before agent fan-out and after large edit batches. On restore, picks the richest eligible checkpoint.

Context Intel Digest

After compaction, Token Optimizer injects a digest of the session's largest tool outputs: file paths touched, errors detected, line counts. Generated heuristically in <30ms, no LLM call. The model re-orients without re-reading everything.

python3 measure.py setup-smart-compact    # checkpoint + restore hooks

Output Tokens: Lean vs Verbose

Output tokens are the most expensive part of your session. They cost 5x more than input tokens on Opus and are billed per generation, not per cache read. A verbose response to a simple question burns dollars you never needed to spend.

Token Optimizer handles this automatically with lean-output nudges. When your context fills past 25% and quality starts dropping, a short nudge tells the model to reason deeply but keep visible output lean. Live A/B testing showed a 10-15% typical reduction in output tokens, up to 30-41%, on real prompts.

How it works:

  • The nudge is injected as additionalContext, never modifying the existing prefix, so your cache stays intact
  • It only fires when context is filling up, not when you have plenty of room
  • The model still thinks through the problem; it just produces a more concise visible answer
  • Disable any time: TOKEN_OPTIMIZER_VERBOSITY_STEER=0

This is one of the 9 active compression features, and it's the one that saves on the output side, where tokens cost the most.

Quality Scoring

The quality score tracks two things: Resource Health (how close you are to the degradation cliff) and Session Efficiency (whether your tokens are doing useful work). Letter grades from S to F make triage instant.

As context fills, quality drops. MRCR falls from 93% to 76% between 256K and 1M context. Your AI gets measurably dumber as the window fills. The quality score shows you exactly when that happens.

Real session quality breakdown

The status bar shifts color as quality degrades: green, yellow, orange, red. When quality drops below 75, session duration appears as a warning. Running subagents show with their model and elapsed time.

Status Bar Degradation

python3 measure.py setup-quality-bar      # one-time install

Read how scoring works โ†’

Quality score details
ScoreSignalsWhat it means
Resource HealthContext fill, compaction depth, absolute waste tokensHow close you are to the degradation cliff
Session EfficiencyStale reads, bloated results, decision density, agent efficiencyWhether the session is using tokens well right now
GradeRangeMeaning
S90-100Peak efficiency
A80-89Healthy, minor optimization possible
B70-79Degradation starting
C55-69Significant waste
D40-54Serious problems
F0-39Context is rotting, immediate action needed

Quality bar disappeared? Running Claude Code's /statusline overwrites Token Optimizer's entry. SessionStart auto-restores it. Just start a new session and it's back.

Want it off for good?

python3 measure.py setup-quality-bar --uninstall

Coach Mode

> /token-coach

Tell it your goal. Get back specific, prioritized fixes with exact token savings. It reads 30 days of your session data and surfaces what no single session can show: quality drifting down, sessions getting longer, cache hit rates falling, cost per session climbing.

Every insight is grounded in your actual numbers. "Your short sessions score 68 vs 60 for long ones" hits differently than "consider shorter sessions." Coach Mode also identifies project-level optimization opportunities (skills you never use, MCP servers that load eagerly, CLAUDE.md patterns that break your cache) and teaches you how to fix them so future sessions start leaner.

Read about Coach Mode โ†’

Coach Mode details

Historical patterns detected

PatternWhat it detects
Quality driftAverage quality dropping week over week
Session duration creepSessions getting longer, filling context faster
Cache degradationCache hit rate falling (and whether model switches cause it)
Grade distributionToo many D-grade sessions piling up
Cost awarenessCost per session climbing, with routing advice
Duration-quality correlationShort sessions scoring higher, suggesting you break up long ones
Compression gapShadow-only savings far exceeding active compression
Model switchingFrequent mid-session model switches invalidating the prompt cache

Waste detectors

11 automated detectors analyze your session patterns:

DetectorWhat it catches
PDF/binary ingestionLarge files consuming context
Web search overheadToo many web results dumped into context
Retry churnSame tool retried 3+ times with errors
Tool cascade3+ consecutive tool errors
LoopingRepeated similar messages
Overpowered modelOpus used for simple edits
Weak modelHaiku on complex tasks
Bad decompositionMonolithic 500+ word prompts
Wasteful thinkingExtended thinking >2x output for small edits
Output wasteVerbose responses to simple operations
Cache instabilityCLAUDE.md patterns that break the prompt cache

Keep-Warm

Opt-in feature for API-billed sessions. Issues a tiny cache-read ping just before your prompt cache entry would expire, refreshing its TTL. Costs ~0.1x of the prefix versus the 1.25-2x re-write you'd pay on resume. Tripwire auto-off if pings stop paying for themselves.

python3 measure.py keepwarm-enable          # opt in (API billing only)
python3 measure.py keepwarm-report            # net savings, spend, tripwire state
python3 measure.py keepwarm-disable           # opt out any time

Fleet Auditor

Scans across Claude Code, Codex, and custom transcript setups to find idle burns, model misrouting, and config bloat with dollar savings per finding.

CLAUDE.md Routing Injection

Generate model routing instructions from your actual usage data and inject them into CLAUDE.md. 48-hour staleness guard auto-removes stale advice.

python3 measure.py inject-routing --dry-run   # Preview
python3 measure.py inject-routing              # Inject

FAQ

๐Ÿ”’ Can it degrade my context quality?

No. Structural optimization only removes genuinely unused components. Active Compression controls can be disabled with a single command or env var. The quality scoring system tracks degradation in real time.

๐Ÿ’พ Does it break the prompt cache?

No. Token Optimizer never touches content already in your context. It works on new content entering your window and on the compaction boundary. Your cache prefix stays intact, which means it saves you money twice: less input per turn, and cheaper cache reads on every turn forward.

๐Ÿ“ก Does it send any data anywhere?

No analytics, no telemetry endpoint, no product data leaves your machine. Measurement events are local SQLite rows you own. Zero network calls.

โš ๏ธ Can it hurt my session?

No. All hooks are non-blocking with fail-open design. If a Token Optimizer script errors, your command runs normally.

๐Ÿ“ฆ Any runtime dependencies?

No. Pure Python stdlib on Claude Code and Codex. TypeScript with zero runtime deps on OpenCode and OpenClaw.

๐Ÿ” How does install.sh verify file integrity?

Resolves the latest GitHub Release tag, checks out that tag, fetches CHECKSUMS.sha256 from the same release, and verifies every script file. Out-of-band verification means a compromised commit cannot swap both code and checksums simultaneously.

All Commands

Show all commands
CommandWhat it doesDocs
/token-optimizerFull audit with 6 parallel agents, guided fixesโ†’
/token-coach30-day trend analysis, prioritized fixesโ†’
quick10-second health checkโ†’
doctorInstallation check, score out of 10โ†’
dashboardOpen the HTML dashboardโ†’
savingsDollar savings reportโ†’
reportPer-component token breakdownโ†’
qualityContext-quality analysis of live sessionโ†’
trendsSkill adoption, model mix, overhead over timeโ†’
compression-statsMeasured savings from active compressionโ†’
memory-reviewMEMORY.md structural auditโ†’
git-contextSuggest files for your current diffโ†’
driftSide-by-side comparison vs your last snapshotโ†’
conversationPer-message token and cost breakdownโ†’
pricing-tierView or switch pricing tiersโ†’
expandRetrieve an archived tool result (progressive disclosure)โ†’
resume-leanReopen a cold session with token-free reconstructionโ†’

Full CLI reference โ†’

License

PolyForm Noncommercial 1.0.0. Source-available. Personal, research, educational, and non-commercial use requires no license purchase.

Personal / hobby / research / education?

Go for it. Full source, runs locally, no license purchase needed.

Small team (under 5 people OR under $20k/month revenue)?

Small teams get a no-cost commercial license automatically. Just use it.

Started personal, now it's turning into a business?

Your past use is totally fine. The license has a built-in 32-day grace period after any written notice. Reach out for a commercial license when you're ready.

Larger company / commercial use?

Contact Alex Greenshpun or me@alexgreenshpun.com.


Created by Alex Greenshpun.