πŸͺ¨ Caveman Code

June 3, 2026 Β· View on GitHub

πŸͺ¨ Caveman Code

The terminal coding agent that talks like a caveman β€” and burns half the tokens doing it.

Same model. Same task. ~2Γ— fewer tokens than Codex. 20+ providers Β· plan mode Β· autopilot loop Β· MIT.

Stars npm version npm downloads MIT License

Install Β· The Trick Β· How It Saves Tokens Β· Why Caveman Β· Features Β· SDK


πŸ”₯ The trick

Big agent waffle. Waffle cost token. Caveman no waffle.

Asked β–Έ why does this component re-render on every keystroke?

Ordinary agent Β· ~290 tokensπŸͺ¨ Caveman Code Β· 31 tokens
Great question! A React component can re-render on every keystroke for several reasons. The most common cause is passing a fresh object or function reference as a prop on each render, which defeats React's referential-equality bail-out and forces the child to reconcile again … (three more paragraphs)New object ref each render. Inline prop = new ref = re-render. Wrap in useMemo.

Same answer. Same model. Caveman version costs ~9Γ— less to read back β€” and the agent reads its own context back on every single turn. The saving compounds across the whole session.

That is the entire product. Everything below is the coding agent it ships inside.


The proof

25-task MicroBench Β· gpt-5.5 Β· xhigh reasoning Β· 2026-05-18

ToolFresh tokensPass rate
πŸͺ¨ cavemanβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ 524k14 / 25
codexβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 1,010k15 / 25

1.93Γ— fewer tokens than Codex CLI on identical tasks. Same gpt-5.5 model. Same xhigh reasoning. Pass rate within one task.

No marketing-deck baselines. Each tool spawned as a real child process. Each task verified by a task-specific verify.sh. Raw CSV + per-task logs published.

npx tsx research/evals/run-honest-bench.ts --tools caveman,codex   # reproduce in one command

Raw CSV Β· Aggregate JSON Β· Methodology Β· 25 task prompts


Install

npm install -g @juliusbrussee/caveman-code

Installs two binaries β€” caveman (primary) and caveman-code (alias). Same command, pick either.

export ANTHROPIC_API_KEY=sk-ant-...     # or any supported provider's key
caveman                                 # launch the TUI
caveman "explain this codebase"          # one-shot
caveman -p "summarize this"              # print mode (non-interactive)
caveman goal start "ship feature X"      # autonomous Ralph loop
Other install paths β€” pnpm Β· yarn Β· bun Β· Docker Β· OAuth login
pnpm add -g @juliusbrussee/caveman-code
yarn global add @juliusbrussee/caveman-code
bun  add -g @juliusbrussee/caveman-code

# Docker
docker run --rm -it -v "$PWD:/work" ghcr.io/juliusbrussee/caveman-code:latest

# No API key? Use a subscription you already pay for:
caveman && /login   # Claude Pro Β· ChatGPT Plus Β· Copilot Β· Gemini Β· Antigravity

CI / headless install: docs/getting-started/installation.md.


Quick Start

caveman                            # interactive TUI
caveman "fix the failing tests"     # start with a prompt
caveman -p "summarize this file"    # non-interactive: print and exit
cat err.log | caveman -p "debug"    # pipe stdin
caveman -c                          # continue last session
caveman -r                          # browse + resume sessions
caveman /plan                       # plan mode β€” read-only (slash command)
caveman goal start "ship payments v2"   # autonomous Ralph loop

Type / inside the TUI for every slash command. Reference: docs/reference/slash-commands.md.


How It Saves Tokens

Four compression layers, always on β€” and they hit two separate token sinks: what the model says and what the shell returns.

Token sinkLayerWhat happensCut
Model replyCaveman ModeTerse technical fragments β€” no filler, no hedging. Levels lite Β· full Β· ultra.prompt + reply
Tool outputTool BudgetsPer-tool line caps (bash 80 Β· read 300 Β· grep 120), ANSI strip, blank-line collapse, semantic JSON/XML extraction.βˆ’67% to βˆ’94%
Read DedupFiles fingerprinted per session β€” re-reads return a stub, not the bytes.βˆ’99% on repeats
RTKOptional external Rust binary ("Rust Token Killer") β€” pipes bash output through rtk before it enters context.βˆ’60% to βˆ’90% (RTK's own bench)

Pays for itself after one tool call.

Benchmark β€” 10 real tool-output fixtures Β· βˆ’86% aggregate
  git diff (901 lines)   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  -94%
  npm ls (701 lines)     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    -92%
  ls recursive (601 ln)  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     -90%
  grep results (801 ln)  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ       -89%
  test output (501 ln)   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ        -88%
  XML/pom.xml (382 ln)   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ            -79%
  docker inspect (258)   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                  -68%
  ANSI colored (97 ln)   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                       -50%
  read file (429 lines)  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                    -32%
  build output (19 ln)   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                           -18%
                         ────────────────────────────────────────────────────
  AGGREGATE              β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     -86%
MetricValue
Tokens saved (10 fixtures)~72,400 of 337K chars
System-prompt overhead120–195 tokens (lite–ultra)
Net savings β€” 15-turn session+567K tokens (~$1.70, Sonnet)
Net savings β€” 30-turn session+1.13M tokens (~$6.92, Sonnet)
npm run bench:offline   # compression analysis β€” free, <1s
npm run bench:replay    # analyze your real sessions β€” free
npm run bench:live      # A/B with live LLM calls β€” needs API key
Use `/caveman [lite|full|ultra|off]` in the TUI to adjust compression aggressiveness.

Why Caveman Code

CapabilityCavemanClaude CodeCodexAideropencode
4-layer token compressionβœ…βŒβŒrepo map only❌
20+ provider OAuthβœ…AnthropicChatGPTAPI keysβœ…
Autonomous goal loopβœ…βŒβŒβŒβŒ
Autopilot β€” no permission promptsβœ…βŒβŒβœ…βŒ
Repo map (PageRank, Aider-style)βœ…βŒβŒβœ…βŒ
Architect / editor model splitβœ…βŒβŒβœ…βŒ
Session branching + shadow-git checkpointsβœ…βŒfork onlygit only❌
Persistent semantic memory (cavemem)βœ…MEMORY.md❌❌❌
MIT open sourceβœ…closedApache-2.0Apache-2.0βœ…

Full table including Crush: docs/comparison.md.


Features

FeatureTrigger
πŸ€–Autonomous goal loop β€” Ralph-style autopilot. Rolling state, per-iteration /tokenledger,shadowβˆ’gitcheckpoints,rankedtermination(sentinelβ‹…iterationcapβ‹…/token ledger, shadow-git checkpoints, ranked termination (sentinel Β· iteration cap Β· -cap Β· no-progress Β· SIGINT). Resume any time.caveman goal start
🧠Plan mode β€” read-only chat. Model sees only read/grep/find/ls, produces a written plan, never edits. Subagents inherit the gate. /act to execute./plan
πŸ‘₯Subagents β€” up to 7 parallel, worktree-isolated. Frontmatter agents at .cave/agents/*.md (Claude Code superset). Five ship by default.Task tool
πŸͺžArchitect / editor split β€” slow model plans, fast model executes. ~3–5Γ— cheaper than a single-model run.--architect Β· --editor

Latest release: plan mode Β· goal loop Β· native memory tools Β· subagent registry. Full history β†’ CHANGELOG.md.

More β€” sessions Β· providers Β· MCP Β· memory Β· recipes Β· daemon Β· CLI flags

🌳 Sessions, branching, replay

JSONL sessions in ~/.cave/agent/sessions/, organized by working directory. Branching never overwrites history.

caveman -c                    # continue most recent
caveman -r                    # browse and select
caveman --fork <path|id>      # fork into a new file

/tree navigate + branch in-place (search Β· fold Β· page Β· filter) Β· /compact manual compaction Β· /checkpoint + /rollback N rewind code and conversation together.

🌐 20+ providers, 6 OAuth flows

OAuth β€” Claude Pro/Max Β· ChatGPT Plus/Pro Β· GitHub Copilot Β· Google Gemini Β· Antigravity Β· Vertex API keys β€” Anthropic Β· OpenAI Β· Azure Β· Vertex Β· Bedrock Β· Mistral Β· Groq Β· Cerebras Β· xAI Β· OpenRouter Β· Vercel AI Gateway Β· Hugging Face Β· Kimi Β· MiniMax Β· Z.AI Β· DeepSeek Custom β€” any OpenAI-/Anthropic-/Google-compatible endpoint via ~/.cave/agent/models.json.

πŸ”Œ MCP, hooks, skills, commands β€” Claude Code-compatible

Authoring formats are a superset of Claude Code's β€” paste your existing config, it works.

Claude CodeCavemanNotes
~/.claude/settings.json~/.cave/settings.jsonHooks identical (run as observers, never block)
~/.claude/commands/*.md~/.cave/commands/*.mdFrontmatter superset
~/.claude/skills/<name>/SKILL.md~/.cave/skills/<name>/SKILL.mdIdentical
~/.claude/agents/<name>.md~/.cave/agents/<name>.mdFrontmatter superset
.mcp.json.mcp.jsonSame path, no change

MCP transports: stdio Β· Streamable HTTP Β· in-process. OAuth 2.1 + PKCE; tokens in OS keychain.

caveman mcp add <name>      # add a server
caveman mcp doctor          # health-check + tool listing
caveman mcp-server          # run caveman itself as an MCP server (Codex-compatible)

🧠 Memory via cavemem

Persistent memory delegated to cavemem (MIT, hybrid BM25 + local vectors). Agent has two native tools β€” memory_search and memory_save; relevant recall is auto-injected each turn.

/memory search "auth migration"
/memory consolidate            # cluster recent observations into semantic facts
/memory sync --from claude     # import Claude Code's MEMORY.md

πŸ› οΈ Recipes

Declarative multi-step YAML workflows at ~/.cave/recipes/*.yaml. Ten built in: accessibility-audit Β· add-feature-flag Β· add-tests Β· bump-deps Β· extract-component Β· migrate-deps Β· migrate-to-biome Β· port-to-typescript Β· release Β· seo-audit.

/recipe run add-tests src/auth.ts

πŸ–₯️ Daemon

caveman serve --port 39245             # start the daemon
caveman attach --host localhost:39245  # attach a TUI

Sessions live in SQLite and survive SSH drops. Prepend & to any prompt to dispatch to a remote caveman worker.

βš™οΈ CLI flags

FlagDescription
-c / -rContinue / browse-resume session
-p, --printNon-interactive: print and exit
--mode json|rpcStructured output modes
--provider / --modelProvider name / model ID (:<thinking> suffix ok)
--thinking <level>offΒ·minimalΒ·lowΒ·mediumΒ·highΒ·xhigh
--architect / --editor <model>Architect/editor split
--tools <list>Enable specific tools
--no-toolsDisable all built-in tools
--extension <path>Load an extension
--no-extensionsDisable extension discovery

πŸ“‹ Slash commands (in TUI)

CommandDescription
/planToggle plan mode (read-only exploration)
/actExecute a saved plan
/caveman [level]Adjust token compression (liteΒ·fullΒ·ultraΒ·off)
/login, /logoutOAuth authentication
/modelSwitch models
/settingsConfigure theme, thinking, compaction
/resumeBrowse and resume sessions
/treeNavigate session history
/checkpoint, /rollback NGit-like version control

πŸš€ Subcommands

CommandDescription
caveman goal start "<text>"Autonomous Ralph-style loop
caveman goal resume [id] [--force]Resume a paused goal
caveman goal status [id]Show goal state and ledger
caveman goal cancel [id]Mark goal as cancelled
caveman goal listList all goals in project
caveman mcp <subcmd>Manage MCP servers
caveman watch [paths]File watcher for // cave! triggers
caveman exec [flags] "<prompt>"Non-interactive CI mode
caveman plugin <subcmd>Plugin marketplace
caveman run-recipe <name>Run YAML workflow recipes
caveman rollback NRevert to checkpoint N
caveman models <subcmd>Manage model registry
caveman serve / attachDaemon mode

Env: ANTHROPIC_API_KEY Β· OPENAI_API_KEY Β· CAVE_CODING_AGENT_DIR (config dir) Β· CAVE_CACHE_RETENTION=long (extended prompt cache).


SDK

import { AuthStorage, createAgentSession, ModelRegistry, SessionManager } from "@juliusbrussee/caveman-code";

const { session } = await createAgentSession({
  sessionManager: SessionManager.inMemory(),
  authStorage: AuthStorage.create(),
  modelRegistry: ModelRegistry.create(AuthStorage.create()),
});

session.on("message", (msg) => console.log(msg.role, msg.text));
await session.prompt("Refactor src/auth.ts to use the new TokenStore.");

Talk to a running daemon over HTTP / WS via @juliusbrussee/caveman-sdk. API reference β†’

TypeScript monorepo, 9 packages β€” full layout in CLAUDE.md.


Acknowledgements

Caveman Code is a heavy fork of pi-code by Mario Zechner. We track upstream and contribute fixes back where generally useful.

From pi-code (upstream)Caveman Code's own work
Agent runtime Β· MCP scaffolding Β· provider OAuth Β· repo map Β· slash-command parser Β· settings manager Β· skills loader Β· edit-format renderers Β· TUI componentsCaveman Mode (4-layer compression) Β· goal loop Β· plan mode Β· cavemem integration Β· /tree session branching Β· architect/editor split Β· honest-bench harness

Also indebted to Aider (repo map + edit-format-per-model), Claude Code (settings/commands/skills/agents/.mcp.json formats β€” adopted verbatim, then extended), Codex (cave-as-MCP-server), RTK (optional bash-output compression layer), and Biome (single-binary lint/format).

Missing credit? Open an issue β€” we'll fix it fast.


License

MIT Β© Julius Brussee. Forked from pi-code (MIT Β© Mario Zechner).

Issues Β· Releases Β· Changelog Β· Docs

Caveman no waste token. Caveman ship.