Caveman Code vs Claude Code, Codex, Aider, Crush, opencode

May 18, 2026 · View on GitHub

This is the comparison table from the v2 master plan, kept current as features land. The pitch is short:

Caveman Code is the only terminal coding agent that beats Claude Code on cost, Aider on context selection, Codex on provider flexibility, and opencode on session UX — in a single MIT-licensed binary.

Capabilities

Axis	Caveman Code v2	Claude Code	Codex	Aider	Crush	opencode
Token compression (3-layer Caveman Mode)	yes (unique)	no	no	repo map only	no	no
20+ provider OAuth (Claude Pro / ChatGPT / Copilot / Gemini)	yes (unique)	Anthropic only	ChatGPT only	env keys only	subset	env keys
Session branching + fork	yes	no	fork only	git only	no	no
Native MCP	yes	yes	yes	no	yes	yes
Native sandbox	yes	partial	yes (best-in-class)	no	partial	partial
Plan mode	yes	yes	yes	architect	no	yes
Repo map (PageRank)	yes	no	no	yes (best-in-class)	no	no
Edit-format-per-model	yes	no	no	yes (best-in-class)	no	no
Worktree-isolated subagents	yes	yes	yes	no	no	no
Daemon / multi-client	yes	no	yes (app-server)	no	no	yes (best-in-class)
Shadow-git checkpoints + `/rollback N`	yes	no	no	git only	no	no
Containerized parallel sessions	yes	no	no	no	no	no
Cost transparency (per-msg $)	yes	partial	partial	yes (best-in-class)	no	no
MIT open source	yes	closed	Apache	Apache	FSL	MIT

Where each agent shines

Claude Code — first-party Anthropic, opinionated UX, polished out-of-box. Best if you only use Claude and don't care about cost.
Codex — OpenAI's terminal agent. Excellent sandbox primitive ("sandbox-as-utility"). Single-vendor by design.
Aider — pioneer of repo map + edit-format-per-model. Strongest at large-codebase context selection. Less ergonomic interactive UX.
Crush — fast, polished TUI (Charm). Mid-session model swap. Smaller ecosystem.
opencode — strong daemon / multi-client story. Newer; ecosystem still maturing.
Cave — borrows the best of all five and adds Caveman Mode compression + 20+ provider OAuth as native, unique differentiators.

Tokens — the headline

25-task MicroBench, gpt-5.5 on both sides, xhigh reasoning (2026-05-18):

Agent	Fresh tokens	Pass rate	Cost
Codex CLI	1,010,185	15/25 (60%)	$0 (codex sub)
Caveman Code	524,703	14/25 (56%)	$1.78

1.93× fewer tokens for ~equivalent pass rate. Reproduce in one command:

npx tsx research/evals/run-honest-bench.ts --tools caveman,codex

Raw CSV and per-task logs live in research/results/. Methodology spawns each CLI as a real child process — no SDK shortcuts.

Format compatibility

Caveman Code is a superset of Claude Code's authoring formats. Concretely, you can paste these directly into ~/.cave/:

~/.claude/settings.json → ~/.cave/settings.json (hooks, permissions, statusLine identical schema)
~/.claude/commands/*.md → ~/.cave/commands/*.md
~/.claude/skills/<name>/SKILL.md → ~/.cave/skills/<name>/SKILL.md
~/.claude/agents/<name>.md → ~/.cave/agents/<name>.md
.mcp.json (Codex / Claude Code standard) is read at the project root

See migration from Claude Code for the step-by-step.

Caveat — these comparisons evolve

Claude Code, Codex, Crush, and opencode all iterate weekly. We pin our compatibility target to Claude Code v2.1.119 schemas with a CI delta check; tracking the others is best-effort. If you spot drift, open an issue.