Claude Code Prompt Cache A/B Test Report

March 31, 2026 · View on GitHub

Experiment Design

Hypothesis: CLAUDE_CODE_ATTRIBUTION_HEADER=false restores cross-session prompt cache sharing by removing the per-session dynamic hash from system prompt prefix.

Method:

  • Round A (control): 4 sessions with default settings (billing header ON)
  • Round B (treatment): 4 sessions with CLAUDE_CODE_ATTRIBUTION_HEADER=false
  • Each session sends a DIFFERENT prompt (simulating real usage)
  • All sessions use claude --print mode (single-shot, non-interactive)
  • Sessions run sequentially with 8s gap (within 5-min cache TTL)
  • Metric: cache_read_input_tokens on the FIRST API request of each session

Key expectation: In Round A, each session should show partial cache_read (tools block cached, system prompt rebuilt due to header hash change). In Round B, sessions 2+ should show full cache_read (both tools and system prompt served from cache).

Results

Round A — Header ON (default)

SessionPromptcache_readcache_creationinput_tokenshit_ratio
1What is the capital of France? Answer in...11272122063.4800
2List three prime numbers under 20....11272122023.4801
3Explain what a goroutine is in one sente...11374127533.4713
4Name one benefit of TypeScript over Java...11272122033.4801

Round B — Header OFF

SessionPromptcache_readcache_creationinput_tokenshit_ratio
1What is the capital of France? Answer in...2347803.9998
2List three prime numbers under 20....2347403.9998
3Explain what a goroutine is in one sente...11272121973.4802
4Name one benefit of TypeScript over Java...2347503.9998

Analysis

Data Comparison Table

SessionRound A cache_readRound A cache_creationRound B cache_readRound B cache_creationDelta cache_read
111,27212,20623,4780+12,206 (+108%)
211,27212,20223,4740+12,202 (+108%)
311,37412,75311,27212,197-102 (anomaly)
411,27212,20323,4750+12,203 (+108%)

Total prompt size (tools + system): ~23,478 tokens (confirmed by Round B full-cache sessions).

Key Findings

1. Hypothesis CONFIRMED — billing header prevents ~52% of prompt cache sharing

The data reveals a two-tier caching pattern, consistent with the Anthropic API's cache hierarchy (tools → system → messages):

  • Tools block (~11,272 tokens): Tool definitions are stable across sessions — always cached regardless of header setting
  • System prompt block (~12,200 tokens): Contains the billing header with a per-session hash — re-created every session when header is ON, because the hash changes the cache key for this block

With CLAUDE_CODE_ATTRIBUTION_HEADER=false, both blocks are fully cacheable → 99.98% hit ratio.

2. Round A (header ON): Consistent ~48% cache hit ratio

Every session shows the same pattern: ~11,272 cache_read (tools block) + ~12,200 cache_creation (system prompt block). The billing header hash changes per session, preventing the system prompt from being served from cache.

3. Round B (header OFF): Near-perfect cache sharing (3/4 sessions)

Sessions 1, 2, 4 achieved ~23,475 cache_read with 0 cache_creation — the entire prompt (tools + system) is served from cache.

4. Anomalous Session B-3

Session B-3 showed the Round A pattern (11,272 read + 12,197 creation) instead of full cache hits. Possible causes:

  • Server-side cache eviction between B-2 and B-3 (8s gap, within 5-min TTL but eviction is unpredictable)
  • Session B-3's JSONL resolved to a different project directory than the other sessions
  • Note: B-4 achieved full cache again (23,475), confirming B-3's cache_creation was successfully reused

5. Cost Impact Estimate

Using Anthropic pricing (cache_read = 0.1x base, cache_creation = 1.25x base):

MetricRound A (per session)Round B (per session, typical)
Effective input cost11,272×0.1 + 12,200×1.25 + 3 = ~16,380 equiv tokens23,475×0.1 + 0 + 3 = ~2,351 equiv tokens
Savings~85% reduction (~7x cheaper)

Note: This comparison measures system prompt cost only. For long conversations, user messages and tool results dominate the total token count, so the relative savings decrease as conversations grow.

Conclusion

The billing header (CLAUDE_CODE_ATTRIBUTION_HEADER) prevents cross-session prompt cache sharing for the system prompt block (~12K tokens). Setting CLAUDE_CODE_ATTRIBUTION_HEADER=false restores full cache sharing, reducing per-session system prompt costs by ~85%.

This is particularly impactful for:

  • Multi-session workflows — running several Claude Code instances on the same project in parallel
  • Rapid iteration — frequent short sessions that repeatedly pay cache_creation cost
  • CI/CD and automation — multiple claude --print calls in sequence (pipelines, batch processing)

Recommendation: Set CLAUDE_CODE_ATTRIBUTION_HEADER=false in environments where cross-session cache efficiency matters.

Caveats

  • Tested with claude --print mode only. Interactive mode uses a different (larger) system prompt — the absolute numbers will differ, but the mechanism is the same.
  • Sample size is n=4 per round. The pattern is clear (3/4 full cache hit vs 0/4), but larger samples would strengthen the statistical confidence.
  • Server-side cache eviction is non-deterministic — individual runs may show variance (as seen in session B-3).

Environment

  • Claude Code version: 2.1.88 (Claude Code)
  • Test mode: claude --print (single-shot, non-interactive)
  • Date: 2026-03-31 08:23:21 UTC
  • Platform: Darwin arm64
  • Node: v22.14.0

Raw Data

See results/round-a/ and results/round-b/ for per-session metrics JSON files.