Find which Claude Code versions ran your cache regression sessions
April 26, 2026 · View on GitHub
If you've seen cache_creation inflation, quota burning faster than expected, or sessions hitting /usage limits in unusual amounts during March–April 2026, this one-liner tells you which CC version(s) actually touched your local logs and how many sessions were affected per day.
It works because Claude Code writes the per-message usage block — including cache_creation.ephemeral_5m_input_tokens and ephemeral_1h_input_tokens — into the JSONL session logs under ~/.claude/. The diagnostic was originally posted by @lizthegrey on Issue #46829 (2026-04-25). This page reproduces the command, explains the output, and notes the caveats so you can run it yourself before deciding whether to file your own report, downgrade, or restructure billing.
The command
grep -h -r -E 'ephemeral_.*_input_tokens' ~/.claude \
| jq 'select(
.isSidechain == false
and (.message.model | startswith("claude-haiku") | not)
and .message.usage.cache_creation.ephemeral_5m_input_tokens > 0
) | .timestamp + "," + .version' 2>/dev/null \
| sed 's/T.*,/,/' \
| sort \
| uniq -c
Run it from any shell with jq installed. There are no destructive operations, no network calls, and nothing leaves your machine.
What the filter does
| Clause | Effect |
|---|---|
grep -h -r -E 'ephemeral_.*_input_tokens' ~/.claude | Pulls only lines that mention 5-minute or 1-hour cache input token counters. Cheaper than streaming every JSONL through jq. |
select(.isSidechain == false ...) | Drops sub-agent / sidechain calls so the count reflects sessions you actually started. |
| `(.message.model | startswith("claude-haiku") |
.message.usage.cache_creation.ephemeral_5m_input_tokens > 0 | Keeps only messages where 5-minute cache creation happened — the field that inflated when TTL silently dropped from 1h to 5m. |
.timestamp + "," + .version | Pairs the wall-clock timestamp with the CC version that wrote the line. |
sed 's/T.*,/,/' | Trims T HH:MM:SS.sssZ so `sort |
Reading the output
Each row is count "YYYY-MM-DD,version". Higher counts on days when you used Claude Code mean more 5-minute cache-creation messages were written that day under that CC version. Example shape:
92 "2026-04-01,2.1.81"
418 "2026-04-02,2.1.81"
317 "2026-04-09,2.1.85"
188 "2026-04-13,2.1.95"
...
(Sample shape from lizthegrey's reply; your numbers will differ.)
Cross-reference the version column with the regression timeline on Issue #46829 (cache TTL) and Issue #46917 (cache_creation inflation, ~20K tokens vs v2.1.98). If most of your high-volume days fall on a version named in those threads, the regression touched you.
What the command does NOT tell you
- Total tokens billed. It counts messages, not tokens. To see token volume per version, replace the last pipe with
... | jq '.message.usage.cache_creation.ephemeral_5m_input_tokens' | awk '{s+=\$1} END {print s}'after a per-versiongrep. - API vs Max/Pro plan attribution. The JSONL doesn't record which billing path the request used.
- Cache misses caused by sidechain / sub-agent calls. Those are filtered out by design; if you suspect sub-agent token waste, drop the
isSidechain == falseclause. - Whether the inflation is fully fixed yet. Run the same command on
~/.claudeafter upgrading and compare the per-version counts.
Adapting the diagnostic
To check the 1-hour path (the field that should dominate when caching works correctly):
grep -h -r -E 'ephemeral_.*_input_tokens' ~/.claude \
| jq 'select(
.isSidechain == false
and (.message.model | startswith("claude-haiku") | not)
and .message.usage.cache_creation.ephemeral_1h_input_tokens > 0
) | .timestamp + "," + .version' 2>/dev/null \
| sed 's/T.*,/,/' | sort | uniq -c
If this row count is small relative to the 5-minute version, your cache was creating short-TTL entries instead of long-TTL ones — exactly the symptom #46829 describes.
To compute the 5m / total ratio per session (a single number you can compare against a threshold):
grep -h -r -E 'ephemeral_.*_input_tokens' ~/.claude \
| jq -s 'map(select(.isSidechain == false))
| (map(.message.usage.cache_creation.ephemeral_5m_input_tokens // 0) | add) as $five
| (map(.message.usage.cache_creation.ephemeral_1h_input_tokens // 0) | add) as $hour
| { five: $five, hour: $hour, ratio_5m_of_total: ($five / ($five + $hour + 1)) }'
A ratio above ~0.40 across days when you ran multi-turn sessions is consistent with the regression; below ~0.15 looks healthy.
Citations
- Issue #46829 — Cache TTL silently regressed from 1h to 5m around early March 2026 (closed not planned on 2026-04-12; comments continued through April).
- Issue #46917 — CC v2.1.100+ inflates cache_creation by ~20K tokens vs v2.1.98 (open; @Adrian-Mteam's W14–W16 series tracked the User-Agent workaround that stopped working between W15 and W16).
- The diagnostic command is reproduced from @lizthegrey's reply on #46829 dated 2026-04-25.
Related reading
If you want a structured framework for deciding whether to stay on Anthropic, fortify, or migrate based on cost ratios derived from this diagnostic, see the April 2026 Claude Code Migration Playbook. The Playbook's Trigger 1 (cache_creation/total ratio over 14 days) and Chapter 4 hooks operationalize the same cache_creation signal this command surfaces.