Known Challenges and Caveats
March 29, 2026 · View on GitHub
Honest assessment for maintainers and users.
Usage modes
Mode A — Inline (same workflow as the agent)
The action runs as a step directly after the agent step. GitHub context is fully available, timestamps are accurate, token counts come from explicit inputs or the agent's stdout.
- uses: anthropics/claude-code-action@v1
id: agent
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
prompt: "..."
- uses: agentmeter/agentmeter-action@main
if: always()
with:
api_key: ${{ secrets.AGENTMETER_API_KEY }}
status: ${{ steps.agent.outcome }}
model: claude-sonnet-4-5
input_tokens: ${{ steps.agent.outputs.input_tokens }}
output_tokens: ${{ steps.agent.outputs.output_tokens }}
No meaningful caveats. Works with any agent framework that exposes token counts as outputs or stdout.
Mode B — Companion workflow (workflow_run trigger)
Required for gh-aw and any agent that runs in a separate workflow. The action triggers on workflow_run: completed and uses workflow_run_id to auto-resolve everything.
This is the complex mode and the source of the challenges below.
Mode B challenges
1. Gate is gh-aw-specific
When workflow_run_id is set, the action calls listJobsForWorkflowRun and checks for a job named exactly conclusion before proceeding. This prevents the ~5 duplicate workflow_run firings that gh-aw produces per agent run.
Limitation: Any other multi-job framework with a terminal job named something other than conclusion will pass through unconditionally — one ingest per job completion. Single-job workflows are fine.
Workaround: Use inline mode, or make the gate job name configurable (gate_job_name input, defaulting to conclusion).
2. Token data requires manual lock file patching (gh-aw)
The action reads token counts from an agent-tokens artifact uploaded by the agent job. In gh-aw, the .lock.yml is auto-generated by gh aw compile and doesn't include these steps. They must be manually patched in after every recompile.
Workaround: Commit .lock.yml to version control and treat the patch as a diff to reapply. The repo includes scripts/patch-workflows.sh for this. See the README for the exact steps to add.
Better fix: Upstream feature request to gh-aw for native token count outputs.
3. Trigger number is null for non-standard branch names
For workflow_run events, the action resolves the PR/issue number from pull_requests[] on the run object, then falls back to a pulls.list API call by head branch. Issue numbers are only inferred when the branch name matches the gh-aw convention agent/issue-N exactly — this is intentional to avoid misattributing unrelated branches (e.g. feature/fix-issue-12-auth). If the branch doesn't follow this pattern and pull_requests[] is empty, triggerNumber is null and no comment is posted.
Workaround: Pass trigger_number explicitly as an input to override resolution.
4. Token data for non-gh-aw workflow_run setups
If running an agent in a separate workflow without gh-aw, upload your own agent-tokens.json artifact in this format:
{
"input_tokens": 1000,
"output_tokens": 200,
"cache_read_tokens": 500,
"cache_write_tokens": 100
}
The action will pick it up automatically via workflow_run_id. See the README for a complete example.
Mode A caveats
5. if: always() is the user's responsibility
If the user omits if: always() on the AgentMeter step, failed agent runs won't be tracked. Documentation only — the action can't enforce this.
6. Codex token counts rely on an internal rollout file format
codex exec (via openai/codex-action) does not expose token usage through any documented public API. However, when running without --ephemeral, the Codex CLI writes a rollout JSONL file to:
$CODEX_HOME/sessions/YYYY/MM/DD/rollout-<timestamp>-<uuid>.jsonl
Each line is a JSON event. Token totals appear in token_count events:
{
"type": "event_msg",
"payload": {
"type": "token_count",
"info": {
"total_token_usage": {
"input_tokens": 479565,
"output_tokens": 7489,
"cached_input_tokens": 444416
}
},
"rate_limits": null
}
}
The last token_count event in the file contains cumulative totals for the full run.
How the workflow extracts tokens:
- Set
codex-home: /tmp/codex-homeonopenai/codex-actionso the rollout path is known - After the codex step, find the latest rollout file with
find /tmp/codex-home/sessions -name "rollout-*.jsonl" -printf "%T@ %p\n" | sort -rn | head -1 - Grep for
"token_count", take the last line, extract fields withjq - Pass
input_tokens,output_tokens,cache_read_tokensas explicit inputs to the AgentMeter step
Stability caveat: The rollout format is an internal Codex CLI implementation detail, not a versioned public API. A future @openai/codex release could rename fields or restructure events. Failure is graceful — costs show as — if the rollout file is missing or unparseable.
Alternative path (codex exec --json): Running with --json writes JSONL to stdout with turn.completed events containing a usage field. However, openai/codex-action's final-message output reads from the output file, not stdout — so the JSONL stream is not accessible from within the action's step outputs. The tryExtractFromCodexExecJsonl function in token-extractor.ts handles this format for consumers who capture codex exec --json stdout directly.
What works regardless of mode
- The action never fails the workflow — all errors are
core.warning(), notcore.setFailed(). GITHUB_TOKENis always available via thegithub_tokeninput default — no config needed.- Comment upsert (update-in-place) works correctly, including across both old 5-column and current 6-column comment formats.
- All four token types (input, output, cache read, cache write) are tracked when available.
turnsis auto-extracted fromagent_outputwhen not provided explicitly — Claude Code JSON (num_turns), Codex exec JSONL (turn.completedcount), or regex fallback. The resolved value appears in both the ingest payload and the PR/issue comment.- Partial token overrides: providing only
input_tokensstill falls back to artifact or extracted values for the other fields.
Status table
| Item | Status | Notes |
|---|---|---|
| Gate is gh-aw-specific | ⚠️ Known limitation | Single-job workflows fine; multi-job non-gh-aw users risk duplicates |
| Lock file patching | ⚠️ Manual step | Must re-patch after every gh aw compile; scripts/patch-workflows.sh automates this |
| Trigger number for non-standard branches | ⚠️ Known limitation | Pass trigger_number explicitly as workaround |
if: always() enforcement | ⚠️ User error risk | Documentation only |
| Codex token extraction | ⚠️ Internal format | Works in production; rollout format is not a public API — could break on CLI upgrade |
Token data for non-gh-aw workflow_run users | ✅ Documented | See README and challenge #4 above |
| Zip parsing | ✅ | Uses fflate — proper decompression |
githubRunId in payload | ✅ | Uses agent run ID when workflow_run_id is set |
| Trigger number resolution | ✅ | pull_requests[] array + pulls.list API fallback; issue branch requires agent/issue-N prefix (gh-aw convention) to prevent misattribution |
| Trigger type resolution | ✅ | issue_comment correctly classified as pr_comment vs issue_comment |
| Status normalization | ✅ | Raw GitHub conclusion mapped internally; custom statuses pass through unchanged |
| Partial token overrides | ✅ | Per-field merge — partial overrides don't zero out unspecified fields |
| Multiple firing dedup | ✅ for gh-aw | Gate on conclusion job name |
| Timestamps / duration | ✅ | Sourced from workflow run API; null when unavailable (falls back to action start/now); duration clamped to ≥0 |
| Workflow name | ✅ | Uses agent workflow name, not companion |
| Comment format migration | ✅ | Old 5-column comments parsed correctly |
| Comment ordering | ✅ | Newest runs displayed first; 5 visible, rest in collapsible section |
| Comment posting | ✅ | Upsert by marker, paginated search, correct PR/issue number |
GITHUB_TOKEN availability | ✅ | github_token input with default: ${{ github.token }} |
| Node.js version | ✅ | node24 |
| Pricing | ✅ | Fetched from /api/models/pricing; prefix-match fallback for versioned IDs; null cache pricing shows — |