Known Challenges and Caveats

March 29, 2026 · View on GitHub

Honest assessment for maintainers and users.

Usage modes

Mode A — Inline (same workflow as the agent)

The action runs as a step directly after the agent step. GitHub context is fully available, timestamps are accurate, token counts come from explicit inputs or the agent's stdout.

- uses: anthropics/claude-code-action@v1
  id: agent
  with:
    anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
    prompt: "..."

- uses: agentmeter/agentmeter-action@main
  if: always()
  with:
    api_key: ${{ secrets.AGENTMETER_API_KEY }}
    status: ${{ steps.agent.outcome }}
    model: claude-sonnet-4-5
    input_tokens: ${{ steps.agent.outputs.input_tokens }}
    output_tokens: ${{ steps.agent.outputs.output_tokens }}

No meaningful caveats. Works with any agent framework that exposes token counts as outputs or stdout.

Mode B — Companion workflow (`workflow_run` trigger)

Required for gh-aw and any agent that runs in a separate workflow. The action triggers on workflow_run: completed and uses workflow_run_id to auto-resolve everything.

This is the complex mode and the source of the challenges below.

When workflow_run_id is set, the action calls listJobsForWorkflowRun and checks for a job named exactly conclusion before proceeding. This prevents the ~5 duplicate workflow_run firings that gh-aw produces per agent run.

Limitation: Any other multi-job framework with a terminal job named something other than conclusion will pass through unconditionally — one ingest per job completion. Single-job workflows are fine.

Workaround: Use inline mode, or make the gate job name configurable (gate_job_name input, defaulting to conclusion).

2. Token data requires manual lock file patching (gh-aw)

The action reads token counts from an agent-tokens artifact uploaded by the agent job. In gh-aw, the .lock.yml is auto-generated by gh aw compile and doesn't include these steps. They must be manually patched in after every recompile.

Workaround: Commit .lock.yml to version control and treat the patch as a diff to reapply. The repo includes scripts/patch-workflows.sh for this. See the README for the exact steps to add.

Better fix: Upstream feature request to gh-aw for native token count outputs.

3. Trigger number is null for non-standard branch names

For workflow_run events, the action resolves the PR/issue number from pull_requests[] on the run object, then falls back to a pulls.list API call by head branch. Issue numbers are only inferred when the branch name matches the gh-aw convention agent/issue-N exactly — this is intentional to avoid misattributing unrelated branches (e.g. feature/fix-issue-12-auth). If the branch doesn't follow this pattern and pull_requests[] is empty, triggerNumber is null and no comment is posted.

Workaround: Pass trigger_number explicitly as an input to override resolution.

4. Token data for non-gh-aw `workflow_run` setups

If running an agent in a separate workflow without gh-aw, upload your own agent-tokens.json artifact in this format:

{
  "input_tokens": 1000,
  "output_tokens": 200,
  "cache_read_tokens": 500,
  "cache_write_tokens": 100
}

The action will pick it up automatically via workflow_run_id. See the README for a complete example.

Mode A caveats

5. `if: always()` is the user's responsibility

If the user omits if: always() on the AgentMeter step, failed agent runs won't be tracked. Documentation only — the action can't enforce this.

6. Codex token counts rely on an internal rollout file format

codex exec (via openai/codex-action) does not expose token usage through any documented public API. However, when running without --ephemeral, the Codex CLI writes a rollout JSONL file to:

$CODEX_HOME/sessions/YYYY/MM/DD/rollout-<timestamp>-<uuid>.jsonl

Each line is a JSON event. Token totals appear in token_count events:

{
  "type": "event_msg",
  "payload": {
    "type": "token_count",
    "info": {
      "total_token_usage": {
        "input_tokens": 479565,
        "output_tokens": 7489,
        "cached_input_tokens": 444416
      }
    },
    "rate_limits": null
  }
}

The last token_count event in the file contains cumulative totals for the full run.

How the workflow extracts tokens:

Set codex-home: /tmp/codex-home on openai/codex-action so the rollout path is known
After the codex step, find the latest rollout file with find /tmp/codex-home/sessions -name "rollout-*.jsonl" -printf "%T@ %p\n" | sort -rn | head -1
Grep for "token_count", take the last line, extract fields with jq
Pass input_tokens, output_tokens, cache_read_tokens as explicit inputs to the AgentMeter step

Stability caveat: The rollout format is an internal Codex CLI implementation detail, not a versioned public API. A future @openai/codex release could rename fields or restructure events. Failure is graceful — costs show as — if the rollout file is missing or unparseable.

Alternative path (codex exec --json): Running with --json writes JSONL to stdout with turn.completed events containing a usage field. However, openai/codex-action's final-message output reads from the output file, not stdout — so the JSONL stream is not accessible from within the action's step outputs. The tryExtractFromCodexExecJsonl function in token-extractor.ts handles this format for consumers who capture codex exec --json stdout directly.

What works regardless of mode

The action never fails the workflow — all errors are core.warning(), not core.setFailed().
GITHUB_TOKEN is always available via the github_token input default — no config needed.
Comment upsert (update-in-place) works correctly, including across both old 5-column and current 6-column comment formats.
All four token types (input, output, cache read, cache write) are tracked when available.
turns is auto-extracted from agent_output when not provided explicitly — Claude Code JSON (num_turns), Codex exec JSONL (turn.completed count), or regex fallback. The resolved value appears in both the ingest payload and the PR/issue comment.
Partial token overrides: providing only input_tokens still falls back to artifact or extracted values for the other fields.

Status table

Item	Status	Notes
Gate is gh-aw-specific	⚠️ Known limitation	Single-job workflows fine; multi-job non-gh-aw users risk duplicates
Lock file patching	⚠️ Manual step	Must re-patch after every `gh aw compile`; `scripts/patch-workflows.sh` automates this
Trigger number for non-standard branches	⚠️ Known limitation	Pass `trigger_number` explicitly as workaround
`if: always()` enforcement	⚠️ User error risk	Documentation only
Codex token extraction	⚠️ Internal format	Works in production; rollout format is not a public API — could break on CLI upgrade
Token data for non-gh-aw `workflow_run` users	✅ Documented	See README and challenge #4 above
Zip parsing	✅	Uses `fflate` — proper decompression
`githubRunId` in payload	✅	Uses agent run ID when `workflow_run_id` is set
Trigger number resolution	✅	`pull_requests[]` array + `pulls.list` API fallback; issue branch requires `agent/issue-N` prefix (gh-aw convention) to prevent misattribution
Trigger type resolution	✅	`issue_comment` correctly classified as `pr_comment` vs `issue_comment`
Status normalization	✅	Raw GitHub conclusion mapped internally; custom statuses pass through unchanged
Partial token overrides	✅	Per-field merge — partial overrides don't zero out unspecified fields
Multiple firing dedup	✅ for gh-aw	Gate on `conclusion` job name
Timestamps / duration	✅	Sourced from workflow run API; `null` when unavailable (falls back to action start/now); duration clamped to ≥0
Workflow name	✅	Uses agent workflow name, not companion
Comment format migration	✅	Old 5-column comments parsed correctly
Comment ordering	✅	Newest runs displayed first; 5 visible, rest in collapsible section
Comment posting	✅	Upsert by marker, paginated search, correct PR/issue number
`GITHUB_TOKEN` availability	✅	`github_token` input with `default: ${{ github.token }}`
Node.js version	✅	node24
Pricing	✅	Fetched from `/api/models/pricing`; prefix-match fallback for versioned IDs; `null` cache pricing shows `—`