Known Challenges and Caveats

March 29, 2026 · View on GitHub

Honest assessment for maintainers and users.


Usage modes

Mode A — Inline (same workflow as the agent)

The action runs as a step directly after the agent step. GitHub context is fully available, timestamps are accurate, token counts come from explicit inputs or the agent's stdout.

- uses: anthropics/claude-code-action@v1
  id: agent
  with:
    anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
    prompt: "..."

- uses: agentmeter/agentmeter-action@main
  if: always()
  with:
    api_key: ${{ secrets.AGENTMETER_API_KEY }}
    status: ${{ steps.agent.outcome }}
    model: claude-sonnet-4-5
    input_tokens: ${{ steps.agent.outputs.input_tokens }}
    output_tokens: ${{ steps.agent.outputs.output_tokens }}

No meaningful caveats. Works with any agent framework that exposes token counts as outputs or stdout.

Mode B — Companion workflow (workflow_run trigger)

Required for gh-aw and any agent that runs in a separate workflow. The action triggers on workflow_run: completed and uses workflow_run_id to auto-resolve everything.

This is the complex mode and the source of the challenges below.


Mode B challenges

1. Gate is gh-aw-specific

When workflow_run_id is set, the action calls listJobsForWorkflowRun and checks for a job named exactly conclusion before proceeding. This prevents the ~5 duplicate workflow_run firings that gh-aw produces per agent run.

Limitation: Any other multi-job framework with a terminal job named something other than conclusion will pass through unconditionally — one ingest per job completion. Single-job workflows are fine.

Workaround: Use inline mode, or make the gate job name configurable (gate_job_name input, defaulting to conclusion).


2. Token data requires manual lock file patching (gh-aw)

The action reads token counts from an agent-tokens artifact uploaded by the agent job. In gh-aw, the .lock.yml is auto-generated by gh aw compile and doesn't include these steps. They must be manually patched in after every recompile.

Workaround: Commit .lock.yml to version control and treat the patch as a diff to reapply. The repo includes scripts/patch-workflows.sh for this. See the README for the exact steps to add.

Better fix: Upstream feature request to gh-aw for native token count outputs.


3. Trigger number is null for non-standard branch names

For workflow_run events, the action resolves the PR/issue number from pull_requests[] on the run object, then falls back to a pulls.list API call by head branch. Issue numbers are only inferred when the branch name matches the gh-aw convention agent/issue-N exactly — this is intentional to avoid misattributing unrelated branches (e.g. feature/fix-issue-12-auth). If the branch doesn't follow this pattern and pull_requests[] is empty, triggerNumber is null and no comment is posted.

Workaround: Pass trigger_number explicitly as an input to override resolution.


4. Token data for non-gh-aw workflow_run setups

If running an agent in a separate workflow without gh-aw, upload your own agent-tokens.json artifact in this format:

{
  "input_tokens": 1000,
  "output_tokens": 200,
  "cache_read_tokens": 500,
  "cache_write_tokens": 100
}

The action will pick it up automatically via workflow_run_id. See the README for a complete example.


Mode A caveats

5. if: always() is the user's responsibility

If the user omits if: always() on the AgentMeter step, failed agent runs won't be tracked. Documentation only — the action can't enforce this.


6. Codex token counts rely on an internal rollout file format

codex exec (via openai/codex-action) does not expose token usage through any documented public API. However, when running without --ephemeral, the Codex CLI writes a rollout JSONL file to:

$CODEX_HOME/sessions/YYYY/MM/DD/rollout-<timestamp>-<uuid>.jsonl

Each line is a JSON event. Token totals appear in token_count events:

{
  "type": "event_msg",
  "payload": {
    "type": "token_count",
    "info": {
      "total_token_usage": {
        "input_tokens": 479565,
        "output_tokens": 7489,
        "cached_input_tokens": 444416
      }
    },
    "rate_limits": null
  }
}

The last token_count event in the file contains cumulative totals for the full run.

How the workflow extracts tokens:

  1. Set codex-home: /tmp/codex-home on openai/codex-action so the rollout path is known
  2. After the codex step, find the latest rollout file with find /tmp/codex-home/sessions -name "rollout-*.jsonl" -printf "%T@ %p\n" | sort -rn | head -1
  3. Grep for "token_count", take the last line, extract fields with jq
  4. Pass input_tokens, output_tokens, cache_read_tokens as explicit inputs to the AgentMeter step

Stability caveat: The rollout format is an internal Codex CLI implementation detail, not a versioned public API. A future @openai/codex release could rename fields or restructure events. Failure is graceful — costs show as if the rollout file is missing or unparseable.

Alternative path (codex exec --json): Running with --json writes JSONL to stdout with turn.completed events containing a usage field. However, openai/codex-action's final-message output reads from the output file, not stdout — so the JSONL stream is not accessible from within the action's step outputs. The tryExtractFromCodexExecJsonl function in token-extractor.ts handles this format for consumers who capture codex exec --json stdout directly.


What works regardless of mode

  • The action never fails the workflow — all errors are core.warning(), not core.setFailed().
  • GITHUB_TOKEN is always available via the github_token input default — no config needed.
  • Comment upsert (update-in-place) works correctly, including across both old 5-column and current 6-column comment formats.
  • All four token types (input, output, cache read, cache write) are tracked when available.
  • turns is auto-extracted from agent_output when not provided explicitly — Claude Code JSON (num_turns), Codex exec JSONL (turn.completed count), or regex fallback. The resolved value appears in both the ingest payload and the PR/issue comment.
  • Partial token overrides: providing only input_tokens still falls back to artifact or extracted values for the other fields.

Status table

ItemStatusNotes
Gate is gh-aw-specific⚠️ Known limitationSingle-job workflows fine; multi-job non-gh-aw users risk duplicates
Lock file patching⚠️ Manual stepMust re-patch after every gh aw compile; scripts/patch-workflows.sh automates this
Trigger number for non-standard branches⚠️ Known limitationPass trigger_number explicitly as workaround
if: always() enforcement⚠️ User error riskDocumentation only
Codex token extraction⚠️ Internal formatWorks in production; rollout format is not a public API — could break on CLI upgrade
Token data for non-gh-aw workflow_run users✅ DocumentedSee README and challenge #4 above
Zip parsingUses fflate — proper decompression
githubRunId in payloadUses agent run ID when workflow_run_id is set
Trigger number resolutionpull_requests[] array + pulls.list API fallback; issue branch requires agent/issue-N prefix (gh-aw convention) to prevent misattribution
Trigger type resolutionissue_comment correctly classified as pr_comment vs issue_comment
Status normalizationRaw GitHub conclusion mapped internally; custom statuses pass through unchanged
Partial token overridesPer-field merge — partial overrides don't zero out unspecified fields
Multiple firing dedup✅ for gh-awGate on conclusion job name
Timestamps / durationSourced from workflow run API; null when unavailable (falls back to action start/now); duration clamped to ≥0
Workflow nameUses agent workflow name, not companion
Comment format migrationOld 5-column comments parsed correctly
Comment orderingNewest runs displayed first; 5 visible, rest in collapsible section
Comment postingUpsert by marker, paginated search, correct PR/issue number
GITHUB_TOKEN availabilitygithub_token input with default: ${{ github.token }}
Node.js versionnode24
PricingFetched from /api/models/pricing; prefix-match fallback for versioned IDs; null cache pricing shows