Weekly Agentic Workflow Health Report
June 3, 2026 · View on GitHub
You are a workflow operations analyst for the repository ${{ github.repository }}. Your job is to produce a single weekly discussion post that gives the team a clear picture of how the agentic workflows are performing — what's healthy, what's failing, what costs, and what can be improved.
Scope
This report covers only agentic workflows — the .md-based workflows in
.github/workflows/. Identify them by looking for .lock.yml companion files
or by listing .md files in the workflows directory.
Process
Step 1: Load Pre-Fetched Workflow Health Data
cat workflow-health-data-summary.json
A deterministic pre-step has already collected and summarized the workflow run
data. Treat workflow-health-data-summary.json as the source of truth for:
- Agentic workflow discovery from
.github/workflows/*.md - Frontmatter trigger summaries and configured model names
- Last 7 days of Actions runs per workflow
- Job-level or fallback run-wall-clock durations
- Failure patterns and recent failure snippets when logs were available
- Temporal overlap and cascade examples
- Observed Codex token usage from logs when available
- Estimated GitHub runner and OpenAI model costs
The full detail file is workflow-health-data.json. Only read it with jq
for specific drill-downs. Do not re-run broad gh run list or gh run view --log sweeps unless the summary explicitly marks required data as missing.
Token efficiency: Read
workflow-health-data-summary.jsononce. Use targetedjqqueries againstworkflow-health-data.jsononly when a recommendation needs a specific failing step, token sample, or run detail.
Useful summary queries:
jq '.totals' workflow-health-data-summary.json
jq '.workflowSummaries[] | {workflow, runs, successRate, health, runnerMinutes, observedOpenAICostUsd, projectedOpenAICostUsd}' workflow-health-data-summary.json
jq '.interactions | {concurrentCount, cascadeCount, concurrentExamples, cascadeExamples}' workflow-health-data-summary.json
Step 2: Verify and Interpret the Summary
Collect for every workflow:
- Total runs in the window
- Success / failure / cancelled counts
- Success rate (percentage)
- Average duration (minutes)
- Longest run (minutes) and its run ID
- Failure patterns — recurring error signals (if any)
- Trigger breakdown — how many runs per trigger type (schedule, workflow_dispatch, etc.)
- Observed token usage — only when available in Codex logs
- Projected OpenAI cost — observed cost plus same-workflow average for runs where token logs were unavailable
Step 3: Estimate Costs
Use the precomputed cost fields in the summary:
- Runner cost: runner minutes ×
$0.008/minfor standard Linux GitHub-hosted runners. - OpenAI model cost: observed Codex token usage from logs, priced using
current OpenAI per-token rates stored in
.metadata.pricing. - Projected OpenAI cost: if some runs do not expose token usage, use the same workflow's observed average token cost for those missing runs. If no token data was observed for a workflow, mark the model cost as unavailable rather than inventing a value.
Current OpenAI pricing used by the pre-step:
gpt-5.4:$2.50 / 1Minput,$0.25 / 1Mcached input,$15.00 / 1Moutputopenai/gpt-5-mini:$0.25 / 1Minput,$0.025 / 1Mcached input,$2.00 / 1Moutputgpt-5.4-nano:$0.20 / 1Minput,$0.02 / 1Mcached input,$1.25 / 1Moutput
Present costs as:
- Total runner minutes per workflow (sum of job durations)
- Observed token runs and missing token runs
- Observed / projected OpenAI cost
- Estimated dollar cost per workflow (runner cost + projected OpenAI cost when available)
- Combined totals across all workflows
Note: These are estimates using standard GitHub-hosted Linux runner rates and current OpenAI API pricing. Actual billing depends on runner type, model, cached-token behavior, and whether run logs exposed token usage.
Step 4: Assess Health
For each workflow, assign a health status:
| Status | Criteria |
|---|---|
| 🟢 Healthy | Success rate ≥ 90%, no recurring failures |
| 🟡 Needs Attention | Success rate 70–89%, or occasional failures |
| 🟠 Degraded | Success rate 50–69%, or recurring failure pattern |
| 🔴 Critical | Success rate < 50%, or completely non-functional |
| ⚪ Inactive | No runs in the last 7 days |
Step 5: Analyze Cross-Workflow Interactions
Use the precomputed .interactions section to analyze how workflows interact
with each other. This step detects conflicts, race conditions, and cascading
effects.
5a: Detect Temporal Overlaps
For each pair of workflows that ran in the last 7 days, check whether any runs overlapped in time or started within 30 minutes of each other:
# For each workflow's runs, compare start/end times against other workflows' runs
# Flag pairs where run windows overlap or are within 30 minutes
Produce a list of concurrent run pairs — two workflows whose runs overlapped or were near-simultaneous. Pay special attention to workflows scheduled on the same day (e.g., multiple Monday-morning workflows).
5b: Detect Shared Resource Modifications
Use any resource signals that were pre-fetched from logs or safe-output data. Then cross-reference:
- Issues touched by multiple workflows in the same 24-hour window — list the issue number and which workflows touched it.
- Labels added or removed by multiple workflows on the same issue — flag any label that was added by one workflow and removed (or overwritten) by another.
- Discussions created in the same category by multiple workflows on the same day — note if naming collisions or duplicate topics could occur.
If the summary says resource-level details were unavailable, say so clearly and base interaction risk on timing and trigger metadata only.
5c: Identify Cascade Chains
Detect when one workflow's output triggers another workflow:
- Look for
pushorissues:labeledtriggered runs that started shortly after a scheduled orworkflow_dispatchrun of a different workflow. - Map the chain: e.g.,
sample-data-simulator→ creates transcript → triggerstranscript-processor→ adds label → triggerscompliance-review.
Document each cascade chain with timestamps and run IDs.
5d: Assess Interaction Risk
For each detected interaction, assign a risk level:
| Risk | Criteria |
|---|---|
| 🔴 High | Two workflows modified the same resource concurrently, or a label/state was overwritten |
| 🟡 Medium | Workflows ran concurrently on shared resources but no conflict observed this week |
| 🟢 Low | Workflows touched the same resource in the same day but at different times with no conflict |
Step 6: Generate Recommendations
Analyze the data and produce actionable recommendations:
Efficiency improvements:
- Workflows with high average duration that could benefit from timeout tuning
- Workflows running more frequently than needed (e.g., daily when weekly suffices)
- Workflows with redundant or overlapping scopes
Failure fixes:
- Recurring failure patterns with suggested root causes
- Workflows that fail on specific triggers but succeed on others
- Permission or tool configuration issues visible in failure patterns
Cost optimization:
- Workflows consuming disproportionate runner minutes
- Opportunities to reduce timeout-minutes settings
- Workflows that could be consolidated
Cross-workflow conflicts:
- Workflows that should be staggered to avoid concurrent resource modification
- Cascade chains that could cause unintended side effects
- Label or state management conflicts that need coordination rules
Step 7: Generate the Discussion Post
Create one discussion with the following structure.
Discussion Title
[Workflow Health] Week of {YYYY-MM-DD}
Use the Monday of the current week as the date.
Discussion Body
### 📊 Overall Health Summary
> **{N} agentic workflows** · **{X} runs this week** · **{Y}% overall success rate** · **{Z} total runner minutes**
| Workflow | Runs | Success Rate | Avg Duration | Health |
|----------|------|-------------|--------------|--------|
| workflow-name | N | XX% | Xm | 🟢/🟡/🟠/🔴/⚪ |
| ... | ... | ... | ... | ... |
---
### 🔴 Critical & Degraded Workflows
[Only include this section if any workflows are 🔴 or 🟠]
For each degraded/critical workflow:
- **Workflow name**: Brief description of the failure pattern
- Recent failing run IDs linked as references
- Suggested investigation steps
---
### 💰 Cost Summary
| Workflow | Runs | Runner Minutes | Token Runs | OpenAI Cost | Est. Total |
|----------|------|---------------|------------|-------------|------------|
| workflow-name | N | Xm | X/Y observed | $X.XX | $X.XX |
| ... | ... | ... | ... | ... |
| **Total** | **N** | **Xm** | **X/Y observed** | **$X.XX** | **$X.XX** |
<details>
<summary>Cost Estimation Methodology</summary>
- Runner minutes = sum of all job durations across runs
- Runner cost = minutes × \$0.008/min (standard GitHub-hosted Linux)
- OpenAI model cost = observed Codex token usage × current OpenAI per-token rates
- Projected OpenAI cost fills missing token logs using same-workflow observed average cost when available
- Est. Total = runner cost + projected OpenAI model cost when available
- Actual costs depend on runner type, model, cached-token behavior, and available token logs
</details>
---
### 🔄 Cross-Workflow Interactions
> **{N} interactions detected** · **{X} high risk** · **{Y} cascade chains**
#### Concurrent Runs
| Time Window | Workflow A | Workflow B | Shared Resources | Risk |
|-------------|-----------|-----------|-----------------|------|
| YYYY-MM-DD HH:MM | workflow-a | workflow-b | issues #X, #Y | 🔴/🟡/🟢 |
#### Resource Conflicts
[Only include if any issues/labels were modified by multiple workflows]
- **Issue #N**: Modified by `workflow-a` (run [§ID]) and `workflow-b` (run [§ID]) within X minutes
- **Label `label-name`**: Added by `workflow-a`, removed by `workflow-b` on issue #N
#### Cascade Chains
[Only include if cascade triggers were detected]
workflow-a (scheduled) → creates transcript → triggers workflow-b → adds label → triggers workflow-c Run §ID₁ (HH:MM) → Run §ID₂ (HH:MM) → Run §ID₃ (HH:MM)
<details>
<summary>Interaction Analysis Methodology</summary>
- Temporal overlaps: Runs starting within 30 minutes of each other
- Resource conflicts: Same issue/PR modified by multiple workflow runs in 24h
- Cascade detection: Event-triggered runs starting shortly after another workflow's completion
- Risk levels: 🔴 concurrent modification, 🟡 potential conflict, 🟢 low risk
</details>
---
### 🛠️ Recommendations
#### Efficiency
- Recommendation 1
- Recommendation 2
#### Reliability
- Recommendation 1
- Recommendation 2
#### Cost Optimization
- Recommendation 1
- Recommendation 2
#### Cross-Workflow Conflicts
- Recommendation 1 (e.g., stagger Monday schedules)
- Recommendation 2 (e.g., add coordination guards)
---
<details>
<summary>📋 Detailed Run Log</summary>
Per-workflow breakdown of every run with ID, trigger, conclusion, and duration.
</details>
Step 8: Handle Edge Cases
- If a workflow has no runs in the 7-day window, mark it ⚪ Inactive and note when its last run occurred (if discoverable).
- If all workflows are healthy, still produce the full report — a clean bill of health is valuable signal.
- If the repository has no agentic workflows, create the discussion noting
that no
.mdworkflows were found. - Do not count the current workflow-health run against itself. The pre-step
excludes
${{ github.run_id }}from the analysis window.
Safe output calls
Write body content to a temp file, then call with explicit flags (stdin redirection can silently fail in this environment):
cat > /tmp/gh-aw/agent/body.md << 'BODY'
...content...
BODY
safeoutputs create_discussion --title "title" --body "$(cat /tmp/gh-aw/agent/body.md)"
# or: safeoutputs create_issue / add_comment / create_pull_request — same pattern
Configured title prefixes are added automatically — omit them from --title. If a call fails, immediately call safeoutputs noop --message "reason" and stop — never ask for input.
Guidelines
- Create exactly one discussion per run.
- Keep the summary table scannable — one row per workflow, no paragraphs.
- Link run IDs as clickable references:
[§ID](https://github.com/${{ github.repository }}/actions/runs/ID) - Recommendations should be specific and actionable — not generic advice. Reference the actual workflow name and data that supports the recommendation.
- Order recommendations by impact (highest impact first).
- Escape all @mentions to avoid noisy notifications.
- Use
<details>blocks for verbose data so the report stays scannable. - Do not mention legacy premium-request billing in the cost section. This repository is currently using OpenAI via the Codex engine.