Telemetry, Inspector, and Debugging
May 23, 2026 · View on GitHub
This doc explains the runtime-debugging surfaces. For actual live-debugging workflows, read dev/agents/accountycat-debugger/SKILL.md first.
Primary Files
ACShared/Telemetry/TelemetryStore.swiftAC/Core/BrainService+Telemetry.swiftAC/Core/TelemetryAdapters.swiftAC/Services/LLMTelemetryRecorder.swiftAC/Services/ACDebugBundleService.swiftAC/UI/DebugSheet.swiftAC/UI/StatsView.swiftACShared/Telemetry/MonitoringStatsSnapshot.swiftACShared/Evals/ACEvalModels.swiftACShared/Evals/ACEvalStore.swiftACInspector/InspectorController.swiftACInspector/TelemetryIndexStore.swiftACInspector/PromptLabRunner.swiftACTests/AgentEvalRunnerTests.swiftdev/agents/accountycat-eval/SKILL.md
Telemetry Model
TelemetryStore is the source of truth for verbose runtime telemetry.
It manages:
- telemetry sessions
- JSONL event append/load
- prompt/input/output artifacts
- screenshot artifacts and thumbnails
- session heartbeats
- retention cleanup
Sessions live under ~/Library/Application Support/AC/telemetry.
Important Constraint
Verbose telemetry is effectively Debug-build only.
TelemetryPersistencePolicy.storesVerboseTelemetry(debugMode:) currently returns ACBuild.isDebug, so toggling state.debugMode does not make Release builds behave like Debug builds.
Runtime Breadcrumbs vs Source of Truth
Use the right artifact for the job:
activity.logis for human-readable breadcrumbs- telemetry events are the source of truth for reconstructing behavior
- debug bundles are portable snapshots for offline triage
- ACInspector is the best local UI for browsing episodes and prompt artifacts
Debug Stats (in-app)
In Debug builds, the main app exposes a developer sheet (hammer icon in the chat panel header) with a Stats tab (StatsView → MonitoringStatsSnapshot.load).
The panel aggregates telemetry for a selectable 24h or 7d window from TelemetryStore. It is intended for monitoring mix triage and rough cost planning, not billing accuracy.
Usage totals
- Total tokens and LLM calls — prefers
llm_interactionevents (all kinds: monitoring, chat, memory, etc.); falls back to legacymodel_outputtoken usage when no interaction events exist in the window. - Reported cost — shown only when runtimes attach
costUSDon token usage records. - Vision — percentage of token-usage records that included a screenshot. Current
llm_interactionevents set this from the recorded request image path; legacy fallbackmodel_outputrecords set it from the saved screenshot artifact.
Rates
- Monitoring / active hr —
evaluation_requestedcount divided by active monitoring time (see below). Use this for extrapolation. - Monitoring / wall hr — same count spread across the full calendar window (24h or 7d). Useful only as a lower bound when AC ran briefly.
Active monitoring time
Derived from telemetry activity markers (observations, evaluations, policy/model stages, monitoring llm_interaction kinds, session heartbeats). Periods end on evaluation_skipped metrics with reason idle, on gaps longer than 30s without activity, or at the window edge. Idle backoff time is excluded.
Cost estimate rows
Extrapolate observed per-active-hour token/call rates to 10h/day and 10h × 7d/week, plus calendar-day averages over the selected window. Short sessions produce noisy projections until enough active monitoring time is recorded.
Refresh with the sheet’s reload control after more runtime; stats are cached per window on AppController.
ACInspector
ACInspector has two big jobs:
- inspect recorded telemetry episodes
- run Prompt Lab scenario replays
- capture and inspect durable eval cases from selected episodes
Prompt Lab lets you:
- import telemetry episodes into structured scenarios
- compare prompt sets, pipeline profiles, and runtime profiles
- inspect rendered prompts and outputs without changing the main app
The Eval Cases tab lists saved eval cases from ~/Library/Application Support/AC/evals/.
Use Create Eval Case from a selected episode to prefill a case from focus, chat,
or chat-action telemetry. Unsupported LLM interaction kinds remain inspectable but
are not eval-capturable until they have a case schema.
Eval case capture stores:
- one
case.jsonper case underevals/cases/<id>/ - copied screenshots/artifacts under that case folder so telemetry cleanup does not delete the eval evidence
manifest.json, regenerated on save/delete, with agent-readable summaries: id, name, kind, importance, categories, app/title context, screenshot presence, expected outcome summary, and recommended backend
Captured evals compare structured behavior, not exact wording. Focus evals compare assessment/action expectations. Chat evals compare action kinds and optional schedule kind while requiring a non-empty reply by default. Chat-action evals compare normalized action fields such as kind, intent, target, scope, duration, profile fields, memory text containment, and locked flag.
Agent Eval Runner
Agents should use dev/agents/accountycat-eval/SKILL.md when changing prompts,
monitoring behavior, chat command parsing, or action resolution. Beyond the
Inspector-captured cases described above, a curated synthetic suite
(ACTests/SyntheticEvalCases.swift, written by the runner's seed command) covers
the bulk of usage. docs/reference/eval-suite.md owns the wider context: what the
evals measure, suite composition, the pass-bar models, results, and known limitations.
Common commands:
swift dev/agents/accountycat-eval/scripts/ac-eval-runner.swift list --json
swift dev/agents/accountycat-eval/scripts/ac-eval-runner.swift list --kind focus --importance high,critical --category false_positive --json
swift dev/agents/accountycat-eval/scripts/ac-eval-runner.swift run --backend local --importance critical,high --limit 30 --json
swift dev/agents/accountycat-eval/scripts/ac-eval-runner.swift run --backend local --ids <case-id> <case-id> --json
Runner execution delegates to ACTests/AgentEvalRunnerTests.swift so evals reuse the
same AC code paths as tests. Online runs are explicit:
AC_EVAL_OPENROUTER_API_KEY=... swift dev/agents/accountycat-eval/scripts/ac-eval-runner.swift run --backend online --online-model <model> --ids <case-id> --json
AC_EVAL_OPENAI_API_KEY=... swift dev/agents/accountycat-eval/scripts/ac-eval-runner.swift run --backend online --online-model <model> --ids <case-id> --json
The eval runner does not read Keychain. The shell wrapper writes any supplied online
API key into the temporary runner request so the xcodebuild-hosted eval path sees
the same credential. Prefer local evals and filtered online slices; eval cases can
contain personal titles, chat messages, and screenshots.
Debug Bundles
ACDebugBundleService exports a compact, agent-readable bundle containing:
- a redacted current-state snapshot
- a summary of recent telemetry
- copied raw telemetry for the selected session
- activity log
- OpenRouter health snapshot when present
Bundles are a good handoff artifact when live inspection is inconvenient.
Live Debugging Entry Point
When the task is "why did AC do this?" or "why is runtime behavior wrong?", start here:
dev/agents/accountycat-debugger/SKILL.md- relevant references under
dev/agents/accountycat-debugger/references/ - this doc for storage/file-layout context
That skill is the intended triage path for telemetry-heavy debugging.
If You Change This Area
- Preserve enough telemetry to explain decisions after the fact.
- Keep Inspector assumptions aligned with emitted event schemas and artifact names.
- Update debug-bundle summaries when new event kinds become operationally important.
- Keep
MonitoringStatsSnapshotand this doc aligned when stats definitions, active-time rules, or cost projection assumptions change.