Harness
May 5, 2026 · View on GitHub
A condensed roadmap matching the implementation plan at ~/.claude/plans/harness-application-foundation.md. This is the build order; status is tracked in the GitHub issue tracker once we have one.
Phase 0 — Foundation (this commit)
- ✅ git init +
.gitignore+README.md. - ✅
standards/library (12 numbered files + INDEX + AUDIT_CHECKLIST). - ✅
docs/(PRD, ARCHITECTURE, ROADMAP). - ✅
docs/PROMPTS/(system prompt, persona defaults, friction vocab). - ✅
wiki/scaffold (Home + 17 reference pages). - ✅
CLAUDE.mdroot. - ✅ Design system + primitives + screen drafts already shipped in
HarnessDesign/.
Phase 1 — Plumbing ✅ shipped 2026-05-03
Deliverable: an Xcode project that builds, with the service skeletons wired up and unit-tested in isolation.
- Create
Harness.xcodeproj(single Mac target, non-sandboxed entitlements). Generated fromproject.ymlvia xcodegen. - Include
HarnessDesign/source in the main app target. (Revised from "Swift Package" — graduated to Package later if/when we add a second target.) -
Harness/Core/HarnessPaths.swift— every filesystem-path constant. -
Harness/Core/Models.swift—GoalRequest,ProjectRequest,SimulatorRef,Step,ToolCall,ToolKind,ToolInput,ToolResult,FrictionEvent,FrictionKind,Verdict,RunOutcome,RunEvent,UserApproval. -
Harness/Services/ProcessRunner.swift— actor; cancellation; timeout; streaming variant; explicit Pipe close ondefer. -
Harness/Services/ToolLocator.swift— actor; resolves xcrun / xcodebuild / idb / idb_companion / brew; 12h cache. -
Harness/Services/KeychainStore.swift—SecItem*wrapper; convenience methods for the Anthropic key. -
Harness/Services/XcodeBuilder.swift—xcodebuildwrapper; derived data isolated; signing-error mapping;.appartifact pickup. -
Harness/Services/SimulatorDriver.swift— fullSimulatorDrivingprotocol; pixel→point conversion intoPoints(unit-tested); idempotent boot. -
Harness/Services/ClaudeClient.swift— single-shotstep(_:); prompt-caching markers; full tool-call parsing; typedClaudeErrorcases. -
Harness/Tools/AgentTools.swift—ToolSchema.toolDefinitions(cacheControl:)matching Tool-Schema. - Tests: 28 across
HarnessPaths,ProcessRunner(cancellation, streaming),KeychainStore(round-trip),SimulatorDriver(coord scaling, simctl JSON parsing),AgentTools(schema invariants).
Wiki updates landed: Core-Services.md with shipped status, Build-and-Run.md filled in, Design-System.md reconciled to actual API names (Theme.* / HFont.* / Color.harness*), Simulator-Driver.md linked.
Carries forward into Phase 2: the smoke "boot sim → screenshot → send to Claude → print response" CLI is deferred to Phase 2 — once RunCoordinator exists, it's a thin orchestration on top, not an independent target.
Phase 2 — Agent loop ✅ shipped 2026-05-03
Deliverable: a non-UI run end-to-end. Goal in → loop runs → events.jsonl + screenshots on disk → verdict out.
-
Harness/Tools/AgentTools.swift— landed in Phase 1. -
Harness/Domain/AgentLoop.swift— actor;HistoryCompactor(last-6 turns kept full, older screenshots dropped); cycle detector viaScreenshotHasherdHash + tool-call equivalence; parse-failure retry (cap 2); step + token budget short-circuits. -
Harness/Services/RunLogger.swift+Harness/Services/RunLogParser.swift— append-only JSONL with per-rowsynchronize(); meta.json snapshot; tolerant parser withvalidateInvariants(_:). -
Harness/Services/RunHistoryStore.swift— SwiftData container withRunRecord+ProjectRef;VersionedSchema+ migration plan in place;inMemory()for tests. -
Harness/Domain/RunCoordinator.swift— actor;run(_:approvals:)returnsAsyncThrowingStream<RunEvent>; build → boot → install → launch → loop → log → cleanup; step-mode approval gate viaAsyncStream<UserApproval>. - Prompt library:
Harness/Core/PromptLibrary.swift+ xcodegenresources: docs/PROMPTS type: folder.AgentLoopcaches the system prompt after first load. - Replay test infrastructure:
MockLLMClient(scripted-sequence + lookup-closure modes),FakeXcodeBuilder,FakeSimulatorDriverwith synthesized solid-color PNGs. - Replay tests: happy path, cycle detector trip, step budget short-circuit. All green.
Wiki updates landed: Core-Services flips RunLogger/RunLogParser/RunHistoryStore/RunCoordinator/AgentLoop/PromptLibrary to ✅; Agent-Loop.md filled with the prose walkthrough; Run-Logger.md filled with implementation detail.
Test count: 50 across 12 suites.
Phase 3 — Visibility ✅ shipped 2026-05-03
Deliverable: full UI shell. End-to-end manual test path open from the goal-input screen through to replay.
-
Harness/App/AppState.swift(apiKeyPresent / toolPaths / simulators / defaults) +AppCoordinator.swift(selectedSection, activeRunID, modal flags) +AppContainer.swift(DI root, pending-run hand-off). -
Harness/App/HarnessApp.swift— NavigationSplitView shell, sheet routing for first-run wizard / settings / replay, ⌘N + ⌘, command bindings. -
Harness/App/SidebarView.swift— section picker + tooling health rows. -
Harness/App/FirstRunWizard.swift— API key, xcodebuild + idb health, simulator list, copy-paste install commands. -
Harness/Features/GoalInput/—xcodebuild -list -jsonscheme resolution, simulator picker, persona/goal text, mode + model + step-budget controls. Hand-off viaAppContainer.stagePendingRun(_:). -
Harness/Features/RunSession/—RunSessionViewModelconsumesAsyncThrowingStream<RunEvent>fromRunCoordinator. Live mirror viasimctl screenshotpoller @ 3 fps. Step feed scrolls automatically.ApprovalCardwired to step-mode approval gate viaAsyncStream<UserApproval>. Stop button (⌘.) cascades cancellation. -
Harness/Features/RunHistory/— SwiftData-backed list withVerdictPill, double-click to open replay, context-menu Reveal-in-Finder + Delete. -
Harness/Features/RunReplay/—RunReplayViewModelparsesevents.jsonlviaRunLogParser, scrubber + ←/→ keys, observation/intent/tool/friction per step. -
Harness/Features/Settings/— API key replace, default model + mode + step budget, tooling re-detect. -
Harness/Domain/Mappers.swift— adapters between productionVerdict / ToolKind / FrictionKind / ToolCalland the HarnessDesignPreview*placeholder types the primitives consume. Cheap conversion at the binding layer; lets primitives stay as the design package shipped them. - Removed
HarnessDesign/Screens/*— those were the original "layout drafts with mock data" and now collide with the real Features views by filename. Primitives + DesignSystem stay. - Bundled
docs/PROMPTS/*.mdas Resources —project.ymlbuildPhase: resourcespulls the folder intoHarness.app/Contents/Resources/PROMPTS/.PromptLibraryreads viaBundle.main. - Smoke launched: built
Harness.appfromxcodebuild,open'd it, process visible.
Build status: clean (Swift 6 strict concurrency, no warnings on the new code). Test status: 50 tests across 12 suites, still all green (Phase 1 + 2 unchanged).
Wiki updates carrying forward: Adding-a-Feature.md will be filled with the GoalInput recipe in a follow-up; FrictionReport deferred (the run-session feed + replay surface friction inline today).
Phase 4 — Polish ✅ shipped 2026-05-04
- Cycle detector + step/token-budget bail-outs verified. (
RunCoordinatorReplayTests.cycleDetectorTrips()+stepBudgetShortCircuit().) - Stop button cascades cancellation reliably. (
RunSessionViewModel.stop()→ approval gate.stop+runTask.cancel();endInputSessionRunsOnThrowcovers the failure path.) - Coordinate-overlay visualization (last-tap dot animates correctly across resolutions). (
SimulatorMirrorViewscaleslastTapPointbyframe.width / deviceSize.width; agent and user-forwarded taps both wireRunSessionViewModel.lastTapPoint.) - Run filtering / search / export. (
RunHistoryView.searchable+SegmentedToggleoverVerdictFilter; right-click → "Export Run…" zips the run dir via/usr/bin/zipto anNSSavePaneldestination.) - Crash-resilience: partial-run replay loads without crash. (
CrashResilienceTestscovers mid-row truncation, mid-step truncation, and trailing-garbage scenarios;RunReplayViewModelTestscovers the zero-step case.) - Design-system unification: every feature view consumes
HarnessDesignprimitives +Theme.*/HFont.*/Color.harness*tokens. No more.red/.green/.orangeliterals or magic paddings.
Deferred from this phase:
- Code-sign + notarize a Developer ID build. Apple Development codesigning works today (see
f80bf98 fix(XcodeBuilder,WDABuilder): ad-hoc sign…); the full notarytool + Developer ID + distribution pipeline waits until v1 ship.
Phase 5 — WebDriverAgent migration ✅ shipped 2026-05-03
Replaces idb (broken on iOS 26+ — taps render the green dot but never reach the responder chain) with WebDriverAgent. Same SimulatorDriving protocol surface; only the implementation changes.
- Phase A — vendor
appium/WebDriverAgentat v12.2.0 as a git submodule (vendor/WebDriverAgent). - Phase B —
WDABuilderbuilds + caches the.xctestrunper iOS version under~/Library/Application Support/Harness/wda-build/iOS-<ver>/. Submodule SHA gates rebuild. - Phase C —
WDARunnerspawns / stops the long-runningxcodebuild test-without-building. Cancellation flows through the streaming task → SIGTERM. - Phase D —
WDAClientURLSession HTTP client for WDA's W3C //wda/*endpoints. Retries 5xx + connection-refused; URLProtocol-mocked tests assert request shapes. - Phase E + F —
SimulatorDriverbecomes an actor; input methods route toWDAClient. NewstartInputSession/endInputSession/cleanupWDA. RunCoordinator's lifecycle is nowcleanupWDA → boot → install → launch → startInputSession → loop → endInputSession(the last always runs, even on failure). - Phase G —
SimulatorWindowControllerhides Simulator.app at run start so Harness's mirror is the only visible surface. Toggle viaAppState.keepSimulatorVisible. - Phase H — drop idb / idb_companion from
ToolLocator, AppState, FirstRunWizard, SidebarView, Settings. WebDriverAgent readiness shows in their place. - Phase I — standards / wiki / docs / tests rewritten for the new pipeline.
Phase 6 — Workspace rework ✅ shipped 2026-05-04
The product question after Phases 1–4 was throughput: composing a run took the user through project picker → scheme → simulator → persona → goal every single time, and after the run, all that context was gone. Phase 6 introduces the missing abstractions so a single context selection persists indefinitely and complex multi-step user tests run in one go.
- Phase A — SwiftData V2 (
5d2fcae). New@Models:Application,Persona,Action,ActionChain,ActionChainStep.RunRecordgains optional refs + mirrored lookup-IDs. V1→V2 custom migration backfills one Application per distinct (projectPath, scheme) tuple from existing run history;ProjectRefis folded intoApplicationand dropped. Snapshot value types inModels.swift. - Phase B — Applications + scope sidebar (
9aa2fdb). Sidebar splits intoLIBRARY(always) andWORKSPACE(gated onselectedApplicationID). Active Application card sits between them. Applications module ships full CRUD + create/edit sheets + recent-runs panel.ProjectPickerextracted toHarness/Services/so both Applications create and the run form share the picker.selectedApplicationIDpersisted insettings.json; stale ids cleared on launch. - Phase C — Personas library (
ce12e65). List/detail UI, create/duplicate/edit/archive flows. Built-ins seeded idempotently fromdocs/PROMPTS/persona-defaults.mdviaPromptLibrary.parseMarkdownSections; built-in personas are read-only with a "Duplicate to edit" CTA. - Phase D — Actions + Action Chains (
1b839ff). Two-tabActionsViewwith singleActionsViewModelover both collections. Actions: name + prompt + notes + "used in N chains" badge. Chains: drag-to-reorder editable step list with per-steppreservesStatetoggle, draft warning for zero-step chains, broken-linkFrictionTagrows for steps pointing at deleted Actions. - Phase E — Compose Run + chain executor + JSONL v2 (
f2947d5).GoalRequestrenamed toRunRequestwithname/applicationID/personaID/payload: RunPayload(.singleAction/.chain/.ad_hoc).Harness/Domain/ChainExecutor.swiftorchestrates multi-leg runs: per-legAgentLoopreset (cycle detector + step budget reset), per-leg JSONLleg_started/leg_completedrows,preservesStatetoggle controls whether the simulator reinstalls between legs. Aggregate verdict: all-success → success, any failure/blocked → abort + skip remaining legs. JSONL bumped to v2; parser stays tolerant of v1 logs (wraps them in one virtual leg).TimelineScrubbergains optional leg-boundary ticks.RunHistoryDetailViewsummary grid grows a "Legs" cell whenlegs.count > 1.FrictionReportViewgroups cards by leg for chain runs.
Test count progression: 112 → 120 (Phase A) → 125 (B) → 133 (C) → 141 (D) → 155 (E). All green.
Phase 7+ — deferred
See PRD.md "Deferred / future ideas" + docs/DESIGN_BACKLOG.md for tracked follow-ups. Track as GitHub issues once the repo is public.
Tracking
Per-PR status: PRs reference standards touched (e.g., "Standards: 03, 13") in their description and run the audit checklist before requesting review.
Per-phase status: each phase ends with a tagged commit phase-1, phase-2, etc., for easy diff windowing.