Fusion Architecture
July 5, 2026 · View on GitHub
This document describes the actual architecture of Fusion as implemented in this repository (gsxdsm/fusion). It is intended as a practical onboarding map for developers and AI agents.
1) Overview
Fusion is an AI-orchestrated task board. It takes tasks through a structured lifecycle (planning → todo → in-progress → in-review → done → archived) and automates planning, execution, review, merge, and operational recovery.
At a high level, Fusion is split into:
- Core domain + persistence (
@fusion/core) - Execution engine (
@fusion/engine) - Dashboard API + SPA (
@fusion/dashboard) - CLI + Pi extension (
@runfusion/fusion) - Desktop shell (
@fusion/desktop) - Mobile shell (
@fusion/mobile) - Terminal dashboard (part of
@runfusion/fusion— seepackages/cli/src/commands/dashboard-tui/)
Native shells expose a shared host-neutral bridge at window.fusionShell for first-run shell onboarding, connection profile persistence, and active shell mode/profile state. The dashboard consumes window.fusionShell when present and degrades cleanly in plain web/PWA mode.
The dashboard also has a canonical host-context bootstrap layer (packages/dashboard/app/shell-host.ts) that normalizes launch metadata into one discriminated union:
{ kind: "browser" }{ kind: "desktop-shell", mode?, connectionId?, serverUrl?, canOpenConnectionManager? }{ kind: "mobile-shell", mode?, connectionId?, serverUrl?, canOpenConnectionManager? }
Detection priority is deterministic: explicit bootstrapped global from shell handoff → shell handoff query params → desktop fallback via window.fusionAPI presence → browser fallback. Shell-only query params are stripped at bootstrap via history.replaceState.
React consumers read this through ShellHostProvider / useShellHostContext (packages/dashboard/app/context/ShellHostContext.tsx). Do not add ad-hoc host checks in components.
Dashboard chrome now resolves connection-management capabilities through packages/dashboard/app/shell-native.ts (getShellConnectionNativeResult) and renders status/actions via ShellConnectionStatus. Components should receive derived props from App-level wiring, not read window.fusionAPI/window.fusionShell directly.
Important distinction: NodeContext.isRemote indicates browsing a remote mesh node inside the current dashboard instance; shell host mode: "remote" indicates how native desktop/mobile launched into this dashboard server. These are separate axes and must not be conflated in UI or routing logic.
window.fusionShell bridge contract
Canonical dashboard-side types live in packages/dashboard/app/types/native-shell.d.ts.
Shared bridge methods used by dashboard/mobile/desktop flows:
getState()listProfiles()saveProfile(profile)deleteProfile(profileId)setActiveProfile(profileId)setDesktopMode(mode)startQrScan()openConnectionManager()subscribe(listener)
Shared shell state contract (ShellConnectionState):
host("web" | "mobile-shell" | "desktop-shell")desktopMode("local" | "remote", optional)activeProfileIdprofileslocalServer(status, optionalport, optionalerror)
Desktop-specific bootstrap extension:
- Electron preload also exposes
getDesktopModeState()for first-run desktop mode selection ({ isFirstRun, desktopMode }). - Electron preload exposes
window.fusionAPI.openConnectionManager()as the renderer-safe desktop entry point for opening native connection management. - The dashboard itself does not depend on that preload-only helper for steady-state rendering; it consumes shared shell state via
ShellContext(packages/dashboard/app/context/ShellContext.tsx).
Persistence ownership by host:
- Mobile shell persists connection profiles + active profile with Capacitor Preferences (
packages/mobile/src/plugins/connection-profiles.ts). - Desktop shell persists shell settings in app-owned JSON at
app.getPath("userData")/shell-connections.json(packages/desktop/src/shell-settings.ts).
These are shell-owned persistence layers, intentionally separate from Fusion project/global settings.
Shell contract regression matrix (FN-3409)
Cross-package automated tests now lock:
- Mobile shell: first-run remote onboarding inputs (QR/manual + optional token), saved-profile edit/switch, and restore-on-reinit persistence.
- Desktop shell: first-run/last-used mode restore, local-vs-remote startup behavior, and preload bridge channel compatibility for connection management.
- Dashboard shell awareness: canonical per-viewport connection-manager entry placement, browser-safe fallback (no shell-only controls), and host-context/native-helper resolution without ad-hoc window bridge access.
- Sensitive data handling: dashboard-facing native status surfaces expose profile label/origin metadata only; auth tokens are not surfaced.
High-level runtime diagram
┌──────────────────────────────┐
│ Human + AI Interactions │
│ (Dashboard SPA, CLI, Pi) │
└──────────────┬───────────────┘
│
┌──────────────────────┼──────────────────────┐
│ │ │
┌─────────▼─────────┐ ┌─────────▼─────────┐ ┌─────────▼─────────┐
│ Dashboard (API) │ │ CLI `fn` router │ │ Pi extension tools │
│ + React SPA │ │ + TUI component │ │ (extension.ts) │
│ (lazy-loaded) │ │ (commands/*) │ │ │
└─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘
└──────────────┬────────┴──────────────┬───────┘
│ │
┌────────▼───────────────────────▼───────┐
│ Engine Runtime │
│ Scheduler / Planning / Executor / Merger │
│ Heartbeat / Self-healing / Autopilot │
└────────┬───────────────────────┬────────┘
│ │
┌───────────▼──────────┐ ┌────────▼─────────────┐
│ @fusion/core │ │ External systems │
│ stores + types │ │ git, GitHub, models │
└───────┬──────────────┘ └───────────────────────┘
│
┌────────────────▼────────────────┐
│ Persistence │
│ - .fusion/fusion.db (SQLite/WAL)
│ - .fusion/tasks/* (PROMPT/logs)
│ - ~/.fusion/fusion-central.db │
└──────────────────────────────────┘
2) Monorepo Structure
| Package | Published | Role | Key files |
|---|---|---|---|
@fusion/core | Private | Domain model, stores, SQLite adapters, settings, shared types | packages/core/src/types.ts, store.ts, db.ts, central-core.ts, agent-store.ts |
@fusion/engine | Private | AI orchestration runtime (planning, scheduler, executor, merger, recovery) | planning processor, scheduler.ts, executor.ts, merger.ts, project-runtime.ts |
@fusion/dashboard | Private | Express API server + React app | packages/dashboard/src/server.ts, routes.ts, sse.ts, websocket.ts, packages/dashboard/app/App.tsx |
@runfusion/fusion | Published | CLI binary (fn) + Pi extension | packages/cli/src/bin.ts, commands/*, project-resolver.ts, extension.ts |
@fusion/desktop | Private | Electron shell around Fusion dashboard/client | packages/desktop/src/main.ts, ipc.ts, preload.ts, scripts/build.ts |
@fusion/mobile | Private | Capacitor + PWA mobile packaging of dashboard assets | packages/mobile/capacitor.config.ts, packages/mobile/src/* |
@fusion/plugin-sdk | Private | Plugin SDK for building Fusion extensions | packages/plugin-sdk/src/* |
3) Package Dependencies
Workspace dependency graph
A ──▶ B means A depends on B.
@fusion/engine ───────────────▶ @fusion/core
@fusion/dashboard ────────────▶ @fusion/core
@fusion/dashboard ────────────▶ @fusion/engine
@runfusion/fusion (CLI) ─────────▶ @fusion/core
@runfusion/fusion (CLI) ─────────▶ @fusion/engine
@runfusion/fusion (CLI) ─────────▶ @fusion/dashboard
@fusion/plugin-sdk (peerDep) ─▶ @fusion/core
@fusion/desktop: no workspace package dependencies
@fusion/mobile: no workspace package dependencies
Concrete references:
@fusion/enginehas a workspace dependency on@fusion/core(packages/engine/package.json)@fusion/dashboardhas workspace dependencies on@fusion/coreand@fusion/engine(packages/dashboard/package.json)@runfusion/fusionhas workspace development dependencies on@fusion/core,@fusion/engine, and@fusion/dashboardfor composition/build packaging (packages/cli/package.json)@fusion/plugin-sdkdeclares a peer dependency on@fusion/core(packages/plugin-sdk/package.json)@fusion/desktopembeds dashboard assets at build time via script (packages/desktop/scripts/build.ts) but does not declare workspace deps inpackage.json@fusion/mobiletriggers dashboard build/sync via scripts (packages/mobile/package.json) but does not declare workspace deps inpackage.json
4) Core Package (@fusion/core)
Responsibility
@fusion/core is the shared domain and persistence layer.
Main components
- Types and constants:
packages/core/src/types.ts- Columns:
COLUMNS - Transition map:
VALID_TRANSITIONS - Settings defaults:
DEFAULT_GLOBAL_SETTINGS,DEFAULT_PROJECT_SETTINGS - Workflow types (
WorkflowStep,WorkflowStepPhase, etc.)
- Columns:
- TaskStore:
packages/core/src/store.ts- Main task CRUD + lifecycle store
- Emits board events (
task:created,task:moved,task:updated, ...) - Hybrid model: SQLite metadata + filesystem blobs under
.fusion/tasks/{id}
- Database adapter:
packages/core/src/db.ts- SQLite (
node:sqlite) with WAL mode + foreign keys - JSON helpers:
toJson,toJsonNullable,fromJson - Core schema tables include:
tasks,config,workflow_steps,activityLog,archivedTasks,automations,agents,agentHeartbeats, approval tables (approval_requests,approval_request_audit_events),task_documents,task_document_revisions, mission hierarchy tables (missions,milestones,slices,mission_features,mission_events), goals table (goals), plugin/routine tables (plugins,routines), roadmap tables (roadmaps,roadmap_milestones,roadmap_features), insight tables (project_insights,project_insight_runs), research tables (research_runs,research_exports,research_run_events), eval tables (eval_runs,eval_task_results,eval_run_events), todo tables (todo_lists,todo_items),__meta - Migration-created tables include:
ai_sessions,messages,agentRatings,chat_sessions,chat_messages,runAuditEvents,mission_contract_assertions,mission_feature_assertions,mission_validator_runs,mission_validator_failures,mission_fix_feature_lineage ai_sessions.statuslifecycle includesdraft(pre-start planning session), thengenerating,awaiting_input, terminalcomplete/error
- SQLite (
- Roadmap feature ownership: roadmap contracts, ordering/handoff helpers, persistence, routes, and dashboard UI live in
plugins/fusion-plugin-roadmap(package@fusion-plugin-examples/roadmap, plugin idfusion-plugin-roadmap) rather than dashboard/core ownership. - CentralCore:
packages/core/src/central-core.ts- Global project registry, health, central activity feed, global concurrency
- Backed by
packages/core/src/central-db.ts(~/.fusion/fusion-central.db)
- Specialized stores:
AgentStore(agent-store.ts) — filesystem-based agent metadata + heartbeat run historyMissionStore(mission-store.ts) — mission/milestone/slice/feature hierarchyGoalStore(goal-store.ts) — strategic goal CRUD with server-enforced 5-active-goal capAutomationStore(automation-store.ts) — scheduled jobs with global/project scope isolationMessageStore(message-store.ts) — SQLite-backed mailbox/inbox/outbox messagingApprovalRequestStore(approval-request-store.ts) — durable approval request lifecycle + append-only audit eventsChatStore(chat-store.ts) — session/message persistence for agent chatInsightStore(insight-store.ts) — project insight persistence + dedupe/run trackingReflectionStore(reflection-store.ts) — agent reflection records and performance snapshotsPluginStore(plugin-store.ts) — plugin registry/state/settings persistenceRoutineStore(routine-store.ts) — recurring routine definitions and run historyTodoStore(todo-store.ts) — project-scoped todo lists/items with completion, reorder, and composite list+items queriesEvalStore(eval-store.ts) — eval run persistence, per-task eval results with durable snapshots, and append-only run event trails
Approval request system (ApprovalRequestStore)
Schema (migration 68 in db.ts) adds two tables:
approval_requests- Identity/lifecycle:
id,status,requestedAt,decidedAt,completedAt,createdAt,updatedAt - Requester snapshot:
requesterActorId,requesterActorType,requesterActorName - Target action payload:
targetActionCategory,targetActionOperation,targetActionSummary,targetResourceType,targetResourceId,targetContext(JSON text) - Optional runtime linkage:
taskId,runId - Indexes:
idxApprovalRequestsStatusCreatedAt (status, createdAt),idxApprovalRequestsRequesterCreatedAt (requesterActorId, createdAt),idxApprovalRequestsTaskCreatedAt (taskId, createdAt)
- Identity/lifecycle:
approval_request_audit_eventsid,requestId,eventType, actor snapshot (actorId,actorType,actorName), optionalnote,createdAtrequestIdis a foreign key toapproval_requests(id)withON DELETE CASCADE- Index:
idxApprovalRequestAuditRequestCreatedAt (requestId, createdAt, id)
Store API (packages/core/src/approval-request-store.ts):
Dashboard approval endpoints (packages/dashboard/src/routes/register-approval-routes.ts):
GET /api/approvalsGET /api/approvals/:idPOST /api/approvals/:id/decision
Runtime flow: engine action gate creates/reuses request → pauses task/agent with pauseReason="awaiting-approval" → approver calls decision endpoint (decision: approve|deny) → request transitions (pending→approved|denied) → route resumes matching paused task/agent best-effort → next tool retry consumes approved exactly once (then completed) or returns structured denial.
Provisioning note: durable fn_agent_create / fn_agent_delete approvals use agent_provisioning policy handling on this same decision route; fn_spawn_agent stays under action-gate task_agent_mutation because spawned children are ephemeral runtime workers.
create(input: ApprovalRequestCreateInput)— inserts apendingrequest and appends acreatedaudit eventget(id)— returns one request ornulllist(input?: ApprovalRequestListInput)— filters bystatus,requesterActorId,taskId,runId; orderedcreatedAt DESC, id DESC; paginated bylimit/offsetgetPendingCountsByActor()— single-pass SQL aggregate (status='pending'grouped byrequesterActorId) used by/api/agentspending-approval counters without materializing full request rowsdecide(requestId, status, input: ApprovalRequestDecisionInput)— appliespending -> approved|denied, stampsdecidedAt, appendsapproved/deniedaudit eventmarkCompleted(requestId, input: ApprovalRequestCompletionInput)— appliesapproved -> completed, stampscompletedAt, appendscompletedaudit eventgetAuditHistory(requestId)— returns append-only audit rows orderedcreatedAt ASC, rowid ASC
Lifecycle contract (types.ts isValidApprovalRequestTransition):
- Primary forward paths:
pending -> approved -> completedandpending -> denied - Direct
pending -> completedand all transitions fromdenied/completed(except no-op self-transition) are rejected - Same-state transitions (
from === to) are treated as valid by the helper even though the intended lifecycle is forward-only
Secrets Store (SecretsStore)
SecretsStore (packages/core/src/secrets-store.ts) provides encrypted key-value secret persistence for tasks/agents (FN-4791). It is designed so plaintext values are only available at explicit reveal time and are never persisted or logged in plaintext.
Scope model:
projectscope stores rows insecretsinside.fusion/fusion.db(project database, FN-4788).globalscope stores rows insecrets_globalinside~/.fusion/fusion-central.db(central database, FN-4788).
Encryption model:
- Uses
createSecretCipherfrompackages/core/src/secrets-crypto.ts(FN-4790). - Cipher is AES-256-GCM with a fresh random nonce per row encryption.
- Key material comes from a
MasterKeyProvider; resolver flow prefers OS keychain and falls back to~/.fusion/master.keywhen keychain storage is unavailable (FN-4789).
Per-secret policy/metadata:
SecretAccessPolicyis a per-row union:"auto" | "prompt" | "deny".auto: policy layer allows direct reads for trusted callers.prompt: reads are expected to be approval-gated through the approvals flow.deny: programmatic reveal is disallowed by policy.
- Environment materialization metadata is stored on each secret:
envExportable: booleanenvExportKey: string | null- Engine worktree acquisition now materializes managed env files when
ProjectSettings.secretsEnv.enabledis true (seepackages/engine/src/worktree-acquisition.ts:345-483andpackages/engine/src/secrets-env-writer.ts).
- Read provenance is captured on reveal via
lastReadAtandlastReadBy.
Error contract:
SecretsStoreErrorwithcodein"duplicate-key" | "not-found" | "invalid-policy" | "invalid-key" | "decrypt-failed".
Public API surface:
listSecrets(scope?: SecretScope): SecretRecord[]getSecretMetadata(id, scope): SecretRecord | nullcreateSecret({ scope, key, plaintextValue, description?, accessPolicy?, envExportable?, envExportKey? }): Promise<SecretRecord>updateSecret(id, scope, patch): Promise<SecretRecord>(plaintextValueupdates re-encrypt and rotate nonce)deleteSecret(id, scope): voidrevealSecret(id, scope, { agentId?, userId? }): Promise<{ key, plaintextValue }>(the only decrypting method; updates read provenance)
Settings boundary:
- Global default policy:
GlobalSettings.secretsAccessPolicy(used byresolveSecretAccessPolicy). - Project-level secrets settings:
ProjectSettings.secretsEnv. Cross-node sync passphrase state surfaces read-only viaGlobalSettings.secretsSyncPassphraseConfigured(derived fromhasSyncPassphraseConfigured(secretsStore)against the reserved__sync_passphrase__row insecrets_global). - Agent secret reads are exposed via
fn_secret_get(packages/cli/src/extension.ts:1542-1629). - Cross-node sync routes ship at
/api/nodes/:id/secrets/push,/api/nodes/:id/secrets/pull,/api/secrets/sync-receive,/api/secrets/sync-exportwith inbound Bearer apiKey validation (packages/dashboard/src/routes/register-secrets-sync-inbound-routes.ts:99-114,:181-196).
Mesh state read path for dashboard topology
GET /api/mesh/stateinpackages/dashboard/src/routes/register-mesh-routes.tsis the authoritative dashboard/API read path for topology.- Default behavior aggregates a deduped cluster snapshot from the local node plus reachable peers (
includeRemote !== false) while preserving node-local last-known entries when peers are unreachable. includeRemote=falseis the non-recursive local-only path used for peer fan-out, so cross-node aggregation never recursively calls remote aggregated endpoints.- Route registration reuses the shared
options?.centralCoreinstance when available instead of creating per-requestCentralCoreinstances, preserving shared mesh state continuity. - Nodes UI topology consumes a dedicated
useMeshStatehook that unwraps the/api/mesh/statesnapshot intoNodeMeshState[];MeshTopologyrenders peer relationships directly from each node'sknownPeers(including remote↔remote links) without fabricating local-star fallback edges.
Shared mesh-state snapshot helpers
packages/core/src/shared-mesh-state.ts defines a common snapshot envelope for non-task mesh state export/apply:
- Envelope fields:
version,exportedAt,checksum,payload - Checksum rule:
sha256(JSON.stringify(payloadWithoutChecksum)) - Payload families:
TaskMetadataSnapshot(tasksstructured metadata only)MissionHierarchySnapshot(missions,milestones,slices,features,missionEvents,assertions,featureAssertionLinks)AgentSnapshot(agents,blockedStates)AgentRunSnapshot(runs)ActivityLogSnapshot(entries)RunAuditSnapshot(entries)ProjectSettingsSnapshot(global,projects)AuthMaterialSnapshot(providerAuth, with API-key and OAuth credential shapes)
Intentional exclusions from shared snapshots:
- Task/agent blob contents (
PROMPT.md, task document bodies, attachment bytes, JSONL run logs) - Instruction-bundle file contents
- Node-local runtime handles and paths (for example worktree/session-file handles)
Chat System
ChatStore(packages/core/src/chat-store.ts) andchat-types.tsprovide session-oriented chat state (chat_sessions,chat_messagestables)- Dashboard chat UX lives in
packages/dashboard/app/components/ChatView.tsxand hooksuseChat.ts/useQuickChat.ts - Main
useChatsession restore/recovery must not reset the active thread during session-list refresh orchat:session:updatedmetadata churn while a response is in flight. chat_sessions.inFlightGenerationstores a durable JSON snapshot while generation is active: latest streamed text/thinking, tool-call state, andreplayFromEventIdfor SSE resume.ChatManager.sendMessage()updates that snapshot during streaming (debounced) and clears it on done/error/cancel so stale partial state does not survive completion.- When the active session is still generating after reload/reconnect (
isGenerating: true),useChat/useQuickChathydrate the UI frominFlightGenerationimmediately, seed the shared stream handlers with that same text/thinking/tool-call snapshot, then reconnect/api/chat/sessions/:id/streamwithLast-Event-ID = replayFromEventIdso newly replayed deltas append to the restored bubble instead of replacing it or re-appending already-known deltas. - Hooks also auto-reattach if a stale cached session is selected and a later refresh (or session re-fetch) flips
isGeneratingto true with aninFlightGenerationsnapshot; dedupe is guarded by a last-attached(sessionId, replayFromEventId)ref so snapshot checkpoint bumps do not open duplicate SSE streams. - Attach-triggered message loads may commit the persisted transcript when they match the last attached generation even if React has not yet settled the active-session state/ref. Cache misses during that attach path must preserve the already visible thread so prior user/assistant messages remain visible beside the live streaming assistant response.
- Chat message submission uses SSE streaming responses from dashboard chat routes.
- Direct-chat terminal failures now persist as a distinct assistant message with
metadata.failureInfo(summary, optionalerrorClass, optionalcode, optionaldetail, optional reference metadata) so the chat thread remains the durable primary failure surface after reload/reconnect. ChatManager.sendMessage()preserves any interrupted partial assistant output as its own message, then appends a separate persisted failure bubble instead of overwriting the partial reply.- Main-chat optimistic user sends are reconciled against persisted SSE user echoes by content + temp-id replacement, so one user send cannot survive as a duplicate history entry after stream completion.
useChat.loadMessages()/session restore map persistedmetadata.failureInfoback intoChatMessageInfo.failureInfo, and live stream failures append the same assistant-style bubble client-side unless the error is classified as a tab-suspension false positive.ChatViewrenders failure bubbles inline with shared error-surface tokens; mailbox references deep-link into the mailbox view, while other failure references keep an inline "View failure details" affordance so reload/reconnect does not strand users in agent logs.ChatViewrenders those failure bubbles with inline assistant attribution even for model-only__fn_agent__chats, so provider/model failures still read as a response from the active model instead of an anonymous system alert.streamChatResponse()must flush trailing buffered SSE data on EOF even without a final newline, so terminaldone/errorevents are not dropped at chunk boundaries.- Chat generation ownership is isolated by
generationId(ChatManager.beginGeneration+ChatStreamManagersubscription filters + route preallocation), preventing stale generation terminal events from leaking into a newer active request.
Chat Rooms (Dashboard)
- The Rooms tab in
packages/dashboard/app/components/ChatView.tsxis wired throughuseChatRooms(packages/dashboard/app/hooks/useChatRooms.ts). useChatRoomsowns room list fetch/sort, active-room selection, member+message hydration, room creation/deletion, and room message sends.- The hook subscribes to
/api/eventsand consumeschat:room:created,chat:room:updated,chat:room:deleted,chat:room:member:added,chat:room:member:removed,chat:room:message:added,chat:room:message:updated, andchat:room:message:deletedto keep UI state in sync. - Room messages persist through
POST /api/chat/rooms/:id/messages; the route persists the user message first, then callsChatManager.sendRoomMessage(...)to orchestrate room-member responders and persist assistant room replies withchatStore.addRoomMessage(...)(includingsenderAgentIdfor each responder). sendRoomMessage(...)uses existing room-member + mention resolution rules: mentioned members are direct responders, non-mentioned members are ambient responders (capped byROOM_AMBIENT_MAX_RESPONDERS), and non-member mentions are handled explicitly by the manager instead of silently disappearing.- Room responder prompt context is compacted deterministically: the newest 12 room messages stay verbatim, while older fetched history is summarized into a structured header (span, participants, and ranked highlights) before prompt size caps are enforced.
- Room-reply generation is now non-silent on failure: if a room has members but no active responders can be resolved, or all responder generations fail/return empty output,
sendRoomMessage(...)throwsRoomReplyGenerationErrorand the route surfaces HTTP 502 instead of returning a silent user-only success. useChatRooms.sendRoomMessage()now follows direct-chat style optimistic UX: append a temporary local user room message beforePOST /api/chat/rooms/:id/messages, reconcile that temp entry to the persisted user message on success, then refresh authoritative transcript state while continuingchat:room:message:*live SSE updates.- On failures,
useChatRooms.sendRoomMessage()performs state reconciliation (rollback temp entry or replace with persisted transcript when POST partially succeeded) and rethrows;ChatViewclears the composer immediately when dispatching a room send, restores the exact prior text only if the send rejects, and owns the single user-facing error toast. - Mention UI in rooms keeps direct-chat behavior unchanged while adding room affordances:
AgentMentionPopupreceives room membership context and shows members first with astatus-dotmember indicator (aria-label="Room member").- With an empty mention filter in room mode, only room members are listed; a hint row prompts the user to type to search non-members.
- Mention chips rendered in room messages (
ChatViewandQuickChatFAB) mark non-members viachat-mention-chip--non-member, includingtitle/aria-labeltext (Not a member of {roomName}) and muted warning-token styling.
Agent Companies
- Import/export utilities:
agent-companies-parser.ts,agent-companies-exporter.ts,agent-companies-types.ts - Supports YAML-frontmatter manifests for company/team/agent/project/task/skill definitions
- Includes conversion helpers from parsed manifests to
AgentCreateInputand export helpers for directory bundles
Project Insights
InsightStore(insight-store.ts,insight-types.ts) persists extracted project learnings- Uses fingerprint-based deduplication and run tracking
- Run lifecycle is hardened through
insight-run-executor.ts+InsightStoretransition guards:- single active run per
projectId + trigger(pending|runningconflict) - terminal-state immutability for run rows
- persisted failure classification (
cancelled,timed_out,retryable_transient,non_retryable) and retry lineage metadata - append-only durable event trail in
project_insight_run_events
- single active run per
- Dashboard routes (
insights-routes.ts) consume the core executor/store APIs for run start, cancel, retry, and event inspection (/api/insights/runs/:id/events) POST /api/insights/runpreserves the single-active-run guarantee while adding orphan recovery for stalepending|runningrows:- a conflicting active row is only auto-recovered when there is no in-memory controller ownership (
activeRunControllershas no entry) and run age (startedAt ?? createdAt) exceeds the grace window (ORPHAN_GRACE_MS = 30_000) - recovered rows are durably marked
failedwith lifecycle terminal metadata (terminalReason=failed,terminalCause=orphaned_active_run_recovered,failureClass=non_retryable,retryable=false) and warning/status events appended toproject_insight_run_events - true live conflicts continue returning HTTP 409 with structured payload details
{ code: "ACTIVE_RUN_CONFLICT", activeRunId, activeRunStatus, trigger }so the dashboard can hydrate and display the existing active run instead of surfacing a raw backend exception
- a conflicting active row is only auto-recovered when there is no in-memory controller ownership (
POST /api/insights/:id/create-taskremains a draft-payload endpoint (returnssuggestedTitle/suggestedDescription); the dashboardInsightsViewnow uses that payload to create a real task through the normal app task-creation path (column: triage,sourceType: dashboard_ui, source metadata indicating insights origin)- Backed by
project_insights,project_insight_runs, andproject_insight_run_events - Architecture invariant: stale
pending/runninginsight runs auto-recover at dashboard startup and on periodic/drive-by sweeps; active-row conflicts must be evaluated by age plus liveactiveRunControllersownership instead of assuming all active rows block forever.
Research Runs
ResearchStore(research-store.ts,research-types.ts,research-settings.ts) persists bounded research runs, sources/events, exports, lifecycle metadata, and retry/cancel state transitions.- Backed by
research_runs,research_exports, andresearch_run_events. - Engine orchestration is implemented in
packages/engine/src/research-orchestrator.ts+research-step-runner.ts. - Dashboard/API surface is implemented under
/api/research(packages/dashboard/src/research-routes.ts) withResearchView.tsxin the app. - CLI surface is implemented in
packages/cli/src/commands/research.tswith six subcommands (create, list, show, export, cancel, retry). - Agent tool surface is exposed via
packages/cli/src/extension.ts(fn_research_run,fn_research_list,fn_research_get,fn_research_cancel,fn_research_retry). - Boundary contract (FN-3292):
ResearchStoreowns persistence and lifecycle writes (status transitions, lifecycle event log rows, sources/results snapshots).ResearchStepRunnerowns provider I/O concerns only (provider selection, timeout/abort/provider-error classification, synthesis call execution); it does not read/write run state.ResearchOrchestratorowns sequencing and failure policy (phase progression, provider fallback, partial-step continuation, terminal status choice) and interacts with store only through public store methods.- Provider substitution must remain data-driven: source metadata can carry provider identity, and fetching should resolve providers per source rather than relying on provider ordering.
- Boundary note: research and insights are parallel subsystems sharing host infrastructure, not one table/store family.
Task Evaluations
EvalStore(eval-store.ts,eval-types.ts) persists eval runs and task-level eval outcomes.- Dashboard/API surface is implemented under
/api/evals(packages/dashboard/src/evals-routes.ts) withEvalsView.tsxin the app. - Backed by
eval_runs,eval_task_results, andeval_run_events. - Data model stores structured scoring/evidence/signal payloads plus durable
taskSnapshotmetadata so historical eval results remain readable even if the live task row later changes or is removed. - Lifecycle safeguards mirror other core stores: deterministic list ordering, transition guards, terminal immutability for run rows, and active-run conflict protection for scheduled/task-completion triggers.
eval_task_resultsenforces one row per(runId, taskId)via a unique index; store writes use upsert semantics to keep reruns idempotent.- Canonical scoring contract is documented in
docs/evals.md; authoritative score computation is centralized inpackages/core/src/eval-scoring.ts.
Scoring authority boundary:
- Authoritative fields:
categoryScores[].finalScore,categoryScores[].band,categoryScores[].weight, andoverallScore(derived bycomputeOverallScore). - Advisory/model-authored fields: category
aiScore, categoryrationale, categoryevidence, andoverallRationaletext. - Evaluator code (
packages/engine/src/evaluator.ts) may provide AI category inputs, but must route final score computation through core helpers (normalizeCategoryScore,computeOverallScore) and must not persist AI-provided overall numbers as source of truth.
Hybrid evaluator pipeline (FN-3389/FN-3391):
- Batch selection:
runScheduledEvalBatchin core computes a deterministic completed-task window (windowStartExclusive→windowEndInclusive) from the last completed scheduled run. - Signal summary:
collectDeterministicSignals(eval-signal-collector.ts) normalizes timing/workflow/review/log/commit summaries with stable fallbacks for missing metadata. - Evidence harvesting:
collectTaskEvaluationEvidence(packages/engine/src/evaluator-evidence.ts) reads existing task-store/git surfaces (workflowStepResults, documents, task activity log, agent logs, run-audit events, merge/PR metadata) and emits a boundedTaskEvaluationEvidenceBundlewith fixed source-group ordering. - AI review:
HybridEvaluatorService(packages/engine/src/evaluator.ts) injects deterministic signals plus a dedicated## Evidencebundle section into a strict JSON prompt, runs a read-only AI session, validates the JSON payload, and merges AI advisory fields into persisted eval output while preserving core score authority. - Follow-up policy engine:
packages/engine/src/eval-followups.tsnormalizes raw evaluator drafts into canonical follow-up suggestions, applies deterministic suppression/dedupe rules, and (policy permitting) materializes triage tasks throughTaskStore.createTask()with source provenance back to the parent task and eval run/suggestion IDs. - Persistence boundary: eval rows persist normalized evidence refs plus bounded excerpts/IDs (not full raw logs or unbounded command output) and structured follow-up lifecycle state (
suggested/suppressed/created) including suppression reason or created task linkage. Source drill-down stays in original task/agent/run-audit stores and git history. - Model resolution (temporary): evaluator model selection first uses an explicit run override pair (
provider+modelIdtogether only), then falls back to the existing validator lane (resolveValidatorSettingsModel) until FN-3393 introduces dedicated evaluator settings. - Scheduled execution wiring: CronRunner intercepts the sentinel command
fn eval --scheduled-batchand executes in-process, invokingrunScheduledEvalBatchwithHybridEvaluatorService;ProjectEnginesyncs scheduled eval automation on startup and on relevant settings changes.
Plugin System
PluginStore(plugin-store.ts) is a facade over two persistence scopes:- Global install metadata in central DB table
plugin_installs(~/.fusion/fusion-central.db) including manifest/path/settings/schema/dependencies - Per-project runtime state in central DB table
project_plugin_stateskeyed by normalized project path (enabled,state,error)
- Global install metadata in central DB table
- Legacy project-local
pluginsrows in.fusion/fusion.dbare migrated lazily on plugin-store init/read; migration is idempotent and keeps newestupdatedAtinstall metadata as global canonical data while preserving per-project enablement rows - Post-FN-3722, the project-local
pluginstable is legacy read-only migration input; any new install writer targeting it is a bug TaskStore.getPluginStore()now propagates the configuredglobalSettingsDir/central directory so all CLI and dashboard install paths resolve the same central DBPluginLoader(plugin-loader.ts) loads/unloads plugin modules using the effective per-project plugin state- Plugin contributions now include both embedded
uiSlotsand top-leveldashboardViews - Executor runtime contributions can be provided via
executorRuntimeEnv(taskCtx, ctx); see the canonical plugin-authoring contract indocs/PLUGIN_AUTHORING.md§4 "executorRuntimeEnv: task-scoped executor subprocess environment". The engine applies these task-scoped overlays only to executor-spawned user commands, never to git plumbing subprocesses. - Discovery endpoints:
GET /api/plugins/ui-slotsGET /api/plugins/dashboard-views
- Dashboard management routes are implemented in
packages/dashboard/src/plugin-routes.ts
Prompt Overrides
prompt-overrides.tsdefines prompt key catalogs and per-role override validation- Provides override resolution/validation helpers (
resolvePrompt,resolveRolePrompts,assertValidPromptOverrideMap)
Plugin Prompt Contributions
- Plugin prompt contributions are filtered per surface through
PluginRunner.getPromptContributionsForSurface(surface). - Prompt assembly uses
buildPluginPromptSection(surface, pluginRunner)inpackages/engine/src/agent-instructions.ts. - Supported prompt surfaces:
executor-systemexecutor-tasktriagereviewerheartbeat
- Integration points append the built plugin section to the role-specific system/task prompt only when contributions exist, preserving existing prompts when no plugins contribute.
- Executor, heartbeat, and planning (triage) system prompts inject a shared
goalContextdynamic layer via the canonicalresolveAndEmitGoalContext(...)seam (which usesbuildGoalContextSection(...)); when no active goals exist, no goal section is emitted.
Agent Permissions
agent-permissions.tsnormalizes permissions and computes effective access state- Core helpers:
normalizePermissions,computeAccessState,ROLE_DEFAULT_PERMISSIONS
Standalone roadmap model
Fusion now has two planning models in core:
- Roadmap hierarchy —
Roadmap → RoadmapMilestone → RoadmapFeature - Mission hierarchy —
Mission → Milestone → Slice → Feature → Task
The roadmap model is intentionally lightweight and independent from MissionStore/mission lifecycle semantics. It is meant for standalone planning, ordering, drag-and-drop moves, and future conversion flows into missions or tasks without coupling roadmap data to slice activation, autopilot, or mission status rollups.
Roadmap persistence (FN-1690/FN-1691):
RoadmapStoreprovides CRUD operations with atomic reorder/move semantics- All list queries use deterministic ordering:
ORDER BY orderIndex ASC, createdAt ASC, id ASC - Covering indexes ensure efficient ordered reads without temp B-tree sorts
- Cross-milestone feature moves atomically renumber both source and destination milestone scopes
- FK cascade integrity: deleting a roadmap removes milestones and features
- Export/handoff DTO methods for integration with downstream systems:
getRoadmapExport()→RoadmapExportBundle(flat export payload)getMissionPlanningHandoff()→RoadmapMissionPlanningHandoff(mission conversion)listFeatureTaskPlanningHandoffs()→RoadmapFeatureTaskPlanningHandoff[](all features as task handoffs)getRoadmapFeatureHandoff()→RoadmapFeatureTaskPlanningHandoff(single feature task handoff)
- Pure handoff mapping helpers in
roadmap-handoff.tsfor read-only transformations
Roadmap handoff contract boundary (FN-1674):
- Handoffs are read-only transformations — no mission/task records are created
- Source lineage is preserved on every emitted item (roadmapId, milestoneId, featureId, titles, order indices)
- Ordering is deterministic using
normalizeRoadmapMilestoneOrderandnormalizeRoadmapFeatureOrder - Not-found semantics: store handoff methods throw when roadmapId is unknown; routes map to HTTP 404
- The combined handoff endpoint (
GET /:roadmapId/handoff) returns both mission and task handoffs
Key roadmap invariants:
- milestone ordering is scoped to a single roadmap and must remain contiguous + 0-based
- feature ordering is scoped to a single milestone and must remain contiguous + 0-based
- repair/normalization uses deterministic tie-breakers:
orderIndex ASC,createdAt ASC,id ASC - cross-milestone feature moves must renumber both the source and destination milestone deterministically
Roadmap frontend API contract (plugin namespace):
- Canonical frontend namespace:
/api/plugins/fusion-plugin-roadmap/roadmaps - Roadmaps:
GET /,POST /,GET /:roadmapId,PATCH /:roadmapId,DELETE /:roadmapId - Milestones:
GET /:roadmapId/milestones,POST /:roadmapId/milestones,PATCH /milestones/:milestoneId,DELETE /milestones/:milestoneId,POST /:roadmapId/milestones/reorder - Features:
GET /milestones/:milestoneId/features,POST /milestones/:milestoneId/features,PATCH /features/:featureId,DELETE /features/:featureId,POST /milestones/:milestoneId/features/reorder,POST /features/:featureId/move - Export/Handoff:
GET /:roadmapId/export,GET /:roadmapId/handoff,GET /:roadmapId/handoff/mission,GET /:roadmapId/milestones/:milestoneId/features/:featureId/handoff/task - Canonical roadmap REST namespace is plugin-scoped (
/api/plugins/fusion-plugin-roadmap/...), while dashboard maintains a temporary/api/roadmapscompatibility mount that delegates to plugin-owned handlers during migration.
Database schema:
roadmaps— roadmap metadata (id, title, description, timestamps)roadmap_milestones— milestone data withroadmapIdFKroadmap_features— feature data withmilestoneIdFKidxRoadmapMilestonesRoadmapOrder— covering index for deterministic milestone orderingidxRoadmapFeaturesMilestoneOrder— covering index for deterministic feature ordering
Shared utilities
From packages/core/src/index.ts exports (selected high-impact modules):
- Memory + knowledge:
memory-backend.ts,memory-compaction.ts,memory-dreams.ts,project-memory.ts,memory-insights.ts,insight-store.ts,insight-types.ts - Stores and plugin/routine helpers:
chat-store.ts,routine-store.ts,plugin-store.ts,plugin-loader.ts,reflection-store.ts - Execution/runtime helpers:
run-command.ts,board.ts,task-merge.ts,archive-db.ts - Settings + prompts + permissions:
settings-schema.ts,prompt-overrides.ts,agent-permissions.ts,agent-prompts.ts - Node/system infrastructure:
node-connection.ts,node-discovery.ts,system-metrics.ts,migration-orchestrator.ts - Identity/version/extensions:
daemon-token.ts,app-version.ts,pi-extensions.ts - Agent companies import/export:
agent-companies-parser.ts,agent-companies-exporter.ts,agent-companies-types.ts
Docker Node Provisioning
Fusion has a managed Docker node provisioning subsystem spanning @fusion/core services and dashboard routes.
Core services:
DockerClientService(packages/core/src/docker-client.ts)- Creates Dockerode clients from host settings.
- Supports default local daemon, named Docker
context, or explicithostwith optional TLS fields. - Host/TLS inputs:
context,host,tlsVerify,tlsCaPath,tlsCertPath,tlsKeyPath.
DockerProvisioningService(packages/core/src/docker-provisioning.ts)- Handles initial container lifecycle actions (provision/deprovision/start/stop/restart/status).
- Provisioning creates and starts a container first, then route-level orchestration registers metadata/node records.
MeshConfigGenerator(packages/core/src/mesh-config-generator.ts)- Generates mesh env/config, applies config by recreating the container, registers the node into mesh state, then health-checks until online or timeout.
Route boundary (dashboard):
register-docker-provisioning-routes.tsowns initial container lifecycle endpoints (/api/docker/provision,/api/docker/deprovision, and per-container start/stop/restart/status).register-docker-node-routes.tsowns managed-node metadata + mesh configuration endpoints (for example/api/docker/nodes/:managedId/apply-mesh-configand mesh-status checks) after a container is provisioned.
Provisioning lifecycle (implemented flow):
- Container provisioning: dashboard provisioning route calls
DockerProvisioningService.provision()to create/start a managed container. - Mesh config generation:
MeshConfigGenerator.generateConfig()resolves API key, reachable URL, and mesh env vars. - Mesh config application:
MeshConfigGenerator.applyConfig()callsDockerClientService.recreateContainer()so env vars are applied to a recreated container. - Node registration:
MeshConfigGenerator.registerInMesh()creates/links a remoteNodeConfigentry. - Health check: mesh registration flow polls
checkNodeHealth()until online or timeout.
Port convention:
- Managed Docker mesh-node containers default to
4041(DEFAULT_CONTAINER_PORTinmesh-config-generator.ts). 4040remains reserved for the production dashboard and should not be documented as the managed mesh-node default.
Memory System
Fusion uses OpenClaw-style project memory files and separates memory into two responsibilities:
- Layered backend runtime memory (
memory-backend.ts,project-memory.ts)- canonical long-term + layered memory access used by agents and dashboard APIs
- Insight extraction automation (
memory-insights.ts,InsightStore)- scheduled extraction/pruning workflows over project memory plus insight/audit artifacts
Both systems currently use .fusion/memory/MEMORY.md as the canonical working source-of-truth.
Primary memory files:
- Long-term:
.fusion/memory/MEMORY.md - Daily notes:
.fusion/memory/YYYY-MM-DD.md - Dream processing:
.fusion/memory/DREAMS.md
Memory subsystems:
memory-backend.ts— backend contracts + file/readonly/qmd implementationsmemory-compaction.ts— summarization/compaction automationmemory-dreams.ts— background dream processing for agent and project memorymemory-insights.ts+InsightStore— extracted insight synthesis and persistent insight/run storage
Pluggable backends (memory-backend.ts):
| Backend | Type | Capabilities |
|---|---|---|
FileMemoryBackend | file | Read/Write, Atomic writes, Persistent |
ReadOnlyMemoryBackend | readonly | Read only, Non-persistent |
QmdMemoryBackend | qmd | Read/Write, Persistent, CLI-based with file fallback |
Backend registration:
import { registerMemoryBackend, resolveMemoryBackend } from "@fusion/core";
// Register custom backend
registerMemoryBackend(customBackend);
// Resolve based on settings
const backend = resolveMemoryBackend(settings);
Settings integration:
memoryEnabled: Toggle controls whether memory instructions are injected into promptsmemoryBackendType: Select which backend to use (file,readonly,qmd, or custom). Unknown types are accepted and persisted verbatim; runtime resolution falls back toDEFAULT_MEMORY_BACKEND(qmd).
QMD Backend Behavior:
The QMD backend (qmd) delegates read/write I/O to the file backend and schedules background QMD index refreshes. For search, it attempts QMD query first and falls back to local .fusion/memory/ file search when QMD is unavailable, errors, or returns no matches.
QMD-backed memory behavior also applies to agent-private memory workspaces under .fusion/agent-memory/{agentId}/:
- Agent memory search normalizes QMD hit paths (including
qmd://..., absolute paths, and relative filenames) into canonical readable workspace paths (MEMORY.md,DREAMS.md,YYYY-MM-DD.md) so results can be passed directly intofn_memory_get. - Agent-memory writes from tool and non-tool paths (including
processAgentMemoryDreams()) schedule agent-specific QMD refreshes so new dreams/long-term updates remain discoverable without manual reindexing.
Dashboard API:
GET /api/memory/backend— Returns current backend status and capabilities
See Memory Plugin Contract for the full plan.
5) Engine Package (@fusion/engine)
@fusion/engine executes the autonomous workflow.
Agent roles
- Planning: the planning processor generates task plans (
PROMPT.md) and selects eligible planning tasks by priority first, then FIFO (createdAtascending) within each priority tier. If the stuck-task detector kills a not-yet-approved planning session after a non-emptyPROMPT.mddraft exists, the retry is requeued asneeds-replanand seeds the next prompt in revision mode from that draft instead of cold-starting. WhenPROMPT.mdis absent, a non-emptyplantask document written throughfn_task_document_writeis the fallback seed; missing or whitespace-only drafts still cold-start. - Executor:
TaskExecutor(executor.ts) implements tasks in worktrees - Reviewer:
reviewStep()(reviewer.ts) performs plan/code/spec reviews - Merger:
aiMergeTask()(merger.ts) merges approved work - Task-detail chat / steering comments:
TaskStore.addSteeringComment()writes chat steering text to bothtask.commentsandtask.steeringComments. The executor still usessteeringCommentsfor live in-session injection, while next-prompt agent lanes read canonical user-authoredtask.comments: planning/spec generation, spec review, plan/code reviewers, standard merger prompts, and clean-room AI merge + merge-review prompts all surface recent user comments through the sharedagent-user-comments.tsformatter.
Reviewer verdict recovery contract (FN-4092)
- Reviewer verdicts are
APPROVE,REVISE,RETHINK, orUNAVAILABLE. - For non-pause
UNAVAILABLEor non-context reviewer prompt errors,reviewStep()retries once:- Prefer configured validator fallback model (
validatorFallbackProvider+validatorFallbackModelId, including project overrides), or - Retry once on the same model with stricter
Verdict:output instructions when no fallback model is configured.
- Prefer configured validator fallback model (
- Pause/engine-pause short-circuits still return
UNAVAILABLEimmediately and do not spawn/retry reviewer sessions. - Executor handling in
createReviewStepTool()is now explicit:plan/specUNAVAILABLEis advisory after retry exhaustion (UNAVAILABLE (advisory)), and execution proceeds.codeUNAVAILABLEremains blocking; step completion must wait for a usable review verdict.- Advisory and blocking paths are both logged to task logs for operator visibility.
Scheduling and execution
Scheduler(scheduler.ts) — dependency-aware task scheduling that dispatches eligible todo tasks by priority first, then dependency-unblock fanout within the same priority class (FN-4969), then FIFO (createdAtascending) with task-id fallback.urgentalways stays ahead of lower priorities, and overlap/file-scope blockers are excluded from fanout weighting.blockedByinvariant (FN-3924/FN-4091): the field is only durable when it references a current unresolved explicit dependency (or, for dependency-free tasks, an active overlap blocker). Completion gating now validatesblockedBythrough live task resolution: missing blockers and blockers already indone/archivedare treated as stale, while only still-active blockers continue to preventfn_task_done. If no current blocker remains, scheduler/event reconciliation clearsblockedBytonulland re-evaluates from live task state.- Dependency-cycle invariant (FN-5256): task dependency graphs are acyclic at write time (
DependencyCycleErrorinTaskStoreforcreateTask,createTaskWithReservedId,updateTask, andapplyReplicatedTaskCreate) withtask:dependency-cycle-rejectedaudit evidence. Self-healing batch 2 addsreconcileDependencyCycles, which emitstask:dependency-cycle-detected, auto-repairs only bounded umbrella-back-edge loops viatask:auto-reconciled-dependency-cycle, and leaves ambiguous cycles untouched withtask:dependency-cycle-unrepairedfor operator inspection. - Dependency-blocking lease invariant (FN-6292): an
in-progresstask with unmet scheduling dependencies must not contribute an active file-scope lease in scheduler lease maps. This prevents a holder from queueing its own dependency behind its lease and creating a circular wait.
BlockedBy stamping invariants
- Scheduler writes overlap-based
blockedByonly when overlap gating is active and there is a live overlapping active scope; otherwise overlap logic does not stamp blockers. - Active overlap scopes exclude permanently-failed
in-reviewtasks (status === "failed", typically produced bycheckStuckBudget()afterstuckKillCount > maxStuckKills) so superseding re-implementation tasks are not indefinitely queued behind work that will never merge. (FN-4200) - Stamping is sticky when valid (FN-3899): if a todo task is already
queuedbehind a blocker that is still active and still overlaps, the scheduler preserves that blocker and skips rewrites. - When the blocker must change, selection is deterministic: active overlap candidates are ordered by task ID and the first overlapping task is chosen, removing tick-order churn.
- Writes are idempotent: scheduler updates
status/blockedByonly when values change, reducing per-tick churn and audit noise. - Self-healing remains responsible for terminal/missing blocker cleanup (
clearStaleBlockedBy()), while scheduler overlap stamping now focuses on stable active-overlap attribution. reconcileDependencyBlockingLeases()(FN-6292) unwinds existing dependency/lease circular waits: when anin-progressholder has unmet scheduling dependencies and an unmet dependency is blocked by the holder's stale file-scope lease, self-healing gates the backward move with triple proof, moves the holder back totodowith progress/worktree/resume state preserved, and emitstask:reconcile-dependency-blocking-lease(ortask:reconcile-dependency-blocking-lease-no-actionwhen proof fails). Engine rebounds do not setuserPaused.reconcileInReviewUnmetDependencies()(FN-6793/FN-6797) enforces the same dependency invariant after accidental review advancement: unpaused, auto-merge-eligiblein-reviewtasks with live unmet dependencies move back totodowithstatus: "queued",blockedByset to the first unmet dependency, and worktree/progress/resume state preserved. Global/engine pause short-circuits the sweep; task pause/user-pause,autoMerge:false, live execution, checkout guards, and failed rebound mutations leave the task untouched with a no-action audit when applicable. Engine rebounds do not setuserPaused.StepSessionExecutor(step-session-executor.ts) — per-step sessions + parallel wave executioncreateTaskUpdateTool()(executor.ts) emits a diagnostic warning when an agent marks step Nin-progresswhile another step on the same task is alreadyin-progress; the update still proceeds so operators get evidence without changing task semantics.TaskCompletion(task-completion.ts) — completion gate helpersSpecStaleness(spec-staleness.ts) — stale spec detection utilitiesMissionExecutionLoop(mission-execution-loop.ts) — validator/fix loop orchestrationMissionFeatureSync(mission-feature-sync.ts) — feature↔task status synchronizationMissionAutopilot(mission-autopilot.ts) — mission slice auto-progression
Routine + cron automation
RoutineRunner(routine-runner.ts) — executes routine stepsRoutineScheduler(routine-scheduler.ts) — schedules due routinesCronRunner(cron-runner.ts) — cron-based AI/script jobs- FNXC:Automations 2026-06-27-00:00: Scheduled automations use an atomic claim-then-run CAS in
AutomationStore.claimDueSchedule()to advancenextRunAtbefore execution; this prevents duplicate runs when project/global/all-scope pollers or multiple engine processes observe the same due row.
Sandbox backend seam (FN-4636)
- Engine user-configured command runners now route through
packages/engine/src/sandbox/via a sharedSandboxBackendabstraction (resolveSandboxBackend()), currently implemented only by the transparentNativeSandboxBackendpassthrough (no behavior change). - The seam now covers both exec-shaped commands (
run) and spawn-shaped verification commands (runStreaming), withpackages/engine/src/verification-utils.tsdelegatingrunVerificationCommand/execWithProcessGroupthroughrunStreaming. - Follow-up chain: FN-4637 (bubblewrap), FN-4638 (sandbox-exec), FN-4639 (settings selection), FN-4640 (run-audit telemetry), FN-4641 (action-gate), FN-4642 (container backends).
- FN-4641 adds dedicated
sandbox_provisioningapproval-gate plumbing for first-time backend bootstrap. Backends callrequireSandboxProvisioningApproval()(packages/engine/src/sandbox/provisioning-gate.ts) fromprepare()when prerequisites are missing, and policy is resolved viaresolveSandboxProvisioningPolicy()(packages/core/src/sandbox-provisioning-policy.ts). Initial callers land in FN-4637/FN-4638/FN-4642. - FN-4642 adds an experimental
ContainerSandboxBackend(Podman-first, Docker-compatible) plusbuildContainerArgv()for rootless container runs. It is opt-in only via explicitresolveSandboxBackend({ backendId: "podman" | "docker" })and is not wired through settings yet; known prototype limits are no SELinux:Zrelabel on bind mounts, no filesystem policy beyond cwd bind-mounting, and a fixed default image (docker.io/library/alpine:3.20) with override viaFUSION_SANDBOX_CONTAINER_IMAGE.
Execution context + skills
SkillResolver(skill-resolver.ts) — resolves active skill sets for sessionsSessionSkillContext(session-skill-context.ts) — skill context materialization per runContextLimitDetector(context-limit-detector.ts) — context-window pressure checksTokenCapDetector(token-cap-detector.ts) — token-cap enforcement checksPluginRunner(plugin-runner.ts) — runtime plugin callback executionAgentRuntime(agent-runtime.ts) — runtime adapter interface contractRuntimeResolution(runtime-resolution.ts) — runtime selection and fallback logicAgentSessionHelpers(agent-session-helpers.ts) — runtime-aware session creation helpersAgentActionGate(agent-action-gate.ts) — permanent-agent runtime action classification + policy disposition decisions (shared classification source:packages/engine/src/gating-classifications.ts)
Runtime action-gate flow (v1):
- Tool execution wrappers in
pi.tscomposewrapToolsWithBoundary()andwrapToolsWithActionGate(). - Non-ephemeral agents receive
AgentActionGateContextfrom executor/heartbeat session creation. blockandrequire-approvaldispositions intercept before tool side effects.require-approvalpersists durable requests viaApprovalRequestStore, reusing pending requests by dedupe key intargetAction.context.approvalDedupeKey.
Concurrency, recovery, and resiliency
AgentSemaphore(concurrency.ts) — slot acquisition. Multi-project runtimes share a single manager-owned semaphore for the cross-projectglobalMaxConcurrentcap, while eachInProcessRuntimewraps that pool in a scoped semaphore that tracks only that project's held slots. Engine stop,pauseProject, andstopAllabort in-flight agents, wait the configured stop drain window, then return any residual scoped slots to the shared pool without using a blanketreconcileActiveCount(0), so other projects' active slots are preserved and stopped projects do not starve global capacity.RecoveryPolicy(recovery-policy.ts) — retry/recovery decision policyStuckTaskDetector(stuck-task-detector.ts) — inactivity/loop stall detectionGridlockDetector(gridlock-detector.ts) — detects all-blocked todo pipelines and emits notification events (plus explicit clear signals when gridlock resolves)TransientErrorDetector(transient-error-detector.ts) — retriable error classificationSelfHealingManager(self-healing.ts) — auto-unpause/maintenance recovery actions-
Batch 1 maintenance now includes
reconcile-orphaned-task-dirs(FN-6783), a paused-safe housekeeping step that callsTaskStore.reconcileOrphanedTaskDirs()so valid live.fusion/tasks/{ID}/task.jsonrecords missing from the SQLite index become visible without waiting for process restart. The store-level guard skips any ID already present in active, soft-deleted, archived, or tombstoned storage and emitstask:reconcile-orphaned-task-dironly for recovered rows. -
Batch 1 maintenance also includes
reconcile-phantom-committed-reservations(FN-7069), which callsTaskStore.reconcilePhantomCommittedReservations()for committed task-ID reservations that have no live/soft-deleted/archived task row and no.fusion/tasks/{ID}/task.json. The sweep prunes orphanedactivityLogrows andagents/cascadedagentRuns, preservesrunAuditEvents, and keeps the reservationcommittedper FN-5105 so the ID is permanently reserved rather than resurrected or handed out again. -
Batch 1 maintenance now includes one
fts-maintenancestep for both search indexes. The livetasks_ftsbranch still runsmergeevery tick,optimizeevery 4th tick, andrebuildabove32 MiBor1 MiB × live task count. The archivearchived_tasks_ftsbranch is lighter because archive writes are mostly append-only:mergeevery 8th tick,optimizeevery 24th tick, andrebuildabove64 MiBor512 KiB × archived row count. Each branch is independently guarded byfts5Availableand emitstask:fts-maintenancerun-audit telemetry with distincttargetvalues (tasks_ftsvsarchived_tasks_fts). -
AI merge clean-room worktrees are created under the configured worktrees directory's hidden container,
<worktreesDir>/.ai-merge/, asfusion-ai-merge-fn-<id>-<random>detached worktrees. When that container is repo-local, its relative path is added to the repo's local git exclude when possible (alongside the legacy.fusion/ai-merge/entry) so an in-flight clean room does not dirty the integration checkout. Aftergit worktree addand before the merge/review loop,runAiMergebootstraps the clean room with the shared merge dependency-sync helper: a configuredworktreeInitCommandis authoritative and always runs, while unset settings inferpnpm/npm/yarn/buninstalls from lockfiles and can skip only when thenode_modules/.fusion-install-markerhash still matches. Failures and aborts hard-stop the AI merge before merge agents or verification run, andmerge:ai-deps-syncrecords the command, skip state, and duration. Inline cleanup runs fromrunAiMerge's clean-roomfinallyfor successful lands, empty/no-op finalization, concurrent-advance retries, and thrown/aborted merges. Cleanup canonicalizes the path, attemptsgit worktree remove --force, always falls back to filesystem removal, then runsgit worktree pruneso stale or partial registrations (includinggit worktree addfailures) do not dangle. Cleanup emitsmerge:ai-worktree-cleanupaudit events for git-remove, fs-rm, and prune phases; benign already-absent/de-registered paths are treated as idempotent success, while genuine filesystem-removal failures are logged/audited withsuccess: falserather than silently swallowed. -
Worktrees-dir sweeps that list direct children of
<worktreesDir>(pool idle scan, orphan cleanup/reap, self-healing unregistered-orphan reap, and cap enforcement) must exclude the.ai-mergecontainer by name; those one-level sweeps never inspect or recycle clean rooms beneath it. Batch 1 sweeps stale AI merge clean-room worktrees under the new<worktreesDir>/.ai-merge/root and still scans legacy.fusion/ai-merge/plus legacytmpdir()locations for pre-relocation leftovers; candidates are bounded to names starting withfusion-ai-merge-.runAiMergeregisters each live clean-room worktree inactiveSessionRegistrywith kindai-mergeas soon as the directory exists and keeps both raw and canonical paths registered for the duration of the merge, so the dedicated periodic sweep and pre-merge prune defer when either path is active (including concurrent same-task merge attempts). The default age gate is 2 hours; task-aware cleanup uses a 10-minute grace period fordone/archivedtasks and for genuinely missing/deleted task rows, and every removal path is clamped by the same 10-minute minimum-age floor so a freshly created worktree is never reaped. TransientgetTasklookup failures (for example SQLite busy/parse errors) are not treated as deletion evidence; they log a warning, emitlookup-erroronly if eventually removed, and retain the conservative 2-hour gate. The sweep canonicalizes paths before checkingactiveSessionRegistry, attemptsgit worktree remove --force <path>before filesystem removal, runsgit worktree pruneafter cleanup attempts, and emitsworktree:tempdir-sweeprun-audit telemetry for removal attempts and failures. Fresh directories, active-session paths, and individual removal failures are skipped/logged without aborting the maintenance cycle. -
recoverGhostReviewTasks()is a fallback only for idle, non-terminalin-reviewstates. Terminal/actionable states (notablystatus: "failed") are preserved and not auto-kicked back totodo. -
recoverPausedAbortFailures()clears executor pause/resume abort parks only when the durable row is safe to recover.todo/in-progressrows are requeued for normal scheduling, while cleanin-reviewrows (completed steps, not paused/user-paused/executing, auto-merge eligible, no confirmed or terminal merge evidence) havestatus/errorcleared in place so review progression can continue. User hard-cancel, global/user pause,autoMerge:false, terminal merge, and live-execution guards remain operator-actionable. Successful recovery emitstask:auto-recover-paused-abort-parkwithpreservedInReviewmetadata. -
Workflow graph pause/resume is node-reentrant for typed engine-internal interruptions. When
WorkflowGraphExecutorsees the graph abort signal or a node returnsvalue: "aborted", it stamps the interrupted node andengine-pauseabort kind into graph context.TaskExecutorthen uses the existing boundedgraphResumeRetryCountbudget to clear the transient abort, suppress failure notification with anAuto-recovered:task log, and re-enter the graph/task only under the same safety guards: no user/active global pause, no merge/finalize provenance, no genuine node failure, no terminal merge value, noautoMerge:falseprotected review row, and no active execution owner. Global-pause provenance from the graph-controller abort is re-entrant once the global pause has been lifted because it represents the same in-flight node interruption. Generic legacy pause-abort parks without the typed node marker remain operator-action failures except for the narrowin-review/planstale-replay shape: hard-cancel pause provenance,node:plan:value === "aborted", no typed interrupted node, no active task/user/global pause, no terminal merge value, no confirmed merge, auto-merge eligibility, and only a clean row or the exact stale plan pause-abort failure. That path logsstale replay ignored, clears only the stale failure state when present, preservesin-review, and never re-enters planning or moves the task totodo. -
reattach-orphaned-assigned-executionsis a forward-resume safety net for durable-agent assignments. During startup recovery and periodic maintenance, after orphaned-agent and stale-heartbeat-run repairs, self-healing findsin-progresstasks with anassignedAgentIdwhose agent has no active heartbeat run and no active executor session after the orphan grace window. It re-dispatches in place viaexecutor.resumeTaskForAgent(agentId)(the same seam used by cleanHeartbeatMonitor.onRunCompletedand guarded by executor double-execution checks), emitstask:reattach-orphaned-execution, and never moves the task backward. This complements engine-startexecutor.resumeOrphaned()and leaves unassigned/role-based execution recovery to the existing startup/limbo/stuck-task paths. -
Durable
Agent.taskIdis a running assignment for parkedtodo/triagetask rows only when the agent has live proof: a fresh active heartbeat run or an executor-active/tracked heartbeat signal. Scheduler overlap requeues, task move sync, self-healing, and Reports Health Check share this invariant: stale durable links are cleared or rendered as stale whilestatus: "queued"andoverlapBlockedByremain on the task row so file-scope lease blocking is not weakened.fn_list_agentsandfn_agent_showrender the linked task column next toCurrent Task(for exampleCurrent Task: FN-1234 (triage)orCurrent Task: FN-1234 (not active — done)) so parked-column planning ownership is not misread as in-progress execution drift. -
Mission validation has a dedicated stale-run reaper: startup recovery and Batch 2 maintenance call
reapStaleMissionValidatorRuns()when wired by the runtime, usingVALIDATOR_RUN_STALE_MAX_AGE_MS(currently 6 hours). The sweep terminates ownerlessmission_validator_runs.status='running'rows aserror, writes the reap reason intosummary, leaveslastValidatorRunIdpointing at the now-terminal run, and emits run-audit telemetry withmutationType: "mission:validator-run-reaped"plusrunId/featureId/missionId/triggerType/elapsedMsmetadata. Active mission features move toloopState="needs_fix"+lastValidatorStatus="error"unless their parent mission is alreadycomplete/archived.
-
Stuck-loop exhaustion terminal contract
When stuck-kill retries are exhausted, checkStuckBudget() marks executor-phase tasks status: "failed", moves them to in-review, and writes an error that starts with STUCK_LOOP_EXHAUSTED:. The error and final task-log line both include the kill count/max and last stuck reason (loop or inactivity). StuckTaskDetector also untracks the task and refuses to re-track it while that failed terminal error remains, preventing further automatic kill/requeue churn. The final log line explicitly states that no further automatic retries will run and directs operators to manually retry, pause, or move the task back to triage to resume work.
Planning-phase stuck kills use the same stuckKillCount / settings.maxStuckKills budget before execution starts. While under budget, a stuck triage requeue resumes from a non-empty on-disk PROMPT.md draft in revision mode and logs resume feedback; if PROMPT.md is absent, it falls back to a non-empty plan task document. Absent or whitespace-only drafts preserve cold-start behavior, and recoverable written drafts continue through prompt-based planning recovery. At budget exhaustion, triage parks the task as status: "failed", paused: true with a STUCK_LOOP_EXHAUSTED: error so a reasoning-looping planner cannot restart indefinitely.
Plan Review reviewer outages use a narrower retry state: triage tasks parked as status: "plan-review-unavailable" already have an existing PROMPT.md, so polling routes them around the full planning agent. Retry rereads the prompt, requires non-empty deterministic-valid content, preserves the file unchanged, and reruns only the Plan Review/finalization path while holding the same global agent semaphore slot as planning/review AI work. APPROVE (or a previously passed Plan Review result) releases the task normally; REVISE/RETHINK moves it to needs-replan; another UNAVAILABLE/error refreshes the backoff. Missing, empty, or deterministically invalid prompts fail with a task log instead of cold-starting planning.
Active fn_run_verification subprocesses are a bounded progress signal (FN-6598). createRunVerificationTool() brackets each command with StuckTaskDetector.beginVerification() / endVerification(); while the command is active and still inside its own timeout plus cleanup grace, the detector suppresses loop and no-progress-churn classification so healthy marathon verification output cannot consume stuck-kill budget. inactivity is not suppressed: the verification runner must continue emitting line output or synthetic heartbeats, and if the process overruns its recorded deadline or never sends an end signal, normal detection resumes.
If loop recovery times out during compact-and-resume and the executor does not unwind within the bounded force-requeue grace window, TaskExecutor.markStuckAborted() now hard-cancels the hung task before clearing execution guards: spawned child agents are terminated, awaitAbortInFlightTaskWork() reaps API/step/workflow/configured-command/subagent/CLI surfaces, completed/in-progress steps are reconciled against committed branch state before any checkout deletion, the task worktree is removed with RemovalReason.ExecutorStuckKilled, stale in-memory worktree/loop/paused/stuck state is cleared, and then the task is moved back to todo with the configured preserveProgressOnStuckRequeue semantics. With preserve-progress enabled, committed step progress is retained; when the branch has no unique commits, affected steps are reset to pending before the worktree/branch are cleared so a retry cannot skip deleted uncommitted-only work. The path preserves the concurrent-recovery guard: if the latest task column is no longer in-progress, it only clears the execution guard and does not reap/remove resources that a self-healing recovery now owns. Task logs distinguish loop detection, compaction timeout, force-kill cleanup start, force-requeue, and cleanup completion/failure.
recoverMissingWorktreeReviewFailures()is a narrow failed-review recovery: onlystatus: "failed"in-reviewtasks with the explicit session-start signatureRefusing to start coding agent in missing worktree:(fromassertValidWorktreeSession()) are requeued. Recovery clears stale session metadata (worktree,branch,sessionFile, transient failure state), preserves valid step progress/retry counters, logs the auto-recovery reason, and moves the task back totodofor a clean retry.recoverMergeableReviewTasks()only re-enqueues truly eligible tasks; retry-exhausted review tasks are skipped to avoid re-enqueue/no-op loops that keep refreshingupdatedAt.recoverAlreadyMergedReviewTasks()auto-finalizes retry-exhaustedin-reviewtasks when self-healing can prove their work already landed on the merge target. On this landed-content path it clears soft blockers (paused, stalestatus: "failed", and residualerror) before moving todone; true hard blockers (for example incomplete steps, awaiting-user-review, or failed pre-merge workflow steps) still park the task in stablein-review/failedstate with a blocker error instead of entering an auto-finalize loop. Already-merged/tip recovery must prove task ownership before settingmergeDetails.mergeConfirmedor moving todone: accepted evidence is a matchingFusion-Task-Id, matchingFusion-Task-Lineage, a task-ID anchored conventional subject, or a patch-id/tree-equal fallback from the canonicalfusion/<task-id>branch whose tip and candidate commit are not explicitly attributed to another task/lineage. Foreign task tips (for example an FN-7143 row pointing at an FN-7187 tip) are rejected in place with[recovery] already-merged rejected ... reason=foreign-task-tipandtask:auto-recover-already-merged-rejectedaudit metadata instead of finalizing the wrong task.recoverTransientMergeFailures()handles retry-exhaustedin-reviewmerge failures only whenclassifyTransientMergeError()returns a bounded transient class:lease-handoff-target-not-queued,spurious-concurrent-advance-same-sha, orprocess-spawn-failure(spawn ENOTDIR,spawn … ENOENT, or a clean-room path reported asis not a working tree). Recovery resetsmergeRetries, clears transientstatus/error, incrementsmergeDetails.transientRecoveryCount, and requeues auto-merge so the next attempt recreates the AI-merge clean room. The budget stays capped byMAX_TRANSIENT_MERGE_RECOVERIES; exhausted tasks remain parked with themerger:transient-failure-budget-exhaustedaudit path so real structural failures cannot loop forever. FN-6278 makes this recovery mostly after-the-fact insurance for cwd spawn faults: the merge runner now preflights reuse integration roots and repairs/reacquires missing or de-registered task worktrees before the first git spawn, so a staletask.worktreeshould not consume the transient recovery budget by repeatedly producingspawn git ENOENT.reconcileTaskWorktreeMetadata()(FN-4962) reconciles staletask.worktree/task.branchrows against authoritativegit worktree list --porcelainbranch mappings during startup recovery, periodic maintenance, and completion fan-out. The stage must run beforereclaim-stale-active-branches: stale rows rebound to livefusion/<id>worktrees emittask:auto-recover-worktree-metadata-rebound; stale rows with no live branch mapping are nulled (worktree=null,branch=null,baseCommitShaunchanged) and emittask:auto-recover-worktree-metadata-cleared.recoverInProgressLimbo()(FN-5219) is the safety net for stranded executor rows: reset/requeue paths must never leave a task inin-progresswithout a runnable execution context. After metadata reconcile, stalein-progresstasks with null branch, missing/cleared worktree metadata, no live executor claim, and all-pending steps are audited and moved back totodo.
Orphan-only scope-violation auto-recovery
recoverOrphanOnlyScopeViolations() handles the narrow FN-4350 shape without weakening the file-scope invariant: it runs only when all of these predicates hold — task is column === "in-review"; task is failed (status === "failed", with engine/global pause both off); error evidence is a FileScopeViolation (tool_error agent-log payload from formatFileScopeViolationAgentLog, with task.error prefix fallback); task.scopeOverride !== true; task is not actively executing and mergeDetails.mergeConfirmed !== true. It then verifies the task's specific work is already on main using findAlreadyMergedTaskCommit (Fusion-Task-Id trailer / ancestry / patch-id / tree-equality proof). Only when staged files are orphan-only (no declared-scope overlap after excluding .changeset/*) and main-branch proof is positive does it finalize as a no-op (resolutionStrategy: "orphan-discard-no-op"), append an explicit auto-recovery log line, and tear down the task worktree so orphan staging is discarded.
Guardrails: this routine does not retry merges, does not apply to mixed/non-orphan staging, and does not run when no landed-work proof exists (FN-4280 class protection).
- FN-4285 decision: tree-equality recovery (
rev-parse <base>^{tree}==<task-branch>^{tree}) infindAlreadyMergedTaskCommitcloses stranded already-merged branches that evade trailer/ancestry/patch-id matching. FN-7220 tightens the guardrail: patch-id and tree-equal matches imply ownership only from the canonical task branch and are rejected when either the branch tip or candidate merge-target commit carries a foreign Fusion task/lineage trailer. - No-
fn_task_donerecovery classification is normalized across executor, restart recovery, and self-healing: detection keys on executor-emitted"without calling fn_task_done"strings (while still tolerating legacytask_donewording), then applies the bounded ladder deterministically (in-session retries → bounded todo requeues with preserved progress when appropriate → terminal surfaced failure when budget is exhausted). clearStaleBlockedBy()clearsblockedBy(and transientstatus) on todo tasks when their blocker is missing, done, archived, paused in-review, or failed in-review with merge retries exhausted. FN-3924 extends this with a dependency-integrity guard: if a task has explicit dependencies andblockedByis not one of the currently unresolved deps, the stale marker is cleared. FN-4091 broadens the sweep to activein-progressand un-pausedin-reviewtasks as well, but those repairs only nullblockedBy(they do not rewrite scheduler-owned queued state). FN-5488 adds two fast paths: (1) failed in-review blockers at/aboveMAX_AUTO_MERGE_RETRIESalways fan out unblock recovery with explicit reason codes, and (2)status="merging"|"merging-pr"blockers with no active merger owner are treated as unbacked after a short grace window (unbackedMergingFanoutGraceMs, default 60s) so manual retry/unpauseupdatedAtrefreshes cannot deadlock downstream todos indefinitely. Recovery logs now useAuto-recovered (FN-5488): ... reason=<code>for auditability while preserving FN-4538 overlap-blocking invariants.- FN-5624 suppresses transient worktree-local
.fusion/tasks/<id>/task.jsonENOENT session-start failures. When the missing file path is undertask.worktree, executor routes through unusable-worktree auto-recovery, skips persistingstatus: "failed"/erroron the task row, and emits[transient-task-json-suppressed] ... reason=missing-task-json-under-worktree. The corresponding self-healingAuto-recovered:log entry keeps notification suppression aligned with the existing/^Auto-recovered:/grace-window rule. inspectBranchConflict()now treats self-owned zero-attribution collisions as reclaimable (instead of foreign) when ownership is proven by task/worktree identity, so stranded self-branches do not enter unrecoverable loops.reclaimSelfOwnedBranchConflicts()includes pausedbranch-conflict-unrecoverabletasks (not just todo/in-progress), clearing paused/error state in one update and requeueing only when parked inin-review.- FN-6736 adds a phantom executor-binding liveness gate to the same reclaim path. When the only remaining veto is an in-memory
executor-active/live-worktree signal, the task isin-progress, the execution age is far beyond grace,checkedOutByis empty, no active heartbeat/agent row exists, and run-audit activity is stale, self-healing force-clears the phantom executor binding and requeues the task totodowith worktree and progress preserved. Live evidence still wins (FN-4811), missing-worktree limbo remains owned byrecoverInProgressLimbo()(FN-5219), and the path does not increment FN-5704 resume-limbo counters. - Together,
recoverAlreadyMergedReviewTasks(),clearStaleBlockedBy(), and paused-aware in-review scheduling prevent merge-deadlock loops by finalizing already-landed work, clearing stale dependency blockers, reclaiming self-owned conflicts, and avoiding paused review cards re-blocking overlap dispatch. - Merge commit attribution is ownership-aware: a
mergeDetails.commitShais trusted only when reachable fromHEADand attributable to the task viaFusion-Task-Idtrailer or task-ID-bearing subject. Reachable-but-unowned SHAs are rejected to prevent sibling done tasks from sharing misleading merge metadata. - FN-4948 adds a task-worktree pre-commit branch-identity guard: provisioning paths (
NativeWorktreeBackend.create, executor branch creation, andStepSessionExecutor.createStepWorktree) install apre-commithook plusfusion-task-idmetadata under the worktree's git-path. Commits are refused unless HEAD matchesfusion/<task-id>or the allowlist (fusion/step-<n>-<slug>by default). - FN-5089 adds an optional task-worktree
commit-msghook (default enabled viacommitMsgHookEnabled) installed by the same provisioning path; when enabled it appends the configured task attribution trailer (defaults toFusion-Task-Id: <task-id>) without duplicating existing trailers. Attribution remains branch/subject resilient when the hook is disabled. - FN-4948 extends contamination auto-recovery with an
obviously-misroutedbucket: foreign-attributed commits are auto-dropped only when attribution resolves to another task and every changed path is inside.changeset/fn-<foreign-id>-*.md. Any shared/non-namespaced path stays in the unique bucket and escalates to human adjudication. The single-attempt contamination invariant is unchanged. ProjectEnginesettings lifecycle handlers (project-engine.ts) treatenginePausedas a soft pause: clearing it dispatches runtime resume and, whenautoMergeis enabled, performs anin-revieweligibility sweep to requeue mergeable review tasks.UsageLimitPauser(usage-limit-detector.ts) andwithRateLimitRetry(rate-limit-retry.ts)
Worktree and naming helpers
WorktreePool(worktree-pool.ts) — idle worktree reuseWorktreeBackend(worktree-backend.ts) — abstraction for worktree operations used byacquireTaskWorktree.native(default) preserves existinggit worktreebehavior (including sibling-branch retry semantics), whileresolveWorktreeBackend(settings)selects worktrunk whensettings.worktrunk?.enabled === true.- Worktrunk path delegates five decisions with per-op timeouts:
create(120s),sync(180s),prune(60s),remove(60s), and layout resolution (5s). - Direct worktrunk CLI delegates:
create→wt switch --create ... --no-hooks --no-cd,remove→wt remove --foreground. - Fusion probes the canonical
wtbinary on$PATH; explicitworktrunk.binaryPathoverrides still win when operators pin a different location. - Worktrunk-aware fallback implementations where worktrunk lacks a dedicated primitive:
syncuses git fetch+rebase semantics, andpruneusesgit worktree list --porcelainplus per-branchremovecalls. - Layout precedence: when
worktrunk.enabled=true,resolveTaskWorktreePathForBackend(...)defers to backendresolveWorktreePath(...)(usingwt config show --format jsontemplate data with default{{ repo_path }}/.worktrees/{{ branch | sanitize }}fallback); otherwise it remains byte-identical to FN-4606resolveTaskWorktreePath(...)behavior. - Auto-install remains fail-closed while the pinned release manifest is
upstream-pending-verification: the pre-approved install path now rejects missing asset URLs/checksums instead of fabricating a local binary. This preserves the FN-4704/FN-4705 disabled-install contract until a human verifies a real upstream release manifest. - FN-5321 generalized this contract into
packages/engine/src/external-integrations/manifest.ts(validateExternalIntegrationManifest) plusKNOWN_EXTERNAL_INTEGRATIONS;packages/engine/src/__tests__/external-integrations-registry.test.tsenforces that every registered integration manifest validates, avoids duplicate-segment GitHub hallucinations, and carries canonical binary/upstream metadata. worktrunk.onFailurecontrols fail-hard vs fallback-native create behavior and emitsworktree:worktrunk-*run-audit events for create/fallback paths.
- Worktrunk path delegates five decisions with per-op timeouts:
WorktreeNames(worktree-names.ts) — deterministic worktree/branch naming
Observability and reflection
AgentLogger(agent-logger.ts) — structured per-agent run loggingRunAudit(run-audit.ts) — mutation audit tracking (DB/git/filesystem)- FN-7214:
task:reenter-paused-aborted-workflow-noderecords executor re-entry after a typed workflow graph node was interrupted by engine pause/resume. Metadata includesnodeId,fromColumn, retryattempt/maxAttempts,abortProvenance, whether the task was preserved inin-review, and the re-entrymode. - FN-7220:
task:classify-stale-in-review-plan-pause-abort-replayrecords executor classification of a stale genericin-reviewplan-node pause/resume replay. Metadata includesnodeId,fromColumn,abortProvenance, whether a stale failure was cleared,graphResumeRetryCount, andmode: "preserved-in-review". - FN-7220:
task:auto-recover-already-merged-rejectedrecords self-healing rejection of cross-task already-merged/tip metadata. Metadata includesreason(foreign-task-tip,foreign-lineage-tip, orforeign-landed-commit),phase, candidate SHA, candidate owner when known, task branch, and merge target branch. - FN-6782/FN-6796:
task:auto-recover-paused-abort-parkrecords self-healing recovery of pause-abort operator parks. Metadata includes the source column and whether recovery preserved a cleanin-reviewrow instead of requeueing totodo. - FN-7069:
task:reconcile-phantom-committed-reservationrecords task-store startup or self-healing cleanup of committed-reservation-without-task phantoms. Metadata includesreservationStatus: "committed"plus prunedactivityLogandagentscounts;runAuditEventsand the committed reservation are intentionally retained for auditability and ID permanence. - FN-7074:
task:reservation-commit-rolled-backrecords preventive create-path rollback when a distributed reservation was committed with the task-row insert but a later create materialization step failed. Metadata includes{ reservationId, nodeId, reason: "failed-create", error }; the task row/partial directory are removed and the reservation is moved toabortedso FN-7069 should not need to clean up a new phantom. - FN-4956: Layer 3 merge-conflict arbitration now scope-partitions conflicted files before AI resolution. Out-of-scope conflicts are deterministically resolved to the integration branch (
git checkout --ours) and unstaged, while only in-scope conflicts flow to AI. Integration branch defaults are resolved viaresolveIntegrationBranch(rootDir, settings). Audit events:merge:layer3:foreign-file-skippedandmerge:layer3:scope-override-bypass. - FN-5655 goal anchoring observability adds
database-domain mutation typesgoal:injection-applied,goal:injection-skipped, andgoal:retrieval-invokedso Slice 2 cite-rate tracking has a prompt-independent signal. Metadata uses counts/IDs only (count,lane,toolName, optionaltruncated/reason/notFound) and never stores prompt bodies or goal titles/descriptions. These events surface throughGET /api/agents/:id/runs/:runId/auditand support the existingstartTime/endTimefilters.
- FN-7214:
Key diagnostic points (log subsystem tags)
[self-healing]— startup/maintenance recovery pass outcomes.[worktree-metadata-reconcile]— FN-4962 staletask.worktree/task.branchrebind-or-clear decisions and audit emission failures.[scheduler],[executor],[merger]— core execution/dispatch/merge lanes.[insight-sweeper]— startup/periodic/drive-by stale insight-run recovery outcomes and fail-soft sweep errors.Notifier(notifier.ts) — legacy ntfy compatibility shim (NtfyNotifier) plus shared ntfy helpers- Runtime ownership:
NtfyNotifierno longer owns an independent task-lifecycle listener graph;ProjectEngineinjects the canonicalNotificationServiceinstance so task lifecycle notifications (task:created,task:moved,task:updated,task:merged) are emitted through a single path. - Merge dedup safety: all merge-success → done code paths (direct merger completion, owned/no-op auto-finalize, mergeConfirmed fast-path, PR-strategy finalize, and merge-success self-healing finalizers) emit
store.emit("task:merged", result)with a mergedMergeResult.NotificationService.notifiedEventsremains the single dedup source of truth, so duplicate upstream emits still produce exactly one canonicalmergedntfy lifecycle notification per task. - Compatibility scope:
NtfyNotifierremains responsible for gridlock-only compatibility notifications (notifyGridlock) and legacy helper APIs. - Legacy gridlock ntfy delivery is cooldown-throttled: first detection notifies immediately, subsequent detections are suppressed for 15 minutes (even if blocked-task membership changes), and the cooldown resets as soon as gridlock fully clears.
- Runtime ownership:
NotificationService(notification/notification-service.ts) — provider lifecycle + event dispatch orchestration- Subscribes to task lifecycle events plus mailbox and memory events.
task:createddispatchestask-createdonly whentask.sourceAgentIdis present (agent-created tasks, including fn task-create calls made by agents).message:sentdispatchesmessage:agent-to-userandmessage:agent-to-agentnotification events (with message metadata for deep-links), and manualPOST /api/memory/dreamprocessing emitsstore.emit("memory:dreams-processed", payload)when new DREAMS content is written. failedtask notifications are deferred behind a grace window (default 60s) and suppressed when recovery signals arrive (column=done,mergeDetails.mergeConfirmed=true, or status clear with anAuto-recovered:log). Persistent failures still emit exactly once after the window.
- Subscribes to task lifecycle events plus mailbox and memory events.
NotificationProviderinterface (@fusion/corenotification/provider.ts) — pluggable provider contract- Built-in providers:
NtfyNotificationProvider(notification/ntfy-provider.ts),WebhookNotificationProvider(notification/webhook-provider.ts) AgentReflection(agent-reflection.ts) — reflection extraction and persistence
Heartbeat execution
Implemented in agent-heartbeat.ts:
HeartbeatMonitorHeartbeatTriggerScheduler(timer, assignment, on-demand triggers)WakeContext/ per-agent runtime config support
Node/mesh runtime services
NodeHealthMonitor(node-health-monitor.ts) — remote node liveness/metrics checksPeerExchangeService(peer-exchange-service.ts) — peer sync orchestrationMeshLeaseManager(mesh-lease-manager.ts) — canonical abandoned-lease detection + recovery path
Outage ownership boundaries (degraded reads + queued write replay)
CentralCoreowns durable outage state in central persistence (meshSharedSnapshots+meshWriteQueue) and exposes stable assertion methods:recordMeshSnapshot,getLatestMeshSnapshot,enqueueMeshWrite,listPendingMeshWrites,markMeshWriteReplayStarted,markMeshWriteApplied,markMeshWriteFailed, andgetMeshDegradedReadState.PeerExchangeServiceowns retryability classification for sync/apply failures, queue insertion for retryable failures, replay execution (replayPendingWritesForNode(targetNodeId)), and observable sync results (queuedWriteId,replaySummary) for partition/replay assertions.NodeHealthMonitorprovides liveness transitions as replay hints only via deterministic recovery callbackonNodeRecovered(nodeId, previousStatus);onlineis a trigger to attempt replay, not proof that replay succeeded.- Dashboard mesh routes (
register-mesh-routes.ts) preserveGET /api/mesh/statearray shape and attach per-node degradedreadStatemetadata so stale fallback data is explicit during partitions.
Mesh task lease ownership and recovery
Task ownership is persisted in shared task metadata so all nodes agree on one canonical lease view. The persisted lease fields are:
checkedOutBy— owning agent id (compatibility field)checkedOutAt— lease acquisition timestamp (compatibility field)checkoutNodeId— owning node idcheckoutRunId— active owning heartbeat/executor run id when knowncheckoutLeaseRenewedAt— last successful lease renewal timestampcheckoutLeaseEpoch— monotonic fencing generation used to reject stale owners after recovery
AgentStore.checkoutTask() remains the compatibility entrypoint for ownership claims, but lease replacement is fenced by epoch semantics: only the same live owner can renew idempotently, and stale owner replacement is performed only through the recovery path.
MeshLeaseManager.recoverAbandonedLease(taskId, reason, context) is the single canonical abandoned-work path used by scheduler/self-healing/runtime orchestration. Recovery validates staleness, bumps checkoutLeaseEpoch, clears active-owner fields, logs the reason, and re-queues work for scheduler visibility.
A lease is recoverable only when there is no active local executor session for that task and either:
- the owning node is
offlineorerror, or - the owner heartbeat/run age exceeds
max(agentHeartbeatTimeoutMs * 2, 120_000)measured against the most recent lease renewal timestamp.
- Canonical replication/write-coordination contract:
docs/shared-mesh-protocol.md- Defines protocol versioning, write classes, quorum/ack semantics, lease epochs/fencing, offline queue/replay, reconciliation outcomes, restart recovery hooks, and degraded-read staleness metadata.
- Existing
/api/mesh/syncand settings-sync payloads remain the active exchange primitives while follow-on runtime tasks implement full v1 coordinator/quorum behavior.
- Distributed task-ID allocation (
packages/core/src/distributed-task-id.ts) is the first mesh-aware coordinated write primitive.- Durable state lives in SQLite tables
distributed_task_id_state(prefix sequence + authoritative committed count) anddistributed_task_id_reservations(reservation lifecycle rows). - Reserve/commit/abort execute under a process-local lock and a single SQLite transaction. Lazy reservation expiry cleanup runs inside those same transactions.
TaskStorealso uses a non-locking commit core inside its ownBEGIN IMMEDIATEcreate transaction so the reservationcommittedflip and authoritativetasksrow insert share one SQLite durability point. - Default reservation TTL is
15 * 60 * 1000ms (15 minutes). Expired/aborted reservations are burned IDs and are never reissued. If a post-insert create step fails after the reservation was committed (for exampletask.json/PROMPT.mddisk materialization, file-scope validation, or duplicate-intake tombstone checks), the failed-create rollback deletes the just-created task row/partial directory, moves the reservation toaborted, recomputes committed reservation counters, and emitstask:reservation-commit-rolled-back; the sequence stays burned for FN-5105 ID permanence. committedClusterTaskCountfrom allocator state is the only authoritative cluster-wide committed-task count. Local task-row counts and ID suffix math are not authoritative.- Store open reconciles every known prefix in
distributed_task_id_statetomax(current nextSequence, max(tasks suffix)+1, max(archivedTasks suffix)+1, max(reservation sequence)+1). This self-heals stale counters before ordinary task creation resumes. - Mesh allocator write routes (
/api/mesh/task-ids/reserve|commit|abort) return503when the coordinator node is unreachable; they never fall back to local-only cluster ID issuance.
- Durable state lives in SQLite tables
- Cluster task creation now uses a strong-write reserve → create → replicate → commit/abort sequence.
- Ordinary local task creation (
TaskStore.createTask(), duplicate, and refine flows) now allocates IDs through the same distributed reserve/commit/abort lifecycle owned byTaskStore; the invariant isdistributed_task_id_reservations.status = 'committed'iff a live durabletasksrow and task directory landed for that ID.applyReplicatedTaskCreate(...)remains a direct reserved-ID apply path and does not require a local reservation row. POST /api/tasksuses the store-owned allocator path for local creates rather than maintaining a separate route-local allocator implementation.POST /api/tasksreserves a distributed ID, creates the authoritative local task with that reserved ID, then POSTs authenticated replication payloads to peer nodes.- All create-class writes now use conflict-raising inserts, not SQLite
ON CONFLICT ... DO UPDATE. Existing task rows and.fusion/tasks/{id}contents always win over stale counters or colliding reservations. - Local create paths perform a final active+archived existence check immediately before insert. If a reserved
FN-*still collides, the reservation is aborted/burned and the create fails loudly instead of rewriting the existing task. - Creation self-heals stale overlap state at the route layer: if a reserved
FN-*collides with an existing task (Task ID already exists...or replicated-create collision), the route aborts that reservation, cleans up partial local state, reserves the next ID, and retries up to a bounded limit. - Replica apply uses
TaskStore.applyReplicatedTaskCreate(...), which is idempotent by task ID: replaying the same payload returns the existing task without creating duplicates. - If an incoming replicated payload conflicts with a different existing task record for the same ID, the apply path returns a deterministic collision error instead of overwriting data.
- Any replication/coordinator failure aborts the reservation and returns write failure (
503), so this path does not report success for local-only partial writes.
- Ordinary local task creation (
- Process lifecycle ownership:
fn serve/fn dashboardstart a single process-levelPeerExchangeServiceand stop it during shutdown.CentralCore.startDiscovery()is invoked from CLI startup only after HTTP bind completes so discovery advertises the actual listening port.InProcessRuntimestays project-scoped and intentionally does not own mesh startup/shutdown.
Remote access runtime
Operator setup + troubleshooting guide: Remote Access runbook.
remote-access/tunnel-process-manager.tsowns tunnel lifecycle orchestration withspawn-based, non-blocking process supervision.remote-access/types.tsdefines the runtime contract used by downstream API/TUI/headless layers:- Providers:
"tailscale" | "cloudflare" - Lifecycle states:
"stopped" | "starting" | "running" | "stopping" | "failed" - Error codes:
invalid_config,start_failed,stop_failed,switch_failed,readiness_timeout,process_exit, etc.
- Providers:
remote-access/provider-adapters.tsprovides provider-specific command composition + readiness parsing while enforcing config validation.- Cloudflare has two command variants:
- Named tunnel mode:
cloudflared tunnel --no-autoupdate run <tunnelName>(token from env) - Quick tunnel mode:
cloudflared tunnel --url http://localhost:<dashboardPort>(ephemeraltrycloudflare.comURL, no token)
- Named tunnel mode:
- Credential inputs are reference-based (
tokenEnvVar,credentialsPath) and validated without logging raw secret values. - Redaction is applied to command previews and emitted log lines before publishing status/log events.
- Deterministic stop semantics: graceful shutdown (
SIGTERM) first, bounded wait, then force-kill fallback (SIGKILL). - Safe provider switching is stop-first: active provider fully stops before target start is attempted; failed starts emit
switch_failedterminal status. ProjectEngine.start()instantiates a per-project tunnel manager and applies startup restore policy fromremoteAccess.lifecycle:- restore is attempted only when
rememberLastRunningis true, a prior-running marker exists, provider config is valid, and runtime prerequisites are available. - restore skips/failures are non-fatal to engine startup and clear stale running markers to avoid restart loops.
- restore is attempted only when
- Manual lifecycle remains explicit: only
startRemoteTunnel()/stopRemoteTunnel()transitions mutate runtime state; provider/settings updates do not auto-start tunnels. ProjectEngineexposes restore diagnostics viagetRemoteTunnelRestoreDiagnostics()(applied|skipped|failed+ machine-readable reason).
Multi-runtime support + IPC
- Runtime contracts:
project-runtime.ts - Orchestration:
ProjectManagerandHybridExecutor - Runtime implementations:
runtimes/in-process-runtime.tsruntimes/child-process-runtime.tsruntimes/remote-node-runtime.ts
- IPC protocol/transport:
ipc/ipc-protocol.tsipc/ipc-host.tsipc/ipc-worker.ts- worker entrypoint:
runtimes/child-process-worker.ts
6) Dashboard Package (@fusion/dashboard)
Server layer
- Entry exports:
packages/dashboard/src/index.ts - Main server factory:
createServer()inpackages/dashboard/src/server.ts - Primary API router:
createApiRoutes()inpackages/dashboard/src/routes.ts
Key server capabilities:
- REST APIs for tasks, git, GitHub, agents, missions, planning, automations/routines, settings
- System stats snapshot and vitest process controls APIs (
GET /api/system-stats,POST /api/kill-vitest) exposing dashboard process/system telemetry (including app CPU percentage and host memory rendered as numeric values, radial gauges, and trend sparklines in the Command Center System area), task/agent aggregates, and manual vitest process termination. Host-memory usage is derived from shared OS-available memory (process.availableMemory()with an unreliablefreememfallback) rather than raw free pages so macOS inactive/cache memory is not counted as used. - Command Center analytics APIs (
GET /api/command-center/tokens,/tools,/activity,/productivity,/team,/github,/signals,/plugin-activations,/live) are project-scoped dashboard routes./productivityreads Lines changed from nullabletask_commit_associations.additions/deletionsmerge-time or backfilled diff stats, derives estimatedhoursSavedfrom that LOC via the exportedHUMAN_LINES_PER_HOURrate, and keeps the unavailable sentinel for both fields when no in-range association has stats.POST /api/command-center/productivity/backfill-locis the explicit operator-triggered, dry-run-defaulting local-git backfill for historical NULL stats; it is not run during dashboard rendering or analytics reads. ItstaskDurationpayload aggregates done tasks whoseexecutionCompletedAtfalls in the selected range, using positivetasks.cumulativeActiveMsvalues for completed count, average, median, p90, and total active execution time; missing qualifying durations remain unavailable rather than zero./signalsaggregates real localincidentsrows for total/open/resolved counts, MTTR, and source/severity/status breakdowns and returns honest empty/unavailable sentinels instead of synthetic signal volume./plugin-activationsaggregates persisted plugin/extension load events for the selected range and returns unavailable when no rows exist instead of treating missing history as zero activations.
- Model pricing & cost estimation: Command Center token cost is derived at read time by
packages/core/src/model-pricing.tsand is not persisted as billing truth. Maintainers still update the built-inMODEL_PRICINGfallback table in that file; keys are lowercased${provider}:${model}with a bare:modelfallback for callers that only know the model id. Codex runs store theopenai-codexprovider, so those rates must be keyed explicitly asopenai-codex:*(for exampleopenai-codex:gpt-5-codex) rather than relying on the OpenAI provider or bare-model fallback, otherwise Command Center shows their cost asunavailable. Each entry stores USD per 1M tokens for input, output, cache-read, and cache-write plus asourcecitation. BumppricingAsOfin the same change as any built-in rate edit, because the dashboard surfaces it as the prices as of date and marks entries low-confidence afterPRICING_STALE_AFTER_MS(approximately 180 days / two quarters) relative to that date. GlobalmodelPricingOverridesfrom Settings take precedence over built-ins using the same exact-key then bare-model lookup order;POST /api/command-center/pricing/fetchis the only dashboard network path and fetches LiteLLM's model pricing JSON on explicit user action, parses it through the pure core parser, persists the resulting overrides with fetched metadata, and leaves the prior overrides intact on fetch/parse failure. Unknown models resolve tounavailablerather than a guessed price. - Remote access APIs (
/api/remote/*) for provider config, activation, tunnel lifecycle, status, token issuance, authenticated URL generation, and QR payload generation- Operational runbook (prereqs/security/troubleshooting):
docs/remote-access.md /api/remote/tunnel/start,/api/remote/tunnel/stop, and/api/remote/tunnel/kill-externalcover tunnel lifecycle and external funnel cleanup./api/remote/statusincludes tunnel status, external funnel detection (externalTunnelwhen managed tunnel is stopped), plus restore diagnostics (restore.outcome+restore.reason) with parity between dashboard and headlessfn serveruntimes.
- Operational runbook (prereqs/security/troubleshooting):
- Remote auth handoff endpoints:
POST /api/remote-access/auth/login-url(daemon-auth protected) issues a tokenized phone-login URL for eitherpersistentorshort-livedmode.GET /remote-login?rt=<token>(public) validates remote token strategy and redirects to dashboard auth handoff (/?token=<daemonToken>when daemon auth is enabled, otherwise/).- Invalid/missing/expired remote tokens return
401JSON with deterministic codes:remote_token_invalid,remote_token_missing,remote_token_expired.
- Chat APIs (
/api/chat/*) with streaming response support (routes.ts,chat.ts) - Dev-server lifecycle + persistence APIs (
/api/dev-server/*) backed by:dev-server-routes.ts(router factory + per-project runtime registry)dev-server-process.ts(DevServerProcessManagerfor spawn/stop/restart/url-detection)dev-server-store.ts(durable.fusion/dev-server.jsonstate + log ring buffer)dev-server-detect.ts(project/workspace script auto-detection + confidence scoring)- Note: this hyphenated
dev-server-*family is the canonical runtime owner today; seedocs/dev-server-module-boundary-audit.mdfor the FN-2212 boundary/consolidation audit covering paralleldevserver-*modules.
- Plugin management routes (
plugin-routes.ts) - Insights routes (
insights-routes.ts) - Evals routes (
evals-routes.ts) —/api/evalsread surface for eval result listing/filtering, drill-down detail, and eval run metadata - Research routes (
research-routes.ts) —/api/researchsurface for runs, details, cancel/retry, exports, create-task, and attach-task actions; supports graceful degradation envelopes via availability payloads when capabilities are unavailable - Plugin-defined roadmap routes under
plugin-routes.tsdispatch (/api/plugins/fusion-plugin-roadmap/...) - Project-scoped store reuse via
project-store-resolver.ts - Rate limiting (
rate-limit.ts) - Static SPA hosting (Vite build output)
Runtime diagnostics logging contract
- Dashboard/server runtime diagnostics use the shared
RuntimeLoggercontract (packages/dashboard/src/runtime-logger.ts) instead of ad hocconsole.*calls. createServer()acceptsServerOptions.runtimeLogger; when omitted it defaults to a console-backed logger, preserving readable output in non-TTY/headless modes.- CLI TTY dashboard sessions inject a logger backed by
DashboardLogSink, so runtime diagnostics from server/routes are captured in the TUI log buffer. - Sensitive remote-auth material is never logged raw; route/UI responses mask persistent token values unless explicitly requested by token-generation actions.
- Short-lived remote auth tokens are runtime-ephemeral (in-memory only, cleared on process restart) and TTL-enforced server-side against persisted
remoteAccess.tokenStrategy.shortLived.ttlMsplus issued expiry metadata. - Remote login links carry auth material in query params (
rtthentokenon redirect). Treat links/QR screenshots as secrets: they can leak through history, screenshots, and chat logs; prefer short-lived mode for sharing. - Intentional startup/banner text in
fn dashboardandfn serveremains direct plain output for readability and backward-compatible scripting behavior.
Headless Node Mode (fn serve / fn daemon)
- Headless runtimes auto-register the current working directory as a project when it is missing from central registry metadata, then continue normal engine startup.
- First-run auto-bootstrap logs one line:
[serve] Auto-registered project "<name>" at <cwd>(or[daemon] ...). - Primary engine binding order is:
--project <id|name>→ centraldefaultProjectId→ cwd project (if registered/started) → first started engine in registry iteration order. - This enables startup from arbitrary launch directories (systemd, Docker, parent directories, symlinked paths) without requiring cwd to be a registered project.
--no-auto-registerstill disables cwd registration, but startup only exits when zero engines start across the registry.
Real-time channels
- SSE:
/api/events(sse.ts)- Emits
task:*, mission events, AI session updates, automation schedule events (schedule:created,schedule:updated,schedule:deleted,schedule:run), and research run lifecycle events (research:run:created,research:run:updated,research:run:completed,research:run:failed,research:run:cancelled) when available - Project-scoped: resolves project context from query param or engine manager
- Canonical maintainer contract (ownership/lifecycle/scoping/pitfalls and shared-vs-dedicated stream boundaries):
docs/dashboard-realtime.md
- Emits
- Chat streaming:
/api/chat/sessions/:id/messages(routes.ts+chat.ts)- Streams assistant responses as SSE events for chat sessions
doneevents include the authoritative persisted assistant message snapshot (message) so clients can render final output even when incrementaltextdeltas are absenterrorevents now allow either the legacy string payload or a structured failure payload matching persistedmetadata.failureInfo; direct-chat clients normalize both shapes and render failures inline in the thread
- Chat session queries:
/api/chat/sessions(routes.ts)- Existing list behavior is unchanged (
status=active|archived|allreturns an array) - Quick Chat resume uses targeted lookup params:
agentId, optionalmodelProvider+modelId, plusresume=1 - Validation requires
modelProviderandmodelIdtogether; partial model pairs return400 - Targeted lookup returns only the newest matching active session (or
null) to avoid scanning every active session client-side
- Existing list behavior is unchanged (
- Chat Room API:
/api/chat/rooms*(register-chat-room-routes.ts)GET /api/chat/rooms→200 { rooms }; query supportsprojectId,status, andagentIdPOST /api/chat/rooms→201 { room, members }; validatesname, returns409on slug collisionsGET/PATCH/DELETE /api/chat/rooms/:id→ room read/update/delete (404for unknown room)GET/POST/DELETE /api/chat/rooms/:id/members[/:agentId]→ member list/add/remove (400for invalid body,404for unknown room/member)GET /api/chat/rooms/:id/messages+POST /api/chat/rooms/:id/messages+DELETE /api/chat/rooms/:id/messages+DELETE /api/chat/rooms/:id/messages/:messageId- Room message POST persists the user room message (
201 { message }), rejects non-nullsenderAgentIdfor user submissions, then triggers server-side room responder execution that persists assistant room replies viachatStore.addRoomMessage(...)
- Room message POST persists the user room message (
POST /api/chat/rooms/:id/attachmentsuploads a room-scoped attachment file and returns{ attachment }metadata (400invalid mime/size,404missing room)GET /api/chat/rooms/:id/attachments/:filenamestreams uploaded room attachments with path-traversal protectionPOST /api/chat/rooms/:id/messages/:messageId/attachmentsrecords attachment metadata on an existing room message- Error contract follows existing API patterns:
400validation failures,404missing resources,409duplicate-slug conflicts,503when chat store is unavailable - SSE fan-out on
/api/eventsnow includes:chat:room:created,chat:room:updated,chat:room:deleted,chat:room:member:added,chat:room:member:removed,chat:room:message:added,chat:room:message:updated,chat:room:message:deleted - Room test coverage (planned): FN-3812 tracks the contract-first test matrix for room creation/switching, persisted history, mention routing, and hybrid responder behavior. See
.fusion/tasks/FN-3812/test-plan.mdplus scaffold files:packages/core/src/__tests__/chat-store.rooms.test.ts,packages/dashboard/src/__tests__/chat.rooms.test.ts,packages/dashboard/src/__tests__/chat-routes.rooms.test.ts, andpackages/dashboard/app/components/__tests__/ChatView.rooms.test.tsx.
- Task log stream:
/api/tasks/:id/logs/stream(server.ts)- SSE endpoint for live task log streaming with project scope resolution
- Dev-server stream:
/api/dev-server/logs/stream(dev-server-routes.ts)- SSE stream emits
history,log,stopped, andfailedevents - initial connection replays persisted
logHistoryand then follows live process output - companion endpoints:
/api/dev-server/detect,/config,/status,/start,/stop,/restart,/preview-url
- SSE stream emits
- Badge WebSocket:
/api/ws(server.ts,websocket.ts)- Scope-keyed channels (
badge:{scopeKey}:{taskId}) prevent cross-project collisions
- Scope-keyed channels (
- Terminal WebSocket:
/api/terminal/ws(server.ts,terminal-service.ts)- Project-scoped terminal session validation + safe unscoped fallback
Frontend SPA layer
- App entry:
packages/dashboard/app/main.tsx - Root composition:
packages/dashboard/app/App.tsx - Core board components:
Board.tsx,Column.tsx,TaskCard.tsx,TaskDetailModal.tsx,ListView.tsx - Board column ordering (board view only):
todocards mirror scheduler pickup order (priority descending, thencreatedAtascending/FIFO within each priority tier, then task ID ascending).triage,in-progress, andarchiveduse priority descending then task ID ascending, with missing/invalid priority normalized tonormal.doneis completion-recency ordered (columnMovedAt, thenupdatedAt, thencreatedAt, newest first). Inin-review, merge-active tasks (status === "merging","merging-pr", or"merging-fix") are pinned above non-merging tasks, with priority-then-ID ordering within each group.
Refinement task routing
fn_task_refinecreates child tasks incolumn: "triage"withsourceType: "task_refine"and a dependency on the source task. Refinements are never routed directly totodo.- Refinements still require normal triage specification (PROMPT.md with valid
File Scope) before execution routing. - To prevent starvation under large same-priority planning backlogs (FN-4647 pattern), triage polling now prefers
task_refinerows over non-refinement rows as an ordering tiebreaker within the same priority band. - Starved refinement self-healing sweep (Lane B):
SelfHealingManager.recoverStarvedRefinementTriageTasks()runs in startup + maintenance sweeps and targetssourceType: "task_refine"tasks still intriage(statusnull|planning) that are unpaused, not actively planning, older thanSTARVED_REFINEMENT_RECOVERY_GRACE_MS(10m), and have observed peer board progress (STARVED_PEER_PROGRESS_THRESHOLD=3non-refinement tasks advanced totodoafter the refinement was created). Remediation is a bounded one-step priority nudge (no direct move-to-todo) with cooldown idempotency (STARVED_REFINEMENT_ESCALATION_COOLDOWN_MS = grace*4) and run-audit emissiontask:auto-recover-starved-refinementincluding{ taskId, ageMs, peerProgressCount, escalation }metadata. - Approval semantics are unchanged: with
requirePlanApproval=true, refinements stop atstatus: "awaiting-approval"; otherwise they move totodoafter spec finalization. - Regression coverage lives in
packages/engine/src/__tests__/triage-refinement-routing.test.tsand locks four guarantees: bounded promotion under backlog pressure, approval-gate preservation, PROMPT-before-todoinvariant, and unchanged baseline ordering for non-refinement-only triage sets. - Task detail surface is shared through
TaskDetailContent(exported fromTaskDetailModal.tsx): desktop/tabletListViewrenders it inline in the split right pane, while mobile and non-list entry points continue usingTaskDetailModal. - In desktop split mode,
ListViewnow uses a compact sidebar-first control layout (count/actions/summary chips + collapsible "View options" panel) to keep list controls dense alongside the inline detail pane; mobile keeps the card-first flow with a toolbar "View options" entry point for the same visibility/filter toggles. - Chat system UI:
ChatView.tsx,QuickChatFAB.tsx - Planning/insight UI:
MissionManager.tsx,TodoView.tsx,InsightsView.tsx,DocumentsView.tsx(roadmap view is plugin-owned) - Dev server UI:
DevServerView.tsx(controls + status/log panel + embedded preview with iframe fallback messaging)
CSS Architecture
The dashboard's CSS is split between a consolidated global stylesheet and modular per-component files:
- Global stylesheet (
packages/dashboard/app/styles.css, ~4,500 lines)- Design tokens (spacing, colors, shadows, transitions, fonts)
- Primitive component classes (
.btn,.card,.modal,.form-input) - Cross-component
@mediaoverrides and breakpoint definitions
- Per-component stylesheets (56+ files in
packages/dashboard/app/components/)- Each component has a co-located
ComponentName.cssfile - Each
ComponentName.tsximports its stylesheet:import "./ComponentName.css"; - Component-specific CSS rules live in the component's
.cssfile, not in the root stylesheet
- Each component has a co-located
Lazy-loaded views (bundle size optimization):
The following 15 views are lazy-loaded via React.lazy() with <Suspense fallback={null}>:
AgentsView,TodoView,NodesView,ChatView,MemoryView,ResearchViewDevServerView,InsightsView,DocumentsView,SkillsViewSetupWizardModal,PluginManager,PiExtensionsManager, `AgentDetailView
A prefetchLazyViews() function runs once on mount via requestIdleCallback to warm chunks. Do not make these views eager — bundle size is carefully managed.
Key hooks
- Task + realtime:
useTasks.ts,useBadgeWebSocket.ts,useAiSessionSync.ts - Chat:
useChat.ts,useQuickChat.ts - Documents/insights/memory:
useDocuments.ts,useInsights.ts,useMemoryBackendStatus.ts,useMemoryData.ts - Plugin roadmap state/hooks: owned by
plugins/fusion-plugin-roadmap/src/dashboard/* - Dev server:
useDevServer.ts(status hydration, command controls, reconnect stream handling, project-scope reset) - Project/agents/setup:
useProjects.ts,useCurrentProject.ts,useAgents.ts,useSetupReadiness.ts - UX/platform helpers:
useFavorites.ts,useAuthOnboarding.ts,useDeepLink.ts,useTerminal.ts
Planning and decomposition features
- Backend planners:
planning.ts,subtask-breakdown.ts(roadmap suggestion generation is plugin-owned) - UI modals:
PlanningModeModal.tsx,SubtaskBreakdownModal.tsx, milestone interview flows - Multi-task creation endpoints are wired under planning/subtask routes in
routes.ts
Health and monitoring endpoints
- Health check:
GET /api/health- Returns liveness status for load balancers and monitoring
- Response:
{ status: "ok" | "degraded", version: string, uptime: number, database: { healthy: boolean, corruptionDetected: boolean, corruptionErrors: string[], isRunning: boolean, lastCheckedAt: string | null }, taskIdIntegrity: { status: "ok" | "anomaly", checkedAt: string | null, anomalies: [...], recommendedAction: string | null } } - Startup does not block on full
PRAGMA integrity_check(100); Fusion schedules it in the background shortly after boot. - Background integrity checks are deduplicated process-wide per on-disk SQLite path: multiple
Databaseinstances sharing the samefusion.dbjoin one shared run, and each instance still updates the underlying integrity state (integrityCheckPending,integrityCheckLastRunAt,corruptionDetected,integrityCheckErrors) that maps todatabase.isRunning,database.lastCheckedAt,database.healthy,database.corruptionDetected, anddatabase.corruptionErrors. - Self-healing watches
store.getDatabaseHealth()during maintenance. Each fresh corruption detection emits atask:auto-db-corruption-detectedrun-audit database event and attempts adb-corruption-detectednotification through the active notification service (or the ntfy fallback) with a one-hour cooldown between repeats until the health state clears. POST /api/health/refreshrecomputes the task-ID integrity section on demand and returns the same top-level shape, including the current database corruption fields.- No authentication required
Custom Provider endpoints
Custom-provider settings routes are registered in register-custom-provider-routes.ts.
| Method | Path | Description |
|---|---|---|
| GET | /api/custom-providers | List configured custom providers from global settings with API keys masked in the response payload. |
| POST | /api/custom-providers | Create a custom provider (name, apiType, baseUrl, optional apiKey and models) and return the new provider with masked API key. |
| PUT | /api/custom-providers/:id | Update an existing custom provider by ID (partial updates supported) and return the sanitized provider payload. |
| DELETE | /api/custom-providers/:id | Delete a custom provider by ID and return a success envelope. |
Project/node path-mapping endpoints
| Method | Path | Description |
|---|---|---|
| GET | /api/projects/:id/path-mappings | List persisted per-node absolute paths for a project (projectNodePathMappings rows keyed by projectId + nodeId). |
| GET | /api/projects/:id/path-mappings/:nodeId | Fetch one project↔node mapping row. |
| PUT | /api/projects/:id/path-mappings/:nodeId | Upsert one mapping row (path body field must be absolute). |
| DELETE | /api/projects/:id/path-mappings/:nodeId | Delete one mapping row if present. |
| GET | /api/nodes/:id/path-mappings | List all project mappings known for a node. |
This API surface is intentionally separate from projects.nodeId (runtime host placement metadata) and from task-level routing defaults (defaultNodeId / Task.nodeId).
Dashboard node onboarding (AddNodeModal → useNodes.register) uses a two-phase flow:
- Register node metadata first via
POST /api/nodes. - Persist selected project↔node path mappings with per-project
PUT /api/projects/:id/path-mappings/:nodeIdupserts.
The client treats mapping persistence as part of onboarding success. If mapping writes fail after node creation, onboarding attempts rollback via DELETE /api/nodes/:id and refreshes node state to avoid a silent half-configured node.
Node settings sync and update-check endpoints
| Method | Path | Description |
|---|---|---|
| GET | /api/nodes/:id/settings | Fetch settings from a remote node. |
| POST | /api/nodes/:id/settings/push | Push local settings to a remote node. |
| POST | /api/nodes/:id/settings/pull | Pull settings from a remote node. |
| GET | /api/nodes/:id/settings/sync-status | Get sync status and diff summary (includes actionableDenialReason when remote probe fails). |
| POST | /api/nodes/:id/auth/sync | Sync model auth snapshots (push/pull, checksum/version validated). |
| POST | /api/nodes/:id/secrets/push | Push local secrets snapshot to a remote node. |
| POST | /api/nodes/:id/secrets/pull | Pull secrets snapshot from a remote node. |
| POST | /api/settings/sync-receive | Receive pushed settings (inbound). |
| POST | /api/settings/auth-receive | Receive AuthMaterialSnapshot and persist via auth storage. |
| POST | /api/secrets/sync-receive | Receive pushed secrets payload (inbound). |
| GET | /api/secrets/sync-export | Export local secrets sync envelope for remote pull flows. |
| GET | /api/settings/auth-export | Export local AuthMaterialSnapshot. |
| GET | /api/update-check | Read cached/TTL-guarded npm update status for @runfusion/fusion (respects updateCheckEnabled). |
| POST | /api/update-check/refresh | Clear cached update data and force a fresh npm update check. |
| GET | /api/updates/check | Perform an on-demand npm registry check for the latest @runfusion/fusion version (no cache). |
When adding a new node settings/auth sync endpoint, add it to the ENDPOINTS catalog in packages/dashboard/src/__tests__/routes-nodes-sync-contract.test.ts so the auth/error/payload parity matrix covers it. Inbound sync endpoints (including /api/secrets/sync-receive and /api/secrets/sync-export) must validate Authorization: Bearer <apiKey> against the local node API key.
Agent stats endpoint
| Method | Path | Description |
|---|---|---|
| GET | /api/agents/stats | Aggregate agent/task stats used by operator summaries and capacity-risk signaling. Returns activeCount, assignedTaskCount, completedRuns, failedRuns, successRate, plus idleNonEphemeralCount (idle agents excluding ephemeral/runtime workers via isEphemeralAgent) and todoTaskCount (tasks currently in Todo). |
Docker provisioning endpoints
Initial container provisioning and lifecycle routes are registered by register-docker-provisioning-routes.ts.
| Method | Path | Description |
|---|---|---|
| POST | /api/docker/provision | Provision and start a managed Docker container. |
| POST | /api/docker/deprovision | Stop/remove a managed Docker container. |
| POST | /api/docker/containers/:containerId/start | Start an existing container. |
| POST | /api/docker/containers/:containerId/stop | Stop a running container. |
| POST | /api/docker/containers/:containerId/restart | Restart a container. |
| GET | /api/docker/containers/:containerId/status | Read runtime status for a container. |
Mesh configuration and post-provision managed-node operations are registered separately in register-docker-node-routes.ts (for example /api/docker/nodes/:managedId/apply-mesh-config and /api/docker/nodes/:managedId/mesh-status).
Run Audit API
The run-audit system records every mutation performed by the engine across four domains:
- Database — task:create, task:update, task:move, etc. Node handoff/recovery emits structured events:
node:handoff:parked(handoff denied/parked),node:handoff:reassign-local(local takeover approved),node:handoff:reassign-any(any-healthy takeover approved), andnode:lease:recovered(abandoned lease cleared and task requeued). - Database /
overseer:intervention(FN-7519, emission façade FN-7520, runtime wiring FN-7551) — the planner-overseer intervention timeline's single canonical mutation type.targetis the task ID; metadata carries the six intervention field groups (stage,reason,action,outcome, optionalattemptCount/attemptLimit, optionalsourceLinks). Written only viarecordPlannerInterventionand read viagetPlannerInterventionTimeline/parseInterventionEntry(packages/core/src/planner-intervention.ts) so no parallel audit store or timeline-mapping exists; surfaced read-only in the task-detail Intervention Timeline (GET /tasks/:id/overseer/interventions). FN-7520 adds the canonicalemitOverseerObservation/emitOverseerSteering/emitOverseerRecoveryAttempt/emitOverseerRetry/emitOverseerConfirmation/emitOverseerEscalationemission façade (packages/core/src/planner-overseer-events.ts) that fixes each decision-point category'saction/defaultoutcomeand funnels throughrecordPlannerIntervention— the single seam FN-7511/FN-7512/FN-7513 call rather than emittingoverseer:interventionevents inline. FN-7551 wires this façade to the LIVE runtime:ProjectEngine'sPlannerOverseerMonitor#onObservationcallback (deduped per(taskId, stage:signal)),buildPlannerRecoveryHandlers(steering/retry/targeted-fix/confirmation-request, plusPlannerRecoveryController's new optionalonConfirmationResolvedhook for the approve/deny resolution), andpollPlannerOverseer's bounded-recovery-exhaustion escalation (deduped per(taskId, stage)) — so the intervention timeline now reflects real engine activity, not only synthetic unit-test entries. - Git — worktree:create, worktree:remove,
worktree:remove-fallback(metadata{ fallback: "filesystem-non-empty", error }when native git removal falls back to filesystem removal + admin prune), commit:create, merge:resolve, merge:audit-failure,worktree:reanchored, and worktrunk lifecycle events (worktree:worktrunk-install|create|sync|prune|remove, plusworktree:worktrunk-fallback,worktree:worktrunk-failure, andworktree:worktrunk-fallback-native). Worktrunk events share metadata{ op, binaryPath?, worktreePath?, durationMs?, exitCode?, stderrPreview?, installSource?, prunedCount? }withinstallSource("release-binary" | "cargo") limited to successfulworktree:worktrunk-installevents andprunedCountlimited to successful prune events when known.worktree:worktrunk-installis emitted only for true install actions; cache hits, configuredworktrunk.binaryPathoverrides, and$PATHresolutions intentionally remain silent. Dirty post-merge audit outcomes emitmerge:audit-failurewith metadata{ mode, strategy, action, reason, issueCount, duplicateSubjectCount, touchedFileOverlapCount, verificationPassed, auditTargetLabel }. FN-5279 addsmerge:reuse-handoff-acquired,merge:reuse-handoff-refused,merge:reuse-handoff-released, andmerge:reuse-handoff-deferred-to-worktrunkfor task-worktree auto-merge handoff visibility. FN-5351 addsmerge:integration-worktree-state(pre-handoff checkout/dirty snapshot for resolved integration branch),merge:cwd-integration-fallback-refused(terminal refusal park event), andmerge:integration-ref-advance(integration ref advance outcome telemetry). - Git /
merge:file-scope-violation— emitted by the merger whenFileScopeViolationErroraborts a squash.targetis the task ID; metadata includesstagedFiles,declaredScope,resetLabel,stagedFileCount, anddeclaredScopeCount. Consumed byfileScopeInvariantFailuresPerDayinGET /api/health/reliability(FN-4360). - Git /
merge:no-op-attribution-mismatch— emitted by the rebase landed-files attribution guard (FN-5304) when<rebaseBaseSha>..HEADhas zero attributable own commits but the sourcefusion/<id>tip still carries attributable own commits.targetis the task ID; metadata includesrecordedSha,rebaseMergeBaseSha,sourceBranchRef,sourceBranchOwnCommitCount, andsourceBranchOwnCommitShas. - Git /
merge:no-op-attribution-mismatch-skipped— emitted when the FN-5304 source-tip guard cannot run because the source branch ref is unavailable (for example already pruned).targetis the task ID; metadata includesreason("source-ref-unavailable"). - Database /
task:auto-recover-misrouted-foreign-commit— emitted per dropped misrouted commit during FN-4948 contamination recovery.targetis the recovering task; metadata carries{ droppedSha, foreignTaskId, paths }. - Database /
task:no-commits-finalize-blocked-incomplete-steps— emitted by no-op finalize lanes when anoCommitsExpectedtask has no net branch changes but incomplete/skipped steps outweigh done steps. Metadata includes{ reason, doneCount, incompleteCount, lane, classification?, baseRef? }; the accompanying task log explains that the task was demoted totodowith progress preserved instead of finalized as done. - Database /
task:orphan-detected-no-action— emitted byrecoverOrphanedExecutions(FN-5337) when row metadata looks orphaned after grace windows; annotation-only event with no lifecycle mutation (in-progresstask stays put). - Database /
task:reattach-orphaned-execution— emitted byreattachOrphanedAssignedExecutions(FN-6336) when self-healing re-dispatches an idle assignedin-progresstask forward viaexecutor.resumeTaskForAgent(agentId)after proving the assigned agent has no active heartbeat run or active execution. - Database /
task:reconcile-stale-agent-assignment— emitted when self-healing or heartbeat reconciliation clears stale durableAgent.taskId/statefor a task parked intodo/triagewithout live execution proof. Metadata includes{ agentId, taskId, taskColumn, agentState, status, blockedBy, overlapBlockedBy, hadFreshRun, hadActiveExecution, reason }; task queue/lease fields are preserved. - Database /
task:soft-delete-column-reconciled— emitted byreconcileSoftDeletedColumnDrift(FN-5566, re-land FN-5446) when a soft-deleted row (deletedAt IS NOT NULL) is found with legacycolumn != 'archived'; rewrites onlycolumn(no resurrection), with metadata{ previousColumn }. - Database /
session:runtime-resolved— emitted once percreateResolvedAgentSessioncall with metadata{ sessionPurpose, runtimeId, wasConfigured, provider, modelId, mockProviderActive, testModeActive, runtimeHint? }for per-lane runtime/provider attribution. - Database /
task:reconcile-dependency-blocking-lease— emitted byreconcileDependencyBlockingLeases()(FN-6292) when self-healing rebounds anin-progressholder totodobecause an unmet dependency is blocked by the holder's stale file-scope lease. Metadata includes the dependency ID, blocked-by marker, and unmet dependency list. - Database /
task:reconcile-in-review-unmet-dependencies— emitted byreconcileInReviewUnmetDependencies()(FN-6793/FN-6797) when self-healing rebounds anin-reviewtask to blockedtodobecause one or more declared dependencies are still unmet. Metadata includesunmetDeps,blockedBy, and prior review status; the-no-actioncompanion is emitted when task pause/user-pause,autoMerge:false, live execution/checkout proof, or a failed rebound mutation prevents the backward move. - Database /
task:reconcile-orphaned-task-dir— emitted byTaskStore.reconcileOrphanedTaskDirs()(FN-6783) when store open or self-healing Batch 1 re-imports a valid live.fusion/tasks/{ID}/task.jsondirectory with no SQLite task row anywhere. Metadata includes the recovered ID, column, status, and task JSON path. - Database /
task:*-no-actionbackward-move family (FN-5335) — backward self-healing sweeps now emit annotation-only events when triple proof fails instead of mutating lifecycle state. New mutation types:task:reclaim-pr-conflict-no-action,task:reclaim-self-owned-branch-conflict-no-action,task:auto-rebound-scope-decay-no-action,task:finalize-no-op-review-no-action,task:stale-incomplete-review-no-action,task:ghost-review-no-action,task:stuck-merge-deadlock-no-action,task:no-progress-no-task-done-no-action,task:missing-worktree-review-no-action,task:partial-progress-no-task-done-no-action,task:reconcile-dependency-blocking-lease-no-action. Seedocs/self-healing-backward-move-audit.mdfor per-stage disposition. - Filesystem — file:write, prompt:write, attachment:create, etc.
- Sandbox — backend lifecycle events from
SandboxBackendwiring in executor/merger/routine-runner (sandbox:prepare,sandbox:run,sandbox:failure,sandbox:fallback) introduced after FN-4636.
Events are tied to specific run IDs for end-to-end traceability.
For scheduler concurrency diagnostics, the queued reason now names the active limiter(s) and usage (for example gate=maxConcurrent ...). The reason includes the bindingGates (maxConcurrent/maxWorktrees/semaphore), per-gate { used, limit, slack }, holders, and computed available. holders.maxConcurrent and holders.maxWorktrees are current in-progress task IDs; holders.semaphore mirrors that set but semaphore slots can also be consumed by triage/merge agents outside in-progress. So if semaphore.used exceeds the visible holder list, that usually indicates non-execution agents are legitimately consuming shared capacity (not stale accounting). maxWorktrees is also enforced inside TaskStore.moveTaskInternal when committing an allocated move into in-progress, making it a hard active execution worktree cap even when workflow WIP/maxConcurrent would allow more tasks. These queued-reason logs are transition-only: a newly emitted line indicates the limiter signature changed or the condition cleared and later reappeared, not that a poll loop simply observed the same blocked state again.
Run audit endpoints:
GET /api/agents/:id/runs/:runId/audit— Returns audit trail for a specific agent run- Query params:
?domain=database|git|filesystem|sandboxfor filtering - Requires agent ownership or admin access
- Query params:
GET /api/health/reliability— Aggregates rolling reliability metrics from run-audit and task activity signals.- Query params:
?windowDays=<1..30>(default7) - Response shape:
{ windowDays, generatedAt, headline, perDay, duration, mergeAttempts }where missing instrumentation/samples surface asnullwith areasonfield.
- Query params:
7) CLI Package (@runfusion/fusion)
Command entrypoint
packages/cli/src/bin.ts- Bootstraps environment
- Parses global flags (including
--project) - Routes subcommands (
task,project,settings,git,backup,mission,agent,message, etc.)
Command modules
packages/cli/src/commands/*- Task operations, settings, git wrappers, backup operations, project/node management
- TUI component (
packages/cli/src/commands/dashboard-tui/)- Ink-based terminal UI (status panel, logs, cursor visibility, tail-follow)
- Merged from former
@fusion/tuipackage - Invoked as part of the
fncommand (no separate package orpnpm tuicommand)
Project selection
packages/cli/src/project-resolver.ts- Resolution order: explicit
--project→ CWD detection (.fusion) → default/fallback logic - Integrates
CentralCoreandProjectManager
- Resolution order: explicit
Pi extension
packages/cli/src/extension.ts- Registers tool set for in-chat task/mission operations
- Uses
TaskStoredirectly for extension-side actions
Binary identity
- Published package defines
fnbinary (packages/cli/package.json) - Running
fnwith no arguments defaults to dashboard (web UI by default)
8) Storage Architecture
Fusion uses a hybrid storage model.
Per-project storage
- SQLite DB:
.fusion/fusion.db - Filesystem blobs (task-local artifacts):
.fusion/tasks/{TASK_ID}/PROMPT.md.fusion/tasks/{TASK_ID}/agent.log.fusion/tasks/{TASK_ID}/attachments/*
SQLite schema is initialized in packages/core/src/db.ts and uses:
- WAL mode (
PRAGMA journal_mode = WAL) - Foreign keys (
PRAGMA foreign_keys = ON) __meta.lastModifiedfor change detection/polling
Central storage (multi-project)
- Central DB:
~/.fusion/fusion-central.db - Schema in
packages/core/src/central-db.tsprojects,projectHealth,centralActivityLog,globalConcurrency,nodes,peerNodes,projectNodePathMappings,settingsSyncState,__meta
projectHealth.inFlightAgentCountandglobalConcurrency.currentlyActiveare persisted slot/health bookkeeping fields. They are not live read-layer running-agent counts; dashboard and CLI read surfaces derive current running agents from tasks incolumn === "in-progress"while leaving slot acquire/free semantics and DB column names unchanged.
Memory files
- OpenClaw-style memory workspace:
.fusion/memory/MEMORY.md.fusion/memory/YYYY-MM-DD.md.fusion/memory/DREAMS.md
- The legacy top-level memory file is migration-compatibility only (seed/alias behavior) and is not canonical storage.
File-based side stores
Some data remains intentionally filesystem-based:
- Agent instruction bundles and heartbeat markdown:
.fusion/agents/*(AgentStore)
Agent/message/approval metadata and history now persist in SQLite tables.
Migration from legacy file storage
- Detection + migration:
packages/core/src/db-migrate.ts - Migrates legacy task/config/log/archive/automation/agent data into SQLite
- Creates
.bakbackups (for exampletask.json.bak,config.json.bak,archive.jsonl.bak)
Archive system
- Archived task snapshots are stored in SQLite
archivedTasks TaskStorearchive helpers:archiveTaskAndCleanup()cleanupArchivedTasks()readArchiveLog()/findInArchive()unarchiveTask()with restore behavior
9) Task Lifecycle
Lifecycle constants are defined in packages/core/src/types.ts:
- Columns:
planning,todo,in-progress,in-review,done,archived - Transition rules via
VALID_TRANSITIONS
Lifecycle flow
planning
│ (Planning processor writes PROMPT.md)
▼
todo
│ (Scheduler selects task, dependencies satisfied)
▼
in-progress
│ (TaskExecutor runs in worktree)
▼
in-review
│ (implementation complete + pre-merge workflow steps)
▼
done
│
└──────────────▶ archived
Execution detail
- Planning phase: the planning processor generates an executable plan
- Execution phase:
TaskExecutorperforms implementation, tool calls, tests/build commands - Review phase: optional
reviewStep()workflow depending on prompt review level (bypassed in fast mode) - Merge phase:
aiMergeTask()handles merge strategy and post-merge workflow steps
Fast Mode: Tasks with
executionMode: "fast"bypass thereview_steptool injection and pre-merge workflow steps. Completion blockers (tests, build, typecheck from PROMPT.md) and post-merge workflow steps remain enforced.
Step status model
Task steps use statuses: pending, in-progress, done, skipped.
Workflow steps
- Defined in project config as
WorkflowStep - Pre-merge steps run in executor (
runWorkflowSteps()) — bypassed in fast mode - Post-merge steps run in merger (
runPostMergeWorkflowSteps())
Task pause ownership
- Only explicit user actions pause ordinary tasks: the dashboard/CLI task pause controls and manual
in-progress → todomoves. System safety pauses remain reserved for explicit approval waits and bounded guardrails such as token-budget, worktrunk-failure, and dispatch-oscillation protection. - Agent pause/sleep and heartbeat recovery never pause assigned tasks. Assigned tasks stay in their current column and retain their existing
paused/pausedByAgentIdstate so the scheduler can re-dispatch unpaused work and user-paused work remains intentionally parked. - Only explicit user unpause actions may clear
task.userPaused; engine self-healing, heartbeat/agent resume cascades, and approval resume paths must leave user-paused tasks parked.
User cancel via move-to-todo
TaskStore.moveTask()acceptsmoveSource: "user" | "engine"(default"engine") and emitstask:movedwithsourceso listeners can distinguish manual moves from engine rebounds.- Manual
in-progress → todomoves (dashboard route/tasks/:id/movewithmoveSource: "user") atomically settask.userPaused = true; engine/default rebounds do not. - Any move to
in-progressclearstask.userPausedin the same store write so explicit redispatch resumes normally. TaskExecutortreats manualin-progress → todoas hard cancel: it marks the task as user-canceled, aborts active session types before dispose/termination, and suppresses preserve-resume auto-bounces while loggingExecution canceled by user — leaving task in todo.- Scheduler dispatch loop skips
todotasks withuserPaused === true(queues with a user-paused reason) until a user explicitly moves the task back toin-progress.
Stalled review detection
@fusion/core computes a heuristic task.stalledReview signal during task hydration (both slim board listings and full/detail reads in TaskStore) by scanning recent task log activity.
Current heuristics (see packages/core/src/stalled-review-detector.ts):
- Reenqueue churn (
heuristic: "reenqueue-churn"): at leastSTALLED_REVIEW_REENQUEUE_THRESHOLD(3) matches ofSTALLED_REVIEW_REENQUEUE_PATTERNwithinSTALLED_REVIEW_WINDOW_MS(60 minutes). - Invalid-transition loop (
heuristic: "invalid-transition-loop"): at leastSTALLED_REVIEW_INVALID_TRANSITION_THRESHOLD(2) matches ofSTALLED_REVIEW_INVALID_TRANSITION_PATTERNin logaction/outcomewithin the same window.
Detection is visibility-only: no scheduler/self-healing actions are triggered by this field. The dashboard TaskCard renders a Stalled badge for in-review tasks when task.stalledReview is present, with the heuristic reason in the tooltip.
Tune sensitivity by adjusting the exported constants in stalled-review-detector.ts. Increase thresholds to reduce noise; decrease thresholds only with incident evidence, because lower values can over-flag transient recovery bursts.
Workflow-defined columns & traits (experimentalFeatures.workflowColumns)
Behind the workflowColumns flag (accessor: packages/core/src/workflow-columns-settings.ts). With the flag off the legacy pipeline above is authoritative and untouched. The flag default-flips only when the graduation report (below) shows zero drift — a field decision, not yet taken.
Engine as substrate, workflows as policy. The flag inverts the architecture: the engine becomes a capability substrate (worktree/git/session mechanics, persistence, crash recovery, audit, machine resource ceilings — non-configurable) and workflows carry the operating logic as composable column traits. The mechanism/policy line (KTD-4):
- Substrate (engine-owned, never workflow-configurable):
AgentSemaphore, checkout leases, worktree/git/session ops, SQLite + WAL, the crash-recovery machinery, the audit trail, the global max-sessions cap, and the three non-configurable lost-work merge guards (no siblingfusion/fn-*target, line-anchored attribution, nomodifiedFilesclear on a no-op finalize). - Policy (workflow/trait-owned): transition validity, WIP/capacity, hold/release, drag meaning, retries, merge strategy, squash posture, file-scope enforcement mode.
Transition authority. moveTaskInternal remains the single transition authority. Flag-on, it swaps the VALID_TRANSITIONS lookup for workflow-resolved column-graph validation (resolveAllowedColumns/workflowHasColumn in workflow-transitions.ts) plus sync trait guards run in-lock; rejections are typed TransitionRejections. VALID_TRANSITIONS and the closed Column/COLUMNS helpers in types.ts are @deprecated while the flag exists — retained as the flag-off authority and the parity oracle, not yet removed.
Trait model. A trait is declarative flags + optional lifecycle hooks (guard, gate, onEnter, onExit, releaseCondition), resolved through one registry (trait-registry.ts, built-ins in builtin-traits.ts). Sync guard and the complete/archived flags are built-in-only; plugin traits (KTD-7) get async hook points only and route through the prompt-session/script machinery. Composition conflicts are rejected at save both in the editor and server-side (assertColumnTraitsValid in createWorkflowDefinition/updateWorkflowDefinition, surfaced as a 400). Capacity is enforced in-txn (KTD-10), never bypassable — not a guard. Enter/exit effects run post-commit, idempotent, guarded by the transitionPending marker; a throwing/missing plugin hook degrades (audit) and never strands the card or wedges the lock.
Graduation. The flag default-flip is gated by computeWorkflowColumnsGraduationReport() (workflow-parity.ts; store method TaskStore.computeWorkflowColumnsGraduationReport), aggregating: five-invariant dual-observe parity, default-workflow transition parity vs VALID_TRANSITIONS (checkTransitionParity), and the U6 dual-accept marker/column disagreement count. ready is true only when all gates pass over a non-empty observation window. The report is the gate; it does not flip the flag.
Step inversion: steps as workflow-modelable nodes (experimentalFeatures.workflowGraphExecutor)
The columns/traits track moved board policy (transitions, capacity, hold, merge orchestration) onto the substrate/policy line. The step-inversion track extends the same inversion to task steps and to the task shape itself, riding the existing workflowGraphExecutor flag (orthogonal to workflowColumns). With the flag off — and for the default coding workflow always — step policy stays exactly as it is today (the monolithic execute seam, PROMPT.md ### Step N: parsing, in-session fn_review_step verdicts, RETHINK git-reset/session-rewind). The default workflow is the byte-identical parity oracle; inversion is opt-in via custom workflows and a built-in stepwise coding workflow.
One new substrate seam pair. The substrate gains exactly one new capability, expressed as two methods: runTaskStep(task, stepIndex) (run exactly one step inside the task's session and observe its complete Step N commit) and resetStepToBaseline(task, stepIndex, baselineSha, checkpointId?) (the RETHINK mechanics — git reset + session rewind + updateStep(...,"pending")). Both delegate to existing code (extracted from StepSessionExecutor and the legacy RETHINK block); neither reimplements step physics or authors commits. The substrate owns how a step runs and resets; the graph owns when. Baseline/checkpoint state, previously fragile in-memory Maps lost on restart, moves into persisted instance run-state (workflow_run_step_instances, schema v108).
Everything else becomes authored graph structure (policy). Step granularity, per-step plan/code review, the verdict→action mapping, rework/escalation routing, parallelism, and even the existence of PROMPT.md stop being engine law:
- A
parse-stepsnode reads a workflow-declared artifact (PROMPT.md is just the default workflow's declaredstep-sourceartifact) and runs a registry parser (step-headings,json-steps, or a plugin-contributed parser) to writeTask.steps[]. It is the only graph-side step-list writer and must dominate anyforeach. Parsers fail closed to a routableoutcome:parse-error. - A
foreach(source:"task-steps")node instantiates an inline template subgraph once per planned step, withmode(sequential/parallel) andisolation(shared/worktree) as explicit axes and per-instance run-state pinned + persisted for crash-safe resume. - Resume-limbo graph failures are retried only through a narrow persisted counter (
Task.graphResumeRetryCount, max 2). The executor classifies a failure as transient only when it happens immediately after the engine restart/unpause resume log marker, reports no graphreason, has no completed step progress, and the task has no durablelastError/failureReason; it clears transientstatus/error, logs the auto-retry, and schedules one more graph execution. Any explicit graph reason, completed step progress, durable task error, missing resume marker, or exhausted counter remains a genuinestatus:"failed"disposition and goes to review handoff, preserving the FN-5704 anti-loop contract. - Paused graph exits are benign only while the task is still in
in-progress; that is the user-pause/engine-pause state where preserving the pause without requeueing is intentional. If the graph reports a pause/abort exit after the task has already advanced to another live column (for examplein-reviewafter an unpause/resume race),TaskExecutor.handleGraphFailure()surfaces the boundary as operator-actionable failure evidence (status:"failed"/errorwhen no failure is already present, plus a task-log entry) and does not move, rewind, or auto-merge the task unless the graph result carries the typed interrupted-node marker. The exceptions are typed in-flight node pause aborts (FN-7214), completed/no-commit finalize-to-review teardown (FN-6625/FN-6644/FN-6647), and benign merge-seam pause/resume aborts (FN-6735). For FN-7214 node aborts,hard-canceland liftedglobal-pauseprovenance can re-enter the interrupted node through the boundedgraphResumeRetryCountpath; explicituserPaused, active global pause, merge/finalize provenance, genuine node failures,autoMerge:falsehuman-gated review rows, retry-exhausted tasks, and already-confirmed merges still use the protected operator-action path. For completed finalize handoff, once the persisted task row proves a completed finalize handoff (non-in-progress, all steps done/skipped, no live pause/status/error, and the finalize-to-review log entry), a trailing graph abort resolves as an already-advanced benign graph exit even if volatile completion markers were cleared by teardown/restart and later abort provenance was re-marked fromcompletion-finalizetohard-cancel. For merge-seam aborts,in-reviewtasks with no persisted status/error and no confirmed merge may re-enter bounded auto-merge retry only when the failed graph node is a merge/request-merge seam, the graph value is not conflict/contamination/foreign/retry-exhaustion evidence, project settings allow auto-merge processing (or the task is a shared-branch local integration member), and the merge retry budget is not exhausted.doneandarchivedremain terminal and keep their column/status, while existing failure details are preserved. - A
step-reviewnode surfaces reviewer verdicts (APPROVE/REVISE/RETHINK/UNAVAILABLE) as outcome edges;reworkedges (the only legal graph cycles, bounded per instance) route REVISE/RETHINK back tostep-execute, with RETHINK traversal triggering the reset seam. - A
codenode runs sandboxed TypeScript (esbuild + child process, clamped timeout, no store handle) for arbitrary computed routing/field logic — the same trust tier as project-local script steps.
Task.steps[] stays the physical projection sink. Instance lifecycle transitions write through store.updateStep with explicit indices (projection-first ordering closes the merge-blocker race), so every existing consumer — the merge-blocker, dashboard/TUI step display, reconcileStepsFromGitHistory, lost-work reset — keeps working unchanged. Git reconcile remains authoritative over the instance rows (rows are corrected to match git, never the reverse).
Task shape recast. The task model reduces to core fields (title, description) + standard metadata + workflow-defined custom fields (typed, enum options, render hints; values in tasks.customFields, validated through one store authority with typed rejections). Field-schema edits orphan rather than destroy values. This round ships the field system; recasting existing built-in fields (priority, labels) onto it is a deferred, additive follow-up.
Invariant bar. The five lifecycle invariants (FN-5147 terminal-until-merged, hard-cancel, in-review stall, file-scope, squash) plus the lost-work guard trio remain the non-configurable correctness bar on the stepwise path. The v108 migration is additive; instance rows are prunable; flag-off rollback mid-task converges via the existing fell-back + git-reconcile recovery (the projection is always git-reconcilable).
10) Agent System
Fusion has two complementary agent models:
- Task pipeline agents (planning/executor/reviewer/merger) managed by engine runtime
- Persistent registered agents managed by
AgentStore
Persistent agent storage
packages/core/src/agent-store.ts persists to:
.fusion/agents/{id}.json.fusion/agents/{id}-heartbeats.jsonl.fusion/agents/{id}-keys.jsonl.fusion/agents/{id}-revisions.jsonl.fusion/agents/{id}/avatar.{ext}(uploaded avatar image file, served via/api/agents/:id/avatar)
Agent spawning from executor
TaskExecutor supports hierarchical child agents via:
createSpawnAgentTool()runSpawnedChild()terminateChildAgent()/terminateAllChildren()
Limits are controlled by project settings (maxSpawnedAgentsPerParent, maxSpawnedAgentsGlobal).
Heartbeat monitoring and triggers
agent-heartbeat.ts provides:
- Health monitoring and run tracking (
HeartbeatMonitor) - Trigger scheduling (
HeartbeatTriggerScheduler) for:- timer
- task assignment
- on-demand runs
- Assignment triggers skipped because a heartbeat run is already active are deferred and re-fired from
HeartbeatMonitor.onRunCompleted, preserving the existing completion recovery path while avoiding timer-dependent stalls.
Custom instructions
packages/engine/src/agent-instructions.ts resolves per-agent instruction text/path with path-traversal and extension validation.
Planner overseer monitoring (records-only)
/*
FNXC:PlannerOversight 2026-07-04-00:00:
FN-7511 delivers the monitoring foundation for a planner-oversight layer that watches an in-flight task's
lifecycle without steering it. packages/engine/src/planner-overseer.ts declares the five watched stages
(OVERSEER_WATCHED_STAGES: executor, reviewer, merger, pull-request, workflow-gate), a normalized
OverseerStageObservation model, and a resolveWatchedStage(task) resolver with deterministic precedence
(workflow-gate > pull-request > merger > reviewer > executor) so a task in a compound state resolves to
exactly one stage. PlannerOverseerMonitor#observeTask(task, level) is the gating seam: when the task's
effective planner oversight level (resolveEffectivePlannerOversightLevel, FN-7508/FN-7509/FN-7510) is
"off", nothing is recorded; otherwise exactly one observation is recorded into a bounded per-task ring
buffer (default cap 20) and the optional onObservation callback is invoked best-effort.
*/
ProjectEngine constructs a PlannerOverseerMonitor alongside PrMonitor and exposes it via
getPlannerOverseer(). A bounded setInterval poll (45s cadence, cleared on stop()) walks the
current in-progress/in-review tasks, resolves each task's effective planner oversight level, and
calls observeTask — skipping tasks that resolve to "off" or to no watched stage. Observations for
tasks that leave the in-flight set are dropped from the ring buffer on the next poll.
This layer is records-only: no lifecycle mutation, retry, merge, notification, or external-service call happens here, and it emits no run-audit events or dashboard UI. Steering/recovery, confirmation gates, human-control safeguards, and dashboard/UI/run-audit surfaces are deferred to FN-7512 through FN-7520; this module is the seam those subtasks read observations from.
Planner overseer bounded autonomous recovery (FN-7512)
/*
FNXC:PlannerOversight 2026-07-04-12:00:
FN-7512 builds the bounded autonomous-recovery layer on top of FN-7511's observation seam. When the
task's effective planner oversight level resolves to "autonomous", the planner overseer may take ONE
of three bounded corrective actions on the task's currently watched stage:
- inject_guidance — post a planner-authored steering comment into the active agent lane.
- retry_step — re-enqueue a stuck/failed step via the existing store retry/re-enqueue path.
- request_targeted_fix — post a steering comment tagged as a targeted-fix request, referencing
the observation's specific error source link.
At every other effective level (
"off"/"observe"/"steer") the decision is always"none"— this layer is completely inert unless oversight is"autonomous". */
packages/core/src/planner-recovery.ts declares the shared, engine-free recovery vocabulary:
PlannerRecoveryActionKind (inject_guidance | retry_step | request_targeted_fix | none),
PlannerRecoveryObservation (a structural mirror of FN-7511's OverseerStageObservation so the engine
can pass one straight through with no adapter), PlannerRecoveryAttemptState, PlannerRecoveryDecision,
and the pure, never-throw decidePlannerRecovery(input). Decision rules, in order:
- No observation, or
oversightLevel !== "autonomous"→"none". - The per-
(taskId, watchedStage)attempt count has reachedPLANNER_RECOVERY_MAX_ATTEMPTS(default3, mirroringMAX_RECOVERY_RETRIESinrecovery-policy.ts) →"none",exhausted: true— the layer stops autonomously and the task is left for escalation (FN-7514+ owns the human-control story). merger/pull-requeststages →"await_confirmation"(FN-7513) withrequiresConfirmation: true,sideEffectClass: "merge_pr": these require confirmation and are surfaced, never dispatched, by the bounded layer itself — see "Planner overseer confirmation gate (FN-7513)" below.reviewerstage →"inject_guidance".executor/workflow-gatestage withsignal === "failed"→"request_targeted_fix"when a source link carries a specific fixable error (failed-check/merge-error), else"retry_step".- Any other
executor/workflow-gatesignal (stuck/blocked/progressing/awaiting-human) →"inject_guidance".
packages/engine/src/planner-recovery-controller.ts's PlannerRecoveryController is the dispatcher,
mirroring the AutoRecoveryDispatcher + StuckTaskDetector handler-injection conventions: it holds an
in-memory per-(taskId, watchedStage) attempt registry, calls decidePlannerRecovery, and — only when
an action other than "none" is chosen — dispatches through injected PlannerRecoveryHandlers
(injectGuidance / retryStep / requestTargetedFix, all optional and async), incrementing the
attempt count only on a successful dispatch. tick(task, ctx) is a no-op (returns null) when
task.userPaused === true or when there is no active observation, and never throws — any handler or
snapshot-provider error degrades to a no-op.
ProjectEngine wires one concrete PlannerRecoveryController alongside its PlannerOverseerMonitor,
reusing ONLY existing mechanisms — no new session/tool/merge channel:
injectGuidance/requestTargetedFix→store.addSteeringComment(taskId, text, "agent")(the same channel the executor's real-time injection listener already watches).retryStep→store.moveTask(taskId, "todo", { preserveProgress: true, moveSource: "engine" })— the same in-progress→todo retry/re-enqueue path auto-recovery and self-healing already use.
controller.tick(task) is called from the SAME bounded 45s poll FN-7511 uses for observeTask, guarded
so it only runs when the resolved effective level is "autonomous" (every other level already
continues before reaching the tick). Attempt state for a task is cleared (controller.clear(taskId))
whenever the task leaves the in-flight in-progress/in-review set, alongside the FN-7511 observation
ring buffer.
Explicit scope boundaries (owned by later subtasks, not this layer): merge/PR actions and
destructive/external-service side effects (FN-7513, confirmation-gated); comprehensive human-pause /
autoMerge:false / human-review terminal safeguards beyond the bare userPaused skip (FN-7514); a
persisted intervention timeline (FN-7519); run-audit/activity events (FN-7520); and any dashboard UI
(FN-7515+).
Planner overseer confirmation gate (FN-7513)
/*
FNXC:PlannerOversight 2026-07-04-13:00:
FN-7513 adds the safety gate deciding which FN-7512 recovery-layer actions may run autonomously versus
which must be blocked behind an explicit, recorded human approval. Merge/PR progression (advancing a
merge, promoting a shared branch, retrying/forcing a merge, opening/updating/merging a pull request) and
any destructive or external-service side effect (branch/worktree deletion, force operations, remote
pushes, third-party GitHub/GitLab calls) are classified confirmation-required, regardless of the
effective oversight level. Bounded recovery (inject_guidance / retry_step / request_targeted_fix on
non-merge/PR stages, FN-7512) is unaffected and remains no-confirmation. The invariant: a gated action
NEVER executes without a recorded, approved PlannerConfirmationRequest.
*/
packages/core/src/planner-confirmation.ts declares the classifier vocabulary:
PlannerActionSideEffectClass("bounded_recovery" | "merge_pr" | "destructive_external").PlannerConfirmationRequest—{ requestId, taskId, watchedStage, sideEffectClass, proposedAction, reason, sourceLinks, requestedAt, status: "pending" | "approved" | "denied", resolvedAt?, resolvedBy? }. Conceptually mirrorsTaskMergeDetails.mergeConfirmed(an explicit human approval precedes a side effect) but is its own record — it never reads or writesmergeConfirmed, which stays owned by the merge dispatch path.classifyPlannerActionSideEffect({ watchedStage, proposedAction })— pure, deterministic, never-throw.merger/pull-requeststage actions beyond guidance/retry →"merge_pr"; an explicit allow-list of destructive/external action names (branch/worktree delete, force push/merge/ delete, remote push, GitHub/GitLab/external-service calls, PR open/merge, shared-branch promotion) →"destructive_external"regardless of stage; everything else →"bounded_recovery". Malformed input or an unrecognized non-bounded action on a non-merge/PR stage fails CLOSED to"destructive_external"rather than silently allowing an unclassified action through.requiresPlannerConfirmation(sideEffectClass)—truefor"merge_pr"/"destructive_external",falsefor"bounded_recovery".
packages/core/src/planner-recovery.ts's decidePlannerRecovery now calls the classifier for every
branch and returns requiresConfirmation / sideEffectClass / (for gated decisions) proposedAction on
PlannerRecoveryDecision. The merger/pull-request branch, previously "none", now returns action: "await_confirmation" naming what would run on approval (advance_merge / advance_pull_request); every
other rule (level gate, attempt bound, exhaustion) is unchanged.
packages/engine/src/planner-recovery-controller.ts's PlannerRecoveryController adds the gate:
- A per-
(taskId, watchedStage)pending-confirmation registry (getPendingConfirmations(taskId)) — idempotent:ticknever creates a second pending request for a stage that already has one pending. requestConfirmation(task, request, ctx)(optional handler) — records/surfaces the pending request; it must NOT perform the side effect itself.executeMergePrAction(taskId, request, ctx)/executeDestructiveExternalAction(taskId, request, ctx)(optional handlers) — invoked ONLY fromresolveConfirmation(..., "approved", ...), never fromtick.resolveConfirmation(taskId, requestId, "approved" | "denied", resolvedBy?)— on"approved", dispatches the matching execution handler exactly once and clears the pending request; on"denied", clears the request with no side effect, leaving the task for other escalation (FN-7514+), AND consumes one bounded-recovery attempt for that(taskId, watchedStage)pair (the same sharedPLANNER_RECOVERY_MAX_ATTEMPTSbudgetdispatch()consumes). Without counting denials against the budget, a denied merge/PR/destructive confirmation would resurface as an identical pending request on the very nexttick()forever; counting it means repeated denials eventually exhaust the stage (decidePlannerRecoverythen returnsaction: "none", exhausted: true) instead of re-prompting indefinitely. Never throws — handler rejections are logged and swallowed, and the request is still cleared.tick(task, ctx): whendecidePlannerRecoveryreturnsrequiresConfirmation: true, callsrequestConfirmation(idempotently) and does NOT invoke any side-effecting handler; bounded-recovery decisions still dispatch exactly as FN-7512 (with the attempt increment). The"autonomous"-only gate and theuserPausedskip are preserved.clear(taskId)also clears pending confirmations (in addition to attempt state) on terminal task transitions.
ProjectEngine wires the concrete handlers in buildPlannerRecoveryHandlers: requestConfirmation
posts a [planner-oversight] confirmation required (...) steering comment (reusing the same
addSteeringComment channel as bounded recovery, so a human sees it). executeMergePrAction branches
on request.proposedAction (falling back to request.watchedStage defensively) rather than treating
every approved "merge_pr" request identically: ONLY "advance_merge" (the merger stage) reuses the
EXISTING store.mergeTask(taskId) merge mechanism; "advance_pull_request" (the pull-request stage)
is intentionally a no-op today because no reusable PR-specific advance mechanism exists yet — an
approved PR confirmation must never fall through to a direct task merge/cleanup, which would bypass the
PR workflow entirely. No executeDestructiveExternalAction is wired yet, since FN-7511's observation
model does not currently emit a destructive-action signal; a future task can wire one (and the
PR-specific execution handler) using existing safe helpers when a concrete need arises.
Downstream ownership (not this layer): rendering the pending-confirmation UI/badge (FN-7515+/
FN-7517), comprehensive human-control safeguards beyond userPaused (FN-7514), a persisted intervention
timeline (FN-7519), and run-audit/activity events (FN-7520) all consume the data this gate exposes but
are implemented elsewhere.
Planner overseer human-control guard (FN-7514)
FN-7514 supplies the comprehensive human-control safeguard the FN-7512/FN-7513 layers deferred: the
overseer must be fully inert — no steering, retry, targeted-fix, or FN-7513 confirmation-required action
(merge/PR progression, destructive/external-service side effect) may fire, and no pending confirmation
may even be recorded — whenever a task is (a) user-paused, or (b) ineligible for auto-merge processing
per the FN-5147 autoMerge:false / PR-based human-review terminal contract.
packages/engine/src/overseer-human-control-policy.ts exports the pure predicate
evaluateOverseerHumanControl(task, settings) (no I/O, mirrors the recovery-policy.ts style), returning
{ withhold: boolean; reason?: "user-paused" | "auto-merge-off-human-review" }. It reuses
allowsAutoMergeProcessing from @fusion/core VERBATIM for the auto-merge-off half — never re-derives
the predicate inline. For the pause half, it distinguishes:
- Explicit user pause:
task.userPaused === true, ORtask.paused === truewith NOtask.pausedReason(thefn_task_pausetool /TaskStore.pauseTasknever stamps apausedReason). - Engine/self-healing park (NOT user pause):
task.paused === trueWITH apausedReason(every self-healing park path — branch-conflict-unrecoverable, token_budget_exceeded, in-review-stall-deadlock, worktrunk_operation_failed, etc. — always stamps one).
PlannerRecoveryController.tick() (planner-recovery-controller.ts) consults this guard FIRST — before
the snapshot lookup, before decidePlannerRecovery, before FN-7513's confirmation classification. When
withheld, tick() returns null immediately (same contract as the prior bare userPaused check) and
never reaches the point where a pending PlannerConfirmationRequest could be created. A snapshot lookup
AFTER the withhold decision (read-only, for audit metadata only — stage/oversightLevel — never feeding
back into the decision) feeds an optional recordHumanControlWithheld handler, which ProjectEngine
wires to a bounded RunAuditor.database({ type: "overseer:oversight-withheld-human-control", ... })
no-action event (metadata: { taskId, reason, stage, oversightLevel }). The controller dedupes this
emission per (taskId, withheld reason) — a task stuck in the same withheld state across many poll
cycles emits the event once, not on every tick; a reason change (or clear(taskId) on terminal
transition) re-arms it.
ProjectEngine.pollPlannerOverseer fetches global Settings once per poll cycle (not per task) and
threads it through ctx.settings to tick(), so the guard's allowsAutoMergeProcessing check sees the
same settings self-healing already gates lifecycle mutation on.
Downstream ownership (not this layer): the dashboard UI/badges surfacing withheld state (FN-7515+), a persisted intervention timeline (FN-7519), and richer run-audit/activity presentation (FN-7520).
Planner overseer runtime-state exposure (FN-7531)
/*
FNXC:PlannerOversight 2026-07-04-00:00:
FN-7531 closes the data-exposure gap FN-7516 needed: the planner overseer's runtime state (FN-7511's
PlannerOverseerMonitor observations, FN-7512/FN-7513's PlannerRecoveryController attempt/pending-
confirmation registries) was engine-side and in-memory only. This task adds a lightweight, serializable
snapshot and surfaces it on the GET /api/tasks payload so task cards can render an indicator without
a second round-trip.
*/
packages/core/src/planner-overseer-state.ts declares the externally-meaningful five-value enum
PLANNER_OVERSEER_STATES (idle | watching | steering | recovering | awaiting-confirmation), the
serializable PlannerOverseerRuntimeSnapshot interface (state, oversightLevel, watchedStage?,
signal?, attemptCount?, attemptLimit?, pendingConfirmation?, observedAt? — watchedStage/
signal are kept as bare string so the engine's stage taxonomy is not pulled into @fusion/core),
and the pure, never-throw derivePlannerOverseerState(input). Precedence: oversightLevel === "off"
or no active observation → "idle"; a pending confirmation → "awaiting-confirmation" (wins over an
in-flight recovery attempt); a recorded recovery attempt → "recovering"; "steer" level → "steering";
otherwise (observe/autonomous watching, no attempts/pending) → "watching".
ProjectEngine.getPlannerOverseerRuntimeSnapshot(taskId) (delegating to the pure
assemblePlannerOverseerRuntimeSnapshot helper in packages/engine/src/planner-overseer-runtime-snapshot.ts
for testability) reads the latest observation from PlannerOverseerMonitor.getObservations(taskId) plus
PlannerRecoveryController.getPendingConfirmations(taskId)/getAttemptCount(taskId, stage), and returns
null (never throws) when there is no active observation for the task. GET /tasks
(register-task-workflow-routes.ts) additively enriches each returned task with plannerOverseerState
when the engine snapshot is non-null — best-effort, mirroring the existing branchProgress enrichment
block right beside it: any engine error is swallowed and the un-enriched list is returned, and tasks with
no active observation omit the field entirely (byte-identical payload). Task.plannerOverseerState? is a
transient field — engine-populated at serialization time, never persisted to the store or task.json.
FN-7516's TaskCard renders the badge/affordance; this task only provides the field, the engine
accessor, and (since FN-7516 had not yet landed consumption) a minimal guarded read plus a
memo-comparator entry so the card repaints on state change.
11) Multi-Project Architecture
Multi-project orchestration spans core + engine.
Core control plane
CentralCore(packages/core/src/central-core.ts) maintains:- Project registry
- Health metrics
- Unified central activity feed
- Global concurrency state
- Node registry (
local/remote) - Per-project/per-node working-directory mappings (
projectNodePathMappings)
Engine orchestration
HybridExecutor(packages/engine/src/hybrid-executor.ts) is the top-level orchestratorProjectManagerinstantiates per-project runtimes and forwards events with project attribution- Runtime startup/update resolves
ProjectRuntimeConfig.workingDirectorythroughCentralCore.resolveLocalProjectWorkingDirectory()/resolveProjectWorkingDirectory(projectId,nodeId)using exactprojectNodePathMappingsrows for the active node; missing mappings are hard failures (no fallback toRegisteredProject.path).
Runtime abstraction
Defined in project-runtime.ts:
ProjectRuntimeinterfaceRuntimeStatusandRuntimeMetrics
Implementations:
InProcessRuntimeChildProcessRuntimeRemoteNodeRuntime
InProcessRuntime.stop() now performs a two-layer executor shutdown: it first aborts detached bash subprocess trees (abortAllSessionBash()), then immediately aborts/disposes in-flight AI task sessions (abortAllInFlight("engine stop")) before entering the drain wait. The post-abort drain window is intentionally short by default (runtimeStopDrainMs, default 2000 ms) and can be set to 0 to skip drain polling in test/CI paths.
IPC protocol (child-process mode)
In packages/engine/src/ipc/ipc-protocol.ts:
- Host commands:
START_RUNTIME,STOP_RUNTIME,GET_STATUS,GET_METRICS,PING - Worker events:
TASK_CREATED,TASK_MOVED,TASK_UPDATED,ERROR_EVENT,HEALTH_CHANGED
Multi-project runtime diagram
HybridExecutor
│
┌───────┴────────┐
│ ProjectManager│
└───┬─────────┬───┘
│ │
┌───────────▼───┐ ┌──▼──────────────┐
│InProcessRuntime│ │ChildProcessRuntime│
│(local process) │ │(fork + IPC host) │
└──────┬─────────┘ └──┬───────────────┘
│ │
TaskStore/Scheduler │
▼
child-process-worker
+ InProcessRuntime
Task Routing Architecture
Task dispatch routing is resolved in two layers:
- Task routing resolution (
packages/engine/src/effective-node.ts)resolveEffectiveNode(task, settings)applies precedence:Task.nodeId→task-overrideProjectSettings.defaultNodeId→project-default- no node set →
local
- Runtime selection (
packages/engine/src/project-manager.ts)child-processisolation always usesChildProcessRuntimein-processisolation usesRemoteNodeRuntimewhen the registered project host node is remote- otherwise uses
InProcessRuntime
Dispatch flow in scheduler
Within Scheduler.schedule() dispatch for todo tasks now runs node gates in this order:
resolveEffectiveNode()chooses routing source (task-override,project-default,local).- If a node is selected,
validateNodeDispatchchecks for a persisted(projectId, nodeId)working-directory mapping (CentralCore.getProjectNodePath). - Missing/blank mappings block dispatch (task stays in
todo) and logExecution blocked: project has no path mapping for node <id>. - Only after mapping validation passes does
applyUnavailableNodePolicy()evaluate node health and optionalfallback-localbehavior.
This preserves a clear separation between configuration correctness (mapping exists) and runtime health/failover policy.
Unavailable-node policy
unavailableNodePolicy is a validated/stored project setting (block default, fallback-local allowed) and is enforced during scheduler dispatch when both conditions are true:
- effective routing selected a remote node, and
SchedulerOptions.nodeHealthMonitoris configured.
Behavior summary:
block(default): unhealthy node status (offline,error,connecting) blocks dispatch for that poll cycle and keeps the task intodo.fallback-local: unhealthy remote node reroutes dispatch to local execution (effectiveNodeId: null,effectiveNodeSource: "local").- unknown node health (
undefined) is treated as allow/continue.
Active-task node-override guard
packages/core/src/node-override-guard.ts enforces immutable routing overrides for active tasks:
validateNodeOverrideChange()blocks node override updates while task column isin-progress- returns reason
task-in-progress
TaskStore.updateTask() applies this guard before persisting nodeId changes.
Task commit-association API (GET /api/tasks/:id/commit-associations)
Dashboard session-diff route registration (packages/dashboard/src/routes/register-session-diff-routes.ts) now exposes lineage commit associations for task detail views:
- Route:
GET /api/tasks/:id/commit-associations - Project scoping: uses
getProjectContext(req)so reads are project-aware like adjacent task diff endpoints. - 404 behavior: returns
{ error: "Task not found" }for unknown task ids. - Response contract:
{
"taskId": "FN-1234",
"lineageId": "uuid-or-null",
"associations": [
{
"commitSha": "abc123...",
"commitSubject": "feat(FN-1234): ...",
"authoredAt": "2026-05-11T02:00:00.000Z",
"matchedBy": "canonical-lineage-trailer | legacy-task-id-trailer | legacy-subject | manual-reconciliation",
"confidence": "canonical | legacy | ambiguous",
"taskIdSnapshot": "FN-1234",
"note": "optional reconciliation note"
}
]
}
confidence is a consumer-facing interpretation aid:
canonical= immutable lineage trailer match (highest confidence)legacy= recovered via legacy task-id/subject matchingambiguous= manual reconciliation where historical task-id attribution could be misleading
Commit associations also carry optional additions/deletions shortstat counts captured by merge paths or filled later by the explicit POST /api/command-center/productivity/backfill-loc operator backfill. These nullable fields are the Command Center Productivity LOC source: analytics sum additions + deletions only when at least one in-range row has stats, derive estimated human hours saved as round((additions + deletions) / HUMAN_LINES_PER_HOUR, 1), and preserve the — unavailable sentinel for both LOC and hours saved when all matching rows are NULL so unknown historical data is never rendered as 0. The backfill only touches rows where both columns are NULL; malformed SHAs and commit objects unavailable in the local repo stay NULL, so partial historical coverage remains visible until a real local git object supplies stats. The hours-saved field is a conservative estimate, not exact time tracking.
Command Center Productivity task-duration stats use task rows, not commit rows: done tasks completed in range (executionCompletedAt) contribute when cumulativeActiveMs > 0. The aggregator computes completed count plus average, median, p90, and total active execution milliseconds; if no qualifying task exists, the duration metrics use the same unavailable — contract instead of reporting 0.
Done-task files-changed sources of truth
Done-task file-count surfaces intentionally distinguish three data sources:
/api/tasks/:id/diff(lineage union, authoritative landed diff)- This route aggregates the task's landed lineage and returns
stats.filesChangedplus the file list used by the Changes tab. - Done-task cards and diff views should treat this as the canonical "files changed" source.
- This route aggregates the task's landed lineage and returns
task.mergeDetails.filesChanged/insertions/deletions(final-commit shortstat)- These fields describe only the recorded final merge/squash commit shortstat.
- On done cards,
mergeDetails.filesChangedis only a transient loading placeholder until/api/tasks/:id/diffresolves.
task.mergeDetails.landedFiles(recorded committed file list)- When live diff stats are unavailable, done-task cards may fall back to the recorded landed file list length.
- This remains committed-diff metadata; transient executor worktree captures are not surfaced as a done-card files chip.
task.modifiedFiles(execution-time worktree snapshot)- Captured in the executor worktree during implementation (
git diff <base>..HEADsnapshot), before final merge outcomes are known. - Can include transient/superset paths that did not land; done-task cards must not use it for the files-changed chip.
- Captured in the executor worktree during implementation (
FN-4647 decision: mergeDetails shortstat fields remain commit-level metadata. No additional persisted lineage-level summary field is introduced at this time; done-task landed totals continue to be served live via /api/tasks/:id/diff.
Task branch field plumbing (branch + baseBranch)
Task create/update now preserves both branch fields end-to-end:
- Request validation/normalization (dashboard route layer):
packages/dashboard/src/routes/register-task-workflow-routes.tsPOST /api/tasksacceptsbranchandbaseBranchas string values.PATCH /api/tasks/:idacceptsbranchandbaseBranchasstring | nullfor PATCH-style updates, trims string inputs, and treats empty strings as clears (null).- Route handlers reject non-string/non-null payloads with
400.
- Durable persistence (core store layer):
packages/core/src/store.tsTaskStore.createTask()persists bothbranchandbaseBranchon task creation.TaskStore.updateTask()preserves existing PATCH semantics where explicitnullclears either field.- Fields round-trip through JSON and SQLite persistence via the shared task contract in
packages/core/src/types.ts.
Routing activity visibility
Routing decisions are visible in task activity/log entries and in task metadata (effectiveNodeId, effectiveNodeSource), and surfaced in dashboard routing UI + fn task show output.
See also:
- Settings Reference → Node Routing settings
- Task Management → Node Routing
- Multi-Project → Node Routing
12) Settings Hierarchy
Settings are split by scope.
Global scope
- File:
~/.fusion/settings.json - Managed by
GlobalSettingsStore(packages/core/src/global-settings.ts) - Examples:
themeMode,colorTheme, default model/provider, notification preferences (ntfy*legacy fields andnotificationProviders)
Project scope
- Stored in per-project config (
configtable + compatibility file.fusion/config.json) - Includes engine/runtime controls (
maxConcurrent,autoMerge, worktree and workflow behavior, etc.)
Merged view
Settingscombines global + project values- Defaults in
DEFAULT_GLOBAL_SETTINGSandDEFAULT_PROJECT_SETTINGS - Scope key lists in
GLOBAL_SETTINGS_KEYSandPROJECT_SETTINGS_KEYS
Model controls
- Per-task model overrides on task fields:
modelProvider/modelIdvalidatorModelProvider/validatorModelIdplanningModelProvider/planningModelIdthinkingLevel
- Reusable presets via
ModelPreset - Agent prompt template overrides via
agentPrompts
13) Git Integration
Git behavior is implemented primarily in engine executor/merger + dashboard/CLI git APIs.
Git REST API endpoints
Git dashboard routes are registered in register-git-github.ts.
Stranded refinement affordance (Lane C)
Fusion adds an operator-first API surface to diagnose and expedite refinement tasks that remain in Planning (triage) without bypassing plan/approval gates from FN-4657.
| Method | Path | Description |
|---|---|---|
| GET | /api/tasks/stranded-refinements | List stranded refinement diagnostics (sourceType=task_refine, column=triage, paused!=true) with reasons and recommendation. Supports ?freshnessMinutes= (1-1440). |
| GET | /api/tasks/:id/stranded-refinement | Return one refinement diagnostic row plus PROMPT.md presence and dependency-resolution status. |
| POST | /api/tasks/:id/expedite-refinement | Request bounded expedite for a triage refinement. Clears nextRecoveryAt for stale/backoff rows; returns requiresOperatorAction for awaiting-approval/failed/stuck-killed without mutating status. |
Stranded reasons are: untriaged-stale, awaiting-approval, failed, stuck-killed, and recovery-backoff.
Non-bypass guarantees:
- Expedite never moves a task directly to
todo. - Expedite never fabricates/writes
PROMPT.md. - Expedite never clears
awaiting-approval(or failed/stuck statuses). POST /api/tasks/:id/approve-planremains the only route that clearsawaiting-approvaland promotes approved plans.
This complements FN-4657's durable triage routing fix; it does not replace triage specification or plan-approval policy.
| Method | Path | Description |
|---|---|---|
| GET | /api/git/remotes | List GitHub remotes parsed from git remote -v output. |
| GET | /api/git/remotes/detailed | List all remotes with fetch/push URLs. |
| POST | /api/git/remotes | Add a new remote (name, url). |
| DELETE | /api/git/remotes/:name | Remove an existing remote by name. |
| PATCH | /api/git/remotes/:name | Rename a remote (newName). |
| PUT | /api/git/remotes/:name/url | Update a remote URL. |
| GET | /api/git/status | Return branch, short commit, dirty state, and ahead/behind counts. |
| GET | /api/git/commits | Return recent commits (?limit= capped at 100). |
| GET | /api/git/commits/:hash/diff | Return commit stat + patch for a validated commit hash. |
| GET | /api/git/commits/ahead | Return local commits ahead of upstream (empty when upstream is not configured). |
| GET | /api/git/remotes/:name/commits | Return commits for a remote ref (?ref= optional, ?limit= max 50, with remote HEAD/main/master fallback resolution). |
| GET | /api/git/branches | List local branches with current/tracking metadata and last commit date. |
| GET | /api/git/branches/:name/commits | Return commits for a branch (?limit= default 10, max 100). |
| GET | /api/git/worktrees | List worktrees with branch/path metadata and task association when available. |
| POST | /api/git/branches | Create a branch from HEAD or an optional base ref. |
| POST | /api/git/branches/:name/checkout | Checkout an existing branch. |
| DELETE | /api/git/branches/:name | Delete a branch (?force=true allows deleting unmerged branches). |
| POST | /api/git/fetch | Fetch from a remote (remote defaults to origin). |
| POST | /api/git/pull | Pull the current branch (rebase boolean optional) and return structured conflict metadata on merge/rebase conflicts. |
| POST | /api/git/push | Push the current branch. |
| GET | /api/git/stashes | List stash entries. |
| GET | /api/git/stashes/:index/diff | Return stash stat + patch for a validated stash index (404 when missing). |
| POST | /api/git/stashes | Create a stash with an optional message. |
| POST | /api/git/stashes/:index/apply | Apply a stash by index (optionally drop after apply via drop: true). |
| DELETE | /api/git/stashes/:index | Drop a stash by index. |
| GET | /api/git/diff | Return unstaged working-tree diff text. |
| GET | /api/git/diff/file | Return staged or unstaged diff for one file (path + `staged=true |
| GET | /api/git/changes | Return staged and unstaged file change summary. |
| POST | /api/git/stage | Stage specified files. |
| POST | /api/git/unstage | Unstage specified files. |
| POST | /api/git/commit | Create a commit from staged changes with a required message. |
| POST | /api/git/discard | Discard working-tree changes for specified files. |
GitHub tracking lifecycle (task creation + existing-task edits)
Fusion attempts GitHub issue creation when per-task tracking is explicitly enabled (task.githubTracking.enabled === true) and the task is currently unlinked. This runs via a universal post-create hook registered at process startup by dashboard/CLI entrypoints (including engine startup paths) — so it fires for every task-creation path: HTTP routes, pi extension tools (fn_task_create, fn_task_import_github*, fn_delegate_task), CLI commands (fn task add, fn task duplicate, fn task refine), mission/feature triage, automation create-task steps, agent-driven delegation, and routine/cron-created tasks. The hook is best-effort and failures are swallowed with a warning, so task creation is never blocked by GitHub availability. The existing inline maybeCreateTrackingIssue calls in route handlers remain as redundant safety nets and are idempotent (issue_already_linked).
When Fusion does create a tracking issue, it formats the title as [FN-XXXX] Task title and sends a short plain-text body prefixed with Fusion task: FN-XXXX. The body is a bounded summary snippet (not full task prompt content), and Fusion does not include any hyperlink back to the local dashboard. Manual unlink requests (githubTracking.issue: null) do not recreate an issue in that same PATCH request, and disable updates do not create issues. Auth resolution remains strict-mode (token vs gh-cli) but now defensively accepts merged settings shapes where auth keys may appear in global-merged payloads.
When a tracked task later moves to in-progress or done, Fusion posts one short lifecycle comment on the linked tracking issue. These comments always include the Fusion task ID as plain text (Fusion task: FN-XXXX) and never link back to the Fusion app. The in-progress comment stays plain-text; the done comment can additionally include GitHub commit/PR markdown links plus branch, file-change, and merge-timestamp details when that merge context is available on the task. No comment is posted for any other transition.
When a tracked task transitions into done, Fusion closes the linked GitHub issue with state_reason: completed. When a task transitions out of done into any active column (triage, todo, in-progress, in-review), Fusion reopens it with state_reason: reopened. When a tracked task is permanently deleted, Fusion closes the linked GitHub issue with state_reason: not_planned (or deletes it when explicitly requested). Delete-path outcomes emit a github-issue:action store event payload ({ taskId, action, owner, repo, number, outcome, error? }) so success/failure remains observable even after the task row is gone and task activity logs are unwritable. Moves from done to archived leave the issue closed. Tasks without githubTracking.enabled or without a linked issue are unaffected, and GitHub failures are logged to task activity without blocking the move.
The GitHub tracking state listener now attaches to every registered project store (including projects registered after startup), and each store gets a one-time asynchronous startup reconciliation sweep. That sweep scans bounded done tasks with tracking enabled and closes any linked GitHub issue still open, so missed/momentary failures are caught up without blocking server boot. Source-imported GitHub issues can also be auto-closed when githubCloseSourceIssueOnDone === true: GitHubSourceIssueCloseService listens for task:moved transitions into done and closes open task.sourceIssue links, while GitHubTrackingReconciler.reconcileSourceIssues performs a parallel startup sweep over done tasks with GitHub source metadata to close any source issues still open.
Worktree model
- Each active task runs in isolated worktree under
.worktrees/* - Executor creates branches like
fusion/{task-id}(executor.ts) WorktreePoolcan recycle idle worktrees when enabled
WorktreeBackend abstraction
- Backend contract:
WorktreeBackend(packages/engine/src/worktree-backend.ts, re-exported viapackages/engine/src/worktree-pool.ts). - Implementations:
NativeWorktreeBackend(Fusion-managedgit worktreeflow) andWorktrunkWorktreeBackend(delegates to the externalwtCLI from max-sixty/worktrunk). - Backend selection is driven by
worktrunk.enabled; when enabled, worktrunk-managed layout overridesworktreesDirfor delegated operations. - Worktrunk layout is authoritative on create: after
wt switch --create, Fusion resolves the actual registered worktree path viagit worktree list --porcelainand uses that path (instead of assumingresolveTaskWorktreePathalignment). - Delegated operation surface in the interface:
create,sync,prune,remove(plus backend path resolution viaresolveWorktreePath). - Executor acquisition paths (
worktree-acquisition.ts) resolve backend selection centrally, so create flow stays backend-agnostic above the pool/acquisition layer. - Worktree removal is backend-mediated across merger, self-healing, worktree-pool, executor, and step-session cleanup paths via
removeWorktree(...)(WorktreeBackend.remove()). Native removal first runsgit worktree remove --force; when git reports recoverable on-disk cleanup failures such asDirectory not empty,failed to delete, or modified/untracked content, it falls back to async filesystem removal andgit worktree prune(pruneWorktreeAdminEntries) so both the directory and dangling admin entry are cleared. - Self-healing is worktrunk-aware for failure recovery: tasks paused with
pausedReason: "worktrunk_operation_failed"are explicitly skipped in reclaim sweeps (self-healing.ts) until operator intervention. - Failure contract: delegated worktrunk errors preserve stderr context (
WorktrunkOperationError) and are handled byworktrunk.onFailure—"fail"pauses the task, while"fallback-native"retries on the native backend and emits one-shot fallback telemetry. - Install contract: Fusion only auto-installs from a source-of-truth manifest. The shipped placeholder manifest intentionally stays in
upstream-pending-verificationuntil a human verifies upstream asset URLs and checksums, so install attempts fail closed rather than guessing release metadata.
Stale index.lock recovery on worktree create
- Native worktree create paths now classify
git worktree addfailures containing.../index.lock: File existsbefore falling back to generic branch-conflict handling. - Classifier gates are deterministic: the lock must exist, be older than the stale threshold (default 30s), not be owned by a live
activeSessionRegistrysession, and resolve to a normalized lock/worktree path. - If classified
stale, Fusion removes the lock and retries create exactly once. - If staleness cannot be proven, lock removal is refused and the flow raises
StaleWorktreeIndexLockErrorso task failure messaging can escalate with manual remediation guidance. - Run-audit events emitted by the create path:
worktree:stale-lock-detected,worktree:stale-lock-recovered,worktree:stale-lock-recovery-failed,worktree:stale-lock-refused,worktree:stale-registration-detected,worktree:stale-registration-recovered,worktree:stale-registration-recovery-failed.
Branch-conflict inspection and auto-reclaim
inspectBranchConflictclassifies branch collisions asstale,stale-resolved,reclaimable, orlive-foreign.- Dispatch preflight (
acquireTaskWorktree/executor) now auto-reclaimsreclaimableself-owned conflicts and emitsbranch:auto-reclaimrun-audit events with task/branch/worktree/tip/stranded-commit metadata. - Self-healing also runs
reclaimSelfOwnedBranchConflicts()across idletodo+in-progresstasks; successful reclaim keeps stranded commits intact and failed reclaim escalates toin-review/failedwithbranch-conflict-unrecoverable. - Cross-task collisions (
live-foreign) remain manual by design; operators resolve conflicting branches/worktrees with standard git tooling, then retry the task.
Merge strategies
- Setting type:
MergeStrategy = "direct" | "pull-request"(types.ts) aiMergeTask()inmerger.tsperforms merge flow- FN-5782 wires branch-group routing into merge target resolution: tasks with
branchContext.assignmentMode === "shared"and a resolvablebranch_groupsrow merge ontobranch_groups.branchName(mergeTarget.source = "branch-group-integration") instead of the project default branch; ungrouped andper-task-derivedtasks keep the existing direct-to-default path unchanged. Merge emitsmerge:branch-group-routedaudit telemetry for routed members. FN-5846 extends the same contract to deterministic/self-healing finalize paths (recoverAlreadyMergedReviewTasks, interrupted/deadlock/misbound finalizers, and themergeConfirmedfast path): a resolvable shared member is re-routed to the group branch before reachability checks,mergeTargetSource/mergeTargetBranchare stamped by the finalizer,recordBranchGroupMemberLandedis called, and a defensive audit event is emitted if a path would otherwise evaluate the member against the project default branch. FN-5788 adds a callable promotion-decision hook (evaluateBranchGroupPromotion) andmerge:branch-group-promotion-gatedtelemetry; FN-5830 lands the completion gate + promotion machinery viaevaluateBranchGroupCompletionand idempotentpromoteBranchGroup(single shared→default merge/PR with finalized status and PR tracking persistence). - FN-5279 adds
mergeIntegrationWorktreefor auto-merge only. Defaultreuse-task-worktreehands merger ownership from executor to the merger inside the task worktree after five gates (clean tree, expected branch, no live executor session, canonical branch/worktree binding, lease handoff). Refusals emitmerge:reuse-handoff-refused, leave the task inin-review, and do not silently fall back to project-root merge mode.cwd-integration-branchis the explicit opt-in project-root path;cwd-mainis a deprecated alias normalized tocwd-integration-branch. Integration-branch defaults across merger and self-healing flows are resolved dynamically viaresolveIntegrationBranch(rootDir, settings)(integrationBranch→baseBranch→origin/HEAD→main). Whenworktrunk.enabled=true, worktrunk-managed merge/worktree behavior still wins and the handoff path emits a defer event instead of taking over. FN-5363 tightens this path:acquireMergeQueueLease({ targetTaskId })is strict (no queue-head fallback), merge queue rows are enqueue/lease-gated toin-reviewtasks, and stale non-review rows are auto-cleaned (including onin-reviewcolumn exit when leases are absent or expired). FN-5353 extends the same contract: merger self-enqueues the target before strict target leasing, null target leases are surfaced asmerge:reuse-handoff-refusedwithreason: "target-not-queued",acquireReuseHandoffhard-refusesreason: "worktree-equals-project-root", andresolveMergeIntegrationRootreturns a missing-worktree sentinel (rootDir: "") so reacquire executes before any reuse gate can misroute against project root. FN-6278 adds a stable cwd preflight before root-derived git spawns: inreuse-task-worktreemode, an empty, missing, incomplete, or de-registeredtask.worktreeis repaired/reacquired before the first spawn, whilecwd-integration-branchremains a no-op project-root path. FN-5351 adds a production verification trail for integration-branch invariants:merge:integration-worktree-state,merge:cwd-integration-fallback-refused, andmerge:integration-ref-advance. merger.tsalso exposes a test-only__test__helper object for internal merger unit/integration coverage (for example autostash orphan cleanup behavior)- Supports workflow-step execution after merge (post-merge phase)
- Deterministic verification now runs a bootstrap preamble (
node scripts/ensure-test-artifacts.mjs) before configuredtestCommand/buildCommand, then self-heals ViteFailed to resolve entry for package "@fusion/..."workspace-entry faults by rebuilding the missing package once and retrying the failed command. If that retry still reports the same missing-entry fault, merger raises a typed environment fault andProjectEngineleaves the task in-review (no verificationFailureCount increment or in-progress bounce) so the next recovery sweep can retry after other runs rebuild artifacts. - FN-4232/FN-4605 extends that bootstrap to cover stale dist consumers comprehensively:
@fusion/{core,dashboard,engine,plugin-sdk}and@fusion-plugin-examples/{dependency-graph,hermes-runtime,openclaw-runtime,paperclip-runtime}are checked for missing/stale artifacts (staleness compares newestsrc/mtime against the oldest requireddist/artifact mtime for configured packages). Package-levelpretesthooks in@fusion/dashboardand@fusion-plugin-examples/dependency-graphinvoke the same bootstrap for filtered test runs. - When stale or missing artifacts are found, the preamble logs
[test-bootstrap] rebuilding workspace dist artifacts (missing or stale): ...; if rebuild fails, remediation now prints exact artifact-path diagnostics ([test-bootstrap] missing: .../[test-bootstrap] stale (src newer than dist): ...) plus the FN-4232/FN-4605 reference and recovery commands.
Finalize integrity gate
- Finalize-to-done now runs an ownership classifier with three outcomes:
owned-commit(task trailer/subject commit proven landed on merge target),proven-no-op(zero-ahead branch plus start point reachable from target), andunproven(missing ownership evidence, including foreign start-point inheritance). owned-commitandproven-no-opcan finalize.proven-no-opexplicitly reconciles metadata by clearing staletask.modifiedFilesand stampingmergeDetails.noOpMerge=truewithlandedFiles: [].noCommitsExpected === truetasks have an additional no-op finalize guard (FN-6461): if a zero-net-change lane reaches finalize with step evidence showing incomplete/skipped work outweighing completed work (incompleteCount >= doneCount, with at least one step), the task must not move todone. Merger and self-healing writetask.error, log an operator-visible reason, emittask:no-commits-finalize-blocked-incomplete-steps, and move the task back totodowithpreserveProgress: true. All-done no-commits tasks, mostly-done tasks with only a minor skipped tail, zero-step tasks, ordinary tasks, and no-commits tasks with real landed changes keep the existing finalize behavior.unprovenno longer silently completes as done; merger/self-healing emittask:finalize-unproven-blockedaudit events and auto-retry by requeuing totodofor a fresh execution pass.- Historical cleanup is additive:
reconcileDoneTaskIntegrity()scans done tasks missingmergeDetails.commitShabut still carryingmodifiedFiles, then either recovers owned commit metadata, clears no-op stale files, or emitstask:integrity-warningwithout regressing done tasks back to review.task:integrity-warningis transition-only on the persisted warning reason: first warning emits once, repeated sweeps with the samemergeDetails.integrityWarning.reasonstay silent, and a new warning reason emits again. - This integrity gate complements FN-4646 landed-file capture (metadata truth source) and FN-4647 dashboard labeling (UI presentation); gate enforcement is in merger/self-healing, while display semantics remain UI-owned.
Autostash lifecycle
- Before destructive merge prep,
stashUnrelatedRootDirChanges()snapshots dirty root-dir edits intofusion-merger-autostash:<taskId>:<ts>(plus optionalrace-rescue-*stashes for late writes). - During verification-fix finalize fallback,
commitOrAmendMergeWithFixes()now snapshots any still-dirty root-dir state intofusion-merger-autostash:<taskId>:finalize-reset:<ts>before its hard reset/clean recovery path, preventing silent mixed-worktree leftovers from being discarded. - In
aiMergeTaskcleanup,restoreUnrelatedRootDirChanges()attempts restore; thendropAutostashHandle()runs on every terminal path and drops primary + race-rescue stashes when restoration succeeded or content is no longer live. - If restore fails with unresolved developer work (
failed/conflict-needs-manual), cleanup uses a keep-if-live rule so still-live stashes are preserved for manual recovery. sweepAutostashOrphans()keeps its subsumed/live classification for prior-run leftovers, andsweepStaleAutostashes()adds an age-based backstop that dropsfusion-merger-autostash:*entries older than the configured threshold (default 24h).
Stash Recovery surface
- Orphans are typically residual
fusion-merger-autostash:*entries from older merge runs where restore could not safely complete. - Existing task-scoped surfacing remains: merger warnings still log to
mergerLog.warnandstore.logEntryfor the active merge task. - New global surfacing adds
merger:autostashOrphansTaskStore events, engine helpers (listAutostashOrphans,getAutostashDiff,applyAutostashBySha,dropAutostashBySha), and dashboard API endpoints under/api/stash-recovery/*. merger:autostashOrphansrecords now include provenance fields (sourcePhase,detectedByTaskId,detectedAt) so operators can attribute leftovers to the merge phase and surfacing task/session.ProjectEngineconsumes the orphan event stream and auto-creates deduplicatedsourceType: "recovery"follow-up tasks for live leftovers, so repeated detections do not spam the board.- Dashboard operators inspect orphan counts, review diffs, apply stashes, and explicitly drop entries with confirmation from Git Manager → Recovery; the recovery controls are part of Git Manager rather than a standalone top-level dashboard view.
- Decision: recovery stays user-gated. Auto-apply was rejected because clean-tree checks are racy, stash placement is ambiguous after source task merge, and apply conflicts can produce hard-to-untangle state.
sweepAutostashOrphanscontinues to auto-drop only subsumed entries while preserving live developer work.
Automated follow-up dedup (FN-5232)
- Engine-side automated follow-up creation now routes through
packages/engine/src/verification-followup-dedup.tsinstead of callingTaskStore.createTask()directly from recovery/eval/PR-comment paths. - Verification-style follow-ups stamp
sourceMetadata.verificationFailureSignature, a deterministic SHA-256 digest over{ lane, sorted failing test basenames }(orlane|no-fileswhen no files can be parsed). Open matches reuse the existing task and append at most one[verification recurrence]log entry per hour; closed/done/archived matches within 24 hours create a fresh task withsourceMetadata.supersedesTaskIdpointing at the prior task. - Non-verification automated follow-ups can supply
extraMatchKeys(for example evalsuggestionIdor PRprNumber) so dedup stays deterministic even when no test-file signature exists. - This layer composes with FN-4892 same-agent intake dedup in
@fusion/core: engine dedup prevents repeated automated recovery spam up front, while store-side same-agent dedup still archives newly-created near-duplicates whensourceAgentIdis present. - Run-audit emits
verification:followup-createdandverification:followup-dedupeddatabase events with hashed signature metadata only; no raw stdout/stderr or secret material is persisted in the audit payload.
Conflict handling
merger.ts includes conflict classification and auto-resolution helpers:
- lock files (
LOCKFILE_PATTERNS) - generated files (
GENERATED_PATTERNS) - whitespace-trivial conflicts
PR and badge integration
- Engine PR monitor:
pr-monitor.tsandpr-comment-handler.ts - Dashboard GitHub APIs + webhook route in
routes.ts - Badge snapshots are streamed via
/api/wsanduseBadgeWebSocket.ts
PR checks API
- Tasks now support multiple linked PRs via
Task.prInfos(canonical list).Task.prInforemains as a back-compat primary mirror and should be treated asprInfos[0]when present. GET /api/tasks/:id/pr/checksreturns live PR check data for the task PR:checks: PrCheckStatus[](required and non-required checks)rollup: "success" | "pending" | "failure" | "unknown"derived from required checks only (merge-readiness semantics)lastCheckedAt: stringtimestamp for the fetch
- Route behavior matches PR refresh safeguards:
404when the task has no associated PR429whengithubRateLimiterdenies the repo request window, includingretryAfter/resetAtdetails
PR review ingestion and auto-transition
POST /api/tasks/:id/pr/refreshand backgroundrefreshPrInBackground()refresh every linked PR (prInfos) in bounded batches, return a primary entry plusallentries, then sync review data into task comments with idempotency on(source, externalId).- Synced comment sources are
github-reviewandgithub-review-comment; refinement auto-creation is skipped for these external comments. GET /api/tasks/:id/pr/reviewsreturns the live GitHub review snapshot plus the Fusion-threaded stored review comments for the task.POST /api/tasks/:id/pr/:number/unlinkremoves only the task↔PR link (does not close the PR) and returns the updatedprInfoslist.- When review decision transitions to
CHANGES_REQUESTEDwhile the task is inin-review, Fusion auto-moves the task back totodowithpreserveProgressandpreserveWorktree, writes areview-feedbacktask document, and records run-audit mutationpr:changes-requested-auto-move.
14) Key Design Decisions
-
SQLite + WAL for local-first reliability
- Chosen for simple deployment and strong transactional behavior
- WAL mode enables concurrent readers/writers with low ops overhead
-
Hybrid persistence (DB + filesystem blobs)
- Structured metadata in SQLite, large text/artifacts in task directories
- Keeps DB efficient while preserving inspectable task artifacts
-
Git worktree isolation as core execution primitive
- Prevents cross-task interference
- Makes concurrent task execution safer
- Enables deterministic cleanup/retry/recovery
-
Agent-as-tool-caller pattern
- Engine tools (
task_update,task_log,review_step,spawn_agent, etc.) create explicit, auditable state transitions - Prompts are role-specific (
TRIAGE_SYSTEM_PROMPT,EXECUTOR_SYSTEM_PROMPT, etc.)
- Engine tools (
-
Separation of real-time channels by concern
- SSE for broad board/missions/session state updates (
/api/events) - Dedicated badge WebSocket (
/api/ws) for lightweight PR/issue badge snapshots
- SSE for broad board/missions/session state updates (
-
Multi-project control plane with runtime abstraction
CentralCoredecouples registry/health/concurrency from per-project executionProjectRuntimeinterface allows multiple isolation strategies (in-process, child-process, remote node)
Source Map (quick navigation)
- Core exports:
packages/core/src/index.ts - Engine exports:
packages/engine/src/index.ts - Dashboard exports:
packages/dashboard/src/index.ts - CLI entry:
packages/cli/src/bin.ts - Pi extension:
packages/cli/src/extension.ts - Runtime abstraction:
packages/engine/src/project-runtime.ts - Multi-project orchestrator:
packages/engine/src/hybrid-executor.ts - Task routing resolver:
packages/engine/src/effective-node.ts - Node override guard:
packages/core/src/node-override-guard.ts
PR-backed Review tab state and same-task revision flow
Pull-request auto-merge tasks persist structured review metadata on the task as reviewState.
reviewState.source:"pull-request"or"reviewer-agent"reviewState.summary: review decision, reviewer states, required checks, and blocking reasonsreviewState.items: normalized per-review/per-comment records keyed by stable GitHub IDsreviewState.addressing: per-item lifecycle records (queued,in-progress,addressed,failed) with timestamps and optionalstale
API flow:
GET /api/tasks/:id/reviewreturns canonicalTaskReviewData(mode,refreshable,fetchedAt,summary,items[]) for modal load.POST /api/tasks/:id/review/refreshreturns the sameTaskReviewDatashape after re-fetching source data (GitHub PR mode or reviewer-agent direct mode).POST /api/tasks/:id/review/addressrecords selected review items as queued, appends a deterministic**PR Review Revision Request**steering comment payload, clears transient failure/session state, and requeues the same task totodofor same-task revision.
UI contract boundary:
PrPanelowns branch/PR lifecycle metadata and automation status.TaskReviewTabowns review decisions, detailed review items, selection, and addressing progress.TaskCommentsremains separate for general discussion.
Retry observability
Fusion derives a per-task retrySummary at read time by aggregating retry counters (stuck-kill, recovery, task_done, workflow-step, verification, post-review-fix, merge-conflict bounce, branch-conflict recovery, reviewer context retry, reviewer fallback retry). The engine emits a structured retry-burned log channel with { taskId, agentId, role, category, attempt, total, breakdown } so token-cost telemetry can correlate retry burn with spend.
Project settings expose per-category caps (maxBranchConflictRecoveries, maxReviewerContextRetries, maxReviewerFallbackRetries) plus a master cap (maxTotalRetriesBeforeFail). When a cap is exceeded, engine code throws RetryStormError; executor terminal failure handling serializes this into task.error so dashboard surfaces can render structured failure details.
Lifecycle invariants
This section preserves the detailed lifecycle/self-healing contracts that were formerly in AGENTS.md.
- Orphan
fusion/*branches: branches with zero unique commits vsmainare pruned bycleanupOrphanedBranches(branch:orphan-prune). Branches with unique commits are not auto-rescued; operators inspect and clean them manually via standard git tooling (git branch -D,git worktree remove, etc.). - Stale active branches: self-healing's
reclaim-stale-active-branchesstage prunes afusion/<task-id>branch with zero unique commits when no usable worktree mapping exists, then clearstask.branch/task.worktree/task.baseCommitSha. It must defer reclaim (emitbranch:stale-active-reclaim-deferred) when the task worktree is inactiveSessionRegistry, whenexecutionStartedAtis withinSTALE_ACTIVE_BRANCH_EXECUTION_GRACE_MS(10 minutes), or when the mapped worktree has uncommitted changes. - Worktree metadata reconcile ordering (FN-4962):
reconcile-task-worktree-metadatamust run beforereclaim-stale-active-branches; staletask.worktreemetadata is rebound to livefusion/<task-id>worktrees when present (task:auto-recover-worktree-metadata-rebound) or cleared (task:auto-recover-worktree-metadata-cleared) when absent. - Completion fan-out is synchronous:
SelfHealingManager.reconcileCompletedTask()runs onin-review → done. Downstream staleblockedBylinks and residualfusion/<task-id>branch/worktree artifacts are reconciled immediately, not on a periodic sweep. - In-review stall deadlock: identical stalls (same code + reason) repeated past
inReviewStallDeadlockThreshold(default 3) auto-pause withpausedReason: "in-review-stall-deadlock"andstatus: "failed". User-initiated retry paths (dashboard retry,fn_task_retry, and CLItask retry) clear that automatic deadlock pause so the retry can execute, but they never override explicit/manual pauses or unrelated automatic pause reasons. - Restart recovery:
RestartRecoveryCoordinatorclassifies interruptedin-progressruns. Unusable-worktree session-start failures (missing,incomplete,unregistered git worktree) are recoverable; retries are capped atMAX_WORKTREE_SESSION_RETRIES=3before escalating. - Executor pre-session liveness gate (FN-4935/FN-6861): the gate now skips for fresh acquisitions (
acquisition.source === "fresh"), emits structurednot_usable_task_worktree:<classification>diagnostics (including canonicalized registered-path snapshots) and aworktree:incomplete-detectedaudit event withsource: "executor-liveness-gate", while preserving the existingtaskDoneRetryCount/MAX_TASK_DONE_REQUEUE_RETRIESrequeue contract. The project repo root is never a usable task worktree even though it is a legitimately registered Git worktree;classifyTaskWorktreereturnsrepo-rootfor canonical root-equal paths, and resume acquisition treats that as self-healable stale metadata by clearingtask.worktreeand creating a fresh checkout under the configured worktrees directory. FN-5772 adds a bounded nested-root self-heal: whentask.worktreepoints at a strict descendant of a registered worktree root inside the configured worktrees dir, executor re-anchorstask.worktreeto the git top-level, emitsworktree:reanchored(fromPath,toPath,source), and proceeds; repo-root/outside-dir/unregistered top-level mismatches still fail. FN-4651worktreeSessionRetryCountremains scoped to the in-review/session-start recovery path. - Stale self-owned active-session reconcile on conflict cleanup (FN-4973): when executor worktree-conflict cleanup finds only a same-task stale
activeSessionRegistryentry and no live in-memoryactiveWorktreesbinding for that task/path, it must unregister the stale entry beforeremoveWorktree(plus one-shot backstop reconcile on same-taskActiveSessionWorktreeRemovalErrorraces). Foreign-task entries remain protected by FN-4811 and must never be reconciled by the requesting task. - Same-task stale removal canonical helper (FN-5346): executor same-task cleanup paths now route pre-removal reconciliation through
reconcileSelfOwnedActiveSessionForRemoval(via executor helper wiring), so stale self-ownedactiveSessionRegistryresidues are cleared only when no live in-memory binding exists, while FN-4811 foreign-owner refusals and live-owner protections remain intact. - Live worktree conflict fallback (FN-7385): when branch-conflict cleanup is refused because the conflicting path belongs to an active executor/workflow-step session (including same-task process-active sessions, foreign
activeWorktreesowners, or DB-only live owners), executor acquisition must preserve that path and retry with a fresh generated worktree plus bounded sibling branch. Stale/non-live conflicts still use the existing cleanup/reclaim path, and unrecoverable non-active cleanup failures remain actionable errors. - Task title/ID drift (FN-4898): active and archived title writes normalize foreign embedded
FN-NNNtokens viapackages/core/src/task-title-id-drift.ts. Empty placeholder groups ((),[],{}) left behind by token stripping are also removed in bothnormalizeTitleForTaskIdandsanitizeTitle(FN-4978). Lineage is preserved insourceParentTaskId/ description markers, not title embeds. FN-5077 extends drift normalization to reject dangling-connector fragments ("Close as duplicate of") so token-stripped residuals never persist as task titles. - PR-conflict reclaim wiring (FN-4763): GitHub PR refresh now persists normalized
prInfo.mergeableconflict state and, when conflicting, funnels tasks into self-healing’s existing reclaim machinery (reclaimPrConflictForTask/reclaim-pr-conflictsstage) so branch-conflict handling stays centralized with existinginspectBranchConflictoutcomes and unrecoverable pause semantics. PR refresh also capturesprInfo.conflictDiagnostics(conflicting files + suggested local recovery commands) for dashboard surfacing. - Worktrunk-managed lifecycles: when
worktrunk.enabled, self-healing defers prune/idle/worktree-cap sweeps to the worktrunk backend; branch-level stale/ conflict reclaim stays native. Orphanfusion/*branches are operator-managed via standard git tooling (no auto-rescue task filing). - Post-finalize verification no-op (FN-4944): when auto-merge receives a delayed
VerificationErrorafter a task is alreadydonewithmergeDetails.mergeConfirmed === true(already-on-main fast-path), it must log one[verification] ... no actiondiagnostic and must not bounce the task back toin-progress/merging-fix. Defense-in-depth now re-checks the done+mergeConfirmed condition immediately before each verification-failure status write site, and emitstask:post-finalize-verification-no-opdatabase audit events with failure metadata for forensics. - Transient auto-merge retry classification (FN-5697): non-conflict auto-merge errors now run through
isTransientError(...)before terminal parking. Transient provider/network failures (for exampleThis operation was aborted,socket hang up, andserver_errorpayloads) are retried with bounded exponential backoff (5s/10s/20s) andstatus=nullfor both direct and pull-request merge strategies; onceMAX_AUTO_MERGE_TRANSIENT_RETRIESis exhausted, tasks are parkedin-review/failedwith explicit transient-exhaustion logs. - Merge-seam abort provenance (FN-6568/FN-6735): workflow graph merge-node failures must not be classified as pause/resume aborts merely because the merge seam hard-canceled an in-flight session.
TaskExecutortracks paused-abort provenance separately (global-pause,merge-seam,hard-cancel); genuine user/global pauses still preserve FN-6478/FN-5147 parking, while non-paused merge-seam graph failures (merge,requestMerge, built-in merge-region node ids,merge-manual-hold, andmerge-retry) route back into the bounded auto-merge retry path instead of being parkedstatus:"failed"withmergeRetries=NULL. Benign pause/resume aborts at these seams are also retryable when the task is alreadyin-review, has no durable failure/status, has not confirmed a merge, remains auto-merge eligible (or is a shared-branch local integration), and has merge retries remaining. Conflict/contamination/foreign-work/retry-exhaustion values,autoMerge:falsehuman-gated review tasks, pre-existing failures, global/user pauses, and post-confirmation partial landings remain terminal operator-action evidence. - Worktree pool exclusivity (FN-4954):
WorktreePool.acquire(taskId)/release(path, taskId?)track aleasedmap so every pooled path is either idle or leased, never both. Cross-task double-lease detection throwsPoolDoubleLeaseErrorand emitsworktree:pool-double-lease-detected; merger Step 8 now detaches HEAD and clearstask.worktree/task.branchbefore releasing paths back to the pool. - Stale registration recovery (FN-5056):
NativeWorktreeBackend.createandexecutor.tryCreateWorktreedetectmissing but already registered worktreefailures, rungit worktree prune(plusremove --force/add -ffallbacks) before retrying, and emitworktree:stale-registration-{detected,recovered,recovery-failed}audit events. - Raw worktree deletion must be paired with prune (FN-5058): any direct filesystem deletion of a worktree directory (
rm -rf/rmSync) must be followed by best-effortgit worktree pruneviapruneWorktreeAdminEntriesso.git/worktrees/*admin entries are not stranded in a missing-but-registered state (FN-5056 class). - Meta-task auto-archive safety guards (FN-5064):
auto-archive-meta-resolved/auto-archive-meta-stalledmust skip archival (withtask:auto-archive-meta-*-skippedaudits) whenever guard checks detect substantive work signals such as unique branch commits, recent executor activity, pendingtaskDoneRetryCount, merge-in-progress state, or active worktree session. The correspondingtask:auto-archive-meta-resolved-skippedandtask:auto-archive-meta-stalled-skippedrun-audit rows are transition-only per task+guard-reason signature: emit once on first skip, suppress repeated sweeps while the same reasons persist, clear when the skip no longer applies, and re-emit if a different reason later blocks archival. - Scheduler fanout tiebreaker (FN-4969): within the same priority class, scheduler dispatch prefers runnable
todotasks with the highest active dependency-dependent fanout;urgentalways outranks lower priorities regardless of fanout, andoverlapBlockedBy/file-scope overlap blockers are excluded from unblock weight. - Scheduler overlap priority/age guard (FN-5325): with
groupOverlappingFiles=true, scheduler now defers a lower-priority (or younger same-priority) candidate when an overlapping queued todo task exists, preserving priority→age→task-id order for overlap serialization without preempting in-progress work. If the inversion is against an already-running lower-priority blocker, scheduler still defers the candidate; the per-pairing audit event was removed in FN-6174 due to zero consumers and table bloat. - Empty-commit refusal + early empty-own-diff finalize (FN-5345/FN-5377): Fusion task worktrees install a
prepare-commit-msghook that refusesgit commit --allow-emptyand other zero-staged-diff commits, preventing verification-only tasks from manufacturing empty handoff commits that defeat the merger's no-op classifier. The hook allows legitimate empty-tree paths (amend, merge, squash, cherry-pick, revert, rebase). Amend detection tokenizes the parent process command line (ps -o args=with/proc/$PPID/cmdlinefallback for Alpine/busybox) and stops at the first message-supplying flag (-m/-F/--message/--file) so a commit message containing the substring--amendcannot bypass the guard. InaiMergeTask, an early empty-own-diff fast-path runs BEFORE any reuse-handoff acquisition: when integration mode isreuse-task-worktree, the branch exists,git rev-list --count <mergeTarget>..<branch>is > 0, andgit diff --quiet <mergeBase>..<branch>exits 0, the task auto-finalizes as no-op withmergeDetails.noOpMerge: trueand emitstask:auto-recover-finalize-already-on-mainwithreason: "empty-own-diff-early-fast-path". The fast-path best-effort removes the stranded worktree (FN-4811 same-task/foreign-owner guard) and deletes thefusion/<id>branch so empty-own-diff residuals do not accumulate. This unsticks tasks where a stale empty handoff commit combined with drifted worktree↔branch mapping would otherwise wedge the handoff gate withregistered-branch-mismatch. The explicitcwd-integration-branchmode is unchanged (cwd-mainremains a deprecated alias normalized to it).classifyOwnedLandedEvidencealso detects empty-own-diff (aheadCount > 0, zero net diff) and returnsproven-no-opso downstream self-healing and post-handoff finalize paths benefit too. Additionally, merger's reuse-fallback path now consultsgit worktree list --porcelainbefore creating a new worktree: extant usable registrations offusion/<id>are reused directly (rather than blindlygit worktree add -fproducing a duplicate registration), and stale registrations are pruned first. The direct-reuse shortcut is guarded by FN-4811 (refuses paths owned by a different task inactiveSessionRegistry) and FN-4954 (skipped whenrecycleWorktrees=truewith a pool attached, soWorktreePool.acquirelease bookkeeping stays consistent). Two audit subtypes —merge:reuse-fallback-pruned-stale-registrationandmerge:reuse-fallback-reused-existing-registration— replace the prior overloading ofmerge:reuse-fallback-new-worktreefor these cases. - Verified no-op/duplicate executor completion (FN-6275/FN-7488): explicit
fn_task_donemay complete with zero branch commits only when the summary starts with a recognized sentinel (PREMISE STALE:,NO-OP:,NOOP:,DUPLICATE: FN-NNNN ..., orREDUNDANT:), the task already carries a no-commit contract, or the PROMPT declares a source-free gitignored task-artifact delivery. The source-free path is intentionally narrow: File Scope must be populated and limited to board/task artifacts such as.fusion/tasks/..., task documents/logs, or attachments; the prompt must forbid force-adding ignored.fusion/artifacts and fabricating empty commits or equivalently state that source-free/gitignored task artifacts are the only deliverables; and any tracked source/docs/config/test/changeset scope keeps theno_commitsrefusal active (even if.fusion/artifacts are also listed). These exemptions only relax theno_commitsinvariant;wrong_toplevel,wrong_branch, pending-step/review refusals, and scope-leak guards still run. Accepted sentinel completions persistnoCommitsExpected: true, write task-log audit details with marker kind/reason/raw summary/run/agent IDs, and add a task timeline activity so the no-code terminal path remains explainable. Prompt-derived source-free completions logprompt-derived source-free task-artifact contractfor operator audit. Ordinary zero-commit implementation completions without one of these contracts are still refused. - In-review branch-binding self-heal (FN-5083/FN-6695):
reconcile-in-review-branch-rebindruns afterreconcile-task-worktree-metadataand beforereclaim-stale-active-branches. It restorestask.branch(and clearstask.worktreefor fresh acquisition) forin-reviewtasks when exactly one case-insensitivefusion/<id>candidate branch has unique commits versus the integration base. Ambiguous candidates emittask:auto-rebind-skipped(reason: "ambiguous-candidates") and are never auto-resolved. Unsafe metadata repair is also skipped withtask:auto-rebind-skipped:userPausedpreserves authoritative user intent, andcheckedOutBypreserves live agent checkout ownership. Branch construction across executor/worktree-pool/worktree-acquisition/merger/self-healing canonicalizes to lowercase viacanonicalFusionBranchName;fn_task_donewrong-branch checks now auto-canonicalize case-only mismatches and emitbranch:auto-canonicalize-case. - In-review is terminal-until-merged under
autoMerge: false(FN-5147): when a project setssettings.autoMerge: false,in-reviewis the intended resting state until a human merges the PR. No lifecycle-mutating self-healing sweep (reclaimSelfOwnedBranchConflicts,recoverGhostReviewTasks,recoverStaleIncompleteReviewTasks,recoverInterruptedMergingTasks,recoverStuckMergeDeadlocks,recoverMissingWorktreeReviewFailures,recoverPartialProgressNoTaskDoneFailures,recoverCompletionHandoffLimbo,recoverPostDoneNonContinuableWedge,recoverMergeableReviewTasks,recoverMergedReviewTasks,recoverAlreadyMergedReviewTasks,recoverOrphanOnlyScopeViolations,recoverForeignOnlyContaminatedInReviewTasks,recoverReviewTasksWithFailedPreMergeSteps,finalizeNoOpReviewTasks,surfaceInReviewStalls,surfaceInReviewStalled) may move the task out ofin-review, mark itpaused/failed, or re-enqueue it for execution. Explicit per-task overrides are distinguished bytask.autoMergeProvenance: "user"; ambiguous legacy rows stampedautoMerge: trueby the pre-FN-6245 review-entry path are marked"legacy-stamp"once and surfaced in run-audit/logs, but are only cleared by the operator-drivenreconcileLegacyAutoMergeStamps({ apply: true })action. Scoped FN-5819 exception: shared-group members (branchContext.assignmentMode === "shared") are still allowed through the member→branch_groups.branchNameintegration step whileautoMergeis off; this is a soft pre-integration only and does not permit shared-branch → default-branch promotion. FN-7182 applies the same human-gated treatment to an openPrInfo.manualPR created or linked from the dashboard Create PR action: automatic merge queues and self-healing stand down until the PR is closed/merged or handled manually, while pipeline-created PRs withoutmanualremain auto-merge eligible. RECONCILE-ONLY sweeps (branch rebind, blocker fan-out, stale-status clears, contamination metadata cleanup, attribution restore, PR refresh, misclassified-failure error clearing) continue to run. - Auto-merge integration-root default (FN-5279): direct auto-merge now defaults
mergeIntegrationWorktreetoreuse-task-worktree; merger must pass the reuse handoff gates or emitmerge:reuse-handoff-refusedand leave the task inin-reviewwithout silently falling back tocwd-integration-branch(cwd-mainremains a deprecated alias normalized to that mode). - Orphaned execution sweep is observation-only (FN-5337):
recoverOrphanedExecutionsonly annotates stale in-progress candidates withtask:orphan-detected-no-actionand[orphan-detected] ... no action (operator-decides)logs. It must never movein-progress/in-reviewbackward totodoor mutate lease/worktree metadata. Proof-based backward recovery remains exclusively inrecoverInProgressLimbo(FN-5219),RestartRecoveryCoordinator,recoverMissingWorktreeReviewFailures, and explicit executor/merger failure paths. Reintroducing lifecycle mutation here requires hard git/session proof gating plus CEO+CTO+PM sign-off. - Self-owned reclaim resume-limbo escalation (FN-5704):
reclaimSelfOwnedBranchConflictstracksresumeLimboCount,resumeLimboTipSha, andresumeLimboStepSignaturefor in-progress reclaim/unpause loops. If reclaim finds no progress (same tip, same step-status signature, and no active-session signal) forMAX_NO_PROGRESS_RESUME_ATTEMPTSconsecutive sweeps, self-healing escalates by moving the task totodowithpreserveWorktree: true,preserveProgress: true, andpreserveResumeState: trueinstead of endlessly re-arming resume. Escalation emitstask:resume-limbo-escalatedrun-audit metadata (frozenTipSha,idleMs,resumeAttemptCount,currentStep) and resets the limbo counter. - Merge-request shadow contract (FN-5741 Phase 1):
mergeRequestContractShadowEnableddefaults OFF. OFF means no writes tomerge_requestsorcompletion_handoff_markersand the legacy lifecycle remains authoritative. ON enables write-only shadow persistence: executor/self-healing appendtask:completion-handoff-acceptedmarker+record writes only after successful legacyhandoffToReview, and merger mirrorsmerge:request-enqueuedplusqueued → running → succeeded(ormanual-requiredforautoMerge:false) transitions. Phase 1 never reads these shadow records for column movement, dependency checks, lease arbitration, merge dequeue, or FN-5479/FN-5704 limbo recovery decisions. - Dual-observe parity seam (FN-5742 Phase 2): with the same flag ON, legacy remains authoritative while shadow reads compute/emit parity telemetry only. Scheduler emits
merge:dependency-parity-diffwhenin-review|done|archiveddependency satisfaction diverges from completion-handoff marker satisfaction, andmerge:lease-parity-diffwhen legacy in-review overlap leasing diverges from shadow lease decomposition. Merger emitsmerge:request-dequeued-shadow(agree/disagree metadata) by comparing legacy dequeue selection to shadow merge-request selection while explicitly skippingmanual-requiredrows. Phase 3 dequeue cutover is gated on sustained parity (low disagreement rate) from these additive events; no lifecycle authority changes in Phase 2. - Authoritative cutover seam (FN-5743 Phase 3): with the flag ON, merge-request records and
completion_handoff_acceptedmarkers become authoritative enforcement signals for dequeue/retry ownership and dependency/lease gates. Accepted handoffs stop stampingin-reviewexecutor overlap leases, transient merge retries stay in merge-request state (running → retrying → queued, terminalexhausted|succeeded|cancelled) withouttodorebounds, and user hard-cancel (in-review → todo) deterministically cancels pending merge-request records while keeping FN-5147/FN-5704 behavior unchanged. - No-progress churn terminalization (FN-5168):
StuckTaskDetectornow tracks ignoredfn_task_updaterebuffs viarecordIgnoredStepUpdate(taskId)and, after one loop/compact-and-resume recovery has already fired in the sameexecute()lifecycle, escalatesignoredStepUpdateCount >= 25to the terminal reasonno-progress-churn.SelfHealingManager.checkStuckBudget()maps that reason directly toSTUCK_NO_PROGRESS_CHURN, emitstask:stuck-no-progress-churn-terminalizedwith{ taskId, ignoredStepUpdateCount, stuckKillStreak, lastReason }, and parks the task inin-reviewwithout consuming the normal stuck-kill budget. Under FN-5147autoMerge: false, that failed in-review task remains terminal-until-merged just likeSTUCK_LOOP_EXHAUSTED; the new class adds an earlier bounded exit, not a re-execution path. - Verification-active stuck-loop suppression (FN-6598):
fn_run_verificationregisters a per-task active verification window withStuckTaskDetector. Within the command's own timeout budget, subprocess output/heartbeats are treated as forward progress and suppress onlyloop/no-progress-churn; the deadline restores normal classification if the command or end callback wedges, andinactivityremains governed by heartbeat flow. - Todo↔in-progress flapping convergence (FN-5941): live backward-recovery paths now share a
getFalsePositiveRequeueSignal(...)guard that suppressesin-progress → todorecovery when any hard liveness proof exists (getExecutingTaskIds, recent active-heartbeat run, checked-out lease, live worktree+branch binding, or recentexecutionStartedAtinside the relevant grace window). Suppressed candidates emit observation-onlytask:*no-actionaudits instead of silently mutating lifecycle state. Scheduler adds a shortrecentEngineTodoRequeuessettle window so engine-sourced requeues cannot be re-dispatched immediately on the sametask:moved → todotick. The durable convergence backstop is the dispatch-oscillation breaker: scheduler reusestask.dispatchStormCount+task.lastDispatchAtas a sliding-window counter (dispatchOscillationThreshold,dispatchOscillationWindowMs) and, when the threshold is exceeded, leaves the task parked intodo, setspaused: truewithpausedReason: "dispatch-oscillation", recordstask:dispatch-oscillation-terminalized, and requires an operator unpause or forward move to reset the counter. - Landed-files attribution (FN-5103): Rebase-strategy
mergeDetails.landedFiles/filesChanged/insertions/deletionsare captured from task-attributable commits only viafilterFilesToOwnTaskCommits(subject-prefix + trailer + bracket-prefix evidence), taggedlandedFilesAttributionRestricted: true. Zero own commits →landedFiles: []andnoOpVerifiedShortCircuit: true. FN-5304 guard: when<rebaseBaseSha>..HEADreports zero own commits, merger must also validate the sourcefusion/<id>tip; if that source tip still has attributable own commits relative torebaseBaseSha, throwSilentNoOpAttributionMismatchError, refuse writingmergeConfirmed: true, park the task inin-reviewwithstatus: "failed", and emitmerge:no-op-attribution-mismatch. If source ref is unavailable, skip with diagnostic +merge:no-op-attribution-mismatch-skipped(reason: "source-ref-unavailable"). Attribution-helper failures fall back to the unrestrictedrebaseBaseSha..shawalk and setlandedFilesCaptureFallback: 'attribution-failed'. Self-healingrecoverDoneTaskMergeMetadataskips reconcile whenlandedFilesAttributionRestrictedornoOpVerifiedShortCircuitis set so the narrower set is not overwritten with the full range. Squash-strategy capture is unchanged. - Soft-delete scheduler invalidation (FN-5137):
task:deletedevents must invalidateAutoClaimSnapshotManagerand clear scheduler bookkeeping (pausedTaskIds,failedTaskIds,wasNodeDispatchValidationBlocked,wasNodeBlocked);executor.execute()/resumeOrphaned()/resumeTaskForAgent()refuse any task withdeletedAtset. - Soft-delete in-flight abort (FN-5142):
task:deletedmust immediately abort/dispose active executor work (activeSessions,activeStepExecutors,activeWorkflowStepSessions, reviewer subagents), interrupt active merge state (mergeAbortController,activeMergeSession,activeMergeTaskId,mergeActive,mergeQueue,pausedReviewTaskIds), and abort triage specify/subagent sessions for that id. Handlers are per-task and idempotent. - Soft-delete audit + column reconcile (FN-5175):
TaskStore.deleteTaskrecords arunAuditEventsrow (mutationType: "task:deleted",domain: "database") inside the same transaction that setsdeletedAt, and sets"column" = 'archived'on the row. Callers without a heartbeat run context (fn task delete, pi extension, dashboard delete route) pass anauditContextwithagentId: "system"and a syntheticrunId. The watcher cross-instance emit path does NOT re-record the audit event. The row stays intasks(notarchivedTasks);archiveTaskis unchanged. - Soft-delete resurrection guard (FN-5208):
TaskStore.readTaskJson()must never fall back to.fusion/tasks/<id>/task.jsonwhen the DB row exists withdeletedAtset — it throwsTaskDeletedError.atomicCreateTaskJson/atomicWriteTaskJson/atomicWriteTaskJsonWithAuditrefuse to upsert a task whose row is currently soft-deleted (unless the in-memory task carriesdeletedAtitself, for soft-delete maintenance paths), emit a[soft-delete-resurrection-blocked]log line, and record atask:resurrection-blockedrun-audit event. Stale in-flight planner/triage writes for a soft-deleted ID surfaceTaskDeletedErrorand abort cleanly without emittingtask:created. - Exhausted in-review visibility surfaces (FN-5513/FN-6569): retry-exhausted merge failures (
column='in-review',status='failed',mergeRetries >= maxAutoMergeRetries, default3) can remain soft-deleted for lifecycle safety, but are now intentionally discoverable through opt-in read paths:TaskStore.listExhaustedInReviewTasks({ includeDeleted }),GET /api/tasks/exhausted-in-review,GET /api/tasks/:id?includeDeleted=true, CLIfn_task_showsoft-delete fallback marker, CLIfn_task_list({ includeDeleted: true }), and the dashboard ReliabilityView "Exhausted in-review (hidden blockers)" panel. This complements FN-5488/FN-5496 downstream blocker healing by surfacing the upstream blocker without mutating lifecycle state. - Soft-delete stream verification gate (FN-5153):
docs/soft-delete-verification-matrix.mdis the authoritative checklist for the FN-5105 → FN-5143 soft-delete stream. Every scenario × layer cell must be GREEN (or have a linked follow-up FN) before the stream is closed;packages/engine/src/__tests__/reliability-interactions/soft-delete-end-to-end.test.tsis the cross-layer regression backstop.
Reliability interaction backstops
Reliability-layer changes are in scope. Interaction regression backstops live in packages/engine/src/__tests__/reliability-interactions/ — any task that adds or changes a reliability layer must add/update interaction tests there covering each plausible pair with existing layers (merge path, workflow/pre-merge, self-healing, scheduler/watchdog/restart recovery, governance gates).
- FN-4935 backstop:
packages/engine/src/__tests__/reliability-interactions/executor-liveness-gate.test.tsguards fresh-acquisition skip behavior, structured liveness classifications, and executor-gate audit/requeue outcomes. - FN-4887 backstop:
packages/engine/src/__tests__/reliability-interactions/foreign-only-contamination-recovery.real-git.test.tscovers composition between bootstrap-misbinding, contamination dispatcher retry, misbound-in-review ordering, and FN-4811 active-session safeguards. - FN-5039 backstop:
packages/engine/src/__tests__/reliability-interactions/worktree-contamination-attribution.real-git.test.tsguardscaptureModifiedFilestrailer attribution filtering andtask:worktree-contamination-detectedaudit fan-out across rebase contamination, clean, untrailered, and fallback paths. - FN-4976 backstop:
packages/engine/src/__tests__/reliability-interactions/stale-self-owned-session-registry.test.tsguardscleanupConflictingWorktreeclearing stale same-taskactiveSessionRegistryentries before the FN-4811 foreign-owner check, while preserving refusal behavior for foreign owners and live same-task bindings. - FN-5346 backstop:
packages/engine/src/__tests__/reliability-interactions/post-completion-stale-self-owned-binding.test.tscovers post-completion and dep-abort same-task stale-binding cleanup, restart-residue recovery, same-task live-binding refusal, foreign-owner FN-4811 refusal, idempotent repeat sweeps, and FN-4954 lease-map composition. - FN-4999 backstop:
packages/engine/src/__tests__/reliability-interactions/completion-handoff-limbo.test.tscovers therecoverCompletionHandoffLimbosweep stage (grace window, active-task skip, merge-blocker guard, capped retries, and audit fan-out). - FN-5889 backstop:
packages/engine/src/__tests__/reliability-interactions/post-done-continuation-no-wedge.test.tscovers the post-done step-session non-continuable suppression path plusrecoverPostDoneNonContinuableWedge, including bounded self-heal ordering before stall surfacing. - FN-5345/FN-5377 backstops:
packages/engine/src/__tests__/reliability-interactions/merge-reuse-task-worktree.test.ts(FN-5345: empty-own-diff branch auto-finalizes via early fast-path without acquiring reuse handoff) covers the early no-op fast-path under drifted worktree mapping;packages/engine/src/__tests__/real-git/prepare-commit-msg-empty-guard.real-git.test.tscovers the empty-commit refusal hook (refuses--allow-empty, allows amend + real commits, no-op outside fusion worktrees). - FN-5083 backstop:
packages/engine/src/__tests__/reliability-interactions/in-review-branch-rebind.test.tscovers in-review branch rebind composition with metadata-cleared state, idempotent re-sweeps, and ambiguous-candidate skip behavior. - FN-5093 backstop:
packages/engine/src/__tests__/reliability-interactions/in-review-stalled-detector.test.tscovers composition between quiet-window in-review stalled surfacing and adjacent reason-driven/paused/ghost-recovery/auto-merge gating paths. - FN-5103 backstop:
packages/engine/src/__tests__/reliability-interactions/landed-files-attribution.test.tscovers attribution-restricted rebase landed-files capture, verified-short-circuit zero-own-commit capture, and attribution-failure fallback composition. - FN-5147 backstop:
packages/engine/src/__tests__/reliability-interactions/in-review-automerge-off.test.tscoversautoMerge: false+ long-quiet in-review + maintenance/startup sweep cycles, asserting no column move / no paused / no status mutation / no requeue, plus explicit regression guards forsurfaceInReviewStallsandsurfaceInReviewStalled. - FN-5168/FN-6598 backstop:
packages/engine/src/__tests__/reliability-interactions/non-progress-churn.test.tscovers loop→compact recovery followed by ignored-step-update churn escalation, terminalbeforeRequeue(false)behavior, audit/log payloads, FN-5147 autoMerge-off composition, and verification-active suppression so healthyfn_run_verificationruns do not reachonLoopDetected/ stuck-budget handling while the no-verification control still trips. - FN-5219 backstop:
packages/engine/src/__tests__/reliability-interactions/in-progress-limbo-recovery.test.tscoversrecoverInProgressLimbocomposition withrecoverOrphanedExecutions(no double-recovery),reconcile-task-worktree-metadata(live rebindable worktree wins),recoverMissingWorktreeReviewFailures(in-review vs in-progress disjoint), and executor task-id claim skip, plus an explicit FN-5149 reproduction case. - FN-5704 backstop:
packages/engine/src/__tests__/reliability-interactions/reclaim-self-owned-resume-limbo-escalation.test.tscovers bounded no-progress reclaim/resume detection, preserve-work escalation totodo,task:resume-limbo-escalatedaudit metadata, progress-signal reset behavior, and user-paused/autoMerge-off non-escalation guards. - FN-5715 backstop:
packages/engine/src/__tests__/reliability-interactions/mission-validation-trigger-gap.test.tslocks the mission-validation trigger invariant so done mission-linked tasks still start validation when the mission loop was stopped, startup recovery replays done implementing features with unpassed assertions, and recovery remains idempotent for already-passed features. - FN-5782 backstop:
packages/engine/src/__tests__/reliability-interactions/branch-group-merge-routing.test.tsguards branch-group merge routing sosharedmembers land onbranch_groups.branchName, grouped multi-member merges converge on the same integration branch, ungrouped/per-task-derivedtasks stay on direct default-branch merge flow, and routed merges emitmerge:branch-group-routedaudit metadata. - FN-5788 backstop:
packages/engine/src/__tests__/reliability-interactions/branch-group-promotion-gate.test.tsguards the promotion eligibility hook/audit seam so member landings emitmerge:branch-group-promotion-gatedwith deterministic reason metadata (eligible,group-automerge-disabled,settings-automerge-disabled,global-pause,engine-paused) while group branches remain open and do not auto-promote to the default branch. - FN-5830 backstop:
packages/engine/src/__tests__/reliability-interactions/branch-group-promotion.test.tsguards branch-group completion-gate + promotion lifecycle so promotion happens exactly once after all members land, re-calls are idempotent, and gated paths emitmerge:branch-group-promotion-gatedwithout default-branch promotion. - FN-5819/FN-5846 backstop:
packages/engine/src/__tests__/reliability-interactions/shared-group-member-integration.test.tsandshared-branch-group-lifecycle.test.tsguard the scoped autoMerge-off exception and deterministic finalize path so shared members integrate into the single group branch, producemergeTargetSource: "branch-group-integration"/mergeTargetBranch, do not land on main, and are not moved backward by self-healing maintenance. - FN-5901 backstop:
packages/engine/src/__tests__/reliability-interactions/mission-validator-run-reaper.test.tsguards stale mission-validator-run recovery across manual and automatic trigger types, verifiesmission:validator-run-reapedaudit metadata, ensures archived/complete parents keep their terminal feature state untouched, and proves reaped active features resume validation instead of staying wedged behind abandonedrunningrows. - FN-5738 backstop (superseded by FN-5902):
packages/engine/src/__tests__/reliability-interactions/mission-validation-trigger-gap.test.tsno longer permits zero-assertion auto-pass. Current coverage proves legacy zero-link features lazily restore a managed assertion, route through validator runs, and do not emitvalidation_auto_passed_no_assertionsduring recovery replays. - FN-5741 backstop:
packages/engine/src/__tests__/reliability-interactions/merge-request-shadow-handoff.test.tsguards Phase-1 merge-request contract shadow writes: flag OFF is a no-op, flag ON writes marker/record strictly after legacy handoff, andautoMerge:falseremainsmanual-requiredwithout shadow running transitions. - FN-5742 backstop:
packages/engine/src/__tests__/reliability-interactions/dual-observe-merge-seam.test.tsguards Phase-2 dual-observe invariants: legacy dependency satisfaction remains authoritative while parity diffs emit, and shadow dequeue selection never advancesmanual-requiredrows. - FN-5743 backstop:
packages/engine/src/__tests__/reliability-interactions/merge-request-cancel-on-hard-cancel.test.tsandpackages/core/src/__tests__/merge-request-record.test.tsguard Phase-3 cutover invariants: transient merge retries mutate merge-request state (no column rebound), user hard-cancel after accepted handoff cancels pending merge requests, and non-user rebounds preserve legacy fail-soft semantics. - FN-5770 backstop:
packages/engine/src/__tests__/reliability-interactions/workflow-interpreter-cutover.test.tsguards the interpreter-authoritative lifecycle seam. The cutover remains opt-in (workflowInterpreterAuthoritativedefault OFF), readiness-gated by clean populated parity summary evidence while retiredworkflowInterpreterDualObservesettings stay inert, reversible by flipping one flag back OFF, and must preserve file-scope, squash-overlap,autoMerge:false, hard-cancel, and self-healing interaction invariants. - FN-5337 backstop:
packages/engine/src/__tests__/reliability-interactions/orphan-detected-no-requeue.test.tslocks observation-only orphan detection across FN-5279 repro metadata desync, worktree-present and worktree-missing candidates, FN-5219 ordering, FN-5147 in-review isolation, FN-5083 branch-cleared composition, lease-manager non-invocation, and per-sweep idempotent audit emission. - FN-5256 backstop:
packages/engine/src/__tests__/reliability-interactions/dependency-cycle-reconcile.test.tscovers persisted dependency-cycle detection viareconcileDependencyCycles, bounded umbrella-back-edge auto-repair, ambiguous-cycle observe-only behavior, composition ordering withreconcileSelfDefeatingDependencies, and the post-sweep write-time guard invariant. Core write-boundary regressions (FN-5240/5241/5242 signature, indirect cycle, umbrella back-edge rejection) live inpackages/core/src/__tests__/store-dependency-cycle.test.ts. - FN-5223 backstop:
packages/engine/src/__tests__/reliability-interactions/engine-active-since-floor.test.tscovers engine-activation floor + grace composition across startup, pause/unpause, global-pause gating, and StuckTaskDetector lifecycle interactions.
The auto-recovery dispatcher at packages/engine/src/auto-recovery.ts (FN-4533) composes on top of existing layers (FN-4500 fast-path, FN-4508 deterministic branch-conflict, FN-4499 bootstrap-misbinding, FN-4428 contamination, mergeAuditAutoRecovery Stages 1–5, self-healing) to handle six residual classes: file-scope violation at squash, branch misbinding / ghost worktree, verification-fix scope leak, contamination, branch-conflict-unrecoverable residuals, and room-post/message-send failures. Invocation is additive — no existing layer's behavior changes.