Usage

April 25, 2026 ยท View on GitHub

This guide describes how to use pi-computer-use tools from Pi once the extension is installed and macOS permissions are granted.

Core Workflow

Call screenshot first when you already know the target. It selects the controlled window and returns the latest semantic state.

screenshot()
screenshot({ app: "Safari" })
screenshot({ app: "TextEdit", windowTitle: "Untitled" })

When the app or window is ambiguous, discover targets first:

list_apps()
list_windows({ app: "Safari" })
screenshot({ window: "@w1" })

Action tools operate on the current controlled window by default. To switch windows, call screenshot again with an app/window title or a window ref from list_windows. You can also pass window to action tools when you want to make the intended target explicit:

click({ window: "@w1", ref: "@e1" })
keypress({ window: "@w1", keys: ["Enter"] })

Tool results include:

  • target: app, bundle ID, pid, window title, window ID, and optional windowRef.
  • capture: screenshot dimensions, scale factor, stateId, and coordinate space.
  • axTargets: semantic targets such as @e1.
  • execution: strategy, variant, AX/fallback details, and strict-mode compatibility.
  • Optional image content when semantic coverage is weak or fallback recovery is useful.

AX Refs First

When the latest state includes AX refs, prefer them over coordinates.

click({ ref: "@e1" })
set_text({ ref: "@e2", text: "hello" })
scroll({ ref: "@e3", scrollY: 600 })

Refs are intentionally short and local to the latest semantic state. If a ref is stale, the bridge tries to reacquire a matching target by role, label, capabilities, and position.

Use coordinates only when no matching AX target is available:

click({ x: 320, y: 180, stateId: "..." })

Coordinates are window-relative screenshot pixels from the latest screenshot. Pass stateId from the latest result when you want stale-state validation.

Use image: "always" when visual verification matters, image: "never" to suppress image attachments, or omit it for the default auto behavior.

Writes are serialized per target window internally, so multiple agents can safely queue actions against the same @w ref. Scroll failures include the best available AX reason and recovery guidance when fallback coordinates are required.

Tool Reference

ToolPurposePrefer
list_appsDiscover running appsBefore targeting when app names are unknown or ambiguous
list_windowsDiscover controllable windows, ids, titles, and geometryBefore targeting multi-window apps
screenshotSelect or refresh the controlled windowwindow refs or app/window filters when switching target
clickActivate by AX ref or coordinateref
double_clickOpen/select items that require double-clickref when available
move_mouseTrigger hover behaviorCoordinates
dragDrag path or AX adjust targetref plus path for adjustable controls
scrollScroll by AX ref or coordinateref
keypressEnter, Escape, Tab, arrows, deletion, shortcutsSemantic keys when possible
type_textInsert text at current cursor/selectionUse after focusing field
set_textReplace AX text valueref with canSetValue
waitPause and refresh statePolling/loading states
arrange_windowMove/resize a window deterministicallyPresets such as center_large, left_half, right_half
navigate_browserNavigate a browser window directlyPrefer over address-bar keystrokes when you know the URL
computer_actionsBatch obvious actionsUse only when intermediate inspection is unnecessary

Text Input

Use set_text when replacement semantics are correct:

set_text({ ref: "@e2", text: "new value" })

Use click plus type_text when insertion semantics matter:

click({ ref: "@e2" })
type_text({ text: " inserted text" })

Use keypress for non-text keys:

keypress({ keys: ["Enter"] })
keypress({ keys: ["Command+L"] })
keypress({ keys: ["Tab", "Enter"] })

For shortcut sequences, use chord strings such as Command+L. Use arrays like ["Command", "L"] only for a single chord call.

Browser Workflows

For browser work, prefer a dedicated browser window rather than the user's active tab. The extension tries to open an isolated browser window when safe and appropriate.

Common address-field workflow:

computer_actions({
  stateId: "...",
  actions: [
    { type: "keypress", keys: ["Command+L"] },
    { type: "type_text", text: "https://example.com" },
    { type: "keypress", keys: ["Enter"] }
  ]
})

For Safari and Chromium-family browsers, this can use an AX-first path for address replacement and navigation.

If browser_use is disabled, browser screenshots and actions are refused. See configuration.

Batching

computer_actions accepts one to twenty actions and returns one post-action state update.

Good fit:

computer_actions({
  stateId: "...",
  actions: [
    { type: "click", ref: "@e1" },
    { type: "set_text", ref: "@e2", text: "hello" },
    { type: "keypress", keys: ["Enter"] }
  ]
})

Do not batch when the next action depends on seeing the intermediate result.

Each batched action includes execution metadata, including whether it used the stealth or default variant.

Strict AX Mode

Strict AX mode requires background-safe Accessibility paths.

Allowed when AX support is available:

  • AX press/focus
  • AX value replacement
  • AX scroll
  • AX increment/decrement adjustment
  • Semantic key actions such as confirm/cancel/press

Blocked:

  • Raw pointer events
  • Raw keyboard events
  • Foreground focus fallbacks
  • Cursor takeover
  • Browser window bootstrap that requires non-AX automation

Enable strict AX mode with config or environment variables. See configuration.