DesktopCtl

May 18, 2026 · View on GitHub

Local CLI for AI agents to observe and control your computer via screen, mouse, and keyboard. Bring your own AI - any model, even without vision.

Runs fully local. No screenshots sent to the cloud.

Learn more at https://desktopctl.com

https://github.com/user-attachments/assets/4321b23e-6706-4792-a911-89e13766ebc0

Why DesktopCtl

Local-first runtime. No cloud dependency
Bring your own AI: works with any desktop AI agent
GPU-accelerated text recognition and computer vision
Selector-first automation (--text, --token) with coordinate fallback
Agent-friendly explicit waits and post-action verification
Stable JSON contracts for agent integrations

Architecture

DesktopCtl is split into two binaries:

DesktopCtl.app (desktopctld): daemon that owns perception, state, execution, and verification
desktopctl: stateless CLI surface for actions and queries over local IPC

Repository layout:

src/desktop/core - shared protocol and types
src/desktop/daemon - daemon runtime
src/desktop/cli - CLI client

Current Scope

macOS-first
OCR-first perception pipeline
Tokenized screen output for agent grounding
Deterministic CLI primitives for click/type/wait flows

Prerequisites

macOS (current support target)
Rust toolchain (cargo)
just command runner
Accessibility permission for DesktopCtl.app
Screen Recording permission for DesktopCtl.app

Quick Start

make install

raw="$(desktopctl app open Notes --json)"
win_id="$(printf '%s' "$raw" | jq -r '.result.window_id // empty')"
desktopctl keyboard press cmd+f --active-window "$win_id" --no-observe
desktopctl keyboard type "Shopping list" --active-window "$win_id" --no-observe
desktopctl screen tokenize --active-window "$win_id"

Status / Roadmap

Status: active development, with macOS-first CLI and daemon workflows already usable.
Reliability for text/token-driven actions and verification loops. Stable machine-readable error codes.
Upcoming CLI: doctor, richer window/app introspection, and --explain failure output.
Better local computer vision and semantic UI tokenization.
Multi-platform support.