Features

June 15, 2026 · View on GitHub

Part of the AgentBox docs. Start at CLAUDE.md. For cloud (--provider daytona) parity + the few cloud-only knobs, see cloud-providers.md §5 and the running status in daytona-backlog.md.

What works today

Full local-Docker lifecycle (plus parity-tested for cloud via --provider daytona — see cloud-providers.md):

  • agentbox create — builds the image on first run (or resolves a checkpoint image when --snapshot <ref> is given), detects git repos (root + 1st-level subdirs), collects host-side carry-over (git stash create + untracked ls-files), spins up the container, then seeds /workspace via either seedWorkspace (in-container git worktree add against the bind-mounted .git/ + stash/untracked replay) or seedWorkspaceFromDir (tar-pipe from host workspace / APFS clone for the no-git case). Checkpoint restore skips both — the image already has /workspace. Mounts the agentbox-claude-config named volume at /home/vscode/.claude and rsyncs host's ~/.claude into it (additive, host-authoritative). Bind-mounts each main repo's .git/ at its identical absolute host path inside the container so worktree pointer files resolve symmetrically on both sides. --with-env (also on agentbox claude; config key box.withEnv) copies the host's DEFAULT_ENV_PATTERNS files (.env*, .envrc, .dev.vars, secrets.toml, local.settings.json, appsettings.*.json, agentbox.yaml) into /workspace after seeding — the host→box reverse of agentbox download env (gitignored files are otherwise excluded by the worktree carry-over's git ls-files --others --exclude-standard). One-shot at create time, lands in the container's writable layer (persists across stop/start), best-effort (warn-not-throw), recorded as BoxRecord.withEnv and surfaced in agentbox status --inspect. Implemented by copyHostEnvFilesToBox / buildHostEnvFindArgs in packages/sandbox-docker/src/host-export.ts (host find . -print0 | tardocker exec -i --user 1000:1000 tar -x).
  • carry: in agentbox.yaml — declarative host→box file copy that bypasses .gitignore. Each entry maps a host path (/abs, ~/..., or ./relative-to-project-root) to an explicit in-box destination (/abs or ~/...~/ expands to /home/vscode); accepts a mode: (octal), user: (uid), exclude: (tar globs / bare dir names), and optional: true. When copying a directory, heavy regenerable dirs (.git, node_modules, bin, obj, packages, dist, .next, targetDEFAULT_CP_EXCLUDES in apps/cli/src/lib/dir-breakdown.ts) are dropped by default and exclude: is additive. The resolver enforces no-..-traversal, denies /proc|/sys|/dev|/etc/passwd|/etc/shadow, caps per-entry size after excludes at box.cpMaxBytes (default 100 MiB — the same limit agentbox cp uses; carry callers pass the effective value into resolveCarry), and flags symlinks whose target leaves $HOME and the project root. On agentbox create / claude / codex / opencode, the host CLI prompts ONCE (@clack/prompts.selectyes / skip just for this box / cancel create) listing every src→dest with size + mode + symlink warnings, then threads the approved set into provider.create as req.carry. Auto-approve with --carry-yes (or AGENTBOX_CARRY_YES=1 for CI); skip with --carry skip (or AGENTBOX_CARRY=skip). agentbox fork is the exception: it sends the carry: block by default (it forwards --carry-yes), because the host is trusted and the box is the untrusted side, so a host→box copy is safe — opt out with agentbox fork --carry skip. -y / --yes does NOT auto-approve carry — non-TTY use of -y with non-empty entries fails loud, asking for the explicit env var (auditable in CI). The -i (queued background) path runs the same gate on the host at submit time (runQueuedCarryGate), serializes the approved ResolvedCarryEntry[] onto the queue job (QueueJobCreateOpts.carry), and the host-side worker applies them at box-create time — so --carry-yes / --carry skip work identically for -i. Docker injects via copyCarryPathsToBox (docker cp for files, host-tar + docker exec tar -x for dirs); cloud (Hetzner + Daytona) injects via uploadCarryPaths (host-tar + backend.uploadFile + backend.exec(tar -x)), per-entry isolated. Files land owned by vscode:vscode (uid 1000) when under /home/vscode; an audit summary ({count, entries: [{src, dest, bytes}]}) is recorded on BoxRecord.carry. Use case: develop AgentBox itself inside an AgentBox — carry ~/.agentbox/secrets.env + ~/.agentbox/claude-credentials.json so the in-box agentbox CLI is fully authenticated. Schema: packages/ctl/src/carry.ts. Resolver / prompt / gate: apps/cli/src/lib/carry-resolve.ts, apps/cli/src/carry-prompt.ts, apps/cli/src/lib/carry-gate.ts. Copiers: packages/sandbox-docker/src/host-export.ts:copyCarryPathsToBox, packages/sandbox-cloud/src/carry.ts:uploadCarryPaths. A file carry entry may also set replaceEnvs: true (substitute {{AGENTBOX_*}} whitelist placeholders), replace: (inline {from,to,regex?} rules), and/or rules: (named refs into the top-level replacements: block) — the file is rendered host-side to a temp by renderCarryEntries (@agentbox/sandbox-core/src/carry-render.ts) before the copy (the host source is never modified; the box name is known by then). Named refs are expanded in resolveCarry; replace options are file-only (a dir entry errors).
  • run_once tasks + the replacement engine — a task may declare run_once: true (the supervisor skips it while a SHA-256 of the resolved command matches a marker at <stateDir>/tasks/<name>, default stateDir=/var/lib/agentbox — box rootfs, captured by checkpoints, off /workspace) or run_once: { check: <cmd> } (run the probe first; exit 0 = skip, no marker — for state outside the checkpoint like a containerized DB). run-task --force bypasses both. Handled in TaskRunner.launch (packages/ctl/src/supervisor.ts). The shared, pure replacement engine lives in @agentbox/core (replace.ts: applyReplacements = {{AGENTBOX_*}} whitelist substitution + ordered rules; re-exported by @agentbox/ctl which adds the yaml/fs loaders — kept in core to avoid the sandbox-core → ctl → relay → sandbox-core build cycle). Surfaced three ways: the top-level replacements: block (named rule-sets, parsed in config.ts), agentbox-ctl render <src> [--out|--in-place] [--env] [--rules|--rule|--rule-regex] (in-box declarative sed, packages/ctl/src/commands/render.ts), and the carry replaceEnvs/replace/rules above. render also expands {{AGENTBOX_AUTO_SECRET}} (fresh 32-byte base64url per render) / {{AGENTBOX_AUTO_SECRET:<name>}} (generated once, persisted at <stateDir>/secrets/<name>, reused) — packages/ctl/src/secret.ts, replacing openssl rand in env tasks.
  • Declarative docker image: services — a service may set image: (a bare ref string, or a mapping { name, ports, env, args, container_name }) instead of command:; parseService (packages/ctl/src/config.ts) synthesizes the docker start-or-run shell (the proven examples/express-ready / optima pattern), reused by name across restarts (env baked into -e, no auto-rm). command/image are mutually exclusive; the runner/ready_when/restart/expose machinery is unchanged. The shared writable-state-dir resolver (packages/ctl/src/state-dir.ts) backs both run_once markers and persisted secrets.
  • agentbox attach [box] — agent-agnostic reattach. Probes the box for live tmux sessions (claude / codex / opencode) via one tmux list-sessions -F '#{session_name} #{session_created}' round-trip, picks the running session, and dispatches to the same wrapped-pty path the per-agent attaches use. When 2+ are live: TTY prompts via Clack select; non-TTY falls back to the most recently started. When 0 are live it prints no agent session running in <name> and exits 1 — never auto-starts (use agentbox claude / codex / opencode for that). Flags: --session-name <name>, --attach-in <mode>, -i, --inline. Works for docker and every cloud provider — the cloud branch reuses cloudAgentAttach; the pre-probe is what keeps cloud from auto-creating the tmux session via provider.buildAttach. apps/cli/src/commands/attach.ts.
  • agentbox claude [-- <claude-args>...] — does everything create does, then starts Claude Code in a detached tmux session inside the box and attaches the user's terminal to it. Ctrl+a d detaches; the claude process keeps running. Reattach with agentbox attach <box> (or agentbox claude attach <box> for the per-agent variant, which also auto-starts a fresh session if none is running). Forwards ANTHROPIC_API_KEY / CLAUDE_CODE_OAUTH_TOKEN / CLAUDE_EFFORT / ANTHROPIC_MODEL from host env when set. --isolate-claude-config opts into a per-box agentbox-claude-config-<id> volume.
  • agentbox claude start [box] [-- <claude-args>...] — start a Claude session in an existing box (vs agentbox claude which creates one). Resolves [box] via the usual auto-pick / index / name / id-prefix chain. Auto-unpauses/starts the container if needed (mirrors shell/code). Re-syncs ~/.claude into the box volume by default (skip with --no-sync-config for speed). Re-runs rebuildPluginNativeDeps (idempotent — gated by per-plugin marker). If a tmux session with the configured name already exists, just attaches; otherwise starts a fresh one. Post--- args are forwarded to claude only when starting a fresh session.
  • agentbox codex [-- <codex-args>...] — the Codex parity of agentbox claude: does everything create does, then launches OpenAI Codex in a detachable tmux session (codex session name; --session-name / config codex.sessionName override). Forwards OPENAI_API_KEY from host env. --isolate-codex-config opts into a per-box agentbox-codex-config-<id> volume. Subcommands mirror claude: agentbox codex start [box] [-- <codex-args>...] (start a session in an existing box, auto-unpause/start, --no-sync-config to skip the ~/.codex resync), agentbox codex attach [box] (attach/start without resyncing), agentbox codex login [-- <args>] (sign in via a throwaway container — defaults to codex login --device-auth, the headless device-code flow; pass -- --api-key for the API-key path). Skips the claude-only steps (setup wizard, plugin rebuild). apps/cli/src/commands/codex.ts. Codex is baked into the base image, but a box built from a checkpoint captured before Codex support (or an older base image) won't have the binary — ensureCodexInstalled (codex.ts) detects that and npm install -g @openai/codexs it into the box's writable layer at create/start time (mirrors --with-playwright; fast command -v no-op when codex is already present).
  • agentbox opencode [-- <opencode-args>...] — the OpenCode parity of agentbox codex: creates a box and launches OpenCode (sst/opencode, the multi-provider terminal agent) in a detachable tmux session (opencode session; --session-name / config opencode.sessionName). --isolate-opencode-config opts into a per-box volume. Subcommands mirror codex: agentbox opencode start [box] (auto-unpause/start, --no-sync-config to skip the config resync), agentbox opencode attach [box], agentbox opencode login [-- <args>] (runs the interactive opencode auth login provider picker in a throwaway container; -- --provider anthropic to skip selection). ensureOpencodeInstalled handles stale-checkpoint boxes (mirrors ensureCodexInstalled). apps/cli/src/commands/opencode.ts.
  • agentbox list / inspect — read from ~/.agentbox/state.json and cross-reference docker inspect for live state (running / paused / stopped / missing). inspect surfaces the claude tmux session status (running / not running) when the container is up.
  • agentbox pause / unpausedocker pause / docker unpause.
  • agentbox stop / startdocker stop / docker start. /workspace lives in the container's writable layer so it survives stop/start without any mount work; start only re-launches agentbox-ctl + dockerd + Xvnc (they die with the container). It first revalidates that each registered worktree's main .git/ still exists on the host (the bind-mount is baked in at create time; if the host dir was deleted while the box was stopped, restart would just produce an opaque mount error).
  • agentbox status / logs — proxy into the in-box agentbox-ctl via docker exec (see in-box-supervisor.md). status renders TASKS + SERVICES sections (the service row has a BLOCKED ON column for waiting services) and reports the claude tmux session state (via the claude-session wire op).
  • agentbox wait <box> — blocks until all autostart tasks + services are ready. Thin wrapper over the daemon's wait-ready op; useful for scripted readiness gates.
  • agentbox code <box> — opens VS Code or Cursor Desktop against the box via the Dev Containers extension. Both IDEs are supported transparently: by default the CLI prefers code and falls back to cursor if code isn't in PATH; pass --ide vscode / --ide cursor to force a flavor. Auto-unpauses paused boxes and starts stopped ones (re-launching the ctl/dockerd/Xvnc daemons). Waits for wait-ready (default 120s) unless --no-wait, then writes /workspace/.vscode/tasks.json (sentinel-protected; --regen-tasks to overwrite a user-owned file) so the IDE auto-opens terminal panels tailing each service's log. The launcher uses <cli> --folder-uri "vscode-remote://attached-container+<hex>/workspace" (Cursor inherits the vscode-remote:// scheme as a VS Code fork); --print returns the URI instead. Each box mounts both server volume sets so either IDE can attach to any box without recreating: per-box agentbox-vscode-server-<id> + agentbox-cursor-server-<id> (server binary + TS cache, ~70MB downloaded on first attach), plus the shared agentbox-vscode-extensions + agentbox-cursor-extensions volumes for downloaded extensions across all boxes.
  • agentbox shell <box> [-- <cmd>...] — interactive shell convenience: drops you into bash -l as vscode in /workspace. Auto-unpauses paused boxes and starts stopped ones (same recovery as agentbox code). By default the interactive shell runs inside a detachable tmux sessionCtrl+a d detaches without killing it (the same chord agentbox claude uses, via the shared buildTmuxSessionArgs in packages/sandbox-docker/src/claude.ts), and agentbox shell attach <box> reattaches. --no-tmux (config key shell.tmux: false) runs a plain docker exec shell with no session — closing the terminal kills it. One-shot -- <cmd> and any non-interactive/piped use are never tmux-wrapped — they stay on a plain docker exec (bash -l -c '<args joined>') so stdout stays machine-readable. --user <name> overrides the in-container user (the tmux server is per-user); --no-login invokes bash without -l. Forwards host TERM so truecolor/hyperlinks survive. The wrapped-pty footer shows the Control+a d: detach hint only for the tmux-backed shell — runWrappedAttach's detachable flag (default mode === 'claude') drives the chord + footer, decoupled from the mode label.
  • N shells per box — a box can hold multiple shell sessions, not just one (Claude stays exactly one agent per box). tmux is the single source of truth — there is no BoxRecord/state.json shell registry; everything is derived live from docker exec <box> tmux list-sessions. Shell sessions are tmux sessions named with a reserved shell prefix: shell (the default, agentbox shell), shell-2/shell-3/… (auto-numbered, agentbox shell --new), and shell-<label> (named, agentbox shell -n <label>). isShellSessionName/shellSessionName/shellLabel/allocateShellSessionName/parseShellSessionList/listShellSessions/killShellSession (all in packages/sandbox-docker/src/shell-session.ts) implement the prefix scheme — a pure string rule that never collides with the claude agent session or a *-dash grouped sibling. agentbox shell ls <box> lists a box's shells (label / attached / created); agentbox shell kill <box> -n <label> (or --all) drops one. agentbox shell / agentbox shell attach take -n <label> to target a specific shell; the detach notice gains a -n <label> suffix for non-default shells. agentbox list shows a SHELLS count column (listBoxes() runs listShellSessions per running box — [] for paused/stopped, no docker exec reach); agentbox status / agentbox status --inspect add a shells summary. The dashboard's [s] "open shell" action start-or-attaches the tracked tmux shell (so it's the same session the CLI sees) — a full per-box shell picker in the TUI is not built yet.
  • agentbox download [box] — box→host download of /workspace (gitignore-aware; --with-env / download env for gitignored env files). agentbox download config [box] — box→host download of just agentbox.yaml (gitignore-bypassing, fixed pattern ['agentbox.yaml']; thin specialization of the download env flow via pullToHost with respectGitignore: false; for syncing back an in-box-edited/regenerated config — apps/cli/src/commands/download-config.ts). agentbox download claude [box] — additive box→host download of Claude extensions (skills/, plugins/, agents/, commands/ under ~/.claude). Reads the claude-config volume via a throwaway helper container (the exact reverse of ensureClaudeVolume's forward sync), so the box need not be running. Never overwrites an item already on the host; excludes agentbox-* skills (the box-only agentbox-setup, image-seeded into the claude-config volume by seedSetupSkillIntoVolume — never written to the host's ~/.claude); the two plugin registry JSONs are merged host-side (only box-only keys added, the forward /home/vscode/.claude/plugins/ rewrite reversed). With the shared agentbox-claude-config volume the download aggregates extensions installed in any box (warned). packages/sandbox-docker/src/claude.ts pullClaudeExtras + pure helpers in claude-pull.ts (library symbols keep the pull name; the CLI verb is the only thing that renamed). agentbox download codex [box] — additive box→host download of Codex config (config.toml, auth.json, prompts/) from the codex-config volume into ~/.codex; never overwrites an existing host file (pullCodexConfig in codex.ts). agentbox download opencode [box] — additive box→host download of OpenCode auth.json (→ ~/.local/share/opencode) + opencode.json/agents/commands/themes (→ ~/.config/opencode) from the opencode-config volume (pullOpencodeConfig in opencode.ts).
  • agentbox cp <src> [dst] — file/dir copy between host and box, modeled on docker cp. Direction is picked by which arg carries a <name>: prefix (a : not preceded by /); both-sides or neither-side → usage error. Docker provider streams a tar pipe (host tar -cf -docker exec -i tar -xf -, packages/sandbox-docker/src/box-cp.ts) rather than buffering — execa's default 100 MB maxBuffer otherwise silently fails large copies with "tar: write error". On upload, a follow-up docker exec --user root chown -R 1000:1000 <dst> re-owns the landed files to vscode (best-effort). Heavy regenerable dirs (DEFAULT_CP_EXCLUDES: .git, node_modules, bin, obj, packages, dist, .next, target) are dropped by default → tar --exclude; keep them with --no-default-excludes, add more with repeatable --exclude=<glob|name>. Uploads whose post-exclude size exceeds box.cpMaxBytes (default 100 MiB) are blocked with a du-style tree of the biggest remaining folders/subfolders + a strategy (split per-folder / --exclude / --yes); --yes overrides (apps/cli/src/lib/dir-breakdown.ts:measureCopy). Auto-unpauses/starts the box if needed. Host path is optional on download (defaults to cwd); required on upload. Box ref accepts the usual id/prefix/name/container/project-index. Cloud providers (packages/sandbox-cloud/src/cloud-cp.ts) thread the same excludes through their tar -czf stage. The in-box agentbox-ctl cp toHost|fromHost forwards --exclude / --no-default-excludes / --yes to the host CLI through the relay (cp.* RPC). Implementation: apps/cli/src/commands/cp.ts, packages/sandbox-docker/src/box-cp.ts, packages/sandbox-cloud/src/cloud-cp.ts.
  • agentbox destroy — force-removes container + volumes + snapshot dir + per-box run dir (~/.agentbox/boxes/<id>/) + state record (prompts unless -y). Per-box claude-config, agentbox-vscode-server-<id>, and agentbox-cursor-server-<id> volumes are removed too; the shared agentbox-claude-config + agentbox-vscode-extensions + agentbox-cursor-extensions volumes are preserved. Each registered git worktree is removed from the host via git worktree remove --force before the box dir is wiped.
  • agentbox prune — drops missing state records; --all also reaps orphan agentbox-* containers / volumes / snapshot dirs (allowlists all three shared volumes — agentbox-claude-config, agentbox-vscode-extensions, agentbox-cursor-extensions — and per-box variants of either kind that belong to a surviving box). --all also sweeps the legacy agentbox-relay container + agentbox/relay:dev image + agentbox-net network left over from the old in-docker relay design.
  • agentbox relay status|stop|start|restart — manage the host relay process. status reads the pidfile + GETs /healthz and renders running / not-responding (zombie) / not-running; --json dumps the RelayStatus shape. stop / start / restart wrap stopRelay() / ensureRelay() (both idempotent — the same helpers self-update uses). Backed by getRelayStatus() in packages/sandbox-docker/src/relay.ts (re-exported from @agentbox/sandbox-docker); CLI in apps/cli/src/commands/relay.ts.
  • agentbox prepare — one-stop "set up the base image / show what's prepared" command. agentbox prepare (no args) prints a status table across all providers: docker's agentbox/box:dev image + the three shared docker volumes (agentbox-claude-config, agentbox-codex-config, agentbox-opencode-config), plus all daytona agentbox* snapshots (state / size / age / (pinned in project) marker) and agentbox* volumes — including the legacy per-agent ones that the daytona path no longer uses (visible reminder to clean them up via the Daytona dashboard). agentbox prepare --provider docker pre-builds the local Dockerfile.box image (idempotent). agentbox prepare --provider daytona [--name X] [-y] builds a layered Image.fromDockerfile().addLocalFile().runCommands() for the three host agent static tarballs and registers it as a named org-scoped snapshot via the documented daytona.snapshot.create({ name, image }) API (daytona.io/docs/en/snapshots), then pins box.image: <name> into the project config — subsequent agentbox create --provider daytona boots in seconds with the agent static config (plugins/skills/marketplaces/settings) already in place. Replaces the old agentbox daytona publish-snapshot (which used _experimental_createSnapshot, broken upstream).
  • agentbox self-update — self-updates the CLI then refreshes the local runtime. Detects how it was launched (apps/cli/src/exec-method.ts's detectExecutionMethod): npmnpm install -g @madarco/agentbox@latest, pnpmpnpm add -g @madarco/agentbox@latest, npx/direct (dev clone) → skip the package update with a note. Then best-effort docker image rm -f agentbox/box:dev (rebuilds lazily on next create/claude via ensureImage()) and reloads the relay via stopRelay(). The relay is only respawned in-process (ensureRelay()) when no self-update ran — after a real self-update this process is the stale build, so it just stops the relay and the next box command brings up the new one. -y skips the prompt, --dry-run previews, --skip-self does only the image+relay refresh. stopRelay lives in packages/sandbox-docker/src/relay.ts (reuses the existing pidfile helpers); removeImage in docker.ts.
  • Notion integration (relay-gated, host CLI)agentbox-ctl integration notion <op> and the in-box ntn / notion shims proxy a small allowlist of ops (whoami, read-only api passthrough, page.create, page.update) through the host relay to the host's authenticated ntn CLI. Reads pass through; writes prompt the host for approval (same askPrompt gate as git push / gh pr create). refuseUnsafeApiCall keeps api read-only: GET to any endpoint, plus the read-by-POST endpoints v1/search / v1/databases/{id}/query / v1/data_sources/{id}/query (JSON body via -d); every other method/endpoint — writes (v1/pages, v1/comments), -X PATCH/DELETE, and the host-file --input/--file body sources — is refused with exit 65. (Method is inferred from the body source, matching real ntn-d/inline, not gh-style -f/-F — so a body can't slip a write past the gate.) The box never holds a Notion token — printenv | grep -i notion inside a box returns nothing. Off by default — enable per project with agentbox config set --project integrations.notion.enabled true (typed config key integrations.notion.enabled in packages/config/src/types.ts); the relay re-reads the layered config on every call so a flag flip takes effect with no bounce, and a disabled integration is refused before any host process is touched. agentbox doctor reports each integration in a dedicated integrations: group — info for disabled, warn for not-installed / not-logged-in (with install/login hints from the connector descriptor), ok when authed. Connector descriptor lives in packages/integrations/src/connectors/notion.ts; the relay spine in packages/relay/src/integrations.ts (parseIntegrationMethod, assertIntegrationReady, refuseIfIntegrationDisabled, runHostIntegration) is dispatched identically by docker (server.ts) and cloud (host-actions.ts) per the "fix across all providers" rule. Adding a service (Linear / Trello / ClickUp) is one new descriptor file + a one-line registry add — no relay change. See integrations.md (design) and notion_backlog.md (per-task status).
  • Linear integration (relay-gated, host CLI)agentbox-ctl integration linear <op> and the in-box linear shim proxy a strict allowlist of ops (whoami, issue.list/issue.mine/issue.view/issue.query, team.list, query-only api GraphQL passthrough, issue.create/issue.update/issue.comment) through the host relay to the host's authenticated linear CLI (@schpet/linear-cli). Same gate model as Notion: reads pass through, writes prompt. The api op's refuseGraphqlNonQuery consumes value-bearing flag values (--variable, --variables-json) so a benign JSON payload isn't misread as a positional, rejects any GraphQL mutation / subscription operation with exit 65 (the GraphQL analogue of refuseApiNonGet), AND refuses --variable key=@<path> host-file loads (the @<path> syntax would let the box exfiltrate host files via GraphQL variables). linear auth token (which would print the raw API token to stdout) and auth login/logout/migrate/default are hard-rejected by the shim and absent from the connector allowlist — three defenses in series. issue delete / team delete / team create are off-list (destructive). issue.comment maps to linear issue comment add@schpet/linear-cli v2 uses add, not create. Connector descriptor at packages/integrations/src/connectors/linear.ts; shim at packages/sandbox-docker/scripts/linear-shim; typed flag integrations.linear.enabled (default false); doctor row is driven off ALL_CONNECTORS so the linear entry lights up automatically. See integrations.md (design) and linear_backlog.md (per-task status).
  • In-box agentbox-ctl git pull|push [-- <args>] (and any tool the agent runs that shells out via this command) — POSTs to the host relay's /rpc, which executes git on the host with the user's SSH agent + gitconfig. Commits made inside the box land in the host's main .git/ immediately (the .git/ is bind-mounted RW at its identical absolute path); git push is the only operation that needs host credentials, hence the RPC.
  • Browser support — Vercel's agent-browser is baked into the box image (npm install -g agent-browser). The Chromium binary that drives it is not Chrome for Testing (no Linux ARM64 build, and Noble's chromium-browser apt package is a snap stub that doesn't run in containers) — it's Playwright's Chromium, which has working linux/arm64 + linux/amd64 builds. It is not baked: ENV AGENT_BROWSER_EXECUTABLE_PATH=/usr/local/bin/chromium points at the chromium-resolver script (packages/sandbox-docker/scripts/chromium-resolver, installed at /usr/local/bin/chromium), which on first launch reuses the newest installed Playwright Chromium and otherwise runs playwright install chromium — preferring the project's pinned Playwright (/workspace/node_modules/.bin/playwright, so the build matches the project's own tests and they share one binary), else the box's global playwright as a fallback downloader. This avoids baking a version-pinned Chromium that goes stale the instant a project pins a different Playwright (the old bug: a baked build masqueraded in ~/.cache/ms-playwright, the project's playwright install fetched a different one, and agents waiting on the baked path hung). Chrome runtime libs (libnss3, libxkbcommon0, libcups2t64, etc. — Noble names with the t64 suffix where applicable) are installed once at image build. Agents inside the box invoke agent-browser directly; sessions/auth/cookies persist under ~/.agent-browser/ in the container's writable layer, so they survive pause/unpause and stop/start and are wiped on destroy. The flag --with-playwright on both agentbox create and agentbox claude additionally runs npm install -g @playwright/cli@latest inside the container at create time (recorded as BoxRecord.withPlaywright and surfaced in agentbox status --inspect) — a separate package from the playwright runtime baked into the image.
  • Web service port — every box reserves container :80 at create with an unconditional docker run -p 127.0.0.1:0:80 (immutable after docker run, so it's reserved up front even though the expose:-flagged service is usually only known after the in-box wizard writes agentbox.yaml). The ephemeral host port is resolved via docker port and persisted to BoxRecord.webHostPort (re-resolved on every startBox, like vncHostPort, since Docker reallocates it). getBoxEndpoints emits a kind: 'web' endpoint whose URL is the published loopback port (http://127.0.0.1:<webHostPort>) — uniform across engines, not OrbStack-dependent; it's the primary clickable link in agentbox list/status. Until a service declares expose: it renders as web reserved (...). The in-box :80 → expose.port forward is the supervisor-owned WebProxy (see in-box-supervisor.md). Pre-feature boxes (no BoxRecord.webContainerPort) have no reservation and are skipped by startBox — recreate to enable.
  • Portless web URLs (Docker Desktop) — Docker Desktop has no per-container DNS like OrbStack's .orb.local. The first-run prompt (Docker Desktop only, persisted to global config portless.enabled; --portless / --no-portless override) offers to set Portless up: maybePromptPortless (apps/cli/src/portless-prompt.ts) runs npm install -g portless if the CLI is missing, then starts the proxy with portless proxy start --no-tls -p 1355 — a high port + no TLS so it needs no root password and no CA-trust prompt. Consequence: box web apps are served at http://<box-name>.localhost:1355 (http, with a port). An already-running proxy (e.g. the user's own :443 one → https://<box-name>.localhost) is left alone and used as-is. When portless.enabled, createBox registers portless alias <box-name> <webHostPort> and resolves the real URL via portless get, storing it as BoxRecord.portlessUrl (alongside the route name BoxRecord.portlessAlias). startBox re-points the alias + refreshes the URL (Docker reallocates the host port each start); destroyBox removes the route. getBoxEndpoints + agentbox url surface portlessUrl on non-OrbStack engines (OrbStack keeps .orb.local; --loopback forces the 127.0.0.1 URL). All Portless interaction is best-effort — an install/start/alias failure degrades to a printed hint and never blocks create (packages/sandbox-docker/src/portless.ts). The box image ships the portless CLI (Node bumped 22→24 for it) and, when portless.enabled, bind-mounts the host Portless state dir (~/.portless, or the portless.stateDir config override) at /home/vscode/.portless with PORTLESS_STATE_DIR set, so the in-box portless list/get share the host route registry (discovery). In-box portless alias for arbitrary container ports does not work — those ports aren't host-published, so the host proxy can't reach them; only the box's published :80 web port (handled host-side above) is routable.
  • In-box browser on the Portless URL — agentbox screen (and the dashboard's Ctrl-a s) opens a headed Chromium inside the box (shown via VNC) on the box web app. When portless.enabled, that in-box browser loads the same <box-name>.localhost URL the host uses (BoxRecord.portlessUrl) — so the app is one origin from both sides (simplifies Next.js allowedDevOrigins, OAuth, CORS). Chromium hard-codes *.localhost → loopback and ignores /etc/hosts, so the box env (set at docker run by portlessBrowserEnv in portless.ts, when portless.enabled + non-OrbStack) carries AGENT_BROWSER_ARGS=--host-resolver-rules=MAP <box-name>.localhost host.docker.internal — remapping the box's hostname to the host gateway — plus AGENT_BROWSER_IGNORE_HTTPS_ERRORS=1 for a TLS host proxy's self-signed CA. The in-box request thus routes container → host Portless proxy → back into the box, hitting the exact host URL. screen.ts / dashboard openScreen pass record.portlessUrl as the ensureBoxBrowser target only when a web service is exposed (else about:blank / the loopback URL). agentbox url + Ctrl-a u already open portlessUrl host-side (the wrapped-pty shortcuts shell out to agentbox url/screen, so they inherit this).
  • Cloud signed preview URLs — for agentbox create --provider daytona boxes, agentbox url and agentbox screen open a signed preview URL with the auth token embedded in the host (https://{port}-{token}.proxy.daytona.work). The cloud provider's resolveUrl calls CloudBackend.signedPreviewUrl(port, ttl) (Daytona: sb.getSignedPreviewUrl(port, expiresInSeconds)) — distinct from the standard getPreviewLink URL used by the host poller for /bridge/* calls, which keeps the token in the x-daytona-preview-token header. Default TTL is 3600s; override with --ttl <seconds> (1–86400). --loopback is ignored for cloud boxes; the URL always goes through Daytona's proxy.
  • Cloud checkpoints — agentbox checkpoint create -n setup <cloud-box> captures the live Daytona sandbox's filesystem state into a named provider snapshot via sb._experimental_createSnapshot(), and persists a thin manifest under ~/.agentbox/cloud-checkpoints/<backend>/<projectHash-mnemonic>/<name>/manifest.json. Subsequent agentbox create --provider daytona --checkpoint setup (or box.defaultCheckpoint) provisions a new sandbox via client.create({ snapshot: <prefixed-name> }), skipping the workspace-seed step because /workspace is already in the snapshot. Snapshot names are project-prefixed (agentbox-ckpt-<hash>_<mnemonic>-<name>) so multiple projects in the same Daytona org don't collide. agentbox checkpoint ls merges Docker + cloud checkpoints in a single view. The wizard's "starting from checkpoint" announcement is provider-aware: a Docker setup checkpoint won't trigger a misleading skip when creating a cloud box.
  • Cloud editor attach — agentbox code <cloud-box> opens VS Code or Cursor with Remote-SSH attached to the Daytona sandbox's /workspace. --ide vscode (default) / --ide cursor selects the flavor — Claude Code users install the extension inside whichever IDE they pick and it Just Works once Remote-SSH is up. The CLI mints a fresh 60-min SSH token via provider.buildAttach(box, 'shell', { noTmux: true }) (which calls Daytona's createSshAccess(60)) and writes a managed block to ~/.ssh/config (# BEGIN agentbox cloud box <alias># END …) so the folder URI is the stable vscode-remote://ssh-remote+agentbox-cloud-<name>/workspace. UserKnownHostsFile /dev/null + LogLevel ERROR are pinned in the block (Vagrant-style — many sandboxes share one DNS name, so persistent known_hosts entries would generate false-positive HostKeyVerificationFailed). Token expiry is handled by re-invocation: re-run agentbox code and the block is rewritten with a fresh token; agentbox destroy removes the block. Auto-terminals (/workspace/.vscode/tasks.json) is docker-only for now — cloud users author their own.
  • Cloud noVNC — agentbox screen <cloud-box> works end-to-end against Daytona. The same VNC stack used by Docker (Xvnc + websockify + noVNC, all baked into Dockerfile.box) is launched inside the sandbox by launchCloudVncDaemon at create time and re-launched on every start (mirrors Docker's launchVncDaemon). A per-box vncPassword is generated host-side, persisted on the cloud BoxRecord, and threaded into the launcher via inline AGENTBOX_VNC_PASSWORD on backend.exec (so stop/start doesn't rely on Daytona preserving sandbox env). The signed preview URL is appended with /vnc.html?autoconnect=1&password=… so the browser jumps straight to the desktop. --no-vnc at create skips the daemon launch; agentbox screen refuses with the same "VNC is disabled" message Docker uses.
  • VNC web client — every box launches Xvnc (TigerVNC) on display :1 plus websockify serving noVNC at container port 6080, by default. ENV DISPLAY=:1 is baked into the image, so any GUI process started inside the box (Chromium via agent-browser, chromium from a shell) renders to that display. The image declares EXPOSE 6080, which is what makes OrbStack's <container-name>.orb.local auto-DNS route correctly. On OrbStack the URL is http://agentbox-<name>.orb.local:6080/vnc.html?autoconnect=1&password=<pw>; Docker Desktop hosts use the auto-allocated host port from docker run -p 127.0.0.1:0:6080 (resolved via docker port after runBox, persisted to BoxRecord.vncHostPort). The password is an 8-char [A-Za-z0-9] per-box random (BoxRecord.vncPassword) embedded in the auto-connect URL. The supervisor script is packages/sandbox-docker/scripts/agentbox-vnc-start — baked at /usr/local/bin/agentbox-vnc-start in the image, launched via docker exec -d --user vscode from launchVncDaemon() (mirrors launchCtlDaemon — best-effort, idempotent, relaunched by startBox() after stop/start because Xvnc dies with the container). Opt out with agentbox create --no-vnc / agentbox claude --no-vncBoxRecord.vncEnabled decides whether startBox re-runs the launch. agentbox screen opens this noVNC URL in the host browser (--loopback forces the 127.0.0.1 URL, --print prints it). Distinct from agentbox url, which opens the web app on container :80.
  • Box self-awareness — every box is stamped at docker run with AGENTBOX=1, AGENTBOX_BOX_ID, AGENTBOX_BOX_NAME, AGENTBOX_HOST_WORKSPACE (absolute host path of the workspace cwd — informational, not a mount), and (when the workspace has an agentbox.yaml ancestor) AGENTBOX_PROJECT_ROOT + AGENTBOX_PROJECT_INDEX. The same key=value pairs are written to /etc/agentbox/box.env after runBox (via writeBoxEnvFile in packages/sandbox-docker/src/box-env.ts); /etc/profile.d/agentbox.sh sources it so agentbox shell <box> and any bash -l see the vars even when launched outside docker-run's env. A short prose hint about sandbox constraints (DinD available, no SSH creds → use agentbox-ctl git pull|push) is baked at /etc/claude-code/CLAUDE.md — note that path is not currently a documented Claude Code load location, so today the hint is inspectable-only; in-box agents discover constraints via the env vars.
  • Docker-in-Docker — every box ships with docker.io + iptables and runs an in-box dockerd launched after the ctl daemon is up via launchDockerdDaemon() (mirrors launchVncDaemondocker exec -d --user root /usr/local/bin/agentbox-dockerd-start, best-effort, idempotent, relaunched by startBox() because dockerd dies with the container — gated on BoxRecord.dockerVolume so pre-DinD records don't try to start a daemon that isn't installed). The storage driver is selected at runtime by agentbox-dockerd-start, not pinned in the image: the script reuses the driver an already-populated /var/lib/docker was initialized with (dockerd can't switch a populated data root); on a fresh data root it probes the kernel-native overlay2 (mount a test overlay on the data-root filesystem + execve a binary from the merged dir — the exact fuse-overlayfs failure mode) and uses it if it works, else falls back to fuse-overlayfs. It writes the resolved storage-driver into /etc/docker/daemon.json before launching dockerd (the baked daemon.json carries only iptables: true). overlay2 is preferred because fuse-overlayfs is broken on recent kernels — on Docker Desktop's 6.x linuxkit kernel every inner docker run fails at execve() with exec ...: invalid argument. /dev/fuse, SYS_ADMIN, and apparmor:unconfined stay load-bearing: SYS_ADMIN for the overlay2 mount, the full set for the fuse-overlayfs fallback. The vscode user is added to the docker group at image build, so the agent invokes docker without sudo. The data root /var/lib/docker is the per-box named volume agentbox-docker-<id>, removed on destroy. Pass --shared-docker-cache (or set box.dockerCacheShared: true in any config layer) to swap to the shared agentbox-docker-cache volume — preserved on destroy and allowlisted in prune --all, but mutually exclusive at runtime (only one box can hold dockerd's lock on /var/lib/docker at a time). The outer container always gets --cap-add=NET_ADMIN, --security-opt=seccomp=unconfined, and --cgroupns=private (in addition to the existing SYS_ADMIN//dev/fuse/apparmor:unconfined); --privileged is not used, so the same container runs cloud-portably (E2B/Modal/etc. accept the cap_add path). Three non-obvious bits in agentbox-dockerd-start make this work on OrbStack and Docker Desktop: (1) mount -o remount,rw /sys/fs/cgroup because the outer engine bind-mounts cgroup v2 RO and dockerd has to mkdir /sys/fs/cgroup/docker for its own slice; (2) mount -o remount,rw /proc/sys because dockerd writes to /proc/sys/net/ipv6/conf/<veth>/disable_ipv6 during default-bridge setup, and /proc/sys is RO under the same hardening; (3) rm -f /var/run/docker.{pid,sock} before relaunch — /var/run is in the container's writable layer (not a volume), so a stale pidfile from before docker stop survives across start and dockerd refuses to launch ("PID still running" — that PID got reassigned to sleep infinity). Both remounts are SYS_ADMIN-gated and only affect the box's own namespaces, never the host. Cloud parity: Daytona boxes run the same agentbox-dockerd-start script via launchCloudDockerdDaemon (auto-invoked by cloudProvider.create() / start()), so docker works out of the box on the cloud path too — no opt-in command.

What's not built yet (don't claim it works)

  • Auto-refresh of the merged host export (inotify-driven agentbox open keeps ~/.agentbox/boxes/<id>/workspace in sync without manual refresh). Today refresh is on-demand only.
  • Codex in the dashboard (agentbox dashboard compositor) — the dashboard's footer/pane wiring is still claude-only; agentbox codex works standalone. (Claude Code + Codex + tmux + agent-browser are all baked into the image; VS Code Server is downloaded on first attach.)
  • Pre-warming the VS Code Server in the image (server version is keyed to host's VS Code Desktop version, so first attach to a fresh box still pays the ~70MB download; subsequent attaches to the same box are instant).
  • Auto-pause-on-idle / auto-stop policy.
  • Auto-refresh of the merged host export (inotify-driven agentbox open keeps ~/.agentbox/boxes/<id>/workspace in sync without manual refresh). Today refresh is on-demand only.
  • Exporting the container writable layer on destroy (--export <path> flag). The live merged export under ~/.agentbox/boxes/<id>/workspace is wiped with the box (use agentbox checkpoint create first if you want to keep the state).
  • Additional /rpc methods beyond git.pull / git.push / gh.pr.* / integration.<svc>.<op>. The dispatch is a single switch in packages/relay/src/server.ts — easy to extend (target ideas: git.fetch, npm.publish, anything else that needs host creds).
  • A user-facing agentbox events/agentbox notify CLI on top of the relay's ring buffer. Today you can agentbox-relay tail (against the host process at 127.0.0.1:8787) or tail -f ~/.agentbox/relay.log.
  • Event-buffer persistence (events are lost on relay restart; the token registry is rehydrated from state.json on next agentbox create, but historical events aren't).
  • Remote providers (E2B / Modal / Daytona / Vercel Sandbox).
  • Non-macOS host support for the snapshot path (cp -c is APFS-only; Linux fallback to rsync --exclude is TODO).