Sandbox backends
May 21, 2026 · View on GitHub
Bernstein isolates every spawned agent in a sandbox so multiple agents running against the same repository cannot stomp on each other's files, processes, or secrets. Historically the only sandbox type was a local git worktree. The choice of sandbox is now pluggable - agents can run inside worktrees, Docker containers, E2B microVMs, Modal sandboxes, or any backend a plugin author registers.
This document covers:
- The
SandboxBackend/SandboxSessionprotocol and theWorkspaceManifest/SandboxCapabilityvalue objects - The four first-party backends (
worktree,docker,e2b,modal) - The
bernstein.sandbox_backendsentry-point group for third-party backends
Protocol shape
The protocol lives in src/bernstein/core/sandbox/:
from bernstein.core.sandbox import (
SandboxBackend,
SandboxSession,
SandboxCapability,
WorkspaceManifest,
GitRepoEntry,
FileEntry,
ExecResult,
get_backend,
list_backends,
register_backend,
)
SandboxBackend
A runtime_checkable Protocol. Every backend exposes:
name: str- canonical identifier referenced fromplan.yaml.capabilities: frozenset[SandboxCapability]- feature flags.async def create(manifest, options=None) -> SandboxSession- provision a fresh sandbox.async def resume(snapshot_id) -> SandboxSession- restore a snapshot; raisesNotImplementedErrorif the backend does not declareSandboxCapability.SNAPSHOT.async def destroy(session) -> None- tear down a session.
SandboxSession
An ABC with six abstract methods:
read(path) -> byteswrite(path, data, *, mode=0o644) -> Noneexec(cmd, *, cwd=None, env=None, timeout=None, stdin=None) -> ExecResultls(path) -> list[str]snapshot() -> str(SNAPSHOT-capable backends only)shutdown() -> None(idempotent)
ExecResult is a frozen dataclass with exit_code, stdout,
stderr, and duration_seconds.
SandboxCapability
An StrEnum with six values: FILE_RW, EXEC, NETWORK, GPU,
SNAPSHOT, PERSISTENT_VOLUMES. Every backend advertises the set
it supports; schedulers reject manifests requiring capabilities the
selected backend does not expose.
WorkspaceManifest
Immutable value object passed to SandboxBackend.create:
@dataclass(frozen=True)
class WorkspaceManifest:
root: str = "/workspace"
repo: GitRepoEntry | None = None
files: tuple[FileEntry, ...] = ()
env: Mapping[str, str] = field(default_factory=dict)
timeout_seconds: int = 1800
GitRepoEntry and FileEntry are companion frozen dataclasses.
Cloud-specific mount entries (S3, persistent volumes, secrets
manager bindings) are intentionally deferred to the storage-sinks
work.
First-party backends
| Backend | Ships in | capabilities | Notes |
|---|---|---|---|
worktree | core | FILE_RW, EXEC, NETWORK | Wraps the existing WorktreeManager. Zero behaviour change. Default. |
docker | core | FILE_RW, EXEC, NETWORK | Launches a container per session via the docker Python SDK. Needs pip install bernstein[docker]. |
e2b | [e2b] extra | FILE_RW, EXEC, NETWORK, SNAPSHOT | Runs in E2B Firecracker microVMs. Needs pip install bernstein[e2b] plus E2B_API_KEY. |
modal | [modal] extra | FILE_RW, EXEC, NETWORK, SNAPSHOT, GPU | Serverless containers with optional GPU. Needs pip install bernstein[modal] plus MODAL_TOKEN_ID / MODAL_TOKEN_SECRET. |
Trade-offs
- Latency.
worktreehas no provisioning cost;dockeradds a one-time pull plus ≤ 2 s container start;e2b/modaladd 1–3 s of cold start per session plus provider-side overhead. - Cost.
worktreeanddockerare free (local compute).e2bbills by sandbox minute.modalbills by compute seconds, with optional GPU surcharges. - Isolation.
worktreeshares the host filesystem and network;dockerprovides cgroup + namespace isolation but shares the kernel;e2bruns in a fresh Firecracker microVM per session;modalruns in dedicated serverless containers. - Capabilities. Only
e2bandmodalsupport snapshot/resume; onlymodalexposes GPU today. - Supported exec semantics. All four backends handle argv-based exec with exit-code, stdout, and stderr capture.
plan.yaml extension
stages:
- name: risky-execution
sandbox:
backend: docker # worktree (default), docker, e2b, modal, or a plugin name
options:
image: python:3.13-slim
memory_mb: 2048
timeout_seconds: 1800
steps:
- title: "Run untrusted code analysis"
role: security
cli: claude
sandbox: is entirely optional. When omitted the stage runs in the
worktree backend - byte-identical to the pre-pluggable-sandbox
behaviour.
Registering a custom backend
Plugin authors declare an entry point in their own pyproject.toml:
[project.entry-points."bernstein.sandbox_backends"]
mybackend = "my_package.sandbox:MySandboxBackend"
On next process start the registry picks the entry up automatically.
bernstein agents sandbox-backends lists every installed backend
with its capability set so operators can verify registration.
Third-party backends must:
- Provide
nameandcapabilitiesclass attributes. - Implement
create,resume, anddestroyas coroutines. - Pass the conformance suite at
bernstein.core.sandbox.conformance.SandboxBackendConformance. - Import provider SDKs lazily (inside methods or behind
TYPE_CHECKING) so importing the backend module never crashes on a missing SDK.
Integration
SandboxBackend/SandboxSession/SandboxCapability/WorkspaceManifestlive insrc/bernstein/core/sandbox/.- Four first-party backends ship (worktree & docker in core; e2b & modal as optional extras).
AgentSpawneraccepts an optionalsandbox_sessionparameter; whenNoneit falls back to the direct-worktree path.bernstein agents sandbox-backendslists installed backends.plan.yamlaccepts an optionalsandbox:block per stage.
Observability
Each backend create/destroy cycle emits WAL + Prometheus metrics:
sandbox_session_created{backend=..., session_id=...}sandbox_session_destroyed{backend=..., duration_seconds=...}sandbox_exec_count{backend=..., exit_code=...}
Conformance
SandboxBackendConformance (in
src/bernstein/core/sandbox/conformance.py) is a parametrised pytest
class any backend can subclass to get a complete protocol test
coverage suite. Backends declaring SANDBOX_CAPABILITY.SNAPSHOT
additionally get the snapshot/resume round-trip test automatically.
The worktree backend runs the conformance suite in unit tests
(tests/unit/sandbox/test_backend_protocol.py). Docker / E2B /
Modal conformance lives under tests/integration/sandbox/; those
tests auto-skip without a live daemon or provider credentials.
Security considerations
worktreedoes not isolate at the kernel level. If you need to run untrusted code you must choose a sandboxed backend.dockershould be run withnetwork_disabled=Truefor untrusted workloads; the default leaves network enabled because most agent tasks legitimately need outbound HTTP.e2bandmodalrun untrusted code by design; their isolation posture is the provider's responsibility.- Snapshot IDs are opaque to callers but may contain sensitive state. Do not log them at INFO level without redaction.