Design Note: Exec Tool
June 16, 2026 · View on GitHub
- Status: Draft
- Tracking issue: to be filed
- Author: @shangdinggu
- Last updated: 2026-05-08
- Builds on:
0005-capability-model.md,0008-agent-sandbox.md,0021-tool-dispatch.md
This is the most dangerous tool the kernel ships. Letting an LLM agent run arbitrary binaries on the host is exactly the surface that historically turns research papers into security incidents. This RFC defines the boundary that makes it survivable for our single-user, single-host threat model:
- No shell. Ever.
shell=Truenever appears. Args are a list, never a string. Eliminates command injection entirely. - Absolute-path binaries only.
argv[0]must start with/and passkernel.cap.check_fs(pid, argv[0], "r"). No$PATHlookup; no relative paths; no symlink chasing beyond what the kernel already permits. - Env scrub. The child process gets a fixed safe-list
(
PATH,HOME,LANG,USER,TERM,LC_ALL) plus whatever the caller explicitly passes inargs.env. Inherited secrets (ANTHROPIC_API_KEY,AWS_*,GITHUB_TOKEN…) are dropped. - RLIMIT enforced. Wraps existing
run_sandboxedfrom RFC 0008. Default policy:- cpu_seconds = 60
- memory_bytes = 512 MB
- fsize_bytes = 64 MB
- nofile = 256
- wall_seconds = 60 (configurable)
- Output cap. stdout / stderr each truncated to 256 KB by
default (configurable). The tool returns
stdout_truncated=Trueso the LLM knows. - Capability-gated, opt-in. The tool is NOT registered
by
register_builtin_tools. Operators must callregister_exec_tool(registry)explicitly. Agents must have"Exec"in theirtool_grants.
The tool is named Exec (not Bash, not Shell) to make
the argv-only contract self-evident in code.
1. Threat model
In scope (the tool defends against):
- Command injection via shell metachars. Eliminated by
shell=False+ argv list.Exec(["echo", "; rm -rf /"])literally runsechowith one argument; the semicolon is not interpreted. - PATH manipulation. The agent can't say
argv=["malicious"]and have the supervisor find it on a weird path. argv[0] must be absolute. - Symlink-into-secret. fs_grants enforces what paths are
reachable; agent can't argv[0]=
/etc/passwdto read it (also: it's not an executable). And reading via tools is what theReadtool is for, not Exec. - Env exfiltration. Default scrub drops the daemon's
secrets. The agent can't
Exec(["sh", "-c", "echo $ANTHROPIC_API_KEY"])(no shell anyway, but even viaExec(["env"])the secret isn't in the env it sees). - Resource exhaustion. RLIMIT_AS bounds memory; RLIMIT_CPU bounds CPU time; wall_seconds bounds wall-clock; nofile bounds fd count; fsize bounds output file growth.
- Long-running command stuck. Wall-clock killer thread SIGTERMs then SIGKILLs after grace.
- Output bomb. stdout/stderr are tail-truncated; no unbounded buffering.
Out of scope:
- Trusted-but-buggy binary. If the agent is allowed to run
/usr/bin/gitand/usr/bin/gititself has a remote-code-execution bug, this RFC doesn't help. The capability model can revokeExecfrom a misbehaving agent, but the binary's own correctness is its problem. - Network egress from the child. The child inherits the
daemon's network stack.
Exec(["curl", "evil.com"])works if curl is in fs_grants and the daemon has internet. RFC 0008 bubblewrap covers this for runner subprocesses but not for one-shot tool exec; future RFC may add--unshare-netto Exec when bubblewrap is available. - Timing-channel side-channels. Out of scope for v1.
- Truly untrusted code. If you don't trust the LLM, don't
give it
Exec. Capability is the gate. - Untrusted /usr/bin. The kernel assumes
/usr/bin/grepis what it claims to be. Defense against tampered system binaries is OS-level, not kernel-level.
2. Tool specification
Name
Exec
Args
{
"argv": ["/usr/bin/grep", "-n", "TODO", "/path/to/file"],
"cwd": "/some/dir", // optional; absolute; readable per fs_grants
"env": {"FOO": "bar"}, // optional additive env (after scrub)
"timeout_s": 60, // optional, default 60, max 600
"max_output_bytes": 262144 // optional, default 256 KB, max 4 MB per stream
}
Validation
argv: non-empty list of non-empty strings.argv[0]: absolute path;Path(argv[0]).is_file()must be True before dispatch; agent's fs_grants must include"r"on it (handler callskernel.cap.check_fsdirectly).cwd: optional; if set, must be absolute and a directory; fs_grants must cover it (mode "r").env: dict[str, str]. Keys starting with_rejected (reserved for kernel use). Values must be strings (no None, no leakage of complex types).timeout_s: 1 ≤ x ≤ 600.max_output_bytes: 1 KB ≤ x ≤ 4 MB.
Result
{
"exit_code": 0,
"stdout": "...",
"stderr": "...",
"stdout_truncated": false,
"stderr_truncated": false,
"duration_s": 0.123,
"timed_out": false
}
stdout/stderr are decoded as UTF-8 with errors="replace" — Exec is text-oriented; binary output isn't expected. (Tools that need binary output should write to a file via Write tool first.)
Capability requirements
tool_grantsmust include"Exec".fs_grantsmust include("r", argv[0])(binary must be readable per the agent's grants).fs_grantsmust include("r", cwd)ifcwdis set.- The
requires_fsfield on the Tool registration is empty; the handler does its own fs check (because the args_key has to extract a list element, not a top-level field).
3. Env scrubbing
Default exposed env:
| Key | Value |
|---|---|
PATH | /usr/local/bin:/usr/bin:/bin |
HOME | /tmp |
LANG | C.UTF-8 |
LC_ALL | C.UTF-8 |
USER | (process owner, e.g. cheetah) |
TERM | dumb |
SHELL | /bin/sh |
The caller's env arg is merged on top. Caller can
override PATH etc. (e.g. to give a tool a richer PATH if
needed) but cannot bypass the scrub for unset keys.
Anything in os.environ not in the safe-list and not in
args.env is dropped. This is the secret-leak defence.
Reserved key prefix:
_*rejected byargs.envvalidation.CC_*env keys (kernel-internal) are NOT auto-passed; the caller must explicitly add them viaargs.env.
4. Sandbox policy
The handler builds a SandboxPolicy:
policy = SandboxPolicy(
cpu_seconds = max(timeout_s, 1),
memory_bytes = 512 * 1024 * 1024, # 512 MB
fsize_bytes = 64 * 1024 * 1024, # 64 MB
nofile = 256,
wall_seconds = float(timeout_s),
new_session = True,
)
This is passed to run_sandboxed(...) from RFC 0008. The
runtime guarantees stay the same: RLIMIT in the child via
preexec_fn, wall-clock killer in the parent, SIGTERM → 1s
grace → SIGKILL on the process group.
bubblewrap is not auto-applied for v1 — Exec runs in the
supervisor's namespace. A future RFC may add
use_bubblewrap=True to constrain network / filesystem
further when running tools.
5. Audit
The supervisor's existing tool-dispatch audit (RFC 0021 §5)
emits tool.call.dispatched / tool.call.denied events.
Exec doesn't add new event kinds; the events' payload.tool
is "Exec" and payload.args includes the argv list, so
operators can grep the event log for what was run.
6. Backwards compatibility
- Strictly additive new file
kernel/tools/exec_tool.py. register_builtin_toolsis unchanged; existing setups with no Exec capability see no new behaviour.register_exec_tool(registry)is the explicit opt-in.
7. Open questions
- Should
stdinbe supported? v1 says no — the agent's text-only IPC model doesn't have a clean way to pass binary stdin. A future RFC may addargs.stdin. - Should we run via bubblewrap when available? Lean yes for v1.1, but it's a separate threat-model decision (we'd want net deny by default, bind_rw on a scratch dir). Keep simple for now.
- Output-stream interleaving. v1 returns stdout and stderr separately; an interleaved "what came out when" view is a future enhancement.
8. Acceptance criteria
A PR claiming this RFC must:
register_exec_toolis NOT called byregister_builtin_tools; Exec is opt-in.- Argv validation rejects: non-list, list with non-str, empty list, relative path argv[0], non-existent argv[0].
shell=Trueis never used; metachars in args don't shell-expand:Exec(["/bin/echo", "; pwd"])outputs"; pwd".- capability denied (no
"Exec"in tool_grants) → tool_response.error=permission_denied. - fs denied on argv[0] → handler raises ToolFsDenied → tool_response.error=fs_denied.
- env scrub: a child started without
args.envand withANTHROPIC_API_KEYset in the parent does not see it. - timeout:
Exec(["/bin/sleep", "10"], timeout_s=1)returns timed_out=True in under 5 seconds. - output cap: a binary that prints 1 MB of stdout under
max_output_bytes=1024returns stdout_truncated=True with stdout length ≤ 1024. - End-to-end via runner_main + supervisor + dispatched audit event present.
- No file outside
kernel/,tests/,docs/RFC/modified.