AGENTS.md
June 12, 2026 · View on GitHub
Coding-agent guidance
Style
Telegraph. Drop filler/grammar. Min tokens.
Critical Thinking
Fix root cause. Unsure: read more code; if stuck, ask w/ short options. Unrecognized changes: assume other agent; keep going. If causes issues, stop + ask.
Principles
- Always make sure you are working on top of latest main from remote. Especially in worktrees: fetch
origin/main, then rebase or recreate the worktree on top of it before editing. - Important decisions as comments on top of file
- Code testable, smoke testable, runnable locally
- Small, incremental PR-sized changes
- No backward compat needed (internal code)
- Write failing test before fixing bug
Specs
specs/ contains feature specifications. New code should comply with these or propose changes.
| Spec | Description |
|---|---|
| architecture | Core interpreter architecture, module structure |
| parser | Bash syntax parser design |
| vfs | Virtual filesystem abstraction |
| testing | Testing strategy and patterns |
| builtins | Builtin command design (trait, ShellRef, ExecutionPlan) |
| security-testing | Fail-point injection for security testing |
| threat-model | Security threats and mitigations |
| parallel-execution | Threading model, Arc usage |
| documentation | Rustdoc guides, embedded markdown |
| release-process | Version tagging, crates.io + PyPI + npm publishing |
| limitations | Negative spec: intentional gaps (L-* IDs), partial features, POSIX stance |
| tool-contract | Public LLM Tool trait contract |
| git-support | Sandboxed git operations on VFS |
| python-builtin | Embedded Python via Monty, security, resource limits |
| eval | LLM evaluation harness, dataset format, scoring |
| maintenance | Pre-release maintenance requirements |
| python-package | Python package, PyPI wheels, platform matrix |
| scripted-tool-orchestration | Compose ToolDef+callback pairs into OrchestratorTool via bash scripts |
| ssh-support | Sandboxed SSH/SCP/SFTP operations |
| zapcode-runtime | Embedded TypeScript via ZapCode, VFS bridging, resource limits |
| request-signing | Transparent Ed25519 request signing (bot-auth) per RFC 9421 |
| interactive-shell | Interactive REPL mode with rustyline line editing |
| sqlite-builtin | Embedded SQLite via Turso (MemoryIO + VfsIO backends, dot-commands) |
| coreutils-args-port | Codegen port of uutils clap definitions + uucore modules |
| credential-injection | Per-host HTTP credential injection without exposing secrets |
| performance-results | Benchmark/eval result locations and /benches site aggregation contract |
| emscripten-wheels | Reduced-feature Pyodide/Emscripten Python wheel |
Documentation
- Public docs live in
docs/— user-facing articles (security, guides, etc.) - Rustdoc guides live in
crates/bashkit/docs/as markdown files - Rustdoc guides embedded via
include_str!(seespecs/documentation.md) - Edit
crates/bashkit/docs/*.md, not the doc modules inlib.rs - Add "See also" cross-links when creating new guides
- Run
cargo doc --opento preview rustdoc changes
Bashkit Principles
- All design decisions in
specs/- no undocumented choices - Everything runnable and testable - no theoretical code
- Don't stop until e2e works - verify before declaring done
- Examples tested in CI - must pass
- No silent deferral -
TODOorWTFcomment with explanation - Verify crate assumptions before planning to use them
Cloud Agent Setup
./scripts/init-cloud-env.sh # Install just + gh
just build # Build project
Secrets via Doppler
DOPPLER_TOKEN is pre-configured. Fetch secrets with:
curl -s "https://api.doppler.com/v3/configs/config/secret?name=SECRET_NAME" \
-u "$DOPPLER_TOKEN:" | python3 -c "import sys,json; print(json.load(sys.stdin)['value']['raw'])"
Common secrets: GITHUB_TOKEN, ANTHROPIC_API_KEY
For gh CLI, set GH_TOKEN:
export GH_TOKEN=$(curl -s "https://api.doppler.com/v3/configs/config/secret?name=GITHUB_TOKEN" \
-u "$DOPPLER_TOKEN:" | python3 -c "import sys,json; print(json.load(sys.stdin)['value']['raw'])")
Local Dev
just --list # All commands
just build # Build
just test # Run tests
just check # fmt + clippy + test
just pre-pr # Pre-PR checks
Do not run cargo test --all-features as a single invocation. Not
exercised in CI; statically links every embedded interpreter and enables
failpoints (global state, needs --test-threads=1); the parallel link
step exceeds sandbox/cloud-runner memory and the supervisor kills the
shell. Use just check / just pre-pr, or slice by feature the way CI
does (see .github/workflows/ci.yml):
cargo test --workspace --lib --bins --tests --features http_client,ssh,sqlite
cargo test --workspace --doc --features http_client,ssh,sqlite
cargo test --features realfs -p bashkit --test realfs_tests -p bashkit-cli
cargo test --features failpoints --test security_failpoint_tests -- --test-threads=1
cargo test --test proptest_security -- --test-threads=1
cargo test --features ssh -p bashkit --test ssh_builtin_tests
Python is tested via pytest (.github/workflows/python.yml), TypeScript
via cargo run --example typescript_external_functions --features typescript.
Bashkit's integration tests live under crates/bashkit/tests/integration/
and are aggregated by crates/bashkit/tests/integration/main.rs into a single binary.
New behavioral tests go there. A small number of files stay as top-level
tests/*.rs because they need their own binary (process-global env
mutation, --test-threads=1, ssh-only feature isolation) — the list and
criteria live in specs/testing.md.
Rust
- Stable Rust, version pinned in
rust-toolchain.toml(bump deliberately; match the new version indtolnay/rust-toolchain@<version>refs across.github/workflows/*so CI can't be broken by a same-day rustc release) cargo fmtandcargo clippy -- -D warnings- License checks:
cargo deny check(seedeny.toml)
Stderr from builtins must not leak internal Debug shapes
TM-INF-022 in specs/threat-model.md. No {:?}/{:#?} in
crates/bashkit/src/builtins/; use Display or a domain formatter
(reference: format_compile_errors in builtins/jq/errors.rs); cap
diagnostics ≤ 1 KB; test-only Debug needs // debug-ok: <reason>.
Enforced by cargo test (static scan, per-tool assert_no_leak, fuzz
invariants) — see bashkit::testing rustdoc for the layers.
New library-wrapping builtin: add a no_leak_* test (see jq/tests.rs);
fuzz targets must use bashkit::testing::fuzz_exec(...), not bare
bash.exec(...).
Benches
Two distinct harnesses, two distinct result locations — keep them separate.
- Criterion benches for the
bashkitcrate live incrates/bashkit/benches/. Run viacargo bench --bench <name>orjust bench-parallel/just bench-sqlite. Historical results go incrates/bashkit/benches/results/, namedcriterion-<bench>-<moniker>-<timestamp>.md. Thebench-parallel.shandbench-sqlite.shscripts write there. bashkit-benchharness (bashkit vs real bash) lives incrates/bashkit-bench/. Run viajust bench. Historical results go incrates/bashkit-bench/results/, namedbench-<runner>-<moniker>-<timestamp>.{json,md}.
When adding a new criterion bench, save its first run under
crates/bashkit/benches/results/ so future runs have a baseline to diff against.
Do not mix criterion .md files into crates/bashkit-bench/results/.
Python
- Python package in
crates/bashkit-python/ - Linter/formatter:
ruff(config inpyproject.toml) ruff check crates/bashkit-pythonandruff format --check crates/bashkit-python- Tests:
pytest crates/bashkit-python/tests/ -v(requiresmaturin developfirst) - CI:
.github/workflows/python.yml(lint, test on 3.9/3.12/3.13, build wheel)
Pre-PR Checklist
just pre-pr(runs 2-4 automatically)cargo fmt --checkcargo clippy --all-targets --all-features -- -D warningsjust test(feature-sliced; nevercargo test --all-featuresin one invocation — see Local Dev)- Unit tests cover both positive (expected behavior) and negative (error handling, edge cases) scenarios
- Security tests if change touches user input, parsing, sandboxing, or permissions (see
specs/security-testing.md) - Compatibility/differential tests if change affects Bash behavior parity (compare against real Bash)
- Rebase on main:
git fetch origin main && git rebase origin/main(for worktrees: verify the worktreeHEADis on latestorigin/mainbefore editing) - Update specs if behavior changes
- CI green before merge
- Resolve all PR comments
cargo bench --bench parallel_executionif touching Arc/async/Interpreter/builtins (seespecs/parallel-execution.md)just bench-sqliteif touching the sqlite builtin or its VFS/IO bridge (seespecs/sqlite-builtin.md)just benchif changes might impact performance (interpreter, builtins, tools)ruff check crates/bashkit-python && ruff format --check crates/bashkit-pythonif touching Python code
CI
- GitHub Actions. Check via
ghtool. - NEVER merge when CI is red. No exceptions.
Commits
Conventional Commits: type(scope): description
Types: feat, fix, docs, refactor, test, chore
- Updates to
specs/andAGENTS.md: usechoretype - NEVER add links to Claude sessions in PR body or commits
Commit Attribution
All commits (incl. merge commits) attributed to the real human user —
never a bot/agent identity, no AI Co-authored-by trailers, no
"generated by" text. Verify git config user.name/user.email are
human before committing; if missing or bot-like, set from
$GIT_USER_NAME/$GIT_USER_EMAIL. If those are also missing, stop and
ask — never commit with a default/bot identity. Pre-push script warns on
bot-like author names.
PRs
Squash and Merge. Use PR template if exists.
NEVER add links to Claude sessions in PR body or commits. Never attribute commit or merge commit to coding agents, always use real user.
- Prefer small, shippable PRs. Split large changes into independent, reviewable units.
- When asked to create separate PRs, follow that instruction—do not bundle unrelated changes.
See CONTRIBUTING.md for details.