Chroot Integration Tests Analysis

February 25, 2026 ยท View on GitHub

This document provides a detailed analysis of all chroot integration test files in the gh-aw-firewall project, covering what each test validates, how it maps to real-world usage, and identifying gaps in coverage.

Table of Contents


Test Infrastructure Overview

Execution Model

All chroot tests use AwfRunner.runWithSudo() which invokes sudo -E node dist/cli.js with preserved environment variables (PATH, HOME, GOROOT, CARGO_HOME, JAVA_HOME, DOTNET_ROOT). Each invocation spins up a full Docker Compose stack (Squid proxy + agent container).

Batch Runner Optimization

Tests that share the same allowDomains config are batched into a single AWF container invocation using runBatch(). This concatenates commands into a single bash script with delimiter tokens, parsing per-command results from the combined output. This reduces ~73 container startups to ~27 across the suite.

Custom Matchers

  • toSucceed() - exit code 0
  • toFail() - non-zero exit code
  • toExitWithCode(n) - specific exit code
  • toAllowDomain(domain) / toBlockDomain(domain) - Squid log inspection

Chroot Architecture Under Test

The agent container mounts the host filesystem at /host, then calls chroot /host so all paths resolve naturally. Key features:

  • Selective path mounting (not full / mount by default)
  • Empty writable $HOME with specific subdirectory overlays
  • Dynamic /proc mount via mount -t proc (not static bind mount)
  • Capability drop (NET_ADMIN, SYS_CHROOT, SYS_ADMIN) before user code runs
  • UID/GID remapping to match host user

1. chroot-languages.test.ts

Purpose: Verifies that host-installed language runtimes are accessible through the chroot filesystem. Critical for GitHub Actions runners where tools are pre-installed at the host level.

Test Cases

Batched Quick Checks (single container invocation)

TestCommandWhat It Validates
Python versionpython3 --versionPython3 binary accessible via chroot PATH
Python inlinepython3 -c "print(2 + 2)"Python interpreter executes inline scripts
Python stdlibpython3 -c "import json, os, sys; ..."Python standard library modules load correctly
pip versionpip3 --versionpip package manager accessible
Node.js versionnode --versionNode.js binary accessible
Node.js inlinenode -e "console.log(2 + 2)"Node.js evaluates inline JS
Node.js modulesnode -e "require('os').platform()"Node.js built-in modules resolve
npm versionnpm --versionnpm binary accessible
npx versionnpx --versionnpx binary accessible
Go versiongo versionGo binary accessible
Go envgo env GOVERSIONGo environment properly configured
Java versionjava --versionJDK accessible (fallback: java -version)
.NET versiondotnet --version.NET SDK accessible
.NET infodotnet --info.NET runtime information available
Unix utilswhich bash && which ls && which catCore Unix utilities accessible
Git versiongit --versionGit binary accessible
curl versioncurl --versioncurl binary accessible

Individual Tests (separate containers)

TestWhat It Validates
Java compile + runCreates Hello.java, compiles with javac, runs with java - validates full JDK toolchain
Java stdlib (java.util)Compiles and runs code using java.util.Arrays and java.util.List
.NET create + rundotnet new console + dotnet restore + dotnet run - validates full SDK workflow (requires NuGet domains)

Real-World Mapping

Test AreaReal-World Scenario
PythonClaude/Copilot agents installing Python packages, running Python scripts in AI-generated code
Node.js/npmCopilot CLI itself is a Node.js tool; agents run npm install, build JS projects
GoAgents building Go projects (common in GitHub Actions context)
JavaAgents compiling Java projects with Maven/Gradle (enterprise workflows)
.NETAgents building .NET projects, NuGet restore for dependencies
GitEvery agent workflow uses git (clone, commit, push)
curlAgents fetching APIs, downloading artifacts

Gaps and Missing Coverage

  1. No Rust compile test - Rust is tested in package-managers but only for cargo --version and rustc --version. No cargo build or rustc compile test exists here, despite Rust being a primary language for AWF users.

  2. No Python virtual environment test - Real agents frequently create venvs (python3 -m venv). The chroot filesystem might not handle venv creation correctly (symlinks, activation scripts).

  3. No TypeScript compilation test - tsc or tsx are common in agent workflows but never tested.

  4. No Bun runtime test - Bun is explicitly supported in entrypoint.sh (AWF_BUN_INSTALL) but has no corresponding test.

  5. No multi-language interaction test - Real agents often chain languages (e.g., Python script calling a Node.js tool), which could fail if PATH ordering is wrong.

  6. No dynamic library loading test - Tests only check binary execution. Shared library loading (ld.so.cache, /lib64/) is implicitly tested but not explicitly verified.

  7. Java version check uses fallback pattern - java --version 2>&1 || java -version 2>&1 catches both formats, but doesn't verify which Java version is found (could pick up wrong JDK).

  8. Soft failures on network tests - .NET test uses if (result.success) guard, meaning the test passes even if .NET can't reach NuGet. This hides real failures.


2. chroot-package-managers.test.ts

Purpose: Validates that package managers can perform network operations through the firewall with proper domain whitelisting. Tests both online (with allowed domains) and offline behaviors.

Test Cases

pip (Python)

TestDomains AllowedWhat It Validates
pip listpypi.org, files.pythonhosted.orgLists installed packages (verifies pip can read local package DB)
pip index versionspypi.org, files.pythonhosted.orgQueries PyPI registry through firewall
pip show piplocalhost onlyShows package info without network (offline capability)

npm (Node.js)

TestDomains AllowedWhat It Validates
npm config listregistry.npmjs.orgnpm configuration accessible
npm view chalk versionregistry.npmjs.orgnpm queries registry through firewall
npm view (blocked)localhost onlynpm registry access is blocked without domain whitelisting

Rust (cargo)

TestDomains AllowedWhat It Validates
cargo versioncrates.io, static.crates.io, index.crates.ioCargo binary accessible via chroot
cargo search serdecrates.io, static.crates.io, index.crates.ioCargo can search crates.io through firewall
rustc versionlocalhost onlyrustc binary accessible (offline)

Java (maven)

TestDomains AllowedWhat It Validates
java versionlocalhost onlyJava runtime accessible
javac versionlocalhost onlyJava compiler accessible
mvn versionrepo.maven.apache.org, repo1.maven.orgMaven binary accessible with repository domains

.NET (dotnet/nuget)

TestDomains AllowedWhat It Validates
dotnet list-sdkslocalhost onlySDK listing works offline
dotnet list-runtimeslocalhost onlyRuntime listing works offline
dotnet create + buildapi.nuget.org, nuget.org, dotnetcli.azureedge.netFull project lifecycle with NuGet restore
dotnet restore (blocked)localhost onlyNuGet restore fails without domain whitelisting

Ruby (gem/bundler)

TestDomains AllowedWhat It Validates
ruby versionlocalhost onlyRuby binary accessible
gem list (local)localhost onlyLists locally installed gems
gem versionrubygems.org, index.rubygems.orggem binary accessible with registry domains
bundler versionrubygems.org, index.rubygems.orgBundler binary accessible
gem search railsrubygems.org, index.rubygems.orggem can search rubygems.org through firewall

Go modules

TestDomains AllowedWhat It Validates
go env GOPATH GOPROXYproxy.golang.org, sum.golang.orgGo module proxy configuration correct
go mod init + tidylocalhost onlyGo module initialization works offline

Real-World Mapping

Test AreaReal-World Scenario
pip + PyPICopilot/Claude agents running pip install for Python dependencies in AI-generated code
npm + registryAgents running npm install for JS/TS projects; Copilot CLI itself needs npm
cargo + crates.ioAgents building Rust projects, adding dependencies with cargo add
mavenAgents building Java enterprise projects with Maven
dotnet + NuGetAgents building .NET projects, adding NuGet packages
gem + rubygemsAgents working with Ruby projects, installing gems
go modulesAgents working with Go projects, fetching module dependencies
Blocking testsEnsures firewall actually blocks unauthorized network access - critical security property

Gaps and Missing Coverage

  1. No pip install test - Tests query PyPI index but never actually install a package. pip install requests through the firewall would be a more realistic test.

  2. No npm install test - Tests npm view but never npm install. Real agents always install packages.

  3. No cargo build/add test - Tests cargo search but never cargo add or cargo build with dependencies.

  4. No Gradle test - Maven is tested but Gradle (also very common in Java) is completely absent. entrypoint.sh even pre-seeds ~/.gradle/gradle.properties for proxy config but this is never tested.

  5. No sbt/Scala test - JVM proxy flags are set via JAVA_TOOL_OPTIONS for sbt but never tested.

  6. No pip blocking test - npm and .NET have explicit "blocked without domain" tests, but pip does not. There's no test verifying that pip install fails when PyPI is not whitelisted.

  7. No cargo blocking test - Same gap as pip - no test verifying cargo is blocked without crates.io domains.

  8. No gem install test - Tests gem search but never gem install. Real-world Ruby workflows install gems.

  9. Soft failure pattern - Multiple tests use if (result.exitCode === 0) or if (result.success) guards, meaning the test passes even on failure. This is appropriate for CI flakiness tolerance but masks real regressions.

  10. No proxy configuration verification - Tests verify tools can reach registries but don't verify proxy env vars are correctly set. A test checking echo $HTTP_PROXY would confirm proxy configuration.


3. chroot-edge-cases.test.ts

Purpose: Validates edge cases, security features, error handling, and shell compatibility within the chroot environment.

Test Cases

General Checks (batched)

TestCommandWhat It Validates
PATH preservedecho $PATHPATH includes /usr/bin and /bin
HOME setecho $HOMEHOME env var points to a valid path
/usr readablels /usr/binHost /usr/bin accessible through chroot
/etc readablecat /etc/passwdHost /etc/passwd accessible (contains "root")
/tmp writableWrite + read + delete in /tmpTemp directory is writable
Docker socket hiddenCheck /var/run/docker.sockDocker socket is NOT accessible (security)
NET_ADMIN droppediptables -LCannot list iptables rules (permission denied)
chroot preventedchroot / /bin/trueCannot use chroot command (capability dropped)
Shell pipesecho "hello" | grep helloPipe operator works in chroot
Shell redirectWrite via > and read backRedirection works in chroot
Command substitutionecho "Today is $(date +%Y)"$() substitution works
Compound commandsecho "first" && echo "second" && echo "third"&& chaining works
Non-root userid -uUID is not 0 (running as non-root)
Username setwhoamiUsername is not "root"

Working Directory Handling (individual tests)

TestWhat It Validates
Respect container-workdirpwd with containerWorkDir: '/tmp' returns /tmp
Fallback for nonexistent dirpwd with nonexistent containerWorkDir falls back to home

Exit Code Propagation (individual tests)

TestWhat It Validates
Exit code 0exit 0 propagates correctly
Exit code 1exit 1 propagates correctly
Failed commandfalse returns exit code 1
Command not foundnonexistent_command_xyz123 returns exit code 127

Network Firewall Enforcement (individual tests)

TestWhat It Validates
Allow HTTPScurl -s -o /dev/null -w "%{http_code}" https://api.github.com succeeds with whitelisted domain
Block HTTPScurl -s --connect-timeout 5 https://example.com fails when example.com not whitelisted
Block HTTPcurl -f --connect-timeout 5 http://example.com fails when example.com not whitelisted

Real-World Mapping

Test AreaReal-World Scenario
PATH/HOMEEvery agent command depends on correct environment variables
/usr, /etc accessAgents need host binaries and system configs
/tmp writableBuild tools, compilers, and agents use temp files extensively
Docker socket hiddenPrevents agents from escaping the firewall by spawning unrestricted containers
Capability dropPrevents agents from modifying iptables to bypass firewall
Shell featuresAgents execute complex shell commands with pipes, redirects, and substitution
Non-root executionSecurity requirement - agents must not run as root
Working directory--container-workdir sets where agent commands execute (typically the repo checkout)
Exit codesAWF must faithfully propagate agent exit codes for CI/CD pass/fail determination
Network enforcementCore firewall functionality - allow whitelisted, block everything else

Gaps and Missing Coverage

  1. No --env passthrough test - Test for custom environment variables is explicitly skipped (test.skip). This is a significant gap since --env is a real CLI feature.

  2. No SYS_ADMIN capability drop test - Tests verify NET_ADMIN and SYS_CHROOT are dropped but don't test SYS_ADMIN (which is dropped in chroot mode per entrypoint.sh).

  3. No signal handling test - No test for SIGTERM/SIGINT propagation. The entrypoint has explicit signal handling (trap cleanup_and_exit TERM INT) but this is never tested.

  4. No symlink resolution test - Chroot mode relies on symlinks (e.g., /lib -> /lib/x86_64-linux-gnu). No test verifies symlinks work correctly.

  5. No large output test - No test for commands producing large stdout/stderr, which could test buffer handling.

  6. No credential hiding test - The selective mounting hides credential files via /dev/null overlays, but no test verifies that cat ~/.docker/config.json or cat ~/.ssh/id_rsa returns empty/fails.

  7. No DNS resolution test - DNS configuration is complex in chroot mode (resolv.conf backup/restore, Docker embedded DNS + external DNS). No test verifies DNS queries resolve correctly.

  8. No concurrent process test - No test running multiple processes simultaneously in the chroot, which could reveal issues with /proc, temp files, or resource sharing.

  9. No exit code for signals - Tests check exit codes 0, 1, and 127, but not 128+N signal exit codes (e.g., 143 for SIGTERM).

  10. No timeout propagation test - No test verifying that AWF's timeout mechanism works and propagates correctly.


4. chroot-copilot-home.test.ts

Purpose: Verifies that the GitHub Copilot CLI can access and write to ~/.copilot directory in chroot mode. Essential for package extraction, configuration storage, and log management.

Test Cases (all batched, single container)

TestCommandWhat It Validates
Write to ~/.copilotCreate dir + write file + read backBasic write access to ~/.copilot
Nested directoriesCreate ~/.copilot/pkg/linux-x64/0.0.405/marker.txtDeep directory creation (mimics Copilot package extraction)
Permissionstouch + rm in ~/.copilotFile creation and deletion work (correct ownership)

Real-World Mapping

TestReal-World Scenario
Write fileCopilot CLI writes configuration files on first run
Nested directoriesCopilot CLI extracts bundled packages to ~/.copilot/pkg/<platform>/<version>/
PermissionsCopilot CLI needs to manage its own files (create, update, delete)

Gaps and Missing Coverage

  1. No file persistence test - Tests write and read within the same invocation. No test verifies files persist between AWF invocations (which they should, as ~/.copilot is bind-mounted from host).

  2. No ~/.copilot/logs test - Copilot CLI writes logs to ~/.copilot/logs/ which is separately mounted (${config.workDir}/agent-logs:${effectiveHome}/.copilot/logs:rw). No test verifies log writing works.

  3. No ownership/UID test - Files should be owned by the AWF user (not root). No test checks ls -la ~/.copilot/test/file.txt for correct ownership.

  4. No concurrent write test - No test for atomic file writes (important for config files).

  5. No symlink within ~/.copilot test - Copilot may create symlinks; no test verifies this works.

  6. No .claude.json creation test - entrypoint.sh creates ~/.claude.json when CLAUDE_CODE_API_KEY_HELPER is set. This is never tested.

  7. No other home subdirectory tests - ~/.cache, ~/.config, ~/.local, ~/.anthropic, ~/.claude are all mounted but only ~/.copilot is tested for write access.


5. chroot-procfs.test.ts

Purpose: Validates the dynamic /proc filesystem mount in chroot mode. This is a regression test for commit dda7c67 which replaced a static /proc/self bind mount with mount -t proc.

Background

Without the dynamic proc mount:

  • .NET CLR fails: "Cannot execute dotnet when renamed to bash"
  • JVM misreads /proc/self/exe and /proc/cpuinfo
  • Rustup proxy binaries appear as bash instead of the actual binary

Test Cases

Batch 1: Quick /proc checks (single container)

TestCommandWhat It Validates
/proc/self/exe resolvesreadlink /proc/self/exeReturns a real path (not "bash")
Different binaries differbash -c "readlink ..." vs python3 -c "readlink ..."Different binaries see different /proc/self/exe
/proc/cpuinfocat /proc/cpuinfo | head -10CPU info accessible (needed by JVM, .NET GC)
/proc/meminfocat /proc/meminfo | head -5Memory info accessible (needed by JVM, .NET GC)
/proc/self/statuscat /proc/self/status | head -5Process status accessible

Batch 2: Java /proc tests (single container)

TestCommandWhat It Validates
Java reads /proc/self/exeJava program reads /proc/self/exe via Files.readSymbolicLinkJVM sees itself as "java", not "bash"
Java availableProcessorsJava program reads Runtime.availableProcessors()JVM correctly reads /proc/cpuinfo for CPU count

Real-World Mapping

TestReal-World Scenario
/proc/self/exe resolution.NET CLR reads /proc/self/exe to find itself (required for startup). JVM reads it for identity. Rustup proxy reads it to determine which tool to invoke.
/proc/cpuinfoJVM uses CPU count for thread pool sizing. .NET GC uses it for heap sizing.
/proc/meminfoJVM and .NET use memory info for heap/GC configuration.
Different binary resolutionEnsures the procfs mount is truly dynamic (not cached from parent shell)
Java /proc/self/exeSpecific regression test - JVM was misidentifying itself as bash, causing startup issues

Gaps and Missing Coverage

  1. No .NET /proc/self/exe test - .NET was the original motivation for the fix, but only Java has a /proc/self/exe verification test. A dotnet program reading /proc/self/exe would be valuable.

  2. No Rust/rustup /proc/self/exe test - Rustup proxies use /proc/self/exe to determine which tool to invoke. No test verifies this.

  3. No /proc/self/environ test - The one-shot-token security feature unsets sensitive tokens from /proc/1/environ. No test verifies tokens are actually cleared.

  4. No /proc/self/maps test - Some runtimes read memory maps; not tested.

  5. No /proc isolation test - The dynamic proc mount should be container-scoped (only container processes visible). No test verifies that host PIDs are NOT visible.

  6. No /proc/self/fd test - File descriptor access via /proc is used by some tools; not tested.

  7. No Node.js /proc test - Node.js uses /proc for certain operations (e.g., process.memoryUsage(), os.cpus()). No test verifies Node's /proc access.

  8. Soft failure pattern on Java tests - Both Java /proc tests use if (r.exitCode === 0) guard, meaning they pass even if Java compilation fails.


Cross-File Gap Analysis

High-Priority Missing Tests

GapSeverityAffected Scenarios
Credential hiding verificationCriticalNo test verifies /dev/null overlays on ~/.docker/config.json, ~/.ssh/id_rsa, etc. Prompt injection defense is untested.
Signal handling (SIGTERM/SIGINT)HighNo test for graceful shutdown and cleanup. Real AWF runs in CI with timeout which sends SIGTERM.
DNS resolution in chrootHighComplex DNS setup (resolv.conf backup/restore, Docker embedded DNS) is completely untested.
Package installation (pip/npm/cargo)HighTests only query registries but never install packages. Real agents install packages constantly.
--env passthroughMediumSkipped test. Custom env vars are a core feature for passing API keys to agents.
One-shot token protectionMedium/proc/1/environ token clearing is never tested. Security feature with no regression test.
Bun runtimeMediumExplicitly supported in entrypoint.sh but never tested.
Gradle build toolMediumProxy config pre-seeded by entrypoint.sh but never tested.
~/.claude.json creationMediumCreated by entrypoint.sh for Claude Code API auth but never tested.

Test Pattern Issues

  1. Soft failure masking - Many tests use if (result.success) or if (r.exitCode === 0) guards that silently pass on failure. While appropriate for CI flakiness, these should at minimum log a warning when the underlying check is skipped.

  2. No negative security tests - Security features (capability drop, Docker socket hiding, credential hiding) lack comprehensive negative testing. Only NET_ADMIN and SYS_CHROOT drops are verified.

  3. No cleanup verification - entrypoint.sh has extensive cleanup logic (resolv.conf restoration, hosts file cleanup, script file deletion). None of this is tested.

  4. No --mount custom volume test - Custom volume mounts passed via --mount flag are never tested in chroot context.

  1. Credential exfiltration test - Verify cat ~/.docker/config.json, cat ~/.ssh/id_rsa, cat ~/.config/gh/hosts.yml all return empty or fail.
  2. Package install test - pip install requests, npm install chalk, cargo add serde through the firewall.
  3. DNS resolution test - nslookup github.com or dig github.com inside the chroot.
  4. Signal propagation test - Send SIGTERM to AWF process, verify cleanup runs.
  5. --env passthrough test - Pass custom env var, verify it's accessible in chroot.
  6. Token clearing test - Verify /proc/1/environ doesn't contain sensitive tokens after agent starts.
  7. Bun runtime test - bun --version and bun run inside chroot.
  8. Gradle proxy test - Verify ~/.gradle/gradle.properties contains proxy settings.
  9. .claude.json test - Set CLAUDE_CODE_API_KEY_HELPER, verify file is created correctly.
  10. Home subdirectory write tests - Verify ~/.cache, ~/.config, ~/.local are writable.