Dive into Claude Code

June 19, 2026 · View on GitHub

High-level system structure of Claude Code

Paper arXiv License Stars

English | 中文

A comprehensive source-level architectural analysis of Claude Code (v2.1.88, ~1,900 TypeScript files, ~512K lines of code), combined with a curated collection of community analyses, a design-space guide for agent builders, and cross-system comparisons.

Tip

TL;DR -- Only 1.6% of Claude Code's codebase is AI decision logic. The other 98.4% is deterministic infrastructure -- permission gates, context management, tool routing, and recovery logic. The agent loop is a simple while-loop; the real engineering complexity lives in the systems around it. This repo dissects that architecture and distills it into actionable design guidance for anyone building AI agent systems.


Table of Contents

From Our Paper

Beyond the Paper


Key Highlights

  • 98.4% Infrastructure, 1.6% AI -- The agent loop is a simple while-loop; the real complexity is permission gates, context management, and recovery logic.
  • 5 Values → 13 Principles → Implementation -- Every design choice traces back to human authority, safety, reliability, capability, and adaptability.
  • Defense in Depth with Shared Failure Modes -- 7 safety layers, but all share performance constraints. 50+ subcommands bypass security analysis.
  • 4 CVEs Reveal a Pre-Trust Window -- Extensions execute before the trust dialog appears.
  • The Cross-Cutting Harness Resists Reimplementation -- The loop is easy to copy; hooks, classifier, compaction, and isolation are not.

Reading Guide

If you are a...Start hereThen read
Agent BuilderBuild Your Own AgentArchitecture Deep Dive
Security ResearcherSafety and PermissionsArchitecture: Safety Layers
Product ManagerKey HighlightsValues and Principles
ResearcherFull Paper (arXiv)Community Resources

1,884 files · ~512K lines · v2.1.88 · 7 safety layers · 5 compaction stages · 54 tools · 27 hook events · 4 extension mechanisms · 7 permission modes


Architecture at a Glance

Claude Code answers four design questions that every production coding agent must face:

QuestionClaude Code's Answer
Where does reasoning live?Model reasons; harness enforces. ~1.6% AI, 98.4% infrastructure.
How many execution engines?One queryLoop for all interfaces (CLI, SDK, IDE).
Default safety posture?Deny-first: deny > ask > allow. Strictest rule wins.
Binding resource constraint?~200K (older models) / 1M (Claude 4.6 series) context window. 5 compaction layers before every model call.

The system decomposes into 7 components (User → Interfaces → Agent Loop → Permission System → Tools → State & Persistence → Execution Environment) across 5 architectural layers.

5-layer subsystem decomposition

Note

For the full architectural deep dive -- 7 safety layers, 9-step turn pipeline, 5-layer compaction, and more -- see docs/architecture.md.

↑ Back to top


Values and Design Principles

The architecture traces from 5 human values through 13 design principles to implementation:

ValueCore Idea
Human Decision AuthorityHumans retain control via principal hierarchy. When a 93% prompt-approval rate revealed approval fatigue, response was restructured boundaries, not more warnings.
Safety, Security, PrivacySystem protects even when human vigilance lapses. 7 independent safety layers.
Reliable ExecutionDoes what was meant. Gather-act-verify loop. Graceful recovery.
Capability Amplification"A Unix utility, not a product." 98.4% is deterministic infrastructure enabling the model.
Contextual AdaptabilityCLAUDE.md hierarchy, graduated extensibility, trust trajectories that evolve over time.
The 13 Design Principles
PrincipleDesign Question
Deny-first with human escalationShould unrecognized actions be allowed, blocked, or escalated?
Graduated trust spectrumFixed permission level, or spectrum users traverse over time?
Defense in depthSingle safety boundary, or multiple overlapping ones?
Externalized programmable policyHardcoded policy, or externalized configs with lifecycle hooks?
Context as scarce resourceSingle-pass truncation or graduated pipeline?
Append-only durable stateMutable state, snapshots, or append-only logs?
Minimal scaffolding, maximal harnessInvest in scaffolding or operational infrastructure?
Values over rulesRigid procedures or contextual judgment with deterministic guardrails?
Composable multi-mechanism extensibilityOne API or layered mechanisms at different costs?
Reversibility-weighted risk assessmentSame oversight for all, or lighter for reversible actions?
Transparent file-based config and memoryOpaque DB, embeddings, or user-visible files?
Isolated subagent boundariesShared context/permissions, or isolation?
Graceful recovery and resilienceFail hard, or recover silently?

The paper also applies a sixth evaluative lens -- long-term capability preservation -- citing evidence that developers in AI-assisted conditions score 17% lower on comprehension tests.

↑ Back to top


The Agentic Query Loop

Runtime turn flow

The core is a ReAct-pattern while-loop: assemble context → call model → dispatch tools → check permissions → execute → repeat. Implemented as an AsyncGenerator yielding streaming events.

Before every model call, five compaction shapers run sequentially (cheapest first): Budget Reduction → Snip → Microcompact → Context Collapse → Auto-Compact.

9-step pipeline per turn: Settings resolution → State init → Context assembly → 5 pre-model shapers → Model call → Tool dispatch → Permission gate → Tool execution → Stop condition

Two execution paths:

  • StreamingToolExecutor -- begins executing tools as they stream in (latency optimization)
  • Fallback runTools -- classifies tools as concurrent-safe or exclusive

Recovery: Max output token escalation (3 retries), reactive compaction (once per turn), prompt-too-long handling, streaming fallback, fallback model

5 stop conditions: No tool use, max turns, context overflow, hook intervention, explicit abort

↑ Back to top


Safety and Permissions

Permission gate

7 permission modes form a graduated trust spectrum: plandefaultacceptEditsauto (ML classifier) → dontAskbypassPermissions (+ internal bubble).

Deny-first: A broad deny always overrides a narrow allow. 7 independent safety layers from tool pre-filtering through shell sandboxing to hook interception. Permissions are never restored on resume -- trust is re-established per session.

Warning

Shared failure modes: Defense-in-depth degrades when layers share constraints. Per-subcommand parsing causes event-loop starvation -- commands exceeding 50 subcommands bypass security analysis entirely to prevent the REPL from freezing.

More details: authorization pipeline, auto-mode classifier, CVEs

Authorization pipeline: Pre-filtering (strip denied tools) → PreToolUse hooks → Deny-first rule evaluation → Permission handler (4 branches: coordinator, swarm worker, speculative classifier, interactive)

Auto-mode classifier (yoloClassifier.ts): Separate LLM call with internal/external permission templates. Two-stage: fast-filter + chain-of-thought.

Pre-trust execution window: 2 patched CVEs share this root cause -- hooks and MCP servers execute during initialization before the trust dialog appears, creating a structurally privileged attack window outside the deny-first pipeline.

↑ Back to top


Extensibility

Three injection points: assemble, model, execute

Four mechanisms at graduated context costs: Hooks (zero) → Skills (low) → Plugins (medium) → MCP (high). Three injection points in the agent loop: assemble() (what the model sees), model() (what it can reach), execute() (whether/how actions run).

Tool pool assembly (5-step): Base enumeration (up to 54 tools) → Mode filtering → Deny pre-filtering → MCP integration → Deduplication

27 hook events across 5 categories with 4 execution types (shell, LLM-evaluated, webhook, subagent verifier)

Plugin manifest accepts 10 component types: commands, agents, skills, hooks, MCP servers, LSP servers, output styles, channels, settings, user config

Skills: SKILL.md with 15+ YAML frontmatter fields. Key difference -- SkillTool injects into current context; AgentTool spawns isolated context.

↑ Back to top


Context and Memory

Context construction

9 ordered sources build the context window. CLAUDE.md instructions are delivered as user context (probabilistic compliance), not system prompt (deterministic). Memory is file-based (no vector DB) -- fully inspectable, editable, version-controllable.

4-level CLAUDE.md hierarchy: Managed (/etc/) → User (~/.claude/) → Project (CLAUDE.md, .claude/rules/) → Local (CLAUDE.local.md, gitignored)

5-layer compaction (graduated lazy-degradation): Budget reduction → Snip → Microcompact → Context Collapse (read-time projection, non-destructive) → Auto-Compact (full model summary, last resort)

Memory retrieval: LLM-based scan of memory-file headers, selects up to 5 relevant files. No embeddings, no vector similarity.

↑ Back to top


Subagent Delegation

Subagent architecture

6 built-in types (Explore, Plan, General-purpose, Guide, Verification, Statusline) + custom agents via .claude/agents/*.md. Sidechain transcripts: only summaries return to parent (parent's context is protected from subagent verbosity). Three isolation modes: worktree, remote, in-process. Coordination via POSIX flock().

SkillTool vs AgentTool: SkillTool injects into current context (cheap). AgentTool spawns isolated context (expensive, but prevents context explosion).

Permission override: Subagent permissionMode applies UNLESS parent is in bypassPermissions/acceptEdits/auto (explicit user decisions always take precedence).

Custom agents: YAML frontmatter supports tools, disallowedTools, model, effort, permissionMode, mcpServers, hooks, maxTurns, skills, memory scope, background flag, isolation mode.

↑ Back to top


Session Persistence

Session persistence and context compaction

Three channels: append-only JSONL transcripts, global prompt history, subagent sidechains. Permissions never restored on resume -- trust is re-established per session. Design favors auditability over query power.

Chain patching: Compact boundaries record headUuid/anchorUuid/tailUuid. The session loader patches the message chain at read time. Nothing is destructively edited on disk.

Checkpoints: File-history checkpoints for --rewind-files, stored at ~/.claude/file-history/<sessionId>/.

↑ Back to top


New Signals in the Agent Design Space

New agent-system developments reinforce the same lesson surfaced by Claude Code: agent capability is not a model property alone. It emerges from the runtime, context layer, execution boundary, tool supply chain, human control surface, and evaluation loop around the model.

Design ImplicationWhat it means for agent buildersRepresentative signals
Runtime and control plane are first-class design concernsDurable execution, checkpoints, sandboxes, agent inventory, policy, and observability should be designed as user-visible system surfaces, not hidden deployment plumbing.Cursor cloud agents, Google Managed Agents, Microsoft Agent 365
Context is managed infrastructurePrompts, files, skills, IDE indexes, workspace state, memory namespaces, and interpreter state need lifecycle, provenance, review, and rollback.LangChain Context Hub, AWS AgentCore, Anthropic managed-agent memory
Execution boundary is the safety boundaryPermissions, network reachability, filesystem access, credential custody, tenant isolation, and OS sandboxing are core architecture, not late-stage hardening.Codex Windows sandbox, Running Codex safely, Anthropic self-hosted sandboxes
Tools and skills are a supply chainMCP servers, skills, plugins, and agent-to-agent protocols need registries, allowlists, identity, semantic review, versioning, and revocation.NSA MCP security, GitHub MCP allowlists, A2A milestone
Humans become managers and verifiersAgent products should support goals, plans, approvals, interrupts, reviewable diffs, escalation, and constrained multi-agent write authority.Codex from anywhere, Copilot cloud agent, Cognition multi-agents
Observability must close the improvement loopTraces should feed evaluation, failure clustering, policy enforcement, and prompt/tool repair rather than ending as passive logs.LangSmith Engine, OpenAI agent improvement loop, AWS AgentCore Evaluations

These signals do not replace Claude Code's design space; they make its boundaries clearer. The agent loop is the small part. The harness around it is where most capability, safety, and reliability decisions now live. For month-level source notes, see docs/agent-design-space-source-notes_zh.md.

↑ Back to top


Build Your Own AI Agent: A Design Guide

Not a coding tutorial. A guide to the design decisions you must make, derived from architectural analysis.

Every production agent must navigate these decisions:

DecisionThe QuestionKey Insight
Reasoning placementHow much logic in the model vs. harness?As models converge in capability, the harness becomes the differentiator.
Safety postureHow do you prevent harmful actions?Defense-in-depth fails when layers share failure modes.
Context managementWhat does the model see?Design for context scarcity from day one. Graduated > single-pass.
ExtensibilityHow do extensions plug in?Not all extensions need to consume context tokens.
Subagent architectureShared or isolated context?Agent teams in plan mode cost ~7× tokens. Subagent summary-only returns prevent context blow-up.
Session persistenceWhat carries over?Never restore permissions on resume. Auditability > query power.

Read the full guide: docs/build-your-own-agent.md

↑ Back to top


Cross-System Comparison: Claude Code vs OpenClaw vs Hermes-Agent

The same recurring design questions admit different architectural answers when the deployment context changes. The table below contrasts Claude Code v2.1.88 with two notable peers — OpenClaw, a local-first multi-channel personal-assistant gateway, and NousResearch/hermes-agent, a self-improving multi-deployment agent — across the six design dimensions Section 10 of the paper uses for the OpenClaw comparison. Cells are source-grounded; this is not a feature scoreboard.

Design DimensionClaude Code (v2.1.88) StarOpenClaw StarHermes-Agent Star
System scope & deploymentPer-user CLI / SDK / IDE interface for coding; one queryLoop async generator across entry points.Local-first WebSocket gateway (default port 18789, loopback-bound by default; other binds available); routes ~23 messaging surfaces to an embedded agent runtime; companion apps for macOS, iOS, Android.Three entry points: hermes (interactive CLI), hermes-agent (programmatic runtime), hermes-acp (ACP server); gateway adapters route messages to per-session AIAgent instances cached LRU-style (max 128, 1 h idle TTL); also runs as MCP server via hermes mcp serve.
Trust model & securityDeny-first per-action evaluation; 7 permission modes; LLM-based auto-mode classifier (yoloClassifier / sideQuery); session-scoped permission state (session bypass flag, app allowlist state) is not restored on resume.Single trusted operator per gateway; DM pairing codes, sender allowlists, gateway authentication; per-agent allow / deny tool policy; opt-in sandboxing via Docker / SSH / OpenShell, off by default; non-main mode sandboxes only non-main sessions; hostile multi-tenant isolation explicitly not supported.Dangerous-command pattern detection with per-session approval state; CLI interactive prompts and gateway async prompts; auxiliary-LLM smart approval auto-approves low-risk commands; permanent allowlist persisted in config.yaml; subagent worker threads default to auto-deny dangerous commands (opt-in subagent_auto_approve for batch / cron runs).
Agent runtime & toolsSingle queryLoop async generator with streamed event yields; environment- and feature-gated tool registry; before-API compaction (Snip, Microcompact, Context Collapse, Auto-Compact) runs conditionally, with Auto-Compact first attempting session-memory compaction.Embedded agent runtime inside the gateway's RPC dispatch (the agent RPC validates parameters, accepts immediately, runs asynchronously, and streams lifecycle / stream events back over the gateway protocol); per-session queue serialization with an optional global lane.While-loop with explicit per-turn iteration budget and grace-call slot; per-turn checkpoint dedup; gateway step_callback hook fires on each iteration; auxiliary-model context compression summarizes middle turns while protecting head and tail.
Extension architectureFour mechanisms at graduated context cost: hooks → skills → plugins → MCP; 27 hook events; 10 plugin component types.Manifest-first plugin system with 12 documented capability categories; central registry exposes tools, channels, provider setup, hooks, HTTP routes, CLI commands, services; separate skills layer with multiple sources (workspace highest precedence) plus the ClawHub public registry; openclaw mcp provides both an MCP server surface and an outbound client registry for other MCP servers.12 bundled plugins under plugins/ (context_engine, disk-cleanup, example-dashboard, google_meet, hermes-achievements, image_gen, kanban, memory, observability, platforms, spotify, strike-freedom-cockpit); MCP server (mcp_serve.py) exposes 10 tools; ACP adapter (acp_adapter/) exposes Hermes as an ACP server.
Memory & context4-level CLAUDE.md hierarchy; before-API compaction (Snip, Microcompact, Context Collapse, Auto-Compact); LLM-based selection from file-based Markdown memory files.Workspace bootstrap files (AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md) plus conditional BOOTSTRAP.md / HEARTBEAT.md / MEMORY.md; separate memory system (MEMORY.md, daily notes under memory/YYYY-MM-DD.md, optional DREAMS.md); hybrid vector + keyword search when an embedding provider is configured; experimental dreaming for long-term promotion; pluggable compaction providers.SQLite state store with FTS5 full-text search and WAL-mode concurrent readers; sessions linked by parent_session_id chains for compression-triggered splits; 8 swappable memory backends under plugins/memory/ (byterover, hindsight, holographic, honcho, mem0, openviking, retaindb, supermemory); auxiliary-LLM compression as a separate context-management layer.
Multi-agent architectureSub-agent delegation via sidechain transcripts; 6 built-in agent definitions (availability conditional on build / mode) plus custom; a single summary message returns to parent (in-process / viewable transcript cases preserve more internal detail); agent-isolation settings include worktree and remote, with an in-process teammate backend in the swarm path.Two layers. (1) Multi-agent routing: per-channel isolated agents with their own workspace, auth profiles, session store, and model configuration, dispatched via deterministic binding rules. (2) Sub-agent delegation: maxSpawnDepth range 1–5, default 1, recommended 2; tool policy varies by depth; project vision (VISION.md) rejects agent-hierarchy frameworks as the default.delegate_task tool spawns child AIAgent instances in a ThreadPoolExecutor (parent blocks until children complete); each child has fresh conversation history, its own task_id, and a restricted toolset (DELEGATE_BLOCKED_TOOLS strips delegate_task, clarify, memory, send_message, execute_code); default depth MAX_DEPTH = 1 (configurable up to cap 3); default 3 concurrent children.

What this contrast reveals. Three observations follow from the table. First, deployment context drives the rest of the design: a per-user coding CLI converges on per-action approval and a single execution loop, a multi-channel gateway converges on perimeter trust and channel-bound agents, and a multi-deployment messaging-and-cloud agent converges on opt-in container/cloud isolation, an LLM-based smart approval, and a swappable-backend memory layer. Second, the extension layer is where each system most clearly differentiates: Claude Code stratifies four mechanisms by context cost, OpenClaw treats extension as registry-managed capabilities at the gateway, and Hermes-Agent ships bundled plugins plus dual MCP server / ACP server surfaces other agents can connect to. Third, memory architectures sit on a spectrum: file-based and inspectable Markdown (Claude Code), file-based plus optional vector + experimental dreaming (OpenClaw), or full-text indexed (FTS5) plus eight swappable plugin backends including dedicated vector / RAG providers (Hermes-Agent). The table is best read not as a scoreboard but as three different fixed points in the same design space.

↑ Back to top


Community Projects & Research

A curated map of the repos, reimplementations, and academic papers surrounding Claude Code's architecture.

Official Anthropic Resources

Primary sources referenced throughout the paper — Anthropic's own engineering and research publications, plus product documentation.

Research & Engineering Blogs

ArticleTopic
Building Effective AgentsFoundational: simple composable patterns over heavy frameworks.
Effective Context Engineering for AI AgentsContext curation and token-budget management.
Prompt Caching with ClaudeCache reads at 10% cost, writes at 125%; 5-min default TTL. The platform feature that makes Claude Code's cache-aware compaction architecturally meaningful.
Harness Design for Long-Running Application DevelopmentHarness architecture for autonomous full-stack dev; multi-agent patterns.
Claude Code Auto Mode: A Safer Way to Skip PermissionsML-classifier approval automation; source of the 93% approval-rate finding.
Beyond Permission Prompts: Making Claude Code More Secure and AutonomousSandbox-based security; 84% reduction in permission prompts.
How We Contain Claude Across ProductsContainment across claude.ai, Claude Code, and Cowork (May 2026); Claude Code's human-in-the-loop sandbox, approval fatigue, and capping the blast radius.
Measuring AI Agent Autonomy in PracticeLongitudinal usage: auto-approve rates grow from ~20% to 40%+ with experience.
Our Framework for Developing Safe and Trustworthy AgentsGovernance framework for responsible agent deployment.
When AI Builds ItselfAnthropic Institute on recursive self-improvement: AI accelerating AI development, the direction-setting and research-taste gaps, and governance scenarios.
Scaling Managed Agents: Decoupling the Brain from the HandsHosted-service architecture separating reasoning, execution, and session.
An Update on Recent Claude Code Quality ReportsPostmortem on three bugs behind perceived quality drops: a reasoning-effort default, a cache optimization bug, and a system-prompt change.
Introducing Claude Opus 4.8May 2026 model update: sharper judgment and honesty (~4x fewer unremarked code flaws), longer autonomous runs; introduces dynamic workflows in research preview.
Claude Fable 5 and Claude Mythos 5June 2026 Mythos-class tier sitting above Opus; Fable 5 is the general-use configuration (risky queries fall back to Opus 4.8), with state-of-the-art software-engineering and agentic-coding performance. Access was suspended globally on June 12, 2026 (see next row).
Statement on Suspending Access to Fable 5 and Mythos 5Anthropic's statement on suspending Fable 5 and Mythos 5. A US export-control directive (June 12, 2026) restricted access for foreign nationals, but Anthropic disabled both models for all users worldwide, just days after launch. A rare case of regulation forcing a deployed frontier model offline, and a concrete example of the compliance and safety pressures that agent systems face in deployment.

Product Documentation

DocumentTopic
How Claude Code WorksOfficial overview of the agent loop, tools, and terminal automation.
PermissionsTiered permission system, modes, granular rules.
Hooks27-event hook reference, execution models, lifecycle events.
MemoryCLAUDE.md hierarchy, auto memory, learned preferences.
Sub-agentsSpecialized isolated assistants, custom prompts, tool access.
Orchestrate Subagents at Scale with Dynamic WorkflowsClaude writes a JavaScript orchestration script; a background runtime fans out to up to 1,000 subagents, with intermediate state held in script variables outside the context window (v2.1.154+, research preview).
What's New in Claude Opus 4.8Mid-conversation system messages (prompt-cache-preserving), lower cacheable-prompt minimum, fewer compactions and better compaction recovery.
Claude Code CHANGELOGRelease notes; dynamic workflows and Opus 4.8 land in v2.1.154.

Architecture Analysis

Deep dives into Claude Code's internal design.

RepositoryDescription
ComeOnOliver/claude-code-analysis StarComprehensive reverse-engineering: source tree structure, module boundaries, tool inventories, and architectural patterns.
alejandrobalderas/claude-code-from-source Star18-chapter technical book (~400 pages). All original pseudocode, no proprietary source.
liuup/claude-code-analysis StarChinese-language deep-dive — startup flow, query main loop, MCP integration, multi-agent architecture.
sanbuphy/claude-code-source-code StarQuadrilingual analysis (EN/JA/KO/ZH) — multi-domain reports covering telemetry, codenames, KAIROS, unreleased tools.
cablate/claude-code-research StarIndependent research on internals, Agent SDK, and related tooling.
Yuyz0112/claude-code-reverse StarVisualize Claude Code's LLM interactions — log parser and visual tool to trace prompts, tool calls, and compaction.
Piebald-AI/claude-code-system-prompts StarVersion-tracked prompt corpus across 170+ Claude Code releases — main system prompt, builtin tool descriptions, sub-agent prompts (Plan/Explore/Task), and ~40 system reminders. Updated within minutes of each release.

Open-Source Reimplementations

Clean-room rewrites and buildable research forks.

RepositoryDescription
chauncygu/collection-claude-code-source-code StarMeta-collection of community Claude Code source artifacts -- includes claw-code (Rust port), nano-claude-code (Python), and the extracted original source archive.
777genius/claude-code-working StarWorking reverse-engineered CLI. Runnable with Bun, 450+ chunk files, 31 feature flags polyfilled.
T-Lab-CUHKSZ/claude-code StarCUHK-Shenzhen buildable research fork — reconstructed build system from raw TypeScript snapshot.
ruvnet/open-claude-code StarNightly auto-decompile rebuild — 903+ tests, 25 tools, 4 MCP transports, 6 permission modes.
Enderfga/openclaw-claude-code StarOpenClaw plugin — unified ISession interface for Claude/Codex/Gemini/Cursor. Multi-agent council.
memaxo/claude_code_re StarReverse engineering from minified bundles — deobfuscation of the publicly distributed cli.js file.
agentforce314/clawcodex StarPython rebuild with multi-provider LLM support.

Claude Code Guides & Learning

Tutorials and hands-on learning paths for Claude Code itself.

RepositoryDescription
shareAI-lab/learn-claude-code Star"Bash is all you need" — 19-chapter 0-to-1 course with runnable Python agents, web platform. ZH/EN/JA.
FlorianBruniaux/claude-code-ultimate-guide StarBeginner-to-power-user guide with production-ready templates, agentic workflow guides, and cheatsheets.
affaan-m/everything-claude-code StarAgent harness optimization — skills, instincts, memory, security, and research-first development.

General Harness Engineering Design Space Resources

External resources that complement this paper's design-space analysis — concept essays, curricula, and code that illuminate the harness layer as an engineering practice.

RepositoryDescription
deusyu/harness-engineering StarLearning archive — original concept essays, independent thinking pieces, and curated translations of harness-engineering writing; from concept to independent practice.
walkinglabs/learn-harness-engineering StarProject-based English course with PDF coursebooks, syllabus, and capstone, organized around five harness subsystems: instructions, state, verification, scope, and session lifecycle.
china-qijizhifeng/agentic-harness-engineering StarObservability system that auto-evolves a coding agent's harness — a meta-agent reads execution traces and rewrites system prompts, tools, middleware, skills, sub-agents, and memory.
ZhangHanDong/harness-engineering-from-cc-to-ai-coding StarThe "Horse Book" (《马书》) — Chinese mdBook framing Claude Code v2.1.88 as a Harness Engineering case study; covers architecture, prompt engineering, context management, prompt cache, security, and lessons for builders.

Blog Posts & Technical Articles

ArticleWhat Makes It Valuable
Marco Kotrotsos — "Claude Code Internals" (15-part series)Most systematic pre-leak analysis. Architecture, agent loop, permissions, sub-agents, MCP, telemetry.
Alex Kim — "The Claude Code Source Leak"Anti-distillation mechanisms, frustration detection, Undercover Mode, ~250K wasted API calls/day.
Haseeb Qureshi — Cross-agent architecture comparisonClaude Code vs Codex vs Cline vs OpenCode — architecture-level comparison.
George Sung — "Tracing Claude Code's LLM Traffic"Complete system prompts and full API logs. Discovered dual-model usage (Opus + Haiku).
Agiflow — "Reverse Engineering Prompt Augmentation"5 prompt augmentation mechanisms backed by actual network traces.
Engineer's Codex — "Diving into the Source Code Leak"Modular system prompt, ~40 tools, large query/tool subsystem, anti-distillation.
MindStudio — "Three-Layer Memory Architecture"In-context memory, MEMORY.md pointer index, CLAUDE.md static config. Best single resource on memory.
WaveSpeed — "Claude Code Architecture: Leaked Source Deep Dive"512K-line TS source deep dive; context compression and anti-distillation.
Zain Hasan — "Inside Claude Code: An Architecture Deep Dive"Layered architecture, 5 entry modes, multi-agent walkthrough.
Addy Osmani — "Agent Harness Engineering"Frames harness engineering as a discipline with named primitives (filesystem/git state, sandboxes, AGENTS.md memory, compaction, planning loops, hooks); cites Claude Code as the canonical mature example.
Andrej Karpathy — "Sequoia Ascent 2026"Argues for "agentic engineering": humans orchestrate and verify rather than write code. "LLMs and reinforcement learning automate what you can verify"; "you can outsource your thinking, but you can't outsource your understanding."

Cross-Vendor Code-Agent Engineering

Official engineering posts from other vendors building code agents — useful for seeing how the same design questions are answered outside Claude Code.

ResourceVendorWhat's Notable
Harness Engineering: Leveraging Codex in an Agent-First WorldOpenAIFrames the "harness" as the constraints, feedback loops, and documentation that make agents reliable; reports a roughly 1M-line beta built with essentially no hand-written code.
Best Practices for Coding with AgentsCursorArticulates an agent harness as three components — Instructions, Tools, and Model — orchestrated per model.
Build with Google AntigravityGoogleAgent-first platform: a Manager surface for asynchronous multi-agent orchestration, with Artifacts (plans, screenshots, recordings) as the verification mechanism instead of raw logs.
Codex Security: Now in Research PreviewOpenAIApplication-security agent that builds a project-specific threat model, then finds and pressure-tests vulnerabilities in sandboxed validation environments.
PaperVenueRelevance
Architectural Design Decisions in AI Agent HarnessesarXivSource-grounded study of 70 agent-system projects identifying recurring design dimensions; closest contemporary peer to this paper's design-space framing.
Decoding the Configuration of AI Coding AgentsarXivEmpirical study of 328 Claude Code configuration files — SE concerns and co-occurrence patterns.
On the Use of Agentic Coding ManifestsarXivAnalyzed 253 CLAUDE.md files from 242 repos — structural patterns in operational commands.
Context Engineering for Multi-Agent Code AssistantsarXivMulti-agent workflow combining multiple LLMs for code generation.
OpenHands: An Open Platform for AI Software DevelopersICLR 2025Primary academic reference for open-source AI coding agents.
SWE-Agent: Agent-Computer InterfacesNeurIPS 2024Docker-based coding agent with custom agent-computer interface.

How This Paper Differs

While the projects above focus on engineering reverse-engineering or practical reimplementation, this paper provides a systematic values → principles → implementation analytical framework — tracing five human values through thirteen design principles to specific source-level choices, and using OpenClaw comparison to reveal that cross-cutting integrative mechanisms, not modular features, are the true locus of engineering complexity.

See the full curated list with more resources: docs/related-resources.md

↑ Back to top


Other Notable AI Agent Projects

A broader map of the agent design space surrounding Claude Code. The Cross-System Comparison above analyzes the three closest peers (Claude Code, OpenClaw, Hermes-Agent) in depth; the entries below give wider context across coding-agent peers, frameworks, memory systems, harness extensions, the MCP ecosystem, and specialized agents.

Coding Agent CLIs and IDE Harnesses

RepositoryLaunchFocus
openclaw/openclaw StarJan 2026Local-first personal AI assistant across messaging platforms. (Section 10 analysis)
NousResearch/hermes-agent StarFeb 2026Self-improving personal agent with cross-session memory. (Section 10 analysis)
opensquilla/opensquilla StarJun 2026Token-efficient microkernel personal agent across CLI, Web UI, and chat channels; ML-classifier routing across four model cost tiers, local Markdown+SQLite memory (MEMORY.md plus dated notes with keyword and vector recall), and Bubblewrap/Seatbelt sandbox.
pewdiepie-archdaemon/odysseus StarJun 2026Self-hosted, local-first AI workspace from PewDiePie: autonomous agents with tools, MCP, and shell access, plus memory, deep research, and hardware-aware model serving. AGPL-3.0.
sst/opencode StarJun 2025Provider-agnostic terminal coding agent with ACP integration.
Aider-AI/aider Star2023Pair-program with LLMs in the terminal; works with most popular models.
continuedev/continue Star2023Source-controlled AI checks for IDEs with an open-source Continue CLI.
google-gemini/gemini-cli Star2025Google's open-source terminal coding agent with ReAct loop and MCP support.
openai/codex Star2025OpenAI's local terminal coding agent in Rust.
OpenHands/OpenHands Star2024Open SWE agent platform (formerly OpenDevin) with sandboxed runtime.
cline/cline Star2024VS Code agent with explicit Plan/Act oversight loop.
block/goose Star2025Block's open-source, editor-agnostic agent with MCP-style extensions.
charmbracelet/crush Star2025Agentic coding TUI in Go with multi-LLM provider abstraction.
RooCodeInc/Roo-Code Star2024VS Code multi-agent dev-team with Architect, Coder, and Reviewer modes.
bytedance/trae-agent Star2025ByteDance's modular SWE-bench-oriented agent for software engineering tasks.
github/copilot-cli Star2026GitHub Copilot's GA agentic terminal CLI; plans, builds, reviews.
badlogic/pi-mono StarAug 2025Monorepo coding-agent toolkit — unified LLM API, TUI + web UI; OpenClaw embeds the pi-coding-agent SDK from here.

Agent Frameworks and Orchestration

RepositoryLaunchFocus
geekan/MetaGPT Star2023Role-based multi-agent software-company simulation (ICLR 2024 oral).
microsoft/autogen Star2023Microsoft Research multi-agent conversation framework (COLM 2024).
langchain-ai/langgraph Star2024Stateful graph-based multi-agent orchestration with checkpointing.
openai/openai-agents-python Star2024OpenAI's lightweight multi-agent framework with handoffs and guardrails.
crewAIInc/crewAI Star2023Lean Python framework for role-based multi-agent collaboration, independent of LangChain.
openai/symphony StarFeb 2026OpenAI's orchestration for isolated, autonomous implementation runs.
ComposioHQ/agent-orchestrator Star2025Orchestration layer for parallel AI agents with git worktree isolation.
coleam00/Archon StarFeb 2025Deterministic harness — YAML-defined workflows with execution audit trail.
bytedance/deer-flow Star2026ByteDance's long-horizon "SuperAgent" harness: subagents, memory, sandboxes, skills, and a message gateway; a ground-up rewrite on LangGraph/LangChain.

Memory and Persistent Context

RepositoryLaunchFocus
mem0ai/mem0 Star2024Production memory layer with LoCoMo and LongMemEval benchmarks (arXiv:2504.19413).
letta-ai/letta Star2023Stateful-agent platform with OS-style hierarchical memory paging (formerly MemGPT, COLM 2024).
MemPalace/mempalace Star2026Local-first memory system for AI agents.

Skills and Harness Extensions

RepositoryLaunchFocus
addyosmani/agent-skills Star202522 lifecycle skills + slash commands (/spec, /plan, /build, /test, /review, /ship).
obra/superpowers Star2025Cross-harness mandatory-workflow skills framework (Claude Code, OpenCode, Codex).
mattpocock/skills Star2026Author's everyday .claude/skills collection for real engineering -- composable TDD, diagnose, and to-issues/to-prd skills; model-agnostic, targeting Claude Code, Codex, and other coding agents.
multica-ai/andrej-karpathy-skills Star2026Single CLAUDE.md encoding Andrej Karpathy's four LLM-coding rules (think before coding, simplicity first, surgical changes, goal-driven execution); installable as a plugin or per-project.
lsdefine/GenericAgent Star2025Minimal self-evolving autonomous agent framework — 9 atomic tools + ~100-line ReAct loop.

MCP Ecosystem

RepositoryLaunchFocus
PrefectHQ/fastmcp Star2024Pythonic framework for building MCP servers and clients; de facto SDK.
upstash/context7 Star2025Up-to-date library-documentation MCP server for LLMs and AI code editors.
microsoft/playwright-mcp Star2024Microsoft's official MCP server using accessibility-tree snapshots.

Specialized and Domain Agents

RepositoryLaunchFocus
666ghj/MiroFish StarMar 2026Multi-agent swarm-intelligence simulation engine.
multica-ai/multica Star2026Managed-agents platform for task assignment and skill compounding.
HKUDS/nanobot StarFeb 2026Ultra-lightweight personal AI agent from HKU-DS.
HKUDS/OpenHarness StarApr 2026Open agent harness with built-in personal agent (Ohmo); academic harness reference.
karpathy/autoresearch StarMar 2026Andrej Karpathy's autonomous AI-agent loop running nanochat training research on a single GPU.
HKUDS/CLI-Anything StarMar 2026"Making ALL Software Agent-Native" — wraps arbitrary software as agent-callable tools.
Panniantong/Agent-Reach StarFeb 2026CLI giving agents access to Twitter, Reddit, YouTube, GitHub, Bilibili, Xiaohongshu.
agentscope-ai/QwenPaw StarFeb 2026Personal AI assistant from the AgentScope team.
cft0808/edict StarFeb 2026OpenClaw-based multi-agent orchestration on Tang-dynasty Three Departments and Six Ministries (三省六部制) bureaucracy.

↑ Back to top


Star History Chart

Citation

@article{diveclaudecode2026,
  title={Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems},
  author={Jiacheng Liu, Xiaohan Zhao, Xinyi Shang, and Zhiqiang Shen},
  year={2026},
  eprint={2604.14228},
  archivePrefix={arXiv},
  primaryClass={cs.SE},
}

License

This work is licensed under CC BY-NC-SA 4.0.