Research Foundations
March 12, 2026 · View on GitHub
Every Hoofy feature is grounded in published research. This document maps each capability to the specific research that informed it — what it recommends, and how Hoofy implements it.
Anthropic Engineering
Building Effective Agents (Dec 2024)
Foundational patterns for agent design. Distinguishes workflows from agents, introduces the concept of Agent-Computer Interface (ACI), and establishes that tool design matters as much as prompt design.
| Recommendation | Hoofy Implementation |
|---|---|
| "Agent-Computer Interface (ACI) is as important as HCI" — tool descriptions and parameters are critical for AI usability | All 38 tools use consistent sdd_* and mem_* namespacing with self-documenting parameter descriptions |
| "Do the simplest thing that works" — avoid over-engineering agent systems | Adaptive change pipeline selects only the stages needed (4-7 stages based on type x size), instead of forcing a one-size-fits-all workflow |
| Orchestrator-worker pattern for complex tasks | Project pipeline uses sequential orchestration: propose → specify → clarify → design → tasks → validate |
| Evaluator-optimizer pattern for iterative refinement | Clarity Gate blocks pipeline advancement until clarity score meets threshold, forcing iterative requirement refinement |
Effective Context Engineering for AI Agents (Sep 2025)
The most relevant article for Hoofy's memory system. Defines context as a finite resource with diminishing marginal returns, and presents strategies for managing it.
| Recommendation | Hoofy Implementation |
|---|---|
| "Structured note-taking / agentic memory" — agent writes notes persisted outside the context window, pulls them back later | mem_save persists observations to SQLite with FTS5 full-text search. mem_context and mem_search retrieve them in future sessions |
| "Progressive disclosure" — agents discover context layer by layer, keeping only what's necessary | mem_search → mem_timeline → mem_get pattern: search first, drill into timeline, then read full content |
| "Sub-agent architectures" — specialized sub-agents with clean context windows, return condensed summaries | Knowledge graph traversal via mem_get(id, depth) pulls relations from any observation. namespace parameter on memory tools (mem_save, mem_progress, mem_search, mem_context, mem_compact) enables opt-in isolation — each sub-agent tags observations with its namespace, reads only its own notes, while the orchestrator omits namespace to see everything |
| "Hybrid strategy" — some data retrieved up front, other data explored just-in-time | mem_context loads recent history at session start (up front). mem_search retrieves specific memories on demand (just-in-time) |
| "Context is a finite resource" — treat it like an attention budget | 5 read-heavy tools support `detail_level: summary |
| "You have to be smart about managing what goes into context" — stale and redundant data degrades performance over time | mem_compact identifies stale observations (older than N days) and batch soft-deletes them. Optionally creates a "compaction_summary" observation to preserve key knowledge. Two-step workflow: identify candidates → review → compact with summary |
| "Context is a finite resource" — token budgets must be managed explicitly, not just by verbosity level | 5 read-heavy tools (mem_context, mem_search, mem_timeline, sdd_get_context, sdd_context_check) accept max_tokens to hard-cap response size. Token estimation uses len(text)/4 heuristic (O(1), no tokenizer dependency). Every response includes a 📏 ~N tokens footer. Budget-capped responses prepend ⚡ Budget-capped notice. Complementary to detail_level — one controls content type, the other controls total output size |
Writing Effective Tools for Agents — with Agents (Sep 2025)
Direct guidance on tool design for AI agents. Covers namespacing, consolidation, response format, truncation, and token efficiency.
| Recommendation | Hoofy Implementation |
|---|---|
| "Namespacing tools with prefixes helps delineate boundaries" | mem_* memory tools, sdd_* project tools, sdd_change* change tools, plus standalone sdd_explore, sdd_suggest_context, sdd_review — clear boundaries between systems |
| "Return only high-signal information, avoid cryptic UUIDs" | Tool responses include human-readable summaries, not raw database rows. detail_level parameter lets the AI request only the verbosity needed |
| "Tools should be self-contained, robust to error, extremely clear" | Each tool has comprehensive parameter descriptions with examples in the tool definition |
| "Truncate tool responses, but always include total counts" | mem_search, mem_context, and mem_timeline append navigation hints ("📊 Showing X of Y") when results are capped by limit. NavigationHint() returns empty string when all results are shown (no noise) |
How We Built Our Multi-Agent Research System (Jun 2025)
Architecture lessons from Anthropic's multi-agent Research feature. Key insights on token efficiency, orchestration, and memory management.
| Recommendation | Hoofy Implementation |
|---|---|
| "Long-horizon conversation management: agents summarize completed phases, store in external memory" | mem_session(action="end", summary=...) captures structured summaries at session end for future sessions |
| "Subagents output to filesystem to minimize 'game of telephone'" | All pipeline artifacts are written to sdd/*.md files on disk, not passed through conversation history |
| "Each sub-agent works independently with its own context" — parallel agents need memory isolation | namespace parameter provides opt-in memory scoping. Sub-agents tag observations with namespace="subagent/<task-id>", reads filter by namespace. Orchestrator omits namespace to see all. Convention: subagent/<task-id> or agent/<role> |
| "Token usage explains 80% of performance variance" — more tokens does not equal better results | Topic key upsert (mem_save with topic_key) prevents memory duplication. One observation per topic, always current |
Effective Harnesses for Long-Running Agents (Nov 2025)
Solutions for agents that work across multiple context windows. Introduces the initializer agent pattern, incremental progress, and structured handoffs.
| Recommendation | Hoofy Implementation |
|---|---|
| "Each session: read progress, read git log, run basic test, then start new work" | mem_progress persists structured JSON progress docs that survive context compaction. Auto-read at session start, upserted during work. One active progress per project via topic_key. mem_context provides recent observations for broader session context |
| "Feature list in JSON (not Markdown) — model less likely to inappropriately change JSON" | Pipeline state persisted in sdd/sdd.json (JSON), not markdown. mem_progress content is validated JSON — the model is less likely to corrupt structured data than free-form markdown |
| "Agent commits to git with descriptive messages after each feature" | Change pipeline enforces incremental delivery: one active change at a time, verify stage before completion |
| "Initializer agent sets up environment on first run" | sdd_init_project creates the sdd/ directory structure, sdd.json config, and templates — environment scaffolding before any work begins |
Claude Code: Best Practices for Agentic Coding (Apr 2025)
Best practices for getting the most out of AI coding assistants. Covers CLAUDE.md, custom instructions, and structured workflows.
| Recommendation | Hoofy Implementation |
|---|---|
| Use CLAUDE.md for persistent project context | Context-check stage scans CLAUDE.md, AGENTS.md, CONTRIBUTING.md and other convention files for conflicts with the current change |
| Structure specifications before coding | Full greenfield pipeline (propose → specify → business rules → clarity gate → design → tasks → validate) enforces specs before any code is written |
Academic Research
Codified Context: Infrastructure for AI Agents in a Complex Codebase (Lulla 2026)
Empirical analysis of meta-infrastructure (AGENTS.md, custom instructions, codified context) for AI coding agents in production codebases. Studies 6,088 SWE tasks and shows that codified context is a first-class engineering artifact, not just documentation.
| Finding | Hoofy Implementation |
|---|---|
| AGENTS.md associated with 29% less runtime and 17% less token consumption | Hoofy's AGENTS.md is actively scanned by sdd_context_check and sdd_suggest_context — codified context is used as input, not just documentation |
| Compact constitutions (~660 lines) outperform monolithic instructions | Server instructions reduced from 733 lines to ~160 lines. Detailed guidance moved to 6 on-demand MCP prompts (/sdd-stage-guide, /sdd-memory-guide, /sdd-change-guide, /sdd-bootstrap-guide) loaded only when needed |
| 80%+ of agent prompts are ≤100 words — short, focused interactions dominate | sdd_suggest_context is designed for short "what should I read?" queries. sdd_review takes a brief change description and returns a structured checklist |
| 4.3% overhead for meta-infrastructure (context files) — small cost for significant gains | SDD artifacts (sdd/*.md) and convention files add minimal overhead while preventing hallucinations and rework |
| 24.2% knowledge-to-code ratio — nearly 1/4 of repo content is context/documentation | Hoofy's pipeline generates specs, business rules, design docs, and task breakdowns as first-class artifacts alongside code |
| Ad-hoc review was more used than formal review stages | sdd_review is a standalone tool, not a pipeline stage — can be used at any time without starting a change flow (ADR captured for this decision) |
Industry Research
Requirements Engineering & Specification
| Source | What it says | Hoofy Implementation |
|---|---|---|
| METR 2025 | Experienced developers were 19% slower with unstructured AI despite feeling 20% faster | Hoofy enforces structured specification — the AI cannot skip specs for non-trivial changes |
| DORA 2025 | 7.2% delivery instability increase for every 25% AI adoption without foundational practices | Pipeline stages (context-check, clarity gate, verify) provide the foundational practices DORA identifies as missing |
| McKinsey 2025 | Top performers see 16-30% productivity gains only with structured specification and communication | SDD pipeline is structured specification and communication — proposal, requirements, design, tasks |
| IEEE 720574 | Fixing a requirement error in production costs 10-100x more than during requirements phase | Clarity Gate catches ambiguities in the requirements phase, before any code is written |
| IREB & IEEE 29148 | Industry standards for structured requirements elicitation and traceability | Server instructions implement IEEE 29148 Requirements Smells heuristics for the AI to follow during specification |
| Business Rules Group | Business Rules Manifesto — rules are first-class citizens, not buried in code | Business-rules stage uses BRG taxonomy (Definitions, Facts, Constraints, Derivations) to extract declarative rules from requirements |
| EARS | Easy Approach to Requirements Syntax — sentence templates that eliminate ambiguity | Server instructions use EARS patterns (When/While/Where/If-Then) for the AI to follow when writing requirements |
| DDD Ubiquitous Language | A shared language eliminates translation errors between business and technical domains | Business-rules stage builds a glossary as part of the Ubiquitous Language, used across all pipeline artifacts |
This document is updated as new features are added. Every feature must cite its research source before shipping.