Research Foundations

March 12, 2026 · View on GitHub

Every Hoofy feature is grounded in published research. This document maps each capability to the specific research that informed it — what it recommends, and how Hoofy implements it.

Anthropic Engineering

Building Effective Agents (Dec 2024)

Foundational patterns for agent design. Distinguishes workflows from agents, introduces the concept of Agent-Computer Interface (ACI), and establishes that tool design matters as much as prompt design.

RecommendationHoofy Implementation
"Agent-Computer Interface (ACI) is as important as HCI" — tool descriptions and parameters are critical for AI usabilityAll 38 tools use consistent sdd_* and mem_* namespacing with self-documenting parameter descriptions
"Do the simplest thing that works" — avoid over-engineering agent systemsAdaptive change pipeline selects only the stages needed (4-7 stages based on type x size), instead of forcing a one-size-fits-all workflow
Orchestrator-worker pattern for complex tasksProject pipeline uses sequential orchestration: propose → specify → clarify → design → tasks → validate
Evaluator-optimizer pattern for iterative refinementClarity Gate blocks pipeline advancement until clarity score meets threshold, forcing iterative requirement refinement

Effective Context Engineering for AI Agents (Sep 2025)

The most relevant article for Hoofy's memory system. Defines context as a finite resource with diminishing marginal returns, and presents strategies for managing it.

RecommendationHoofy Implementation
"Structured note-taking / agentic memory" — agent writes notes persisted outside the context window, pulls them back latermem_save persists observations to SQLite with FTS5 full-text search. mem_context and mem_search retrieve them in future sessions
"Progressive disclosure" — agents discover context layer by layer, keeping only what's necessarymem_searchmem_timelinemem_get pattern: search first, drill into timeline, then read full content
"Sub-agent architectures" — specialized sub-agents with clean context windows, return condensed summariesKnowledge graph traversal via mem_get(id, depth) pulls relations from any observation. namespace parameter on memory tools (mem_save, mem_progress, mem_search, mem_context, mem_compact) enables opt-in isolation — each sub-agent tags observations with its namespace, reads only its own notes, while the orchestrator omits namespace to see everything
"Hybrid strategy" — some data retrieved up front, other data explored just-in-timemem_context loads recent history at session start (up front). mem_search retrieves specific memories on demand (just-in-time)
"Context is a finite resource" — treat it like an attention budget5 read-heavy tools support `detail_level: summary
"You have to be smart about managing what goes into context" — stale and redundant data degrades performance over timemem_compact identifies stale observations (older than N days) and batch soft-deletes them. Optionally creates a "compaction_summary" observation to preserve key knowledge. Two-step workflow: identify candidates → review → compact with summary
"Context is a finite resource" — token budgets must be managed explicitly, not just by verbosity level5 read-heavy tools (mem_context, mem_search, mem_timeline, sdd_get_context, sdd_context_check) accept max_tokens to hard-cap response size. Token estimation uses len(text)/4 heuristic (O(1), no tokenizer dependency). Every response includes a 📏 ~N tokens footer. Budget-capped responses prepend ⚡ Budget-capped notice. Complementary to detail_level — one controls content type, the other controls total output size

Writing Effective Tools for Agents — with Agents (Sep 2025)

Direct guidance on tool design for AI agents. Covers namespacing, consolidation, response format, truncation, and token efficiency.

RecommendationHoofy Implementation
"Namespacing tools with prefixes helps delineate boundaries"mem_* memory tools, sdd_* project tools, sdd_change* change tools, plus standalone sdd_explore, sdd_suggest_context, sdd_review — clear boundaries between systems
"Return only high-signal information, avoid cryptic UUIDs"Tool responses include human-readable summaries, not raw database rows. detail_level parameter lets the AI request only the verbosity needed
"Tools should be self-contained, robust to error, extremely clear"Each tool has comprehensive parameter descriptions with examples in the tool definition
"Truncate tool responses, but always include total counts"mem_search, mem_context, and mem_timeline append navigation hints ("📊 Showing X of Y") when results are capped by limit. NavigationHint() returns empty string when all results are shown (no noise)

How We Built Our Multi-Agent Research System (Jun 2025)

Architecture lessons from Anthropic's multi-agent Research feature. Key insights on token efficiency, orchestration, and memory management.

RecommendationHoofy Implementation
"Long-horizon conversation management: agents summarize completed phases, store in external memory"mem_session(action="end", summary=...) captures structured summaries at session end for future sessions
"Subagents output to filesystem to minimize 'game of telephone'"All pipeline artifacts are written to sdd/*.md files on disk, not passed through conversation history
"Each sub-agent works independently with its own context" — parallel agents need memory isolationnamespace parameter provides opt-in memory scoping. Sub-agents tag observations with namespace="subagent/<task-id>", reads filter by namespace. Orchestrator omits namespace to see all. Convention: subagent/<task-id> or agent/<role>
"Token usage explains 80% of performance variance" — more tokens does not equal better resultsTopic key upsert (mem_save with topic_key) prevents memory duplication. One observation per topic, always current

Effective Harnesses for Long-Running Agents (Nov 2025)

Solutions for agents that work across multiple context windows. Introduces the initializer agent pattern, incremental progress, and structured handoffs.

RecommendationHoofy Implementation
"Each session: read progress, read git log, run basic test, then start new work"mem_progress persists structured JSON progress docs that survive context compaction. Auto-read at session start, upserted during work. One active progress per project via topic_key. mem_context provides recent observations for broader session context
"Feature list in JSON (not Markdown) — model less likely to inappropriately change JSON"Pipeline state persisted in sdd/sdd.json (JSON), not markdown. mem_progress content is validated JSON — the model is less likely to corrupt structured data than free-form markdown
"Agent commits to git with descriptive messages after each feature"Change pipeline enforces incremental delivery: one active change at a time, verify stage before completion
"Initializer agent sets up environment on first run"sdd_init_project creates the sdd/ directory structure, sdd.json config, and templates — environment scaffolding before any work begins

Claude Code: Best Practices for Agentic Coding (Apr 2025)

Best practices for getting the most out of AI coding assistants. Covers CLAUDE.md, custom instructions, and structured workflows.

RecommendationHoofy Implementation
Use CLAUDE.md for persistent project contextContext-check stage scans CLAUDE.md, AGENTS.md, CONTRIBUTING.md and other convention files for conflicts with the current change
Structure specifications before codingFull greenfield pipeline (propose → specify → business rules → clarity gate → design → tasks → validate) enforces specs before any code is written

Academic Research

Codified Context: Infrastructure for AI Agents in a Complex Codebase (Lulla 2026)

Empirical analysis of meta-infrastructure (AGENTS.md, custom instructions, codified context) for AI coding agents in production codebases. Studies 6,088 SWE tasks and shows that codified context is a first-class engineering artifact, not just documentation.

FindingHoofy Implementation
AGENTS.md associated with 29% less runtime and 17% less token consumptionHoofy's AGENTS.md is actively scanned by sdd_context_check and sdd_suggest_context — codified context is used as input, not just documentation
Compact constitutions (~660 lines) outperform monolithic instructionsServer instructions reduced from 733 lines to ~160 lines. Detailed guidance moved to 6 on-demand MCP prompts (/sdd-stage-guide, /sdd-memory-guide, /sdd-change-guide, /sdd-bootstrap-guide) loaded only when needed
80%+ of agent prompts are ≤100 words — short, focused interactions dominatesdd_suggest_context is designed for short "what should I read?" queries. sdd_review takes a brief change description and returns a structured checklist
4.3% overhead for meta-infrastructure (context files) — small cost for significant gainsSDD artifacts (sdd/*.md) and convention files add minimal overhead while preventing hallucinations and rework
24.2% knowledge-to-code ratio — nearly 1/4 of repo content is context/documentationHoofy's pipeline generates specs, business rules, design docs, and task breakdowns as first-class artifacts alongside code
Ad-hoc review was more used than formal review stagessdd_review is a standalone tool, not a pipeline stage — can be used at any time without starting a change flow (ADR captured for this decision)

Industry Research

Requirements Engineering & Specification

SourceWhat it saysHoofy Implementation
METR 2025Experienced developers were 19% slower with unstructured AI despite feeling 20% fasterHoofy enforces structured specification — the AI cannot skip specs for non-trivial changes
DORA 20257.2% delivery instability increase for every 25% AI adoption without foundational practicesPipeline stages (context-check, clarity gate, verify) provide the foundational practices DORA identifies as missing
McKinsey 2025Top performers see 16-30% productivity gains only with structured specification and communicationSDD pipeline is structured specification and communication — proposal, requirements, design, tasks
IEEE 720574Fixing a requirement error in production costs 10-100x more than during requirements phaseClarity Gate catches ambiguities in the requirements phase, before any code is written
IREB & IEEE 29148Industry standards for structured requirements elicitation and traceabilityServer instructions implement IEEE 29148 Requirements Smells heuristics for the AI to follow during specification
Business Rules GroupBusiness Rules Manifesto — rules are first-class citizens, not buried in codeBusiness-rules stage uses BRG taxonomy (Definitions, Facts, Constraints, Derivations) to extract declarative rules from requirements
EARSEasy Approach to Requirements Syntax — sentence templates that eliminate ambiguityServer instructions use EARS patterns (When/While/Where/If-Then) for the AI to follow when writing requirements
DDD Ubiquitous LanguageA shared language eliminates translation errors between business and technical domainsBusiness-rules stage builds a glossary as part of the Ubiquitous Language, used across all pipeline artifacts

This document is updated as new features are added. Every feature must cite its research source before shipping.