awesome-agent-harness 🤖

May 29, 2026 · View on GitHub

A curated list of agent harnesses, agent frameworks, workflow frameworks, and emerging agent protocols.

awesome-agent-harness is a curated map of the modern AI agent ecosystem. It focuses on developer-facing agent products, agent-building frameworks, workflow orchestration tools, and the protocols shaping how agents interact with tools, UIs, and each other.

This repository is designed for builders, researchers, and anyone trying to answer questions like:

  • Which agent products are worth tracking right now? 👀
  • Which framework should I choose to build an agent system? 🧰
  • Where is the boundary between agent frameworks and workflow frameworks? 🔀
  • Which protocols may become core ecosystem standards? 🌐

📅 The Evolution Timeline (2026)

The emergence of Harness Engineering marks a paradigm shift in the AI Agent landscape: moving from “Prompt Tuning & Model Obsession” to “Rigid Scaffolding & Environment Constraints”.

Here is how the discipline converged in early 2026:

timeline
    title Harness Engineering Evolution (2026)
    2026-2-5 : Mitchell Hashimoto (HashiCorp)
                        : Coined the philosophy & "The Engineering Ratchet"
    2026-2-11   : OpenAI Technical Memo
                        : 1M lines of code delivered via agent swarm
    2026-3-10      : Viv Trivedy (LangChain)
                        : Formalized the formula: Agent = Model + Harness
    2026-3-24   : Anthropic Engineering Team
                        : Long-Horizon Breakthroughes & Context Firewalls
    2026-4-19 : Addy Osmani (Google) & Industry
                             : Definition of the "Harness Gap" & HaaS Convergence

1. The Spark: Philosophy & The Ratchet Effect

  • Date: 2026-2-5
  • Key Figure: Mitchell Hashimoto (Co-founder of HashiCorp)
  • Milestone: 《My AI Adoption Journey》 borrowed the concept of a "Test Harness" from traditional software engineering. He proposed the "Engineer the Harness" philosophy: Whenever an agent slips, stop blaming the model weights or tweaking prompts. Instead, spend time engineering a rigid environmental constraint so the agent can never make that exact mistake again. This introduced the concept of the Engineering Ratchet to AI development.

2. The Validation: Enterprise-Scale Production

  • Date: 2026-2-11
  • Key Organization: OpenAI
  • Milestone: Published the seminal report 《Harness engineering: leveraging Codex in an agent-first world》. They open-sourced their post-mortem of a project where an autonomous agent swarm delivered 1 million lines of production code in 5 months with zero human coding. It proved that Mitchell's harness-constraint philosophy scales remarkably to massive, complex software architectures.

3. The Anatomy: Standardization & The Formula

  • Date: 2026-3-10
  • Key Figure: Viv Trivedy (LangChain Team)
  • Milestone: Published 《The Anatomy of an Agent Harness》. Viv formalized the chaotic industry practices into an elegant architectural equation: Agent=Model+Harness\text{Agent} = \text{Model} + \text{Harness} He mapped out the 6 foundational atomic components of a modern harness (Filesystem/Git, Boxed Bash, Context Compaction, Lifecycle Hooks, Search, and Multi-Agent Orchestration), transforming the black magic of agent tuning into a rigorous engineering discipline.

4. The Long-Horizon Breakthrough: Context Firewalls

  • Date: 2026-3-24
  • Key Organization: Anthropic Engineering Team
  • Milestone: 《Harness design for long-running application development》 As agents tackled multi-day tasks, the industry hit the wall of Context Rot (reasoning decay as context windows fill up). Anthropic stepped in with breakthrough paradigms for long-running work:
    • Full Context Resets: Tearing down bloated sessions and rebuilding them from a compact, structured Hand-off File.
    • Planner/Generator/Evaluator Splits: Enforcing a Sprint Contract at the harness level to prevent agents from grading their own work ("GANs for code").

5. The Convergence: Harness-as-a-Service (HaaS)

  • Date: 2026-4-19
  • Key Figure/Trend: Addy Osmani (Google) & Cloud Providers
  • Milestone: Addy Osmani published the definitive overview 《Agent Harness Engineering》, declaring that "The gap between what today's models can do and what you see them doing is largely a harness gap." This led directly to the HaaS (Harness-as-a-Service) era. With the launch of the Claude Agent SDK and OpenAI Agents SDK, the industry shifted from building raw LLM completion loops to configuring robust, managed agent runtimes out of the box.

“Every component in a harness encodes an assumption about what the model can’t do on its own.” — Anthropic. As models evolve, the scaffolding doesn't shrink—it moves to higher ceilings.

Contents 📚

Agent Harness 🚀

Agent harnesses are end-user or developer-facing products that package model access, tools, execution loops, memory, planning, coding assistance, browser automation, or task execution into a usable experience.

ProductRelease DateDeveloper / OrganizationOpen Source
Cursor2023-01AnysphereNo
AutoGPT2023-03Significant GravitasYes
BabyAGI2023-04Yohei NakajimaYes
Cody2023-05SourcegraphNo
Aider2023-05Paul GauthierYes
Sweep2023-05Sweep AIYes
Continue2023-06Continue, Inc.Yes
GPT Engineer2023-06LovableYes
GPT Pilot2023-08PythagoraYes
Tongyi Lingma2023-10AlibabaNo
v0.dev2023-10VercelNo
Plandex2024-02PlandexYes
Devin2024-03Cognition LabsNo
OpenHands (OpenDevin)2024-03All Hands AIYes
Amazon Q Developer2024-04AmazonNo
SWE-agent2024-04Princeton University NLP GroupYes
Cline2024-06ClineYes
OpenCode2024-07Anomaly InnovationsYes
PearAI2024-07PearAIYes
Void2024-08Void Editor TeamYes
Replit Agent2024-09ReplitNo
Pythagora2024-10PythagoraNo
Bolt.new2024-10StackBlitzNo
bolt.diy2024-10StackBlitzYes
OpenCUA2024-10ModelBestYes
Lovable2024-11LovableNo
Windsurf2024-11Codeium -> Cognition LabsNo
Amp2024-11SourcegraphNo
browser-use2024-11Browser Use Inc.Yes
agent-browser2024-12EmergenceYes
Roo Code2025-01Roo CodeYes
Trae2025-01ByteDanceNo
Pi Coding Agent2025-01Mario ZechnerYes
Goose2025-01BlockYes
Crush2025-01Crush AIYes
Claude Code2025-02AnthropicYes*
GitHub Copilot2025-02GitHubYes
Manus2025-03Butterfly EffectNo
OpenManus2025-03MetaGPTYes
Genspark Super Agent2025-04GensparkNo
Codex CLI2025-04OpenAIYes
Gemini CLI2025-06GoogleYes
CodeBuddy2025-07TencentNo
trae-agent2025-07ByteDanceYes
Qwen Code2025-07AlibabaYes
Deep Agents2025-07LangChainYes
Qoder2025-08AlibabaNo
Open SWE2025-08LangChainYes
AstrBot2025-09AstrBotDevsYes
OpenClaw2025-11Peter SteinbergerYes
NanoClaw2026-01NanoCoYes
nanobot2026-02HKUDSYes
Hermes2026-02Nous ResearchYes
Warp2026-04WarpYes
Reasonix2026-05esengineYes

Yes* indicates a partially open-source, open-core, or otherwise limited open-source model.

Agent Framework 🧠

Agent frameworks are developer toolkits for building agent systems. They usually cover capabilities like prompt orchestration, tool calling, memory, planning, state management, evaluation, and multi-agent coordination.

FrameworkDeveloper / OrganizationLanguageRelease Date
HaystackdeepsetPython2019-11
LlamaIndexLlamaIndexPython / TypeScript2022-11
DSPyStanford NLPPython2023-01
Semantic KernelMicrosoftC# / Python / Java2023-03
Camel-AIKAUST / Open SourcePython2023-03
Agno (Phidata)Agno teamPython2023-05
Vercel AI SDKVercelTypeScript2023-06
InstructorJason LiuPython / TypeScript / Go / Rust2023-07
MetaGPTMetaGPT Team / DeepWisdomPython2023-07
AutoGenMicrosoft ResearchPython / C#2023-09
ModelScope-AgentAlibabaPython2023-09
Letta (MemGPT)Letta / UC Berkeley teamPython2023-10
CrewAICrewAI Inc.Python2023-11
LangGraphLangChainPython / TypeScript2024-01
RigJetpack.ioRust2024-04
Bee Agent FrameworkIBMTypeScript2024-08
EinoByteDanceGo2024-10
PydanticAIPydanticPython2024-12
Pi Agent CoreMario ZechnerTypeScript2025-01
OpenAI Agents SDKOpenAITypeScript / Node / Python / Go2025-03
Google ADKGooglePython / Java / TypeScript / Go2025-04
Claude Agent SDKAnthropicPython / TypeScript2025-06
Microsoft Agent FrameworkMicrosoftPython / C#2025-10

Workflow Framework 🔄

Workflow frameworks are useful for orchestration, scheduling, stateful execution, observability, and visual flow design. They are not always agent-first, but they often serve as the execution backbone for agent systems.

FrameworkDeveloper / OrganizationLanguageRelease Date
PrefectPrefect TechnologiesPython2019-03
TemporalTemporal TechnologiesGo / Python / Java / TypeScript2020-10
HamiltonStitch Fix / DagWorksPython2021-10
YaoIQSGo / JavaScript2022-11
LangflowDataStaxPython2023-04
FlowiseFlowiseAITypeScript2023-04
DifyLangGeniusPython2023-05
CozeByteDanceGo2023-12
BurrDagWorksPython2024-03

AgentOps / Observability 📈

AgentOps and observability tooling help teams monitor, debug, evaluate, and improve agent behavior in production. These tools typically provide traces, session replay, cost/token monitoring, prompt/version tracking, and quality evaluation pipelines.

Common capability buckets:

  • Tracing & replay: inspect each step (prompt, tool call, model response, latency) of an agent run.
  • Evaluation: run online/offline eval sets, track regressions, and compare prompt/model/agent versions.
  • Cost & performance: monitor token usage, model spend, error rate, and tail latency over time.
  • Dataset & feedback loops: collect production conversations, annotate failure cases, and feed them back into evals.
  • Governance: prompt/version history, experiment lineage, and auditability for incident reviews.
PlatformDeveloper / OrganizationFocusFirst Public Release
HeliconeHeliconeLLM proxy analytics, request logging, spend monitoring, and caching/rate controls2023-05
TruLensTruEra (now Snowflake)LLM app evaluation/guardrails with feedback functions and quality metrics2023-05
LangfuseLangfuseOpen-source LLM/app observability, traces, prompt management, datasets, and evals2023-07
LangSmithLangChainAgent tracing, debugging, test/eval pipelines, and experiment comparison2023-07
AgentOpsAgentOps / CometAgent runtime observability, tracing, cost/performance monitoring, and eval workflows2023-09
Arize PhoenixArize AIOpen-source observability and evaluation for LLM apps (traces, spans, evals)2023-10
BraintrustBraintrust DataEvaluation-first workflow for prompts/apps with experiment tracking and scoring2023-11
Weights & Biases WeaveWeights & BiasesPrompt/app tracing, experiment analysis, and evaluation workflows2024-02

Selection notes (quick heuristic):

  • Pick open-source/self-hosted first (e.g., Langfuse, Phoenix, TruLens) when data residency or internal compliance is strict.
  • Pick evaluation-first platforms (e.g., Braintrust, LangSmith, Weave) when your bottleneck is quality iteration speed.
  • Pick a proxy-centric layer (e.g., Helicone) when you mainly need model usage analytics and cost control with minimal app changes.
  • Use a hybrid stack in larger teams: proxy for spend controls + tracing/eval platform for quality and debugging.

Protocol 🌐

Protocols, conventions, and interface patterns worth watching in the agent ecosystem.

Dates below refer to the first public spec, announcement, or launch that I could verify. For a few newer protocols, the date is best treated as the earliest public appearance rather than a formal standards milestone.

ProtocolInitiated ByFirst Public ReleaseWhat It Covers
llms.txtJeremy Howard / Answer.AI2024-09LLM-readable website discovery and content guidance
MCPAnthropic2024-11Agent / model connection to tools, data, and external systems
ACPZed and JetBrains2025-03An open standard that enables any agent to integrate seamlessly with any editing environment
AG-UICopilotKit / AG-UI community2025-04Real-time agent-to-user interaction between agent backends and frontends
A2AGoogle2025-04Agent-to-agent collaboration and task delegation
ANPANP Open Community2025-05An open protocol stack for the Agentic Web, covering decentralized identity (DID), service discovery, end-to-end encrypted messaging, and agent payments
AGENTS.mdOpenAI-led industry working group; now stewarded by the Agentic AI Foundation2025-08Project-level instructions for coding agents
AP2Google2025-09An open protocol for secure, agent-led AI commerce
A2UIGoogle with contributions from CopilotKit and the open-source community2025-09Agent-generated, declarative UI rendered natively across clients
Agent SkillsAnthropic2025-10Portable skills and reusable capability packs for agents
DESIGN.mdGoogle (via Google Stitch)2026-03Agent-readable design system rules (colors, typography, spacing, patterns) to enforce visual consistency in AI-generated UI

Roadmap 🗺️

This repository can be expanded into a stronger long-term awesome list with:

  1. Official website and GitHub links for every entry where both exist
  2. One-line descriptions for each project
  3. Tags like coding, browser, research, multi-agent, cloud, and cli
  4. Comparison dimensions such as deployment model, tool use, memory, local model support, and collaboration model
  5. A richer protocol section with references and short explanations
  6. Related resources and adjacent awesome lists

Contributing 🤝

Contributions are welcome. Feel free to open an Issue or Pull Request to:

  • Add a missing project
  • Fix a category or release date
  • Improve naming consistency
  • Add links, descriptions, or tags
  • Expand the protocol section

Suggested contribution format:

| Project | Category | Developer / Organization | Language | Open Source | Release Date | Link | Notes |

License 📄

This repository is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).

You are free to share and adapt the material for any purpose, including commercial use, as long as appropriate attribution is given.

Star History

Star History Chart