awesome-agent-harness 🤖
May 29, 2026 · View on GitHub
A curated list of agent harnesses, agent frameworks, workflow frameworks, and emerging agent protocols.
awesome-agent-harness is a curated map of the modern AI agent ecosystem. It focuses on developer-facing agent products, agent-building frameworks, workflow orchestration tools, and the protocols shaping how agents interact with tools, UIs, and each other.
This repository is designed for builders, researchers, and anyone trying to answer questions like:
- Which agent products are worth tracking right now? 👀
- Which framework should I choose to build an agent system? 🧰
- Where is the boundary between agent frameworks and workflow frameworks? 🔀
- Which protocols may become core ecosystem standards? 🌐
📅 The Evolution Timeline (2026)
The emergence of Harness Engineering marks a paradigm shift in the AI Agent landscape: moving from “Prompt Tuning & Model Obsession” to “Rigid Scaffolding & Environment Constraints”.
Here is how the discipline converged in early 2026:
timeline
title Harness Engineering Evolution (2026)
2026-2-5 : Mitchell Hashimoto (HashiCorp)
: Coined the philosophy & "The Engineering Ratchet"
2026-2-11 : OpenAI Technical Memo
: 1M lines of code delivered via agent swarm
2026-3-10 : Viv Trivedy (LangChain)
: Formalized the formula: Agent = Model + Harness
2026-3-24 : Anthropic Engineering Team
: Long-Horizon Breakthroughes & Context Firewalls
2026-4-19 : Addy Osmani (Google) & Industry
: Definition of the "Harness Gap" & HaaS Convergence
1. The Spark: Philosophy & The Ratchet Effect
- Date: 2026-2-5
- Key Figure: Mitchell Hashimoto (Co-founder of HashiCorp)
- Milestone: 《My AI Adoption Journey》 borrowed the concept of a "Test Harness" from traditional software engineering. He proposed the "Engineer the Harness" philosophy: Whenever an agent slips, stop blaming the model weights or tweaking prompts. Instead, spend time engineering a rigid environmental constraint so the agent can never make that exact mistake again. This introduced the concept of the Engineering Ratchet to AI development.
2. The Validation: Enterprise-Scale Production
- Date: 2026-2-11
- Key Organization: OpenAI
- Milestone: Published the seminal report 《Harness engineering: leveraging Codex in an agent-first world》. They open-sourced their post-mortem of a project where an autonomous agent swarm delivered 1 million lines of production code in 5 months with zero human coding. It proved that Mitchell's harness-constraint philosophy scales remarkably to massive, complex software architectures.
3. The Anatomy: Standardization & The Formula
- Date: 2026-3-10
- Key Figure: Viv Trivedy (LangChain Team)
- Milestone: Published 《The Anatomy of an Agent Harness》. Viv formalized the chaotic industry practices into an elegant architectural equation: He mapped out the 6 foundational atomic components of a modern harness (Filesystem/Git, Boxed Bash, Context Compaction, Lifecycle Hooks, Search, and Multi-Agent Orchestration), transforming the black magic of agent tuning into a rigorous engineering discipline.
4. The Long-Horizon Breakthrough: Context Firewalls
- Date: 2026-3-24
- Key Organization: Anthropic Engineering Team
- Milestone: 《Harness design for long-running application development》 As agents tackled multi-day tasks, the industry hit the wall of Context Rot (reasoning decay as context windows fill up). Anthropic stepped in with breakthrough paradigms for long-running work:
- Full Context Resets: Tearing down bloated sessions and rebuilding them from a compact, structured Hand-off File.
- Planner/Generator/Evaluator Splits: Enforcing a Sprint Contract at the harness level to prevent agents from grading their own work ("GANs for code").
5. The Convergence: Harness-as-a-Service (HaaS)
- Date: 2026-4-19
- Key Figure/Trend: Addy Osmani (Google) & Cloud Providers
- Milestone: Addy Osmani published the definitive overview 《Agent Harness Engineering》, declaring that "The gap between what today's models can do and what you see them doing is largely a harness gap." This led directly to the HaaS (Harness-as-a-Service) era. With the launch of the Claude Agent SDK and OpenAI Agents SDK, the industry shifted from building raw LLM completion loops to configuring robust, managed agent runtimes out of the box.
“Every component in a harness encodes an assumption about what the model can’t do on its own.” — Anthropic. As models evolve, the scaffolding doesn't shrink—it moves to higher ceilings.
Contents 📚
- Agent Harness
- Agent Framework
- Workflow Framework
- AgentOps / Observability
- Protocol
- Roadmap
- Contributing
- License
Agent Harness 🚀
Agent harnesses are end-user or developer-facing products that package model access, tools, execution loops, memory, planning, coding assistance, browser automation, or task execution into a usable experience.
| Product | Release Date | Developer / Organization | Open Source |
|---|---|---|---|
| Cursor | 2023-01 | Anysphere | No |
| AutoGPT | 2023-03 | Significant Gravitas | Yes |
| BabyAGI | 2023-04 | Yohei Nakajima | Yes |
| Cody | 2023-05 | Sourcegraph | No |
| Aider | 2023-05 | Paul Gauthier | Yes |
| Sweep | 2023-05 | Sweep AI | Yes |
| Continue | 2023-06 | Continue, Inc. | Yes |
| GPT Engineer | 2023-06 | Lovable | Yes |
| GPT Pilot | 2023-08 | Pythagora | Yes |
| Tongyi Lingma | 2023-10 | Alibaba | No |
| v0.dev | 2023-10 | Vercel | No |
| Plandex | 2024-02 | Plandex | Yes |
| Devin | 2024-03 | Cognition Labs | No |
| OpenHands (OpenDevin) | 2024-03 | All Hands AI | Yes |
| Amazon Q Developer | 2024-04 | Amazon | No |
| SWE-agent | 2024-04 | Princeton University NLP Group | Yes |
| Cline | 2024-06 | Cline | Yes |
| OpenCode | 2024-07 | Anomaly Innovations | Yes |
| PearAI | 2024-07 | PearAI | Yes |
| Void | 2024-08 | Void Editor Team | Yes |
| Replit Agent | 2024-09 | Replit | No |
| Pythagora | 2024-10 | Pythagora | No |
| Bolt.new | 2024-10 | StackBlitz | No |
| bolt.diy | 2024-10 | StackBlitz | Yes |
| OpenCUA | 2024-10 | ModelBest | Yes |
| Lovable | 2024-11 | Lovable | No |
| Windsurf | 2024-11 | Codeium -> Cognition Labs | No |
| Amp | 2024-11 | Sourcegraph | No |
| browser-use | 2024-11 | Browser Use Inc. | Yes |
| agent-browser | 2024-12 | Emergence | Yes |
| Roo Code | 2025-01 | Roo Code | Yes |
| Trae | 2025-01 | ByteDance | No |
| Pi Coding Agent | 2025-01 | Mario Zechner | Yes |
| Goose | 2025-01 | Block | Yes |
| Crush | 2025-01 | Crush AI | Yes |
| Claude Code | 2025-02 | Anthropic | Yes* |
| GitHub Copilot | 2025-02 | GitHub | Yes |
| Manus | 2025-03 | Butterfly Effect | No |
| OpenManus | 2025-03 | MetaGPT | Yes |
| Genspark Super Agent | 2025-04 | Genspark | No |
| Codex CLI | 2025-04 | OpenAI | Yes |
| Gemini CLI | 2025-06 | Yes | |
| CodeBuddy | 2025-07 | Tencent | No |
| trae-agent | 2025-07 | ByteDance | Yes |
| Qwen Code | 2025-07 | Alibaba | Yes |
| Deep Agents | 2025-07 | LangChain | Yes |
| Qoder | 2025-08 | Alibaba | No |
| Open SWE | 2025-08 | LangChain | Yes |
| AstrBot | 2025-09 | AstrBotDevs | Yes |
| OpenClaw | 2025-11 | Peter Steinberger | Yes |
| NanoClaw | 2026-01 | NanoCo | Yes |
| nanobot | 2026-02 | HKUDS | Yes |
| Hermes | 2026-02 | Nous Research | Yes |
| Warp | 2026-04 | Warp | Yes |
| Reasonix | 2026-05 | esengine | Yes |
Yes* indicates a partially open-source, open-core, or otherwise limited open-source model.
Agent Framework 🧠
Agent frameworks are developer toolkits for building agent systems. They usually cover capabilities like prompt orchestration, tool calling, memory, planning, state management, evaluation, and multi-agent coordination.
| Framework | Developer / Organization | Language | Release Date |
|---|---|---|---|
| Haystack | deepset | Python | 2019-11 |
| LlamaIndex | LlamaIndex | Python / TypeScript | 2022-11 |
| DSPy | Stanford NLP | Python | 2023-01 |
| Semantic Kernel | Microsoft | C# / Python / Java | 2023-03 |
| Camel-AI | KAUST / Open Source | Python | 2023-03 |
| Agno (Phidata) | Agno team | Python | 2023-05 |
| Vercel AI SDK | Vercel | TypeScript | 2023-06 |
| Instructor | Jason Liu | Python / TypeScript / Go / Rust | 2023-07 |
| MetaGPT | MetaGPT Team / DeepWisdom | Python | 2023-07 |
| AutoGen | Microsoft Research | Python / C# | 2023-09 |
| ModelScope-Agent | Alibaba | Python | 2023-09 |
| Letta (MemGPT) | Letta / UC Berkeley team | Python | 2023-10 |
| CrewAI | CrewAI Inc. | Python | 2023-11 |
| LangGraph | LangChain | Python / TypeScript | 2024-01 |
| Rig | Jetpack.io | Rust | 2024-04 |
| Bee Agent Framework | IBM | TypeScript | 2024-08 |
| Eino | ByteDance | Go | 2024-10 |
| PydanticAI | Pydantic | Python | 2024-12 |
| Pi Agent Core | Mario Zechner | TypeScript | 2025-01 |
| OpenAI Agents SDK | OpenAI | TypeScript / Node / Python / Go | 2025-03 |
| Google ADK | Python / Java / TypeScript / Go | 2025-04 | |
| Claude Agent SDK | Anthropic | Python / TypeScript | 2025-06 |
| Microsoft Agent Framework | Microsoft | Python / C# | 2025-10 |
Workflow Framework 🔄
Workflow frameworks are useful for orchestration, scheduling, stateful execution, observability, and visual flow design. They are not always agent-first, but they often serve as the execution backbone for agent systems.
| Framework | Developer / Organization | Language | Release Date |
|---|---|---|---|
| Prefect | Prefect Technologies | Python | 2019-03 |
| Temporal | Temporal Technologies | Go / Python / Java / TypeScript | 2020-10 |
| Hamilton | Stitch Fix / DagWorks | Python | 2021-10 |
| Yao | IQS | Go / JavaScript | 2022-11 |
| Langflow | DataStax | Python | 2023-04 |
| Flowise | FlowiseAI | TypeScript | 2023-04 |
| Dify | LangGenius | Python | 2023-05 |
| Coze | ByteDance | Go | 2023-12 |
| Burr | DagWorks | Python | 2024-03 |
AgentOps / Observability 📈
AgentOps and observability tooling help teams monitor, debug, evaluate, and improve agent behavior in production. These tools typically provide traces, session replay, cost/token monitoring, prompt/version tracking, and quality evaluation pipelines.
Common capability buckets:
- Tracing & replay: inspect each step (prompt, tool call, model response, latency) of an agent run.
- Evaluation: run online/offline eval sets, track regressions, and compare prompt/model/agent versions.
- Cost & performance: monitor token usage, model spend, error rate, and tail latency over time.
- Dataset & feedback loops: collect production conversations, annotate failure cases, and feed them back into evals.
- Governance: prompt/version history, experiment lineage, and auditability for incident reviews.
| Platform | Developer / Organization | Focus | First Public Release |
|---|---|---|---|
| Helicone | Helicone | LLM proxy analytics, request logging, spend monitoring, and caching/rate controls | 2023-05 |
| TruLens | TruEra (now Snowflake) | LLM app evaluation/guardrails with feedback functions and quality metrics | 2023-05 |
| Langfuse | Langfuse | Open-source LLM/app observability, traces, prompt management, datasets, and evals | 2023-07 |
| LangSmith | LangChain | Agent tracing, debugging, test/eval pipelines, and experiment comparison | 2023-07 |
| AgentOps | AgentOps / Comet | Agent runtime observability, tracing, cost/performance monitoring, and eval workflows | 2023-09 |
| Arize Phoenix | Arize AI | Open-source observability and evaluation for LLM apps (traces, spans, evals) | 2023-10 |
| Braintrust | Braintrust Data | Evaluation-first workflow for prompts/apps with experiment tracking and scoring | 2023-11 |
| Weights & Biases Weave | Weights & Biases | Prompt/app tracing, experiment analysis, and evaluation workflows | 2024-02 |
Selection notes (quick heuristic):
- Pick open-source/self-hosted first (e.g., Langfuse, Phoenix, TruLens) when data residency or internal compliance is strict.
- Pick evaluation-first platforms (e.g., Braintrust, LangSmith, Weave) when your bottleneck is quality iteration speed.
- Pick a proxy-centric layer (e.g., Helicone) when you mainly need model usage analytics and cost control with minimal app changes.
- Use a hybrid stack in larger teams: proxy for spend controls + tracing/eval platform for quality and debugging.
Protocol 🌐
Protocols, conventions, and interface patterns worth watching in the agent ecosystem.
Dates below refer to the first public spec, announcement, or launch that I could verify. For a few newer protocols, the date is best treated as the earliest public appearance rather than a formal standards milestone.
| Protocol | Initiated By | First Public Release | What It Covers |
|---|---|---|---|
| llms.txt | Jeremy Howard / Answer.AI | 2024-09 | LLM-readable website discovery and content guidance |
| MCP | Anthropic | 2024-11 | Agent / model connection to tools, data, and external systems |
| ACP | Zed and JetBrains | 2025-03 | An open standard that enables any agent to integrate seamlessly with any editing environment |
| AG-UI | CopilotKit / AG-UI community | 2025-04 | Real-time agent-to-user interaction between agent backends and frontends |
| A2A | 2025-04 | Agent-to-agent collaboration and task delegation | |
| ANP | ANP Open Community | 2025-05 | An open protocol stack for the Agentic Web, covering decentralized identity (DID), service discovery, end-to-end encrypted messaging, and agent payments |
| AGENTS.md | OpenAI-led industry working group; now stewarded by the Agentic AI Foundation | 2025-08 | Project-level instructions for coding agents |
| AP2 | 2025-09 | An open protocol for secure, agent-led AI commerce | |
| A2UI | Google with contributions from CopilotKit and the open-source community | 2025-09 | Agent-generated, declarative UI rendered natively across clients |
| Agent Skills | Anthropic | 2025-10 | Portable skills and reusable capability packs for agents |
| DESIGN.md | Google (via Google Stitch) | 2026-03 | Agent-readable design system rules (colors, typography, spacing, patterns) to enforce visual consistency in AI-generated UI |
Roadmap 🗺️
This repository can be expanded into a stronger long-term awesome list with:
- Official website and GitHub links for every entry where both exist
- One-line descriptions for each project
- Tags like
coding,browser,research,multi-agent,cloud, andcli - Comparison dimensions such as deployment model, tool use, memory, local model support, and collaboration model
- A richer protocol section with references and short explanations
- Related resources and adjacent awesome lists
Contributing 🤝
Contributions are welcome. Feel free to open an Issue or Pull Request to:
- Add a missing project
- Fix a category or release date
- Improve naming consistency
- Add links, descriptions, or tags
- Expand the protocol section
Suggested contribution format:
| Project | Category | Developer / Organization | Language | Open Source | Release Date | Link | Notes |
License 📄
This repository is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
You are free to share and adapt the material for any purpose, including commercial use, as long as appropriate attribution is given.