Awesome Agentic Engineering Resources
June 18, 2026 · View on GitHub
Awesome Agentic Engineering Resources
A curated list of high-signal resources — articles, books, courses, cookbooks, papers, playbooks, benchmarks, talks, podcasts, and newsletters — for agentic engineering and AI engineering.
This is a resources list, not a tools list. Open-source tools for building agentic systems live in the sister list awesome-production-agentic-systems; production ML tooling lives in awesome-production-machine-learning. This list covers the learning, design, and operational resources that sit alongside those tools — including both:
Agentic engineering focuses on using AI agents to do software engineering (Copilot, Cursor, Claude Code, Aider, Cline, Windsurf, Codex; spec-driven development; context engineering; agent IDE rules and memory files; SWE benchmarks). AI / agentic systems engineering focuses on building agentic and LLM-powered systems (architecture, RAG, memory, tool use & MCP, orchestration, multi-agent coordination, evaluation, observability, guardrails, safety, fine-tuning, inference, product/UX, economics, teams).
You can keep up to date by watching this repo for the monthly releases summarising newly added resources 🤩
This list was proposed in EthicalML/awesome-production-machine-learning#709 as a sister list focused on resources rather than tools.
Legend
Resources are tagged with icons so you can scan and filter at a glance:
| Icon | Meaning |
|---|---|
| ⭐ | Editors' pick — start here |
| 🆓 | Free to access |
| 💰 | Paid |
| 📘 | Book |
| 🧑🎓 | Course |
| 🎥 | Video / talk |
| 🎧 | Audio / podcast |
| 📄 | Paper |
| 🛠️ | Hands-on cookbook / tutorial |
| 📋 | Playbook / design-pattern catalog |
| 🧪 | Benchmark / leaderboard |
| 🏗️ | Reference implementation / case study |
| 📰 | Newsletter |
Quick links to sections on this page
Topic Coverage Matrix
Resources are organised as a matrix: the top-level sections above (rows) are resource types, and each section is sub-divided by topic. The 21 topics, T1–T21, are shared across sections. This lets you read vertically ("what papers exist on RAG?") or horizontally ("where do I find resources on Coding Agents?").
Topics:
| # | Topic |
|---|---|
| T1 | Coding Agents & AI-Assisted Development (Copilot, Cursor, Claude Code, Aider, Cline, Windsurf, Codex) |
| T2 | Spec-Driven Development & Context Engineering (AGENTS.md, spec-kit, rules files) |
| T3 | Agent IDE Rules, Memory Files & Developer Workflows |
| T4 | SWE Benchmarks & Coding Evaluation |
| T5 | Autonomous Software Agents & Long-Horizon Engineering Tasks |
| T6 | LLM Application Architecture & System Design |
| T7 | Prompt Engineering |
| T8 | Retrieval-Augmented Generation (RAG) |
| T9 | Memory Systems & Long-Context |
| T10 | Tool Use, Function Calling & MCP |
| T11 | Orchestration, Planning & Design Patterns |
| T12 | Multi-Agent Systems & Coordination |
| T13 | Evaluation & Testing |
| T14 | Observability, Tracing & Debugging |
| T15 | Guardrails & Security (prompt injection, jailbreaks, red-teaming) |
| T16 | Safety, Alignment & Responsible AI |
| T17 | Fine-tuning, Post-training, RLHF & Reasoning Training |
| T18 | Inference, Serving, Cost & Latency |
| T19 | Voice, Multi-modal & Embodied Agents |
| T20 | Product, UX & Human-AI Interaction Design |
| T21 | Economics, Teams, Hiring & Org Design |
Coverage (● = populated, ○ = opportunistic / partial, — = out of scope for that row):
| Row \ Topic | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9 | T10 | T11 | T12 | T13 | T14 | T15 | T16 | T17 | T18 | T19 | T20 | T21 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Core & Foundations | ● | ● | ○ | ○ | ○ | ● | ● | ● | ○ | ● | ● | ○ | ● | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ |
| Communities | ● | ○ | ○ | ○ | ○ | ● | ● | ● | ○ | ● | ● | ○ | ● | ● | ○ | ● | ● | ● | ○ | ● | ● |
| Courses | ● | ○ | ○ | ● | ○ | ● | ● | ● | ○ | ● | ● | ● | ● | ● | ● | ● | ● | ● | ○ | ○ | ○ |
| Books | ● | ○ | ○ | — | ○ | ● | ● | ● | ○ | ● | ● | ○ | ● | ○ | ● | ● | ● | ● | ○ | ● | ● |
| Articles & Essays | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |
| Tutorials & Cookbooks | ● | ● | ● | ○ | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ○ | ● | ● | ● | ○ | — |
| Playbooks & Patterns | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ○ | ● | ○ | ● | ● |
| Papers & Research | ● | ○ | — | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ○ |
| Benchmarks | ● | — | — | ● | ● | ○ | ○ | ● | ○ | ● | ● | ● | ● | ○ | ● | ● | ○ | ● | ● | ○ | — |
| Reference Impls | ● | ● | ● | ● | ● | ● | ○ | ● | ● | ● | ● | ● | ● | ● | ● | ○ | ● | ● | ● | ● | ● |
| Talks & Conferences | ● | ● | ○ | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● | ● |
| Podcasts | ● | ○ | ○ | ○ | ● | ● | ● | ● | ○ | ● | ● | ● | ● | ● | ● | ● | ● | ● | ○ | ● | ● |
| Newsletters | ● | ○ | ○ | ○ | ○ | ● | ● | ● | ○ | ● | ● | ○ | ● | ● | ● | ● | ● | ● | ○ | ● | ● |
The Trending / What's New, Milestones Timeline, Governance & Responsible AI, Product / UX / Economics, and Teams, Hiring & Org Design sections collapse across topics and are presented as curated lists rather than matrix cells.
Contributing to the list
Please review our CONTRIBUTING.md before submitting a PR — it explains the one-line description style, how to pick the right row/topic cell, and the quality bar for inclusion. Thank you to the community for supporting the list's growth 🚀
Want to receive recurring updates on this repo and other advancements
| You can join the Machine Learning Engineer newsletter. Join over 70,000 ML professionals and enthusiasts who receive weekly curated articles & tutorials on production Machine Learning. |
|
| Also check out Awesome Production Agentic Systems and Awesome Production Machine Learning, the sister lists of open-source tools for agentic systems and production ML respectively. |
|
Main Content
⭐ Trending / What's New
Rotating pinned items: the most-discussed agentic & AI-engineering resources of the current cycle. Refreshed regularly — see CONTRIBUTING.md for nomination criteria.
- ⭐ 🆓 A practical guide to building agents — OpenAI (2025). 30-page PDF covering when (and when not) to build agents, tool design, guardrails, and human-in-the-loop patterns.
- 🆓 AGENTS.md — Community standard (2025) for per-repo agent instructions, now read by Claude Code, Codex, Aider, Cursor, Cline, Windsurf and others.
- ⭐ 🆓 Building effective agents — Anthropic (2024). The most-cited reference for agent design patterns (augmented LLM, prompt chaining, routing, parallelisation, orchestrator-workers, evaluator-optimiser, autonomous agents). Start here before any other agent reading.
- 🆓 Claude Code: Best practices for agentic coding — Anthropic (2025). CLAUDE.md, slash-commands, headless mode, custom permissions — the canonical how-to-use-Claude-Code reference.
- 🆓 How to build an agent — Thorsten Ball / Amp (2025). Viral step-by-step implementation of a tool-using coding agent in ~400 lines of Go, demystifying "what is an agent" in code.
- ⭐ 🆓 How we built our multi-agent research system — Anthropic (2025). Production retrospective on Claude's multi-agent research mode: orchestrator/subagent split, prompt engineering for agents, evaluation and failure modes.
- ⭐ 🆓 The bitter lesson of AI agents / Agentic Coding: The Future of Software Development with Agents — Armin Ronacher (2025). Widely-shared essays on what it actually feels like to ship with agentic coding tools day-to-day.
- 🆓 The new code — Sean Grove / OpenAI on Latent Space (2025). Specs-as-code: the spec is the new artefact, models are the compiler. Heavily cited in the AGENTS.md / spec-kit discussion.
🧭 Core & Foundations
Canonical "what is agentic engineering / AI engineering" reading. Start here.
T1 · Coding Agents & AI-Assisted Development
- ⭐ 🆓 Building effective agents — Anthropic. The reference taxonomy of agent design patterns (workflows vs. agents).
- ⭐ 🆓 Claude Code: Best practices for agentic coding — Anthropic. CLAUDE.md, tools, slash-commands, headless mode.
- 🆓 Here's how I use LLMs to help me write code — Simon Willison. Grounded, practice-first account of daily LLM-assisted development.
- 🆓 How to build an agent — Thorsten Ball. A working coding agent in ~400 lines; the clearest "agents are not magic" walkthrough.
T2 · Spec-Driven Development & Context Engineering
- 🆓 AGENTS.md — Community standard for per-repo agent instructions.
- 🆓 spec-kit — GitHub's toolkit and essay set on spec-driven development with coding agents.
- ⭐ 🆓 The new code — Sean Grove (OpenAI) on Latent Space. The canonical "specs are the new code" essay.
- 🆓 The rise of "context engineering" — LangChain. Why prompt engineering became context engineering.
T6 · LLM Application Architecture & System Design
- ⭐ 📘 💰 AI Engineering — Chip Huyen (O'Reilly, 2025). The textbook for building LLM applications end-to-end.
- 🆓 Emerging Architectures for LLM Applications — a16z. The widely-shared reference diagram for the LLM app stack.
- ⭐ 🆓 Patterns for Building LLM-based Systems & Products — Eugene Yan. Evaluation, RAG, fine-tuning, caching, guardrails, defensive UX, collecting feedback — the reference pattern catalogue.
- 🆓 What We Learned from a Year of Building with LLMs — Yan, Bensal, Bhawal, Husain, Shankar (2024). Tactical, operational, and strategic lessons distilled from shipping.
T7 · Prompt Engineering
- 🆓 Anthropic: Prompt engineering overview — Anthropic's practical guide for Claude.
- 🆓 OpenAI: Prompt engineering — OpenAI official guide.
- ⭐ 🆓 Prompt Engineering — Lilian Weng (OpenAI). The systematic taxonomy.
- 🆓 Prompt Engineering Guide — DAIR.AI. Continuously updated, with per-technique deep-dives.
T8 · Retrieval-Augmented Generation (RAG)
- ⭐ 🆓 Advanced RAG Techniques / Pinecone Learn — Pinecone. The hub for RAG primers and patterns.
- 🆓 RAG is more than just embedding search — Jason Liu. Systems-view RAG: query understanding, tool routing, evaluation.
- ⭐ 📄 🆓 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al. (2020). The original RAG paper.
- 🆓 Retrieval-Augmented Generation for LLMs: A Survey — Gao et al. (2023). The reference survey.
T10 · Tool Use, Function Calling & MCP
- 🆓 Function calling guide — OpenAI. The canonical reference for structured tool calls.
- ⭐ 🆓 Introducing the Model Context Protocol — Anthropic (2024). The canonical introduction to MCP.
- ⭐ 🆓 Model Context Protocol — Specification — Open protocol docs and SDKs.
- 📄 🆓 Toolformer: Language Models Can Teach Themselves to Use Tools — Schick et al. (2023). The foundational tool-use paper.
T11 · Orchestration, Planning & Design Patterns
- ⭐ 🆓 Building effective agents — Anthropic. The orchestration pattern taxonomy.
- 🆓 LLM Powered Autonomous Agents — Lilian Weng. The canonical deep-dive on planning, memory, and tool use in agent loops.
- 📄 🆓 ReAct: Synergizing Reasoning and Acting in Language Models — Yao et al. (2022). The foundational reason+act loop.
- 📄 🆓 The Rise and Potential of LLM Based Agents: A Survey — Xi et al. (2023). Survey of agent architectures and components.
T13 · Evaluation & Testing
- 📄 🆓 Judging LLM-as-a-Judge — Zheng et al. (2023). The foundational LLM-as-judge paper (MT-Bench, Chatbot Arena).
- 🆓 Task-Specific LLM Evals that Do & Don't Work — Eugene Yan. A pragmatic survey of eval techniques per task type.
- 🆓 Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences — Shankar et al. (2024). How to make LLM-judges trustworthy.
- ⭐ 🆓 Your AI Product Needs Evals — Hamel Husain. The most-cited essay on why and how to build evals for LLM products.
🗓️ Milestones Timeline
Dated, field-defining events that shaped agentic & AI engineering.
| Date | Event | Reference |
|---|---|---|
| 2017-06 | Transformer architecture introduced | Attention Is All You Need |
| 2020-05 | GPT-3 shows in-context learning at scale | Language Models are Few-Shot Learners |
| 2020-05 | RAG framework introduced | RAG for Knowledge-Intensive NLP |
| 2021-06 | GitHub Copilot preview launches — first mainstream AI coding assistant | GitHub blog |
| 2022-01 | Chain-of-Thought prompting | Wei et al. |
| 2022-03 | InstructGPT / RLHF | Ouyang et al. |
| 2022-10 | ReAct: reasoning + acting agent loop | Yao et al. |
| 2022-11 | ChatGPT release — mainstream adoption inflection | OpenAI |
| 2023-03 | GPT-4 release | OpenAI |
| 2023-03 | HuggingGPT / Toolformer-era tool use | Toolformer |
| 2023-03 | LangChain & LlamaIndex hit mainstream | — |
| 2023-05 | Voyager: open-ended agents in Minecraft | Voyager |
| 2023-06 | Simon Willison coins "prompt injection" as a durable threat category | SW blog |
| 2023-10 | SWE-bench released — real-world coding eval | SWE-bench |
| 2023-12 | Mixture-of-experts open models (Mixtral) | Mistral |
| 2024-03 | Devin demo — autonomous software agent pitch | Cognition |
| 2024-05 | GPT-4o: native multi-modal + realtime voice | OpenAI |
| 2024-06 | Anthropic's "Building effective agents" publishes | Anthropic |
| 2024-07 | SWE-bench Verified launched | OpenAI |
| 2024-09 | o1 reveals reasoning-model era | OpenAI |
| 2024-11 | Model Context Protocol (MCP) announced | Anthropic |
| 2025-02 | Claude Code general availability | Anthropic |
| 2025-05 | AGENTS.md published as cross-agent standard | agents.md |
| 2025-06 | GitHub spec-kit / "new code" essays formalise spec-driven dev | spec-kit |
👥 Communities
Discords, Slacks, forums, and meetups where practitioners gather.
- 🆓 AI Dev Board — Community-curated hub for AI engineering resources and discussions.
- 🆓 AI Engineer World's Fair / Latent Space Discord — Practitioner community anchoring the AI Engineer conference series.
- 🆓 Anthropic Discord — Official Claude / Claude Code / MCP community.
- 🆓 Cursor Community Forum — User-driven forum for Cursor rules, MCP, and workflows.
- 🆓 EleutherAI Discord — Open research community; strong training/interpretability discussion.
- 🆓 Hacker News — Filter for "LLM", "agent", "Claude", "Cursor" — where engineering-side essays trend.
- 🆓 Hugging Face Discord & Forums — Transformers, TRL, PEFT, model-hub discussions.
- 🆓 LangChain Discord — Heavy day-to-day Q&A on agent orchestration, RAG, evaluation, MCP.
- 🆓 LlamaIndex Discord — RAG-centric builder community with active reference-impl discussion.
- 🆓 MLOps Community — Slack + podcast + meetups; the biggest practitioner community at the ops/engineering intersection. Active agent and LLM-ops channels.
- 🆓 r/LocalLLaMA — The definitive open-weights / local-inference forum; fastest signal for new models, quantisation, and serving.
- 🆓 r/MachineLearning — Academic and practitioner mix; where new papers and threads get dissected.
🧑🎓 Courses
Structured courses — free and paid, university and industry.
T1 · Coding Agents & AI-Assisted Development
- ⭐ 🧑🎓 🆓 AI Python for Beginners — DeepLearning.AI (Andrew Ng). Gateway to AI-assisted coding.
- 🧑🎓 🆓 GitHub Copilot Fundamentals — Microsoft Learn. Official training path.
- 🧑🎓 🆓 Pair Programming with a Large Language Model — DeepLearning.AI + Google.
T4 · SWE Benchmarks & Coding Evaluation
- 🧑🎓 🆓 Evaluating and Debugging Generative AI — DeepLearning.AI + W&B. Covers coding-eval mechanics.
- 🧑🎓 🆓 Mastering LLMs: Evals — Hamel Husain & Shreya Shankar (Maven). Companion evals-for-LLMs curriculum.
- 🧑🎓 🆓 SWE-bench tutorial — Princeton NLP. Free, self-paced walk-through of running and scoring coding evals.
T6 · LLM Application Architecture & System Design
- 🧑🎓 🆓 Building Systems with the ChatGPT API — DeepLearning.AI + OpenAI.
- 🧑🎓 🆓 CS25: Transformers United — Stanford. Seminal deep-dive seminar series.
- ⭐ 🧑🎓 🆓 LLM Bootcamp — Full Stack Deep Learning. Free 2-day bootcamp on building LLM apps end-to-end.
T7 · Prompt Engineering
- 🧑🎓 🆓 Anthropic Prompt Engineering Interactive Tutorial — Anthropic. Hands-on, notebook-based.
- ⭐ 🧑🎓 🆓 ChatGPT Prompt Engineering for Developers — Andrew Ng & Isa Fulford (OpenAI).
- 🧑🎓 🆓 Prompt Engineering Guide (DAIR.AI) — Self-paced, continuously updated.
T8 · Retrieval-Augmented Generation (RAG)
- 🧑🎓 🆓 Advanced Retrieval for AI with Chroma — DeepLearning.AI.
- 🧑🎓 🆓 Building and Evaluating Advanced RAG Applications — DeepLearning.AI + LlamaIndex + TruEra.
- 🧑🎓 🆓 LangChain Chat with Your Data — DeepLearning.AI + LangChain.
- 🧑🎓 💰 Systematically Improving RAG Applications — Jason Liu on Maven.
T10 · Tool Use, Function Calling & MCP
- 🧑🎓 🆓 Functions, Tools and Agents with LangChain — DeepLearning.AI + LangChain.
- 🧑🎓 🆓 Introduction to MCP — Anthropic official quickstart.
- 🧑🎓 🆓 MCP: Build Rich-Context AI Apps with Anthropic — DeepLearning.AI + Anthropic.
T11 · Orchestration, Planning & Design Patterns
- 🧑🎓 🆓 AI Agentic Design Patterns with AutoGen — DeepLearning.AI + Microsoft.
- 🧑🎓 🆓 AI Agents in LangGraph — DeepLearning.AI + LangChain.
- 🧑🎓 🆓 Hugging Face Agents Course — Hugging Face. Free, certifying course on agent fundamentals.
T12 · Multi-Agent Systems
- 🧑🎓 🆓 Building Agentic RAG with LlamaIndex — DeepLearning.AI + LlamaIndex.
- 🧑🎓 🆓 Multi AI Agent Systems with crewAI — DeepLearning.AI + crewAI.
- 🧑🎓 🆓 Practical Multi AI Agents and Advanced Use Cases with crewAI — DeepLearning.AI.
T13 · Evaluation & Testing
- ⭐ 🧑🎓 💰 AI Evals For Engineers & PMs — Hamel Husain & Shreya Shankar on Maven. The industry-standard evals cohort course.
- 🧑🎓 🆓 Automated Testing for LLMOps — DeepLearning.AI + CircleCI.
- 🧑🎓 🆓 Quality and Safety for LLM Applications — DeepLearning.AI + WhyLabs.
T14 · Observability, Tracing & Debugging
- 🧑🎓 🆓 Evaluating LLMs with Arize — Arize course hub.
- 🧑🎓 🆓 LangSmith Academy — LangChain. Free self-paced LangSmith courses covering tracing and evals.
- 🧑🎓 🆓 LLMOps — DeepLearning.AI + Google Cloud.
T15 · Guardrails & Security
- 🧑🎓 🆓 Prompt Injection Attacks (Learn Prompting) — Learn Prompting. Open course covering injection/jailbreak taxonomies.
- 🧑🎓 🆓 Red Teaming LLM Applications — DeepLearning.AI + Giskard.
- 🧑🎓 🆓 Safe and Reliable AI via Guardrails — DeepLearning.AI + Guardrails AI.
T16 · Safety, Alignment & Responsible AI
- 🧑🎓 🆓 AI Safety Fundamentals — BlueDot Impact. The standard entry curriculum.
- 🧑🎓 🆓 ARENA (Alignment Research Engineer Accelerator) — Hands-on alignment / interpretability.
- 🧑🎓 🆓 Intro to AI Safety, Remastered — Richard Ngo / BlueDot. Free reading curriculum.
T17 · Fine-tuning, Post-training & RLHF
- ⭐ 🧑🎓 🆓 Finetuning Large Language Models — DeepLearning.AI + Lamini.
- 🧑🎓 🆓 Hugging Face NLP Course (incl. RLHF chapter) — Hugging Face.
- 🧑🎓 🆓 Reinforcement Learning from Human Feedback — DeepLearning.AI + Google Cloud.
T18 · Inference, Serving, Cost & Latency
- 🧑🎓 🆓 CUDA Mode lectures — Community lectures on GPU inference internals.
- 🧑🎓 🆓 Efficiently Serving LLMs — DeepLearning.AI + Predibase.
- 🧑🎓 🆓 Quantization Fundamentals with Hugging Face — DeepLearning.AI + HF.
📘 Books
Published and in-progress books covering agentic & AI engineering.
T1 · Coding Agents & AI-Assisted Development
- ⭐ 📘 💰 AI-Assisted Programming — Tom Taulli (O'Reilly, 2024). Practical coverage of Copilot/Cursor/Claude workflows.
- 📘 💰 Prompt Engineering for Generative AI — James Phoenix & Mike Taylor (O'Reilly, 2024). Includes heavy coverage of code-generation prompting patterns.
T6 · LLM Application Architecture & System Design
- ⭐ 📘 💰 AI Engineering: Building Applications with Foundation Models — Chip Huyen (O'Reilly, 2025). The reference textbook for the field.
- 📘 💰 Designing Machine Learning Systems — Chip Huyen (O'Reilly, 2022). The prior-generation canonical ML-systems text; still essential for data/infra context.
- 📘 💰 Generative AI on AWS — Chris Fregly, Antje Barth, Shelbee Eigenbrode (O'Reilly, 2023).
T7 · Prompt Engineering
- 📘 🆓 Prompt Engineering for LLMs — John Berryman & Albert Ziegler (O'Reilly, 2024). From Copilot's original tech-lead.
- 📘 💰 The Prompt Report — Schulhoff et al. (2024). A 76-page survey that effectively functions as a book-length prompting reference.
T8 · RAG
- 📘 💰 Building LLM Apps — Valentina Alto (Wiley, 2024). RAG-heavy application text.
- 📘 💰 RAG Made Simple - Nir Diamant (2025). A visual, code-free walkthrough of 22 retrieval-augmented generation techniques explained through diagrams and analogies.
- 📘 🆓 RAG-Driven Generative AI — Denis Rothman (Packt, 2024).
T10 · Tool Use & MCP
- 📘 💰 Building Intelligent Apps with OpenAI — Olivier Caelen & Marie-Alice Blete (O'Reilly, 2024). Heavy function-calling coverage.
T11 · Orchestration & Design Patterns
- 📘 💰 Generative AI with LangChain — Ben Auffarth (Packt, 2023). Orchestration patterns end-to-end.
T13 · Evaluation
- 📘 💰 Prompt Engineering for Generative AI — Phoenix & Taylor (O'Reilly, 2024). Chapter-length eval coverage.
T15 · Guardrails & Security
- 📘 💰 Generative AI Security — Ken Huang et al. (Apress, 2024).
- 📘 💰 The Developer's Playbook for Large Language Model Security — Steve Wilson (O'Reilly, 2024). OWASP LLM Top 10 project lead's book.
T16 · Safety, Alignment & Responsible AI
- 📘 💰 Human Compatible — Stuart Russell (2019). The foundational alignment argument.
- 📘 💰 The Alignment Problem — Brian Christian (2020). The canonical popular-press primer.
T17 · Fine-tuning & Post-training
- ⭐ 📘 💰 Build a Large Language Model (From Scratch) — Sebastian Raschka (Manning, 2024). The reference hands-on text.
- 📘 💰 Hands-On Large Language Models — Jay Alammar & Maarten Grootendorst (O'Reilly, 2024).
T18 · Inference & Serving
- 📘 💰 Efficient Processing of Deep Neural Networks — Sze et al. (Morgan & Claypool). Hardware/inference reference.
T20 · Product & UX
- 📘 💰 Designing Machine Learning Systems — Chip Huyen. Includes pragmatic product/UX chapters.
- 📘 💰 Human-AI Interaction Design — IxDF topic hub.
T21 · Economics, Teams & Org
- 📘 💰 Managing Machine Learning Projects — Simon Thompson (Manning).
- 📘 🆓 The Pragmatic Engineer's AI coverage — Gergely Orosz. Regularly-updated editorial that functions as a rolling book on AI-engineering org design.
✍️ Articles & Essays
Long-form writing from canonical authors and engineering teams.
T1 · Coding Agents & AI-Assisted Development
- 🆓 Agentic Coding: The Future of Software Development — Armin Ronacher.
- ⭐ 🆓 Here's how I use LLMs to help me write code — Simon Willison.
- 🆓 Revenge of the junior developer — Steve Yegge (Sourcegraph).
- 🆓 The death of the stubborn developer — Steve Yegge.
T2 · Spec-Driven Development & Context Engineering
- 🆓 Context Engineering — LangChain.
- 🆓 Spec-driven development with AI — GitHub Blog.
- ⭐ 🆓 The new code — Sean Grove / Latent Space.
- 🆓 The rise of "context engineering" — LangChain.
T3 · Agent IDE Rules, Memory Files & Workflows
- 🆓 Aider: Tips for using with large codebases — Aider docs.
- ⭐ 🆓 Claude Code: Best practices for agentic coding — Anthropic.
- 🆓 Cursor rules directory — Community catalogue of
.cursorrulesfiles. - 🆓 My Claude Code setup — widely-shared CLAUDE.md + slash-command playbook.
T4 · SWE Benchmarks & Coding Evaluation
- ⭐ 🆓 Introducing SWE-bench Verified — OpenAI.
- 🆓 Measuring an AI system's ability to do ML R&D — METR.
- 🆓 The leaderboard illusion — Singh et al. on bench-gaming.
- 🆓 Why we built Terminal-Bench — Stanford / Laude.
T5 · Autonomous Software Agents
- 🆓 Devin, a software engineer — Cognition.
- 🆓 Don't build multi-agents — Cognition. Contrarian but important counterpoint to multi-agent maximalism.
- ⭐ 🆓 How we built our multi-agent research system — Anthropic.
- 🆓 SWE-agent: Agent-Computer Interfaces — Princeton NLP writeup.
T6 · LLM Application Architecture
- 🆓 Emerging Architectures for LLM Applications — a16z.
- ⭐ 🆓 Patterns for Building LLM-based Systems & Products — Eugene Yan.
- 🆓 Twelve factor agents — HumanLayer. The "12-factor app" equivalent for agent apps.
- 🆓 What We Learned from a Year of Building with LLMs — Yan/Bensal/Bhawal/Husain/Shankar.
T7 · Prompt Engineering
- 🆓 A guide to prompting Claude — Anthropic.
- ⭐ 🆓 Prompt Engineering — Lilian Weng.
- 🆓 Prompting is programming — Eugene Yan.
- 🆓 The prompt report — Learn Prompting team summary of their 76-page survey.
T8 · Retrieval-Augmented Generation (RAG)
- 🆓 Advanced RAG Techniques — Pinecone.
- 🆓 How to improve your RAG system's performance — Anyscale.
- 🆓 Practical considerations in RAG application design — Eugene Yan.
- ⭐ 🆓 RAG is more than just embedding search — Jason Liu.
T9 · Memory Systems & Long-Context
- 🆓 Extending Context Length in LLMs — Hugging Face.
- ⭐ 🆓 Lost in the Middle: How Language Models Use Long Contexts — Liu et al.
- 🆓 Memory for agents — LangChain.
- 🆓 The agentic memory stack — Letta (MemGPT).
T10 · Tool Use, Function Calling & MCP
- 🆓 Designing MCP servers that agents actually use — Phil Schmid.
- 🆓 Function calling with LLMs: a practical guide — DAIR.AI.
- ⭐ 🆓 Introducing the Model Context Protocol — Anthropic.
- 🆓 Tool use is eating the world — Latent Space.
T11 · Orchestration & Design Patterns
- 🆓 Agent design patterns — Andrew Ng, The Batch series.
- 🆓 AI agent frameworks — Latent Space comparative review.
- 🆓 Building effective agents — Anthropic.
- ⭐ 🆓 LLM Powered Autonomous Agents — Lilian Weng.
T12 · Multi-Agent Systems & Coordination
- 🆓 AutoGen: Enabling next-gen LLM applications — Microsoft.
- 🆓 Don't build multi-agents — Cognition.
- ⭐ 🆓 How we built our multi-agent research system — Anthropic.
- 🆓 Multi-agent workflows — LangChain.
T13 · Evaluation & Testing
- 🆓 Creating a LLM-as-a-Judge that drives business results — Hamel Husain.
- 🆓 LLM evals: everything I learned in 12 months — Shreya Shankar.
- 🆓 Task-specific LLM evals that do & don't work — Eugene Yan.
- ⭐ 🆓 Your AI product needs evals — Hamel Husain.
T14 · Observability, Tracing & Debugging
- 🆓 How Honeycomb uses LLMs for product experiences — Phillip Carter.
- 🆓 Logfire: observability for the LLM era — Pydantic.
- ⭐ 🆓 So you want to build an LLM observability platform — Hamel Husain (subsection of evals post; foundational).
- 🆓 The OpenTelemetry Gen AI semantic conventions — OTel.
T15 · Guardrails & Security
- 🆓 OWASP Top 10 for LLM Applications — OWASP.
- ⭐ 🆓 Prompt injection series — Simon Willison. Canonical ongoing series.
- 🆓 Red teaming LLMs — Hugging Face.
- 🆓 Universal and Transferable Adversarial Attacks on Aligned LLMs — Zou et al. (GCG attack).
T16 · Safety, Alignment & Responsible AI
- 🆓 Anthropic's Responsible Scaling Policy — Anthropic.
- ⭐ 🆓 Core Views on AI Safety — Anthropic.
- 🆓 Preparedness Framework — OpenAI.
- 🆓 Scalable oversight via debate & recursive reward modelling — DeepMind Safety Research.
T17 · Fine-tuning, Post-training & RLHF
- ⭐ 🆓 Ahead of AI — Sebastian Raschka. The canonical fine-tuning / post-training deep-dives.
- 🆓 DPO: Your language model is secretly a reward model — Rafailov et al.
- 🆓 The alignment handbook — Hugging Face.
- 🆓 The Novice's LLM Training Guide — Community reference.
T18 · Inference, Serving, Cost & Latency
- 🆓 Everything I've learned about efficient LLM inference — Baseten engineering blog.
- 🆓 GPU performance for LLM inference — vLLM team blog.
- 🆓 LLM Inference Speed of Light — Arseny Kapoulkine.
- ⭐ 🆓 Transformer Inference Arithmetic — Kipply.
T19 · Voice, Multi-modal & Embodied Agents
- 🆓 Building a voice agent with LiveKit — LiveKit Agents docs.
- ⭐ 🆓 Hello GPT-4o — OpenAI.
- 🆓 Moshi: a speech-text foundation model — Kyutai.
- 🆓 Voice-first LLM products — Latent Space.
T20 · Product, UX & Human-AI Interaction
- 🆓 Building products with AI: UX lessons / thesephist.com essays — Linus Lee.
- 🆓 Generative AI: Design Patterns (NNGroup) — Nielsen Norman Group.
- ⭐ 🆓 Maggie Appleton essays — Canonical AI-UX thinking.
- 🆓 Microsoft HAX guidelines for human-AI interaction — Microsoft Research.
T21 · Economics, Teams, Hiring & Org Design
- 🆓 16 Changes to the Way Enterprises Build Software with AI — a16z.
- 🆓 a16z AI canon — a16z.
- ⭐ 🆓 AI engineering org design — Gergely Orosz, Pragmatic Engineer.
- 🆓 Building an AI team — Eugene Yan.
🛠️ Tutorials & Cookbooks
Hands-on, code-first guides and official cookbooks from model providers and framework authors.
T1 · Coding Agents & AI-Assisted Development
- 🛠️ 🆓 Aider tutorials — Aider docs.
- ⭐ 🛠️ 🆓 Claude Code cookbook — Anthropic.
- 🛠️ 🆓 Continue.dev recipes — Continue.
T2 · Spec-Driven Development
- 🛠️ 🆓 AGENTS.md examples — Example
AGENTS.mdfiles for common stacks. - 🛠️ 🆓 GitHub spec-kit — The official spec-driven-development toolkit.
T3 · Agent IDE Rules & Workflows
- 🛠️ 🆓 awesome-cursorrules — Curated
.cursorrulesexamples. - 🛠️ 🆓 Claude Code slash-commands cookbook — Anthropic.
T5 · Autonomous Software Agents
- 🛠️ 🆓 OpenHands (formerly OpenDevin) — All Hands AI.
- 🛠️ 🆓 SWE-agent quickstart — Princeton NLP.
T6 · LLM Application Architecture
- 🛠️ 🆓 Anthropic Cookbook — Claude recipes.
- 🛠️ 🆓 Gemini API Cookbook — Google.
- 🛠️ 🆓 Hugging Face Open-Source AI Cookbook — Hugging Face.
- ⭐ 🛠️ 🆓 OpenAI Cookbook — The reference recipe library for OpenAI APIs.
T7 · Prompt Engineering
- 🛠️ 🆓 Anthropic prompt-engineering interactive tutorial — Notebook-based.
- 🛠️ 🆓 Prompt Engineering Guide notebooks — DAIR.AI.
T8 · Retrieval-Augmented Generation (RAG)
- 🛠️ 🆓 Advanced RAG notebooks — Nir Diamant. 30+ advanced RAG recipes.
- 🛠️ 🆓 LangChain RAG from scratch — LangChain.
- ⭐ 🛠️ 🆓 LlamaIndex tutorials — LlamaIndex.
- 🛠️ 🆓 Pinecone RAG handbook — Pinecone.
T9 · Memory Systems
- 🛠️ 🆓 LangGraph memory — LangChain.
- 🛠️ 🆓 Letta (MemGPT) cookbook — Letta.
- 🛠️ 🆓 Mem0 quickstart — Mem0.
T10 · Tool Use & MCP
- 🛠️ 🆓 awesome-mcp-servers — Community reference-servers catalogue.
- ⭐ 🛠️ 🆓 MCP quickstart — Anthropic.
- 🛠️ 🆓 OpenAI function calling cookbook — OpenAI.
T11 · Orchestration & Patterns
- 🛠️ 🆓 Anthropic building-effective-agents examples — Anthropic.
- 🛠️ 🆓 LangGraph tutorials — LangChain.
- 🛠️ 🆓 LlamaIndex agent tutorials — LlamaIndex.
T12 · Multi-Agent Systems
- 🛠️ 🆓 AutoGen notebook gallery — Microsoft.
- 🛠️ 🆓 CrewAI examples — CrewAI.
- 🛠️ 🆓 LangGraph multi-agent examples — LangChain.
T13 · Evaluation & Testing
- ⭐ 🛠️ 🆓 Hamel Husain's evals repo — Companion code to the evals course.
- 🛠️ 🆓 LangSmith evals tutorials — LangChain.
- 🛠️ 🆓 RAGAS tutorials — RAG-specific eval cookbook.
T14 · Observability
- 🛠️ 🆓 Arize Phoenix tutorials — Arize.
- 🛠️ 🆓 Langfuse cookbook — Langfuse.
- 🛠️ 🆓 Logfire LLM tracing tutorials — Pydantic.
T15 · Guardrails & Security
- 🛠️ 🆓 Guardrails AI cookbook — Guardrails AI.
- 🛠️ 🆓 NVIDIA NeMo Guardrails — NVIDIA.
- 🛠️ 🆓 Prompt injection CTFs (Gandalf) — Lakera. Hands-on red-team practice.
T17 · Fine-tuning & Post-training
- 🛠️ 🆓 Axolotl examples — Axolotl.
- 🛠️ 🆓 Hugging Face TRL tutorials — TRL.
- ⭐ 🛠️ 🆓 Unsloth notebooks — Fast fine-tuning recipes.
T18 · Inference & Serving
- 🛠️ 🆓 llama.cpp server — ggerganov.
- 🛠️ 🆓 TensorRT-LLM tutorials — NVIDIA.
- 🛠️ 🆓 vLLM examples — vLLM.
T19 · Voice & Multimodal
- 🛠️ 🆓 LiveKit Agents examples — LiveKit.
- 🛠️ 🆓 OpenAI Realtime API cookbook — OpenAI.
- 🛠️ 🆓 Pipecat — Daily. Voice-agent framework with extensive cookbook.
📋 Playbooks & Design-Pattern Catalogs
Opinionated, prescriptive guides distilling design patterns and operational practices.
- 📋 🆓 12-Factor Agents — HumanLayer. Opinionated operational principles for agent apps (T6/T11).
- 📋 🆓 A practical guide to building agents — OpenAI PDF (T11).
- 📋 🆓 a16z AI canon — a16z (T20/T21).
- 📋 🆓 Agentic UX — 11 runtime lifecycle patterns for supervised delegation, organized before/while/after an agent acts, with interactive mockups, production screenshots, and an MCP server for coding agents (Daniel Albinsson, 2025).
- 📋 🆓 Anthropic's prompt engineering overview — Anthropic (T7).
- ⭐ 📋 🆓 Building effective agents — Anthropic. The canonical pattern taxonomy (T11).
- 📋 🆓 Claude Code: best practices for agentic coding — Anthropic (T1/T3).
- 📋 🆓 Instructor's RAG patterns — Jason Liu (T8).
- 📋 🆓 LangGraph design patterns — LangChain (T11/T12).
- 📋 🆓 LLM observability playbook — Hamel Husain (T13/T14).
- 📋 🆓 MITRE ATLAS — Adversarial Threat Landscape for AI Systems (T15).
- 📋 🆓 NIST AI Risk Management Framework — NIST (T16).
- 📋 🆓 OpenAI's prompt-engineering playbook — OpenAI (T7).
- 📋 🆓 OWASP Top 10 for LLM Applications — OWASP. The security-pattern catalogue (T15).
- ⭐ 📋 🆓 Patterns for Building LLM-based Systems & Products — Eugene Yan (T6).
- 📋 🆓 Prompt-injection defence patterns — Simon Willison (T15).
- 📋 🆓 RAG-Fusion, HyDE, and other advanced retrieval patterns — Nir Diamant (T8).
- 📋 🆓 The LLM inference playbook — Anyscale (T18).
- 📋 🆓 UX design patterns for AI products — Nielsen Norman Group (T20).
- ⭐ 📋 🆓 What We Learned from a Year of Building with LLMs — Yan/Bensal/Bhawal/Husain/Shankar (T6/T13).
📄 Papers & Research
Foundational papers, surveys, and benchmark papers. Includes a dated milestone-papers table.
Milestone Papers
T1 · Coding Agents & T4 · SWE Benchmarks
- 📄 🆓 AutoCodeRover: Autonomous Program Improvement — Zhang et al.
- 📄 🆓 BigCodeBench — Zhuo et al.
- 📄 🆓 LiveCodeBench — Jain et al.
- 📄 🆓 SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering — Yang et al.
- 📄 🆓 SWE-bench: Can LMs Resolve Real-World GitHub Issues? — Jimenez et al.
T5 · Autonomous SWE Agents
- 📄 🆓 Agentless: Demystifying LLM-based Software Engineering Agents — Xia et al.
- 📄 🆓 OpenHands / OpenDevin — All Hands AI.
- 📄 🆓 Voyager: An Open-Ended Embodied Agent with LLMs — Wang et al.
T6 · App Architecture
- 📄 🆓 Emerging Architectures for LLM Applications — a16z.
- 📄 🆓 The Prompt Report — Schulhoff et al.
T7 · Prompt Engineering
- 📄 🆓 Chain-of-Thought Prompting Elicits Reasoning — Wei et al.
- 📄 🆓 Large Language Models are Zero-Shot Reasoners — Kojima et al. ("Let's think step by step").
- 📄 🆓 Self-Consistency Improves CoT — Wang et al.
- 📄 🆓 Tree of Thoughts — Yao et al.
T8 · RAG
- 📄 🆓 Dense Passage Retrieval — Karpukhin et al.
- 📄 🆓 Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE) — Gao et al.
- 📄 🆓 RAG for LLMs: A Survey — Gao et al.
- 📄 🆓 Retrieval-Augmented Generation for Knowledge-Intensive NLP — Lewis et al.
- 📄 🆓 Self-RAG: Learning to Retrieve, Generate, and Critique — Asai et al.
T9 · Memory
- 📄 🆓 Generative Agents: Interactive Simulacra of Human Behavior — Park et al.
- 📄 🆓 Lost in the Middle — Liu et al.
- 📄 🆓 MemGPT: Towards LLMs as Operating Systems — Packer et al.
T10 · Tool Use & MCP
- 📄 🆓 Berkeley Function-Calling Leaderboard — UC Berkeley.
- 📄 🆓 Gorilla: LLM Connected with Massive APIs — Patil et al.
- 📄 🆓 MRKL Systems — Karpas et al.
- 📄 🆓 Toolformer — Schick et al.
T11 · Orchestration & Patterns
- 📄 🆓 ReAct: Synergizing Reasoning and Acting — Yao et al.
- 📄 🆓 Reflexion: Language Agents with Verbal Reinforcement Learning — Shinn et al.
- 📄 🆓 Self-Refine: Iterative Refinement with Self-Feedback — Madaan et al.
- 📄 🆓 The Rise and Potential of LLM-based Agents: A Survey — Xi et al.
T12 · Multi-Agent
- 📄 🆓 A Survey on LLM-based Autonomous Agents — Wang et al.
- 📄 🆓 AutoGen — Wu et al.
- 📄 🆓 CAMEL: Communicative Agents for Mind Exploration — Li et al.
- 📄 🆓 MetaGPT — Hong et al.
T13 · Evaluation
- 📄 🆓 HELM: Holistic Evaluation of Language Models — Liang et al.
- 📄 🆓 Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena — Zheng et al.
- 📄 🆓 Who Validates the Validators? — Shankar et al.
T14 · Observability
T15 · Guardrails & Security
- 📄 🆓 Many-shot Jailbreaking — Anthropic.
- 📄 🆓 Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection — Greshake et al.
- 📄 🆓 Universal and Transferable Adversarial Attacks on Aligned LLMs (GCG) — Zou et al.
T16 · Safety & Alignment
- 📄 🆓 Concrete Problems in AI Safety — Amodei et al.
- 📄 🆓 Constitutional AI — Bai et al.
- 📄 🆓 Scalable Agent Alignment via Reward Modeling — Leike et al.
T17 · Fine-tuning & Post-training
- 📄 🆓 Constitutional AI / RLAIF — Bai et al.
- 📄 🆓 Direct Preference Optimization (DPO) — Rafailov et al.
- 📄 🆓 LoRA: Low-Rank Adaptation — Hu et al.
- 📄 🆓 QLoRA — Dettmers et al.
- 📄 🆓 Training LMs to follow instructions with human feedback (InstructGPT) — Ouyang et al.
T18 · Inference & Serving
- 📄 🆓 Efficient Memory Management for LLM Serving with PagedAttention (vLLM) — Kwon et al.
- 📄 🆓 Fast Inference from Transformers via Speculative Decoding — Leviathan et al.
- 📄 🆓 FlashAttention — Dao et al.
- 📄 🆓 SGLang: Efficient Execution of Structured Language Model Programs — Zheng et al.
T19 · Voice & Multimodal
- 📄 🆓 Moshi — Kyutai.
- 📄 🆓 Robust Speech Recognition via Large-Scale Weak Supervision (Whisper) — Radford et al.
- 📄 🆓 Seamless: Multilingual Expressive and Streaming Speech Translation — Meta.
T20 · Product & UX
- 📄 🆓 Guidelines for Human-AI Interaction — Amershi et al. (Microsoft Research, CHI 2019).
🧪 Benchmarks & Leaderboards
Public benchmarks and leaderboards for coding agents, tool use, RAG, evaluation, and more.
T1 / T4 · Coding Agents & SWE Benchmarks
- 🧪 🆓 BigCodeBench — Practical programming with diverse function calls.
- 🧪 🆓 HumanEval+ / EvalPlus — Strengthened HumanEval.
- 🧪 🆓 LiveCodeBench — Rolling contamination-free coding benchmark.
- 🧪 🆓 MLE-bench — OpenAI. Kaggle-style ML engineering benchmark.
- ⭐ 🧪 🆓 SWE-bench — Real-world GitHub-issue resolution benchmark; Verified subset is the de-facto industry standard.
- 🧪 🆓 Terminal-Bench — Stanford / Laude. Long-horizon terminal task benchmark.
T5 · Autonomous Agents
- 🧪 🆓 AgentBench — Tsinghua. Broad agent capability benchmark.
- 🧪 🆓 GAIA — General AI Assistants benchmark.
- 🧪 🆓 MLE-bench — ML-engineering agents.
- 🧪 🆓 OSWorld — Desktop OS-controlling agents.
- 🧪 🆓 WebArena / VisualWebArena — Web-navigation agents.
T8 · RAG
- 🧪 🆓 ARES — Automated RAG evaluation.
- 🧪 🆓 BEIR — Zero-shot IR benchmark.
- 🧪 🆓 MTEB — Massive Text Embedding Benchmark.
- 🧪 🆓 RAGAS — Framework and leaderboard for RAG eval.
T10 · Tool Use & Function Calling
- 🧪 🆓 API-Bank — Alibaba. Tool-augmented assistants.
- 🧪 🆓 τ-bench — Sierra. Tool-agent-user interaction benchmark.
- 🧪 🆓 Berkeley Function-Calling Leaderboard (BFCL) — UC Berkeley.
T11 · Orchestration / T12 · Multi-Agent
- 🧪 🆓 AgentBench — General agent-capability.
- 🧪 🆓 AgentBoard — HKUST. Analytic, fine-grained agent eval.
T13 · Evaluation
- 🧪 🆓 Chatbot Arena / LMSYS Arena — Human-preference leaderboard.
- 🧪 🆓 HELM — Stanford CRFM. Holistic evaluation.
- 🧪 🆓 MMLU-Pro — Harder MMLU.
- 🧪 🆓 MT-Bench — LLM-as-judge multi-turn.
T15 · Guardrails & Security
- 🧪 🆓 AdvBench / HarmBench — CAIS. Adversarial / red-team benchmarks.
- 🧪 🆓 JailbreakBench — Chao et al.
- 🧪 🆓 PurpleLlama CyberSecEval — Meta.
T16 · Safety & Alignment
- 🧪 🆓 BBQ — Bias benchmark.
- 🧪 🆓 ToxiGen — Toxicity.
- 🧪 🆓 TruthfulQA — Truthfulness benchmark.
T18 · Inference
- 🧪 🆓 LLMPerf — Anyscale. Throughput/latency tool.
- 🧪 🆓 MLPerf Inference — MLCommons. Industry-standard serving benchmark.
T19 · Voice & Multimodal
- 🧪 🆓 Dynabench speech — Live speech-model benchmarks.
- 🧪 🆓 MMMU — Multimodal multidiscipline benchmark.
- 🧪 🆓 VideoMME — Video understanding.
🏗️ Reference Implementations & Case Studies
Public production write-ups and canonical reference repositories that teach by example.
T1 / T3 · Coding Agents & IDE Rules
- 🏗️ 🆓 Aider — Reference terminal coding agent with detailed engineering blog.
- ⭐ 🏗️ 🆓 Claude Code — Anthropic's reference agentic CLI.
- 🏗️ 🆓 Cline — Open-source autonomous coding agent.
- 🏗️ 🆓 OpenHands — All Hands AI. Open-source autonomous SWE agent.
T2 · Spec-Driven Dev
- 🏗️ 🆓 GitHub spec-kit — Reference spec-driven toolkit.
T5 · Autonomous SWE Agents
- 🏗️ 🆓 Agentless — Minimal agentless baseline that beat prior agents on SWE-bench Lite.
- 🏗️ 🆓 AutoCodeRover — NUS.
- 🏗️ 🆓 SWE-agent — Princeton NLP. Reference agent for SWE-bench.
T6 · App Architecture
- 🏗️ 🆓 LangChain templates — Reference app scaffolds.
- 🏗️ 🆓 Open Interpreter — Reference local code-execution agent.
- 🏗️ 🆓 Quivr — Reference full-stack RAG assistant.
T8 · RAG
- 🏗️ 🆓 GraphRAG — Microsoft Research.
- ⭐ 🏗️ 🆓 LlamaIndex — Reference RAG framework; docs double as case studies.
- 🏗️ 🆓 RAGFlow — Production-grade RAG reference.
- 🏗️ 🆓 Verba — Weaviate reference RAG app.
T9 · Memory
- 🏗️ 🆓 Letta (MemGPT) — Reference agentic-memory implementation.
- 🏗️ 🆓 Mem0 — Reference memory layer.
- 🏗️ 🆓 Zep — Long-term memory store.
T10 · Tool Use & MCP
- 🏗️ 🆓 Anthropic MCP reference servers — The canonical reference MCP servers.
- 🏗️ 🆓 awesome-mcp-servers — Community catalogue of MCP server implementations.
T11 / T12 · Orchestration & Multi-Agent
- 🏗️ 🆓 AutoGen — Microsoft.
- 🏗️ 🆓 CrewAI — Reference role-based multi-agent.
- 🏗️ 🆓 LangGraph — Reference graph-based orchestration.
- 🏗️ 🆓 Pydantic AI — Type-safe agent framework.
T13 · Evaluation
- 🏗️ 🆓 DeepEval — Reference eval framework.
- 🏗️ 🆓 EleutherAI lm-evaluation-harness — Standard offline-eval harness.
- 🏗️ 🆓 RAGAS — RAG-specific evaluation.
T14 · Observability
- 🏗️ 🆓 Arize Phoenix — Open-source tracing + evals.
- 🏗️ 🆓 Langfuse — Open-source LLM observability.
- 🏗️ 🆓 OpenLLMetry — OTel-based LLM instrumentation.
- 🏗️ 🆓 Future AGI — Open-source platform to trace, evaluate, simulate, guard, and auto-improve AI agents.
T15 · Guardrails & Security
- 🏗️ 🆓 Guardrails AI — Reference guardrails framework.
- 🏗️ 🆓 NVIDIA NeMo Guardrails — Programmable guardrails.
- 🏗️ 🆓 Rebuff — Prompt-injection defence reference.
T17 · Fine-tuning
- 🏗️ 🆓 Axolotl — Reference fine-tuning framework.
- 🏗️ 🆓 Hugging Face alignment-handbook — Reference RLHF/DPO recipes.
- 🏗️ 🆓 LLaMA-Factory — Unified fine-tuning toolkit.
- 🏗️ 🆓 Unsloth — Fast LoRA/QLoRA reference.
T18 · Inference & Serving
- 🏗️ 🆓 llama.cpp — Reference CPU/GPU local inference.
- 🏗️ 🆓 SGLang — Structured generation serving.
- 🏗️ 🆓 TensorRT-LLM — NVIDIA reference optimised serving.
- ⭐ 🏗️ 🆓 vLLM — Reference high-throughput LLM serving.
T19 · Voice & Multimodal
- 🏗️ 🆓 LiveKit Agents — Voice-agent reference.
- 🏗️ 🆓 Pipecat — Daily's voice-agent framework.
- 🏗️ 🆓 Ultravox — Real-time speech LM.
T20 · Product & UX
- 🏗️ 🆓 assistant-ui — Reference React components for AI chat.
- 🏗️ 🆓 Open WebUI — Reference local chat UI.
- 🏗️ 🆓 Vercel AI SDK — Reference AI-UI patterns and streaming.
🎥 Talks, Workshops & Conferences
Recorded talks, workshops, and conference series worth watching.
Conference series
- ⭐ 🎥 🆓 AI Engineer Summit / World's Fair — The definitive practitioner conference; full talks on YouTube.
- 🎥 🆓 COLM — Conference on Language Modeling. New dedicated LM venue.
- 🎥 🆓 LlamaCon — Meta's open-source LLM conference.
- 🎥 🆓 MLSys — Core ML-systems conference (inference, serving).
- 🎥 🆓 NeurIPS / ICML / ICLR — Core ML research venues; most papers include recorded talks.
Canonical talks
- ⭐ 🎥 🆓 1hr Talk: Intro to LLMs (Nov 2024) — Karpathy updated "Deep Dive into LLMs".
- ⭐ 🎥 🆓 Intro to LLMs — Andrej Karpathy. The reference "how LLMs work" talk.
- ⭐ 🎥 🆓 Let's build GPT: from scratch, in code — Andrej Karpathy.
- 🎥 🆓 Stanford CS25: Transformers United — Full lecture series.
- 🎥 🆓 State of GPT — Andrej Karpathy (Microsoft Build 2023).
T1 · Coding Agents
- 🎥 🆓 Cursor: Building the AI-first IDE — Cursor team channel.
- 🎥 🆓 Mastering Claude Code — Anthropic (Boris Cherny).
- 🎥 🆓 The future of AI coding — Latent Space talk archives.
T4 · SWE Benchmarks
- 🎥 🆓 SWE-bench at NeurIPS — Carlos Jimenez.
T6 · App Architecture
- 🎥 🆓 Emerging architectures for LLM applications — a16z (video + post).
- 🎥 🆓 State of AI Engineering — Latent Space keynotes.
T7 · Prompt Engineering
- 🎥 🆓 Anthropic: Prompt Engineering for Business Performance — Anthropic.
- 🎥 🆓 ChatGPT Prompt Engineering for Developers — Andrew Ng + OpenAI.
T8 · RAG
- 🎥 🆓 RAG at scale — LangChain channel series.
- 🎥 🆓 Systematically improving RAG applications — Jason Liu.
T10 · MCP
- 🎥 🆓 MCP at AI Engineer Summit — AI Engineer.
- 🎥 🆓 Model Context Protocol deep dive — Anthropic.
T11 / T12 · Orchestration & Multi-Agent
- 🎥 🆓 Andrew Ng: What's next for AI agentic workflows — Sequoia AI Ascent 2024.
- 🎥 🆓 LangGraph: multi-agent workflows — LangChain.
T13 · Evaluation
- 🎥 🆓 Evaluating LLM-based applications — Josh Tobin (DBRX Summit).
- 🎥 🆓 LLM Evals: MT-Bench and Chatbot Arena — LMSYS.
T14 · Observability
- 🎥 🆓 OpenTelemetry for LLMs — KubeCon / OTel community talks.
T15 / T16 · Security & Safety
- 🎥 🆓 Anthropic AI safety research — Anthropic channel.
- 🎥 🆓 Simon Willison on prompt injection — Talks + essays hub.
T17 · Fine-tuning
- 🎥 🆓 Fine-tuning workshop — Hamel Husain channel.
- 🎥 🆓 Let's reproduce GPT-2 / build the GPT tokenizer — Karpathy channel.
T18 · Inference
- 🎥 🆓 CUDA Mode lectures — Community GPU/kernel series.
- 🎥 🆓 vLLM: high-throughput LLM serving — Anyscale / UC Berkeley talks.
T19 · Voice & Multimodal
- 🎥 🆓 LiveKit voice-agent talks — LiveKit.
- 🎥 🆓 OpenAI Realtime API demos — OpenAI.
T20 · Product & UX
- 🎥 🆓 AI UX: the next frontier — NNGroup.
- 🎥 🆓 Linus Lee: tools for thought — Talks archive.
T21 · Economics & Teams
- 🎥 🆓 a16z AI portfolio talks — a16z.
- 🎥 🆓 The Pragmatic Engineer on AI teams — Gergely Orosz.
🎧 Podcasts
Recurring podcasts with strong agentic & AI-engineering coverage.
- 🎧 🆓 Cognitive Revolution — Nathan Labenz. Weekly AI engineering + strategy.
- 🎧 🆓 Dwarkesh Podcast — Dwarkesh Patel. Deep interviews with top researchers.
- 🎧 🆓 Gradient Dissent — Weights & Biases. Applied-ML interviews.
- 🎧 🆓 Interconnects — Nathan Lambert. RLHF / post-training focus.
- ⭐ 🎧 🆓 Latent Space — swyx & Alessio. The AI-engineering podcast of record; guests include most major AI-lab engineers.
- 🎧 🆓 Lex Fridman Podcast — Long-form interviews with AI-lab CEOs and researchers.
- 🎧 🆓 Machine Learning Street Talk — Tim Scarfe. Technical deep-dives.
- 🎧 🆓 MLOps Community podcast — Demetrios Brinkmann. Ops-side operationalisation case studies.
- 🎧 🆓 No Priors — Sarah Guo & Elad Gil. Founders / researchers.
- ⭐ 🎧 🆓 Practical AI — Daniel Whitenack & Chris Benson. Long-running, practitioner-first.
- 🎧 🆓 Pragmatic Engineer — Gergely Orosz. AI-engineering org/hiring coverage.
- 🎧 🆓 The TWIML AI Podcast — Sam Charrington. Longest-running ML interview series.
- 🎧 🆓 Unsupervised Learning — Redpoint. AI-founder / operator conversations.
📰 Newsletters
Weekly and monthly curated newsletters.
- 📰 🆓 Ahead of AI — Sebastian Raschka. LLM research + fine-tuning deep-dives.
- 📰 🆓 Ben's Bites — Daily digest; founder-friendly.
- 📰 🆓 Chip Huyen's Blog — Occasional long-form on AI engineering.
- 📰 🆓 DiamantAI — Nir Diamant. Practical AI engineering and generative AI: RAG, agents, and LLM application patterns explained simply.
- 📰 🆓 Eugene Yan — Pattern / eval / RAG deep-dives.
- 📰 🆓 Hamel's Blog — Evals + applied LLMs.
- ⭐ 📰 🆓 Import AI — Jack Clark (Anthropic co-founder). Policy + research.
- 📰 🆓 Interconnects — Nathan Lambert. RLHF / post-training.
- 📰 🆓 Last Week in AI — Weekly recap.
- ⭐ 📰 🆓 Latent Space — swyx. The AI-engineering newsletter of record.
- 📰 🆓 Machine Learning Engineer Newsletter — Alejandro Saucedo. Weekly production-ML curation.
- 📰 🆓 MLOps Community newsletter — MLOps Community.
- 📰 🆓 Simon Willison's Weblog — RSS/email. Daily real-time coverage of tools and agents.
- ⭐ 📰 🆓 The Batch — Andrew Ng / DeepLearning.AI. Weekly AI-engineering digest.
- 📰 🆓 The Data Exchange — Ben Lorica.
- 📰 🆓 The Pragmatic Engineer — Gergely Orosz. AI-engineering hiring/org coverage.
- 📰 🆓 TLDR AI — Daily headlines.
🛡️ Governance, Safety & Responsible AI
Policy frameworks, safety research, red-teaming resources, and responsible-AI guidance.
Policy & frameworks
- 🆓 EU AI Act — European Commission. Official text + implementation timeline.
- ⭐ 🆓 NIST AI Risk Management Framework (AI RMF 1.0) — NIST. The foundational US framework.
- 🆓 NIST Generative AI Profile (NIST-AI-600-1) — NIST.
- 🆓 OECD AI Principles — International reference.
- 🆓 UK AI Safety Institute reports — UK AISI.
Lab safety & responsible scaling
- 🆓 Anthropic Core Views on AI Safety — Anthropic.
- ⭐ 🆓 Anthropic Responsible Scaling Policy — Anthropic.
- 🆓 Google DeepMind: Frontier Safety Framework — Google DeepMind.
- 🆓 OpenAI Preparedness Framework — OpenAI.
Security & red-teaming
- 🆓 HarmBench — CAIS.
- 🆓 MITRE ATLAS — Adversarial threat landscape for AI systems.
- 🆓 NIST Adversarial ML Taxonomy (NIST AI 100-2) — NIST.
- ⭐ 🆓 OWASP Top 10 for LLM Applications — OWASP.
- 🆓 Simon Willison's prompt-injection series — SW.
Responsible AI practice
- 🆓 Fairlearn — Open-source fairness toolkit.
- 🆓 Google Responsible AI practices — Google.
- 🆓 Microsoft Responsible AI Standard — Microsoft.
- 🆓 Partnership on AI — Multi-stakeholder org with published frameworks and incident database.
Papers & research
- 📄 🆓 Concrete Problems in AI Safety — Amodei et al.
- 📄 🆓 Constitutional AI — Bai et al.
- 📄 🆓 Red Teaming Language Models with Language Models — Perez et al.
- 📄 🆓 Sleeper Agents — Hubinger et al. (Anthropic).
🎨 Product, UX & Economics of AI
Going beyond engineering: designing for AI, human-AI interaction, and the economics of LLM applications.
Design & UX
- 🆓 Apple Human Interface Guidelines — Generative AI — Apple.
- 🆓 Google's People + AI Guidebook — Google PAIR.
- ⭐ 🆓 Guidelines for Human-AI Interaction — Amershi et al. (Microsoft Research). The canonical design heuristics.
- 🆓 Linus Lee — Essays on interfaces for tools of thought.
- 🆓 Maggie Appleton — Essays on the UX of agentic, malleable software.
- 🆓 NNGroup: Generative AI design patterns — Nielsen Norman Group.
Economics & business
- ⭐ 🆓 a16z: The Economic Case for Generative AI — a16z.
- 🆓 Artificial Analysis — Cross-provider pricing/latency/quality dashboards.
- 🆓 Epoch AI — Data on compute, cost, and scaling trends.
- 🆓 Latent Space on unit economics — Latent Space.
- 🆓 Stanford AI Index Report — Stanford HAI. Annual deep economic + research snapshot.
Product strategy
- 🆓 16 Changes to the Way Enterprises Build Software with AI — a16z.
- 🆓 AI product strategy — Lenny's Newsletter (AI tag).
- 🆓 Every Inc — Prose-heavy essays on AI product + consumer LLM UX.
🧑🤝🧑 Teams, Hiring & Org Design
How organisations structure AI-engineering work, hire for it, and operate sustainably.
- 🆓 a16z AI canon — a16z. Curated reading list for people building AI teams.
- 🆓 Building the AI Engineer role — swyx / Latent Space. The foundational essay defining "AI Engineer" as a discipline.
- 🆓 Chip Huyen: Machine learning in production — Org-design questions from production ML.
- 🆓 DeepLearning.AI AI Engineer Hiring Report — The Batch periodic coverage.
- 🆓 Emmanuel Ameisen: Building ML Powered Applications — Book + blog on AI-team building.
- 🆓 Eugene Yan: Team size and velocity — Eugene Yan.
- 🆓 GitHub: The AI-native developer — GitHub's research on workflows / productivity.
- 🆓 Shreya Shankar: Operationalizing ML — Shreya Shankar.
- 🆓 Staff Engineer — AI org posts — Will Larson & community.
- ⭐ 🆓 The Pragmatic Engineer — AI tag — Gergely Orosz. AI-engineering hiring + org design.
- 🆓 What is an AI Engineer? — Applied LLMs consortium.
How to suggest a resource
Please use one of the issue templates (resource suggestion, broken link, or trending nomination) or open a pull request following the guidance in CONTRIBUTING.md. The curation methodology and update cadence are documented in NOTES.md.
Update cadence
Weekly: PR triage and broken-link fixes. Monthly: trending rotation and new-resource batches. Quarterly: full thoroughness pass against the checklist in NOTES.md.
License
— To the extent possible under law, the contributors have waived all copyright and related or neighboring rights to this work.

