Awesome Agentic Engineering Resources

June 18, 2026 · View on GitHub

Awesome Agentic Engineering Resources

A curated list of high-signal resources — articles, books, courses, cookbooks, papers, playbooks, benchmarks, talks, podcasts, and newsletters — for agentic engineering and AI engineering.

This is a resources list, not a tools list. Open-source tools for building agentic systems live in the sister list awesome-production-agentic-systems; production ML tooling lives in awesome-production-machine-learning. This list covers the learning, design, and operational resources that sit alongside those tools — including both:

Agentic engineering focuses on using AI agents to do software engineering (Copilot, Cursor, Claude Code, Aider, Cline, Windsurf, Codex; spec-driven development; context engineering; agent IDE rules and memory files; SWE benchmarks). AI / agentic systems engineering focuses on building agentic and LLM-powered systems (architecture, RAG, memory, tool use & MCP, orchestration, multi-agent coordination, evaluation, observability, guardrails, safety, fine-tuning, inference, product/UX, economics, teams).

You can keep up to date by watching this repo for the monthly releases summarising newly added resources 🤩

This list was proposed in EthicalML/awesome-production-machine-learning#709 as a sister list focused on resources rather than tools.

Legend

Resources are tagged with icons so you can scan and filter at a glance:

Icon	Meaning
⭐	Editors' pick — start here
🆓	Free to access
💰	Paid
📘	Book
🧑‍🎓	Course
🎥	Video / talk
🎧	Audio / podcast
📄	Paper
🛠️	Hands-on cookbook / tutorial
📋	Playbook / design-pattern catalog
🧪	Benchmark / leaderboard
🏗️	Reference implementation / case study
📰	Newsletter

Quick links to sections on this page


⭐ Trending / What's New	🧭 Core & Foundations	🗓️ Milestones Timeline
👥 Communities	🧑‍🎓 Courses	📘 Books
✍️ Articles & Essays	🛠️ Tutorials & Cookbooks	📋 Playbooks & Patterns
📄 Papers & Research	🧪 Benchmarks & Leaderboards	🏗️ Reference Implementations
🎥 Talks & Conferences	🎧 Podcasts	📰 Newsletters
🛡️ Governance, Safety & Responsible AI	🎨 Product, UX & Economics of AI	🧑‍🤝‍🧑 Teams, Hiring & Org Design

Topic Coverage Matrix

Resources are organised as a matrix: the top-level sections above (rows) are resource types, and each section is sub-divided by topic. The 21 topics, T1–T21, are shared across sections. This lets you read vertically ("what papers exist on RAG?") or horizontally ("where do I find resources on Coding Agents?").

Topics:

#	Topic
T1	Coding Agents & AI-Assisted Development (Copilot, Cursor, Claude Code, Aider, Cline, Windsurf, Codex)
T2	Spec-Driven Development & Context Engineering (AGENTS.md, spec-kit, rules files)
T3	Agent IDE Rules, Memory Files & Developer Workflows
T4	SWE Benchmarks & Coding Evaluation
T5	Autonomous Software Agents & Long-Horizon Engineering Tasks
T6	LLM Application Architecture & System Design
T7	Prompt Engineering
T8	Retrieval-Augmented Generation (RAG)
T9	Memory Systems & Long-Context
T10	Tool Use, Function Calling & MCP
T11	Orchestration, Planning & Design Patterns
T12	Multi-Agent Systems & Coordination
T13	Evaluation & Testing
T14	Observability, Tracing & Debugging
T15	Guardrails & Security (prompt injection, jailbreaks, red-teaming)
T16	Safety, Alignment & Responsible AI
T17	Fine-tuning, Post-training, RLHF & Reasoning Training
T18	Inference, Serving, Cost & Latency
T19	Voice, Multi-modal & Embodied Agents
T20	Product, UX & Human-AI Interaction Design
T21	Economics, Teams, Hiring & Org Design

Coverage (● = populated, ○ = opportunistic / partial, — = out of scope for that row):

Row \ Topic	T1	T2	T3	T4	T5	T6	T7	T8	T9	T10	T11	T12	T13	T14	T15	T16	T17	T18	T19	T20	T21
Core & Foundations	●	●	○	○	○	●	●	●	○	●	●	○	●	○	○	○	○	○	○	○	○
Communities	●	○	○	○	○	●	●	●	○	●	●	○	●	●	○	●	●	●	○	●	●
Courses	●	○	○	●	○	●	●	●	○	●	●	●	●	●	●	●	●	●	○	○	○
Books	●	○	○	—	○	●	●	●	○	●	●	○	●	○	●	●	●	●	○	●	●
Articles & Essays	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●
Tutorials & Cookbooks	●	●	●	○	●	●	●	●	●	●	●	●	●	●	●	○	●	●	●	○	—
Playbooks & Patterns	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	○	●	○	●	●
Papers & Research	●	○	—	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	○
Benchmarks	●	—	—	●	●	○	○	●	○	●	●	●	●	○	●	●	○	●	●	○	—
Reference Impls	●	●	●	●	●	●	○	●	●	●	●	●	●	●	●	○	●	●	●	●	●
Talks & Conferences	●	●	○	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●	●
Podcasts	●	○	○	○	●	●	●	●	○	●	●	●	●	●	●	●	●	●	○	●	●
Newsletters	●	○	○	○	○	●	●	●	○	●	●	○	●	●	●	●	●	●	○	●	●

The Trending / What's New, Milestones Timeline, Governance & Responsible AI, Product / UX / Economics, and Teams, Hiring & Org Design sections collapse across topics and are presented as curated lists rather than matrix cells.

Contributing to the list

Please review our CONTRIBUTING.md before submitting a PR — it explains the one-line description style, how to pick the right row/topic cell, and the quality bar for inclusion. Thank you to the community for supporting the list's growth 🚀

Want to receive recurring updates on this repo and other advancements

You can join the Machine Learning Engineer newsletter. Join over 70,000 ML professionals and enthusiasts who receive weekly curated articles & tutorials on production Machine Learning.
Also check out Awesome Production Agentic Systems and Awesome Production Machine Learning, the sister lists of open-source tools for agentic systems and production ML respectively.

Main Content

Rotating pinned items: the most-discussed agentic & AI-engineering resources of the current cycle. Refreshed regularly — see CONTRIBUTING.md for nomination criteria.

⭐ 🆓 A practical guide to building agents — OpenAI (2025). 30-page PDF covering when (and when not) to build agents, tool design, guardrails, and human-in-the-loop patterns.
🆓 AGENTS.md — Community standard (2025) for per-repo agent instructions, now read by Claude Code, Codex, Aider, Cursor, Cline, Windsurf and others.
⭐ 🆓 Building effective agents — Anthropic (2024). The most-cited reference for agent design patterns (augmented LLM, prompt chaining, routing, parallelisation, orchestrator-workers, evaluator-optimiser, autonomous agents). Start here before any other agent reading.
🆓 Claude Code: Best practices for agentic coding — Anthropic (2025). CLAUDE.md, slash-commands, headless mode, custom permissions — the canonical how-to-use-Claude-Code reference.
🆓 How to build an agent — Thorsten Ball / Amp (2025). Viral step-by-step implementation of a tool-using coding agent in ~400 lines of Go, demystifying "what is an agent" in code.
⭐ 🆓 How we built our multi-agent research system — Anthropic (2025). Production retrospective on Claude's multi-agent research mode: orchestrator/subagent split, prompt engineering for agents, evaluation and failure modes.
⭐ 🆓 The bitter lesson of AI agents / Agentic Coding: The Future of Software Development with Agents — Armin Ronacher (2025). Widely-shared essays on what it actually feels like to ship with agentic coding tools day-to-day.
🆓 The new code — Sean Grove / OpenAI on Latent Space (2025). Specs-as-code: the spec is the new artefact, models are the compiler. Heavily cited in the AGENTS.md / spec-kit discussion.

🧭 Core & Foundations

Canonical "what is agentic engineering / AI engineering" reading. Start here.

T1 · Coding Agents & AI-Assisted Development

⭐ 🆓 Building effective agents — Anthropic. The reference taxonomy of agent design patterns (workflows vs. agents).
⭐ 🆓 Claude Code: Best practices for agentic coding — Anthropic. CLAUDE.md, tools, slash-commands, headless mode.
🆓 Here's how I use LLMs to help me write code — Simon Willison. Grounded, practice-first account of daily LLM-assisted development.
🆓 How to build an agent — Thorsten Ball. A working coding agent in ~400 lines; the clearest "agents are not magic" walkthrough.

T2 · Spec-Driven Development & Context Engineering

🆓 AGENTS.md — Community standard for per-repo agent instructions.
🆓 spec-kit — GitHub's toolkit and essay set on spec-driven development with coding agents.
⭐ 🆓 The new code — Sean Grove (OpenAI) on Latent Space. The canonical "specs are the new code" essay.
🆓 The rise of "context engineering" — LangChain. Why prompt engineering became context engineering.

T6 · LLM Application Architecture & System Design

⭐ 📘 💰 AI Engineering — Chip Huyen (O'Reilly, 2025). The textbook for building LLM applications end-to-end.
🆓 Emerging Architectures for LLM Applications — a16z. The widely-shared reference diagram for the LLM app stack.
⭐ 🆓 Patterns for Building LLM-based Systems & Products — Eugene Yan. Evaluation, RAG, fine-tuning, caching, guardrails, defensive UX, collecting feedback — the reference pattern catalogue.
🆓 What We Learned from a Year of Building with LLMs — Yan, Bensal, Bhawal, Husain, Shankar (2024). Tactical, operational, and strategic lessons distilled from shipping.

T7 · Prompt Engineering

🆓 Anthropic: Prompt engineering overview — Anthropic's practical guide for Claude.
🆓 OpenAI: Prompt engineering — OpenAI official guide.
⭐ 🆓 Prompt Engineering — Lilian Weng (OpenAI). The systematic taxonomy.
🆓 Prompt Engineering Guide — DAIR.AI. Continuously updated, with per-technique deep-dives.

T8 · Retrieval-Augmented Generation (RAG)

⭐ 🆓 Advanced RAG Techniques / Pinecone Learn — Pinecone. The hub for RAG primers and patterns.
🆓 RAG is more than just embedding search — Jason Liu. Systems-view RAG: query understanding, tool routing, evaluation.
⭐ 📄 🆓 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al. (2020). The original RAG paper.
🆓 Retrieval-Augmented Generation for LLMs: A Survey — Gao et al. (2023). The reference survey.

T10 · Tool Use, Function Calling & MCP

🆓 Function calling guide — OpenAI. The canonical reference for structured tool calls.
⭐ 🆓 Introducing the Model Context Protocol — Anthropic (2024). The canonical introduction to MCP.
⭐ 🆓 Model Context Protocol — Specification — Open protocol docs and SDKs.
📄 🆓 Toolformer: Language Models Can Teach Themselves to Use Tools — Schick et al. (2023). The foundational tool-use paper.

T11 · Orchestration, Planning & Design Patterns

⭐ 🆓 Building effective agents — Anthropic. The orchestration pattern taxonomy.
🆓 LLM Powered Autonomous Agents — Lilian Weng. The canonical deep-dive on planning, memory, and tool use in agent loops.
📄 🆓 ReAct: Synergizing Reasoning and Acting in Language Models — Yao et al. (2022). The foundational reason+act loop.
📄 🆓 The Rise and Potential of LLM Based Agents: A Survey — Xi et al. (2023). Survey of agent architectures and components.

T13 · Evaluation & Testing

📄 🆓 Judging LLM-as-a-Judge — Zheng et al. (2023). The foundational LLM-as-judge paper (MT-Bench, Chatbot Arena).
🆓 Task-Specific LLM Evals that Do & Don't Work — Eugene Yan. A pragmatic survey of eval techniques per task type.
🆓 Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences — Shankar et al. (2024). How to make LLM-judges trustworthy.
⭐ 🆓 Your AI Product Needs Evals — Hamel Husain. The most-cited essay on why and how to build evals for LLM products.

🗓️ Milestones Timeline

Dated, field-defining events that shaped agentic & AI engineering.

Date	Event	Reference
2017-06	Transformer architecture introduced	Attention Is All You Need
2020-05	GPT-3 shows in-context learning at scale	Language Models are Few-Shot Learners
2020-05	RAG framework introduced	RAG for Knowledge-Intensive NLP
2021-06	GitHub Copilot preview launches — first mainstream AI coding assistant	GitHub blog
2022-01	Chain-of-Thought prompting	Wei et al.
2022-03	InstructGPT / RLHF	Ouyang et al.
2022-10	ReAct: reasoning + acting agent loop	Yao et al.
2022-11	ChatGPT release — mainstream adoption inflection	OpenAI
2023-03	GPT-4 release	OpenAI
2023-03	HuggingGPT / Toolformer-era tool use	Toolformer
2023-03	LangChain & LlamaIndex hit mainstream	—
2023-05	Voyager: open-ended agents in Minecraft	Voyager
2023-06	Simon Willison coins "prompt injection" as a durable threat category	SW blog
2023-10	SWE-bench released — real-world coding eval	SWE-bench
2023-12	Mixture-of-experts open models (Mixtral)	Mistral
2024-03	Devin demo — autonomous software agent pitch	Cognition
2024-05	GPT-4o: native multi-modal + realtime voice	OpenAI
2024-06	Anthropic's "Building effective agents" publishes	Anthropic
2024-07	SWE-bench Verified launched	OpenAI
2024-09	o1 reveals reasoning-model era	OpenAI
2024-11	Model Context Protocol (MCP) announced	Anthropic
2025-02	Claude Code general availability	Anthropic
2025-05	AGENTS.md published as cross-agent standard	agents.md
2025-06	GitHub spec-kit / "new code" essays formalise spec-driven dev	spec-kit

👥 Communities

Discords, Slacks, forums, and meetups where practitioners gather.

🆓 AI Dev Board — Community-curated hub for AI engineering resources and discussions.
🆓 AI Engineer World's Fair / Latent Space Discord — Practitioner community anchoring the AI Engineer conference series.
🆓 Anthropic Discord — Official Claude / Claude Code / MCP community.
🆓 Cursor Community Forum — User-driven forum for Cursor rules, MCP, and workflows.
🆓 EleutherAI Discord — Open research community; strong training/interpretability discussion.
🆓 Hacker News — Filter for "LLM", "agent", "Claude", "Cursor" — where engineering-side essays trend.
🆓 Hugging Face Discord & Forums — Transformers, TRL, PEFT, model-hub discussions.
🆓 LangChain Discord — Heavy day-to-day Q&A on agent orchestration, RAG, evaluation, MCP.
🆓 LlamaIndex Discord — RAG-centric builder community with active reference-impl discussion.
🆓 MLOps Community — Slack + podcast + meetups; the biggest practitioner community at the ops/engineering intersection. Active agent and LLM-ops channels.
🆓 r/LocalLLaMA — The definitive open-weights / local-inference forum; fastest signal for new models, quantisation, and serving.
🆓 r/MachineLearning — Academic and practitioner mix; where new papers and threads get dissected.

🧑‍🎓 Courses

Structured courses — free and paid, university and industry.

T1 · Coding Agents & AI-Assisted Development

⭐ 🧑‍🎓 🆓 AI Python for Beginners — DeepLearning.AI (Andrew Ng). Gateway to AI-assisted coding.
🧑‍🎓 🆓 GitHub Copilot Fundamentals — Microsoft Learn. Official training path.
🧑‍🎓 🆓 Pair Programming with a Large Language Model — DeepLearning.AI + Google.

T4 · SWE Benchmarks & Coding Evaluation

🧑‍🎓 🆓 Evaluating and Debugging Generative AI — DeepLearning.AI + W&B. Covers coding-eval mechanics.
🧑‍🎓 🆓 Mastering LLMs: Evals — Hamel Husain & Shreya Shankar (Maven). Companion evals-for-LLMs curriculum.
🧑‍🎓 🆓 SWE-bench tutorial — Princeton NLP. Free, self-paced walk-through of running and scoring coding evals.

T6 · LLM Application Architecture & System Design

🧑‍🎓 🆓 Building Systems with the ChatGPT API — DeepLearning.AI + OpenAI.
🧑‍🎓 🆓 CS25: Transformers United — Stanford. Seminal deep-dive seminar series.
⭐ 🧑‍🎓 🆓 LLM Bootcamp — Full Stack Deep Learning. Free 2-day bootcamp on building LLM apps end-to-end.

T7 · Prompt Engineering

🧑‍🎓 🆓 Anthropic Prompt Engineering Interactive Tutorial — Anthropic. Hands-on, notebook-based.
⭐ 🧑‍🎓 🆓 ChatGPT Prompt Engineering for Developers — Andrew Ng & Isa Fulford (OpenAI).
🧑‍🎓 🆓 Prompt Engineering Guide (DAIR.AI) — Self-paced, continuously updated.

T8 · Retrieval-Augmented Generation (RAG)

🧑‍🎓 🆓 Advanced Retrieval for AI with Chroma — DeepLearning.AI.
🧑‍🎓 🆓 Building and Evaluating Advanced RAG Applications — DeepLearning.AI + LlamaIndex + TruEra.
🧑‍🎓 🆓 LangChain Chat with Your Data — DeepLearning.AI + LangChain.
🧑‍🎓 💰 Systematically Improving RAG Applications — Jason Liu on Maven.

T10 · Tool Use, Function Calling & MCP

🧑‍🎓 🆓 Functions, Tools and Agents with LangChain — DeepLearning.AI + LangChain.
🧑‍🎓 🆓 Introduction to MCP — Anthropic official quickstart.
🧑‍🎓 🆓 MCP: Build Rich-Context AI Apps with Anthropic — DeepLearning.AI + Anthropic.

T11 · Orchestration, Planning & Design Patterns

🧑‍🎓 🆓 AI Agentic Design Patterns with AutoGen — DeepLearning.AI + Microsoft.
🧑‍🎓 🆓 AI Agents in LangGraph — DeepLearning.AI + LangChain.
🧑‍🎓 🆓 Hugging Face Agents Course — Hugging Face. Free, certifying course on agent fundamentals.

T12 · Multi-Agent Systems

🧑‍🎓 🆓 Building Agentic RAG with LlamaIndex — DeepLearning.AI + LlamaIndex.
🧑‍🎓 🆓 Multi AI Agent Systems with crewAI — DeepLearning.AI + crewAI.
🧑‍🎓 🆓 Practical Multi AI Agents and Advanced Use Cases with crewAI — DeepLearning.AI.

T13 · Evaluation & Testing

⭐ 🧑‍🎓 💰 AI Evals For Engineers & PMs — Hamel Husain & Shreya Shankar on Maven. The industry-standard evals cohort course.
🧑‍🎓 🆓 Automated Testing for LLMOps — DeepLearning.AI + CircleCI.
🧑‍🎓 🆓 Quality and Safety for LLM Applications — DeepLearning.AI + WhyLabs.

T14 · Observability, Tracing & Debugging

🧑‍🎓 🆓 Evaluating LLMs with Arize — Arize course hub.
🧑‍🎓 🆓 LangSmith Academy — LangChain. Free self-paced LangSmith courses covering tracing and evals.
🧑‍🎓 🆓 LLMOps — DeepLearning.AI + Google Cloud.

T15 · Guardrails & Security

🧑‍🎓 🆓 Prompt Injection Attacks (Learn Prompting) — Learn Prompting. Open course covering injection/jailbreak taxonomies.
🧑‍🎓 🆓 Red Teaming LLM Applications — DeepLearning.AI + Giskard.
🧑‍🎓 🆓 Safe and Reliable AI via Guardrails — DeepLearning.AI + Guardrails AI.

T16 · Safety, Alignment & Responsible AI

🧑‍🎓 🆓 AI Safety Fundamentals — BlueDot Impact. The standard entry curriculum.
🧑‍🎓 🆓 ARENA (Alignment Research Engineer Accelerator) — Hands-on alignment / interpretability.
🧑‍🎓 🆓 Intro to AI Safety, Remastered — Richard Ngo / BlueDot. Free reading curriculum.

T17 · Fine-tuning, Post-training & RLHF

⭐ 🧑‍🎓 🆓 Finetuning Large Language Models — DeepLearning.AI + Lamini.
🧑‍🎓 🆓 Hugging Face NLP Course (incl. RLHF chapter) — Hugging Face.
🧑‍🎓 🆓 Reinforcement Learning from Human Feedback — DeepLearning.AI + Google Cloud.

T18 · Inference, Serving, Cost & Latency

🧑‍🎓 🆓 CUDA Mode lectures — Community lectures on GPU inference internals.
🧑‍🎓 🆓 Efficiently Serving LLMs — DeepLearning.AI + Predibase.
🧑‍🎓 🆓 Quantization Fundamentals with Hugging Face — DeepLearning.AI + HF.

📘 Books

Published and in-progress books covering agentic & AI engineering.

T1 · Coding Agents & AI-Assisted Development

⭐ 📘 💰 AI-Assisted Programming — Tom Taulli (O'Reilly, 2024). Practical coverage of Copilot/Cursor/Claude workflows.
📘 💰 Prompt Engineering for Generative AI — James Phoenix & Mike Taylor (O'Reilly, 2024). Includes heavy coverage of code-generation prompting patterns.

T6 · LLM Application Architecture & System Design

⭐ 📘 💰 AI Engineering: Building Applications with Foundation Models — Chip Huyen (O'Reilly, 2025). The reference textbook for the field.
📘 💰 Designing Machine Learning Systems — Chip Huyen (O'Reilly, 2022). The prior-generation canonical ML-systems text; still essential for data/infra context.
📘 💰 Generative AI on AWS — Chris Fregly, Antje Barth, Shelbee Eigenbrode (O'Reilly, 2023).

T7 · Prompt Engineering

📘 🆓 Prompt Engineering for LLMs — John Berryman & Albert Ziegler (O'Reilly, 2024). From Copilot's original tech-lead.
📘 💰 The Prompt Report — Schulhoff et al. (2024). A 76-page survey that effectively functions as a book-length prompting reference.

T8 · RAG

📘 💰 Building LLM Apps — Valentina Alto (Wiley, 2024). RAG-heavy application text.
📘 💰 RAG Made Simple - Nir Diamant (2025). A visual, code-free walkthrough of 22 retrieval-augmented generation techniques explained through diagrams and analogies.
📘 🆓 RAG-Driven Generative AI — Denis Rothman (Packt, 2024).

T10 · Tool Use & MCP

📘 💰 Building Intelligent Apps with OpenAI — Olivier Caelen & Marie-Alice Blete (O'Reilly, 2024). Heavy function-calling coverage.

T11 · Orchestration & Design Patterns

📘 💰 Generative AI with LangChain — Ben Auffarth (Packt, 2023). Orchestration patterns end-to-end.

T13 · Evaluation

📘 💰 Prompt Engineering for Generative AI — Phoenix & Taylor (O'Reilly, 2024). Chapter-length eval coverage.

T15 · Guardrails & Security

📘 💰 Generative AI Security — Ken Huang et al. (Apress, 2024).
📘 💰 The Developer's Playbook for Large Language Model Security — Steve Wilson (O'Reilly, 2024). OWASP LLM Top 10 project lead's book.

T16 · Safety, Alignment & Responsible AI

📘 💰 Human Compatible — Stuart Russell (2019). The foundational alignment argument.
📘 💰 The Alignment Problem — Brian Christian (2020). The canonical popular-press primer.

T17 · Fine-tuning & Post-training

⭐ 📘 💰 Build a Large Language Model (From Scratch) — Sebastian Raschka (Manning, 2024). The reference hands-on text.
📘 💰 Hands-On Large Language Models — Jay Alammar & Maarten Grootendorst (O'Reilly, 2024).

T18 · Inference & Serving

📘 💰 Efficient Processing of Deep Neural Networks — Sze et al. (Morgan & Claypool). Hardware/inference reference.

T20 · Product & UX

📘 💰 Designing Machine Learning Systems — Chip Huyen. Includes pragmatic product/UX chapters.
📘 💰 Human-AI Interaction Design — IxDF topic hub.

T21 · Economics, Teams & Org

📘 💰 Managing Machine Learning Projects — Simon Thompson (Manning).
📘 🆓 The Pragmatic Engineer's AI coverage — Gergely Orosz. Regularly-updated editorial that functions as a rolling book on AI-engineering org design.

✍️ Articles & Essays

Long-form writing from canonical authors and engineering teams.

T1 · Coding Agents & AI-Assisted Development

🆓 Agentic Coding: The Future of Software Development — Armin Ronacher.
⭐ 🆓 Here's how I use LLMs to help me write code — Simon Willison.
🆓 Revenge of the junior developer — Steve Yegge (Sourcegraph).
🆓 The death of the stubborn developer — Steve Yegge.

T2 · Spec-Driven Development & Context Engineering

🆓 Context Engineering — LangChain.
🆓 Spec-driven development with AI — GitHub Blog.
⭐ 🆓 The new code — Sean Grove / Latent Space.
🆓 The rise of "context engineering" — LangChain.

T3 · Agent IDE Rules, Memory Files & Workflows

🆓 Aider: Tips for using with large codebases — Aider docs.
⭐ 🆓 Claude Code: Best practices for agentic coding — Anthropic.
🆓 Cursor rules directory — Community catalogue of .cursorrules files.
🆓 My Claude Code setup — widely-shared CLAUDE.md + slash-command playbook.

T4 · SWE Benchmarks & Coding Evaluation

⭐ 🆓 Introducing SWE-bench Verified — OpenAI.
🆓 Measuring an AI system's ability to do ML R&D — METR.
🆓 The leaderboard illusion — Singh et al. on bench-gaming.
🆓 Why we built Terminal-Bench — Stanford / Laude.

T5 · Autonomous Software Agents

🆓 Devin, a software engineer — Cognition.
🆓 Don't build multi-agents — Cognition. Contrarian but important counterpoint to multi-agent maximalism.
⭐ 🆓 How we built our multi-agent research system — Anthropic.
🆓 SWE-agent: Agent-Computer Interfaces — Princeton NLP writeup.

T6 · LLM Application Architecture

🆓 Emerging Architectures for LLM Applications — a16z.
⭐ 🆓 Patterns for Building LLM-based Systems & Products — Eugene Yan.
🆓 Twelve factor agents — HumanLayer. The "12-factor app" equivalent for agent apps.
🆓 What We Learned from a Year of Building with LLMs — Yan/Bensal/Bhawal/Husain/Shankar.

T7 · Prompt Engineering

🆓 A guide to prompting Claude — Anthropic.
⭐ 🆓 Prompt Engineering — Lilian Weng.
🆓 Prompting is programming — Eugene Yan.
🆓 The prompt report — Learn Prompting team summary of their 76-page survey.

T11 · Orchestration & Design Patterns

🆓 Agent design patterns — Andrew Ng, The Batch series.
🆓 AI agent frameworks — Latent Space comparative review.
🆓 Building effective agents — Anthropic.
⭐ 🆓 LLM Powered Autonomous Agents — Lilian Weng.

T12 · Multi-Agent Systems & Coordination

🆓 AutoGen: Enabling next-gen LLM applications — Microsoft.
🆓 Don't build multi-agents — Cognition.
⭐ 🆓 How we built our multi-agent research system — Anthropic.
🆓 Multi-agent workflows — LangChain.

T13 · Evaluation & Testing

🆓 Creating a LLM-as-a-Judge that drives business results — Hamel Husain.
🆓 LLM evals: everything I learned in 12 months — Shreya Shankar.
🆓 Task-specific LLM evals that do & don't work — Eugene Yan.
⭐ 🆓 Your AI product needs evals — Hamel Husain.

T14 · Observability, Tracing & Debugging

🆓 How Honeycomb uses LLMs for product experiences — Phillip Carter.
🆓 Logfire: observability for the LLM era — Pydantic.
⭐ 🆓 So you want to build an LLM observability platform — Hamel Husain (subsection of evals post; foundational).
🆓 The OpenTelemetry Gen AI semantic conventions — OTel.

T15 · Guardrails & Security

🆓 OWASP Top 10 for LLM Applications — OWASP.
⭐ 🆓 Prompt injection series — Simon Willison. Canonical ongoing series.
🆓 Red teaming LLMs — Hugging Face.
🆓 Universal and Transferable Adversarial Attacks on Aligned LLMs — Zou et al. (GCG attack).

T16 · Safety, Alignment & Responsible AI

🆓 Anthropic's Responsible Scaling Policy — Anthropic.
⭐ 🆓 Core Views on AI Safety — Anthropic.
🆓 Preparedness Framework — OpenAI.
🆓 Scalable oversight via debate & recursive reward modelling — DeepMind Safety Research.

T17 · Fine-tuning, Post-training & RLHF

⭐ 🆓 Ahead of AI — Sebastian Raschka. The canonical fine-tuning / post-training deep-dives.
🆓 DPO: Your language model is secretly a reward model — Rafailov et al.
🆓 The alignment handbook — Hugging Face.
🆓 The Novice's LLM Training Guide — Community reference.

T18 · Inference, Serving, Cost & Latency

🆓 Everything I've learned about efficient LLM inference — Baseten engineering blog.
🆓 GPU performance for LLM inference — vLLM team blog.
🆓 LLM Inference Speed of Light — Arseny Kapoulkine.
⭐ 🆓 Transformer Inference Arithmetic — Kipply.

🆓 Building a voice agent with LiveKit — LiveKit Agents docs.
⭐ 🆓 Hello GPT-4o — OpenAI.
🆓 Moshi: a speech-text foundation model — Kyutai.
🆓 Voice-first LLM products — Latent Space.

T20 · Product, UX & Human-AI Interaction

🆓 Building products with AI: UX lessons / thesephist.com essays — Linus Lee.
🆓 Generative AI: Design Patterns (NNGroup) — Nielsen Norman Group.
⭐ 🆓 Maggie Appleton essays — Canonical AI-UX thinking.
🆓 Microsoft HAX guidelines for human-AI interaction — Microsoft Research.

T21 · Economics, Teams, Hiring & Org Design

🆓 16 Changes to the Way Enterprises Build Software with AI — a16z.
🆓 a16z AI canon — a16z.
⭐ 🆓 AI engineering org design — Gergely Orosz, Pragmatic Engineer.
🆓 Building an AI team — Eugene Yan.

🛠️ Tutorials & Cookbooks

Hands-on, code-first guides and official cookbooks from model providers and framework authors.

T1 · Coding Agents & AI-Assisted Development

🛠️ 🆓 Aider tutorials — Aider docs.
⭐ 🛠️ 🆓 Claude Code cookbook — Anthropic.
🛠️ 🆓 Continue.dev recipes — Continue.

T2 · Spec-Driven Development

🛠️ 🆓 AGENTS.md examples — Example AGENTS.md files for common stacks.
🛠️ 🆓 GitHub spec-kit — The official spec-driven-development toolkit.

T3 · Agent IDE Rules & Workflows

🛠️ 🆓 awesome-cursorrules — Curated .cursorrules examples.
🛠️ 🆓 Claude Code slash-commands cookbook — Anthropic.

T5 · Autonomous Software Agents

🛠️ 🆓 OpenHands (formerly OpenDevin) — All Hands AI.
🛠️ 🆓 SWE-agent quickstart — Princeton NLP.

T6 · LLM Application Architecture

🛠️ 🆓 Anthropic Cookbook — Claude recipes.
🛠️ 🆓 Gemini API Cookbook — Google.
🛠️ 🆓 Hugging Face Open-Source AI Cookbook — Hugging Face.
⭐ 🛠️ 🆓 OpenAI Cookbook — The reference recipe library for OpenAI APIs.

T7 · Prompt Engineering

🛠️ 🆓 Anthropic prompt-engineering interactive tutorial — Notebook-based.
🛠️ 🆓 Prompt Engineering Guide notebooks — DAIR.AI.

T8 · Retrieval-Augmented Generation (RAG)

🛠️ 🆓 Advanced RAG notebooks — Nir Diamant. 30+ advanced RAG recipes.
🛠️ 🆓 LangChain RAG from scratch — LangChain.
⭐ 🛠️ 🆓 LlamaIndex tutorials — LlamaIndex.
🛠️ 🆓 Pinecone RAG handbook — Pinecone.

T9 · Memory Systems

🛠️ 🆓 LangGraph memory — LangChain.
🛠️ 🆓 Letta (MemGPT) cookbook — Letta.
🛠️ 🆓 Mem0 quickstart — Mem0.

T10 · Tool Use & MCP

🛠️ 🆓 awesome-mcp-servers — Community reference-servers catalogue.
⭐ 🛠️ 🆓 MCP quickstart — Anthropic.
🛠️ 🆓 OpenAI function calling cookbook — OpenAI.

T11 · Orchestration & Patterns

🛠️ 🆓 Anthropic building-effective-agents examples — Anthropic.
🛠️ 🆓 LangGraph tutorials — LangChain.
🛠️ 🆓 LlamaIndex agent tutorials — LlamaIndex.

T12 · Multi-Agent Systems

🛠️ 🆓 AutoGen notebook gallery — Microsoft.
🛠️ 🆓 CrewAI examples — CrewAI.
🛠️ 🆓 LangGraph multi-agent examples — LangChain.

T13 · Evaluation & Testing

⭐ 🛠️ 🆓 Hamel Husain's evals repo — Companion code to the evals course.
🛠️ 🆓 LangSmith evals tutorials — LangChain.
🛠️ 🆓 RAGAS tutorials — RAG-specific eval cookbook.

T14 · Observability

🛠️ 🆓 Arize Phoenix tutorials — Arize.
🛠️ 🆓 Langfuse cookbook — Langfuse.
🛠️ 🆓 Logfire LLM tracing tutorials — Pydantic.

T15 · Guardrails & Security

🛠️ 🆓 Guardrails AI cookbook — Guardrails AI.
🛠️ 🆓 NVIDIA NeMo Guardrails — NVIDIA.
🛠️ 🆓 Prompt injection CTFs (Gandalf) — Lakera. Hands-on red-team practice.

T17 · Fine-tuning & Post-training

🛠️ 🆓 Axolotl examples — Axolotl.
🛠️ 🆓 Hugging Face TRL tutorials — TRL.
⭐ 🛠️ 🆓 Unsloth notebooks — Fast fine-tuning recipes.

T18 · Inference & Serving

🛠️ 🆓 llama.cpp server — ggerganov.
🛠️ 🆓 TensorRT-LLM tutorials — NVIDIA.
🛠️ 🆓 vLLM examples — vLLM.

T19 · Voice & Multimodal

🛠️ 🆓 LiveKit Agents examples — LiveKit.
🛠️ 🆓 OpenAI Realtime API cookbook — OpenAI.
🛠️ 🆓 Pipecat — Daily. Voice-agent framework with extensive cookbook.

📋 Playbooks & Design-Pattern Catalogs

Opinionated, prescriptive guides distilling design patterns and operational practices.

📋 🆓 12-Factor Agents — HumanLayer. Opinionated operational principles for agent apps (T6/T11).
📋 🆓 A practical guide to building agents — OpenAI PDF (T11).
📋 🆓 a16z AI canon — a16z (T20/T21).
📋 🆓 Agentic UX — 11 runtime lifecycle patterns for supervised delegation, organized before/while/after an agent acts, with interactive mockups, production screenshots, and an MCP server for coding agents (Daniel Albinsson, 2025).
📋 🆓 Anthropic's prompt engineering overview — Anthropic (T7).
⭐ 📋 🆓 Building effective agents — Anthropic. The canonical pattern taxonomy (T11).
📋 🆓 Claude Code: best practices for agentic coding — Anthropic (T1/T3).
📋 🆓 Instructor's RAG patterns — Jason Liu (T8).
📋 🆓 LangGraph design patterns — LangChain (T11/T12).
📋 🆓 LLM observability playbook — Hamel Husain (T13/T14).
📋 🆓 MITRE ATLAS — Adversarial Threat Landscape for AI Systems (T15).
📋 🆓 NIST AI Risk Management Framework — NIST (T16).
📋 🆓 OpenAI's prompt-engineering playbook — OpenAI (T7).
📋 🆓 OWASP Top 10 for LLM Applications — OWASP. The security-pattern catalogue (T15).
⭐ 📋 🆓 Patterns for Building LLM-based Systems & Products — Eugene Yan (T6).
📋 🆓 Prompt-injection defence patterns — Simon Willison (T15).
📋 🆓 RAG-Fusion, HyDE, and other advanced retrieval patterns — Nir Diamant (T8).
📋 🆓 The LLM inference playbook — Anyscale (T18).
📋 🆓 UX design patterns for AI products — Nielsen Norman Group (T20).
⭐ 📋 🆓 What We Learned from a Year of Building with LLMs — Yan/Bensal/Bhawal/Husain/Shankar (T6/T13).

📄 Papers & Research

Foundational papers, surveys, and benchmark papers. Includes a dated milestone-papers table.

Milestone Papers

Date	Keywords	Institution	Paper
2017-06	Transformer	Google	Attention Is All You Need
2018-10	BERT	Google	BERT: Pre-training of Deep Bidirectional Transformers
2020-05	GPT-3, ICL	OpenAI	Language Models are Few-Shot Learners
2020-05	RAG	Meta	RAG for Knowledge-Intensive NLP Tasks
2021-06	LoRA	Microsoft	LoRA: Low-Rank Adaptation of LLMs
2022-01	CoT	Google	Chain-of-Thought Prompting
2022-03	InstructGPT / RLHF	OpenAI	Training LMs to follow instructions with human feedback
2022-10	ReAct	Princeton / Google	ReAct: Synergizing Reasoning and Acting
2022-12	Constitutional AI	Anthropic	Constitutional AI
2023-02	Toolformer	Meta	Toolformer: LMs Can Teach Themselves to Use Tools
2023-03	Reflexion	Northeastern	Reflexion
2023-03	Self-Refine	CMU	Self-Refine: Iterative Refinement
2023-05	Tree of Thoughts	Princeton	Tree of Thoughts
2023-05	QLoRA	UW	QLoRA: Efficient Finetuning of Quantized LLMs
2023-05	Voyager	NVIDIA / Caltech	Voyager: Open-Ended Embodied Agent
2023-05	DPO	Stanford	DPO: Your LM Is Secretly a Reward Model
2023-06	LLM-as-Judge	UC Berkeley	Judging LLM-as-a-Judge
2023-07	Generative Agents	Stanford / Google	Generative Agents: Interactive Simulacra
2023-07	Lost in the Middle	Stanford	Lost in the Middle
2023-07	GCG	CMU	Universal and Transferable Adversarial Attacks
2023-09	Agent survey	Fudan	The Rise and Potential of LLM-based Agents
2023-10	SWE-bench	Princeton	SWE-bench: Can LMs Resolve Real-World Issues?
2023-10	AutoGen	Microsoft	AutoGen: Enabling Multi-Agent Conversations
2023-11	GAIA	Meta / HF	GAIA: Benchmark for General AI Assistants
2023-12	RAG Survey	Tongji	RAG for LLMs: A Survey
2024-02	SWE-agent	Princeton	SWE-agent: Agent-Computer Interfaces
2024-05	Many-shot jailbreaking	Anthropic	Many-shot Jailbreaking
2024-06	Prompt Report	Maryland	The Prompt Report
2024-06	τ-bench	Sierra	τ-bench: Tool-Agent-User benchmark
2024-09	o1 / reasoning	OpenAI	Learning to Reason with LLMs

T1 · Coding Agents & T4 · SWE Benchmarks

📄 🆓 AutoCodeRover: Autonomous Program Improvement — Zhang et al.
📄 🆓 BigCodeBench — Zhuo et al.
📄 🆓 LiveCodeBench — Jain et al.
📄 🆓 SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering — Yang et al.
📄 🆓 SWE-bench: Can LMs Resolve Real-World GitHub Issues? — Jimenez et al.

T5 · Autonomous SWE Agents

📄 🆓 Agentless: Demystifying LLM-based Software Engineering Agents — Xia et al.
📄 🆓 OpenHands / OpenDevin — All Hands AI.
📄 🆓 Voyager: An Open-Ended Embodied Agent with LLMs — Wang et al.

T6 · App Architecture

📄 🆓 Emerging Architectures for LLM Applications — a16z.
📄 🆓 The Prompt Report — Schulhoff et al.

T7 · Prompt Engineering

📄 🆓 Chain-of-Thought Prompting Elicits Reasoning — Wei et al.
📄 🆓 Large Language Models are Zero-Shot Reasoners — Kojima et al. ("Let's think step by step").
📄 🆓 Self-Consistency Improves CoT — Wang et al.
📄 🆓 Tree of Thoughts — Yao et al.

T8 · RAG

📄 🆓 Dense Passage Retrieval — Karpukhin et al.
📄 🆓 Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE) — Gao et al.
📄 🆓 RAG for LLMs: A Survey — Gao et al.
📄 🆓 Retrieval-Augmented Generation for Knowledge-Intensive NLP — Lewis et al.
📄 🆓 Self-RAG: Learning to Retrieve, Generate, and Critique — Asai et al.

T9 · Memory

📄 🆓 Generative Agents: Interactive Simulacra of Human Behavior — Park et al.
📄 🆓 Lost in the Middle — Liu et al.
📄 🆓 MemGPT: Towards LLMs as Operating Systems — Packer et al.

T10 · Tool Use & MCP

📄 🆓 Berkeley Function-Calling Leaderboard — UC Berkeley.
📄 🆓 Gorilla: LLM Connected with Massive APIs — Patil et al.
📄 🆓 MRKL Systems — Karpas et al.
📄 🆓 Toolformer — Schick et al.

T11 · Orchestration & Patterns

📄 🆓 ReAct: Synergizing Reasoning and Acting — Yao et al.
📄 🆓 Reflexion: Language Agents with Verbal Reinforcement Learning — Shinn et al.
📄 🆓 Self-Refine: Iterative Refinement with Self-Feedback — Madaan et al.
📄 🆓 The Rise and Potential of LLM-based Agents: A Survey — Xi et al.

T12 · Multi-Agent

📄 🆓 A Survey on LLM-based Autonomous Agents — Wang et al.
📄 🆓 AutoGen — Wu et al.
📄 🆓 CAMEL: Communicative Agents for Mind Exploration — Li et al.
📄 🆓 MetaGPT — Hong et al.

T13 · Evaluation

📄 🆓 HELM: Holistic Evaluation of Language Models — Liang et al.
📄 🆓 Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena — Zheng et al.
📄 🆓 Who Validates the Validators? — Shankar et al.

T14 · Observability

📄 🆓 OpenTelemetry Semantic Conventions for Generative AI — OTel.

T15 · Guardrails & Security

📄 🆓 Many-shot Jailbreaking — Anthropic.
📄 🆓 Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection — Greshake et al.
📄 🆓 Universal and Transferable Adversarial Attacks on Aligned LLMs (GCG) — Zou et al.

T16 · Safety & Alignment

📄 🆓 Concrete Problems in AI Safety — Amodei et al.
📄 🆓 Constitutional AI — Bai et al.
📄 🆓 Scalable Agent Alignment via Reward Modeling — Leike et al.

T17 · Fine-tuning & Post-training

📄 🆓 Constitutional AI / RLAIF — Bai et al.
📄 🆓 Direct Preference Optimization (DPO) — Rafailov et al.
📄 🆓 LoRA: Low-Rank Adaptation — Hu et al.
📄 🆓 QLoRA — Dettmers et al.
📄 🆓 Training LMs to follow instructions with human feedback (InstructGPT) — Ouyang et al.

T18 · Inference & Serving

📄 🆓 Efficient Memory Management for LLM Serving with PagedAttention (vLLM) — Kwon et al.
📄 🆓 Fast Inference from Transformers via Speculative Decoding — Leviathan et al.
📄 🆓 FlashAttention — Dao et al.
📄 🆓 SGLang: Efficient Execution of Structured Language Model Programs — Zheng et al.

T19 · Voice & Multimodal

📄 🆓 Moshi — Kyutai.
📄 🆓 Robust Speech Recognition via Large-Scale Weak Supervision (Whisper) — Radford et al.
📄 🆓 Seamless: Multilingual Expressive and Streaming Speech Translation — Meta.

T20 · Product & UX

📄 🆓 Guidelines for Human-AI Interaction — Amershi et al. (Microsoft Research, CHI 2019).

🧪 Benchmarks & Leaderboards

Public benchmarks and leaderboards for coding agents, tool use, RAG, evaluation, and more.

T1 / T4 · Coding Agents & SWE Benchmarks

🧪 🆓 BigCodeBench — Practical programming with diverse function calls.
🧪 🆓 HumanEval+ / EvalPlus — Strengthened HumanEval.
🧪 🆓 LiveCodeBench — Rolling contamination-free coding benchmark.
🧪 🆓 MLE-bench — OpenAI. Kaggle-style ML engineering benchmark.
⭐ 🧪 🆓 SWE-bench — Real-world GitHub-issue resolution benchmark; Verified subset is the de-facto industry standard.
🧪 🆓 Terminal-Bench — Stanford / Laude. Long-horizon terminal task benchmark.

T5 · Autonomous Agents

🧪 🆓 AgentBench — Tsinghua. Broad agent capability benchmark.
🧪 🆓 GAIA — General AI Assistants benchmark.
🧪 🆓 MLE-bench — ML-engineering agents.
🧪 🆓 OSWorld — Desktop OS-controlling agents.
🧪 🆓 WebArena / VisualWebArena — Web-navigation agents.

T8 · RAG

🧪 🆓 ARES — Automated RAG evaluation.
🧪 🆓 BEIR — Zero-shot IR benchmark.
🧪 🆓 MTEB — Massive Text Embedding Benchmark.
🧪 🆓 RAGAS — Framework and leaderboard for RAG eval.

T10 · Tool Use & Function Calling

🧪 🆓 API-Bank — Alibaba. Tool-augmented assistants.
🧪 🆓 τ-bench — Sierra. Tool-agent-user interaction benchmark.
🧪 🆓 Berkeley Function-Calling Leaderboard (BFCL) — UC Berkeley.

T11 · Orchestration / T12 · Multi-Agent

🧪 🆓 AgentBench — General agent-capability.
🧪 🆓 AgentBoard — HKUST. Analytic, fine-grained agent eval.

T13 · Evaluation

🧪 🆓 Chatbot Arena / LMSYS Arena — Human-preference leaderboard.
🧪 🆓 HELM — Stanford CRFM. Holistic evaluation.
🧪 🆓 MMLU-Pro — Harder MMLU.
🧪 🆓 MT-Bench — LLM-as-judge multi-turn.

T15 · Guardrails & Security

🧪 🆓 AdvBench / HarmBench — CAIS. Adversarial / red-team benchmarks.
🧪 🆓 JailbreakBench — Chao et al.
🧪 🆓 PurpleLlama CyberSecEval — Meta.

T16 · Safety & Alignment

🧪 🆓 BBQ — Bias benchmark.
🧪 🆓 ToxiGen — Toxicity.
🧪 🆓 TruthfulQA — Truthfulness benchmark.

T18 · Inference

🧪 🆓 LLMPerf — Anyscale. Throughput/latency tool.
🧪 🆓 MLPerf Inference — MLCommons. Industry-standard serving benchmark.

T19 · Voice & Multimodal

🧪 🆓 Dynabench speech — Live speech-model benchmarks.
🧪 🆓 MMMU — Multimodal multidiscipline benchmark.
🧪 🆓 VideoMME — Video understanding.

🏗️ Reference Implementations & Case Studies

Public production write-ups and canonical reference repositories that teach by example.

T1 / T3 · Coding Agents & IDE Rules

🏗️ 🆓 Aider — Reference terminal coding agent with detailed engineering blog.
⭐ 🏗️ 🆓 Claude Code — Anthropic's reference agentic CLI.
🏗️ 🆓 Cline — Open-source autonomous coding agent.
🏗️ 🆓 OpenHands — All Hands AI. Open-source autonomous SWE agent.

T2 · Spec-Driven Dev

🏗️ 🆓 GitHub spec-kit — Reference spec-driven toolkit.

T5 · Autonomous SWE Agents

🏗️ 🆓 Agentless — Minimal agentless baseline that beat prior agents on SWE-bench Lite.
🏗️ 🆓 AutoCodeRover — NUS.
🏗️ 🆓 SWE-agent — Princeton NLP. Reference agent for SWE-bench.

T6 · App Architecture

🏗️ 🆓 LangChain templates — Reference app scaffolds.
🏗️ 🆓 Open Interpreter — Reference local code-execution agent.
🏗️ 🆓 Quivr — Reference full-stack RAG assistant.

T8 · RAG

🏗️ 🆓 GraphRAG — Microsoft Research.
⭐ 🏗️ 🆓 LlamaIndex — Reference RAG framework; docs double as case studies.
🏗️ 🆓 RAGFlow — Production-grade RAG reference.
🏗️ 🆓 Verba — Weaviate reference RAG app.

T9 · Memory

🏗️ 🆓 Letta (MemGPT) — Reference agentic-memory implementation.
🏗️ 🆓 Mem0 — Reference memory layer.
🏗️ 🆓 Zep — Long-term memory store.

T10 · Tool Use & MCP

🏗️ 🆓 Anthropic MCP reference servers — The canonical reference MCP servers.
🏗️ 🆓 awesome-mcp-servers — Community catalogue of MCP server implementations.

T11 / T12 · Orchestration & Multi-Agent

🏗️ 🆓 AutoGen — Microsoft.
🏗️ 🆓 CrewAI — Reference role-based multi-agent.
🏗️ 🆓 LangGraph — Reference graph-based orchestration.
🏗️ 🆓 Pydantic AI — Type-safe agent framework.

T13 · Evaluation

🏗️ 🆓 DeepEval — Reference eval framework.
🏗️ 🆓 EleutherAI lm-evaluation-harness — Standard offline-eval harness.
🏗️ 🆓 RAGAS — RAG-specific evaluation.

T14 · Observability

🏗️ 🆓 Arize Phoenix — Open-source tracing + evals.
🏗️ 🆓 Langfuse — Open-source LLM observability.
🏗️ 🆓 OpenLLMetry — OTel-based LLM instrumentation.
🏗️ 🆓 Future AGI — Open-source platform to trace, evaluate, simulate, guard, and auto-improve AI agents.

T15 · Guardrails & Security

🏗️ 🆓 Guardrails AI — Reference guardrails framework.
🏗️ 🆓 NVIDIA NeMo Guardrails — Programmable guardrails.
🏗️ 🆓 Rebuff — Prompt-injection defence reference.

T17 · Fine-tuning

🏗️ 🆓 Axolotl — Reference fine-tuning framework.
🏗️ 🆓 Hugging Face alignment-handbook — Reference RLHF/DPO recipes.
🏗️ 🆓 LLaMA-Factory — Unified fine-tuning toolkit.
🏗️ 🆓 Unsloth — Fast LoRA/QLoRA reference.

T18 · Inference & Serving

🏗️ 🆓 llama.cpp — Reference CPU/GPU local inference.
🏗️ 🆓 SGLang — Structured generation serving.
🏗️ 🆓 TensorRT-LLM — NVIDIA reference optimised serving.
⭐ 🏗️ 🆓 vLLM — Reference high-throughput LLM serving.

T19 · Voice & Multimodal

🏗️ 🆓 LiveKit Agents — Voice-agent reference.
🏗️ 🆓 Pipecat — Daily's voice-agent framework.
🏗️ 🆓 Ultravox — Real-time speech LM.

T20 · Product & UX

🏗️ 🆓 assistant-ui — Reference React components for AI chat.
🏗️ 🆓 Open WebUI — Reference local chat UI.
🏗️ 🆓 Vercel AI SDK — Reference AI-UI patterns and streaming.

🎥 Talks, Workshops & Conferences

Recorded talks, workshops, and conference series worth watching.

Conference series

⭐ 🎥 🆓 AI Engineer Summit / World's Fair — The definitive practitioner conference; full talks on YouTube.
🎥 🆓 COLM — Conference on Language Modeling. New dedicated LM venue.
🎥 🆓 LlamaCon — Meta's open-source LLM conference.
🎥 🆓 MLSys — Core ML-systems conference (inference, serving).
🎥 🆓 NeurIPS / ICML / ICLR — Core ML research venues; most papers include recorded talks.

Canonical talks

⭐ 🎥 🆓 1hr Talk: Intro to LLMs (Nov 2024) — Karpathy updated "Deep Dive into LLMs".
⭐ 🎥 🆓 Intro to LLMs — Andrej Karpathy. The reference "how LLMs work" talk.
⭐ 🎥 🆓 Let's build GPT: from scratch, in code — Andrej Karpathy.
🎥 🆓 Stanford CS25: Transformers United — Full lecture series.
🎥 🆓 State of GPT — Andrej Karpathy (Microsoft Build 2023).

T1 · Coding Agents

🎥 🆓 Cursor: Building the AI-first IDE — Cursor team channel.
🎥 🆓 Mastering Claude Code — Anthropic (Boris Cherny).
🎥 🆓 The future of AI coding — Latent Space talk archives.

T4 · SWE Benchmarks

🎥 🆓 SWE-bench at NeurIPS — Carlos Jimenez.

T6 · App Architecture

🎥 🆓 Emerging architectures for LLM applications — a16z (video + post).
🎥 🆓 State of AI Engineering — Latent Space keynotes.

T7 · Prompt Engineering

🎥 🆓 Anthropic: Prompt Engineering for Business Performance — Anthropic.
🎥 🆓 ChatGPT Prompt Engineering for Developers — Andrew Ng + OpenAI.

T8 · RAG

🎥 🆓 RAG at scale — LangChain channel series.
🎥 🆓 Systematically improving RAG applications — Jason Liu.

T10 · MCP

🎥 🆓 MCP at AI Engineer Summit — AI Engineer.
🎥 🆓 Model Context Protocol deep dive — Anthropic.

T11 / T12 · Orchestration & Multi-Agent

🎥 🆓 Andrew Ng: What's next for AI agentic workflows — Sequoia AI Ascent 2024.
🎥 🆓 LangGraph: multi-agent workflows — LangChain.

T13 · Evaluation

🎥 🆓 Evaluating LLM-based applications — Josh Tobin (DBRX Summit).
🎥 🆓 LLM Evals: MT-Bench and Chatbot Arena — LMSYS.

T14 · Observability

🎥 🆓 OpenTelemetry for LLMs — KubeCon / OTel community talks.

T15 / T16 · Security & Safety

🎥 🆓 Anthropic AI safety research — Anthropic channel.
🎥 🆓 Simon Willison on prompt injection — Talks + essays hub.

T17 · Fine-tuning

🎥 🆓 Fine-tuning workshop — Hamel Husain channel.
🎥 🆓 Let's reproduce GPT-2 / build the GPT tokenizer — Karpathy channel.

T18 · Inference

🎥 🆓 CUDA Mode lectures — Community GPU/kernel series.
🎥 🆓 vLLM: high-throughput LLM serving — Anyscale / UC Berkeley talks.

T19 · Voice & Multimodal

🎥 🆓 LiveKit voice-agent talks — LiveKit.
🎥 🆓 OpenAI Realtime API demos — OpenAI.

T20 · Product & UX

🎥 🆓 AI UX: the next frontier — NNGroup.
🎥 🆓 Linus Lee: tools for thought — Talks archive.

T21 · Economics & Teams

🎥 🆓 a16z AI portfolio talks — a16z.
🎥 🆓 The Pragmatic Engineer on AI teams — Gergely Orosz.

🎧 Podcasts

Recurring podcasts with strong agentic & AI-engineering coverage.

🎧 🆓 Cognitive Revolution — Nathan Labenz. Weekly AI engineering + strategy.
🎧 🆓 Dwarkesh Podcast — Dwarkesh Patel. Deep interviews with top researchers.
🎧 🆓 Gradient Dissent — Weights & Biases. Applied-ML interviews.
🎧 🆓 Interconnects — Nathan Lambert. RLHF / post-training focus.
⭐ 🎧 🆓 Latent Space — swyx & Alessio. The AI-engineering podcast of record; guests include most major AI-lab engineers.
🎧 🆓 Lex Fridman Podcast — Long-form interviews with AI-lab CEOs and researchers.
🎧 🆓 Machine Learning Street Talk — Tim Scarfe. Technical deep-dives.
🎧 🆓 MLOps Community podcast — Demetrios Brinkmann. Ops-side operationalisation case studies.
🎧 🆓 No Priors — Sarah Guo & Elad Gil. Founders / researchers.
⭐ 🎧 🆓 Practical AI — Daniel Whitenack & Chris Benson. Long-running, practitioner-first.
🎧 🆓 Pragmatic Engineer — Gergely Orosz. AI-engineering org/hiring coverage.
🎧 🆓 The TWIML AI Podcast — Sam Charrington. Longest-running ML interview series.
🎧 🆓 Unsupervised Learning — Redpoint. AI-founder / operator conversations.

📰 Newsletters

Weekly and monthly curated newsletters.

📰 🆓 Ahead of AI — Sebastian Raschka. LLM research + fine-tuning deep-dives.
📰 🆓 Ben's Bites — Daily digest; founder-friendly.
📰 🆓 Chip Huyen's Blog — Occasional long-form on AI engineering.
📰 🆓 DiamantAI — Nir Diamant. Practical AI engineering and generative AI: RAG, agents, and LLM application patterns explained simply.
📰 🆓 Eugene Yan — Pattern / eval / RAG deep-dives.
📰 🆓 Hamel's Blog — Evals + applied LLMs.
⭐ 📰 🆓 Import AI — Jack Clark (Anthropic co-founder). Policy + research.
📰 🆓 Interconnects — Nathan Lambert. RLHF / post-training.
📰 🆓 Last Week in AI — Weekly recap.
⭐ 📰 🆓 Latent Space — swyx. The AI-engineering newsletter of record.
📰 🆓 Machine Learning Engineer Newsletter — Alejandro Saucedo. Weekly production-ML curation.
📰 🆓 MLOps Community newsletter — MLOps Community.
📰 🆓 Simon Willison's Weblog — RSS/email. Daily real-time coverage of tools and agents.
⭐ 📰 🆓 The Batch — Andrew Ng / DeepLearning.AI. Weekly AI-engineering digest.
📰 🆓 The Data Exchange — Ben Lorica.
📰 🆓 The Pragmatic Engineer — Gergely Orosz. AI-engineering hiring/org coverage.
📰 🆓 TLDR AI — Daily headlines.

🛡️ Governance, Safety & Responsible AI

Policy frameworks, safety research, red-teaming resources, and responsible-AI guidance.

Policy & frameworks

🆓 EU AI Act — European Commission. Official text + implementation timeline.
⭐ 🆓 NIST AI Risk Management Framework (AI RMF 1.0) — NIST. The foundational US framework.
🆓 NIST Generative AI Profile (NIST-AI-600-1) — NIST.
🆓 OECD AI Principles — International reference.
🆓 UK AI Safety Institute reports — UK AISI.

Lab safety & responsible scaling

🆓 Anthropic Core Views on AI Safety — Anthropic.
⭐ 🆓 Anthropic Responsible Scaling Policy — Anthropic.
🆓 Google DeepMind: Frontier Safety Framework — Google DeepMind.
🆓 OpenAI Preparedness Framework — OpenAI.

Security & red-teaming

🆓 HarmBench — CAIS.
🆓 MITRE ATLAS — Adversarial threat landscape for AI systems.
🆓 NIST Adversarial ML Taxonomy (NIST AI 100-2) — NIST.
⭐ 🆓 OWASP Top 10 for LLM Applications — OWASP.
🆓 Simon Willison's prompt-injection series — SW.

Responsible AI practice

🆓 Fairlearn — Open-source fairness toolkit.
🆓 Google Responsible AI practices — Google.
🆓 Microsoft Responsible AI Standard — Microsoft.
🆓 Partnership on AI — Multi-stakeholder org with published frameworks and incident database.

Papers & research

📄 🆓 Concrete Problems in AI Safety — Amodei et al.
📄 🆓 Constitutional AI — Bai et al.
📄 🆓 Red Teaming Language Models with Language Models — Perez et al.
📄 🆓 Sleeper Agents — Hubinger et al. (Anthropic).

🎨 Product, UX & Economics of AI

Going beyond engineering: designing for AI, human-AI interaction, and the economics of LLM applications.

Design & UX

🆓 Apple Human Interface Guidelines — Generative AI — Apple.
🆓 Google's People + AI Guidebook — Google PAIR.
⭐ 🆓 Guidelines for Human-AI Interaction — Amershi et al. (Microsoft Research). The canonical design heuristics.
🆓 Linus Lee — Essays on interfaces for tools of thought.
🆓 Maggie Appleton — Essays on the UX of agentic, malleable software.
🆓 NNGroup: Generative AI design patterns — Nielsen Norman Group.

Economics & business

⭐ 🆓 a16z: The Economic Case for Generative AI — a16z.
🆓 Artificial Analysis — Cross-provider pricing/latency/quality dashboards.
🆓 Epoch AI — Data on compute, cost, and scaling trends.
🆓 Latent Space on unit economics — Latent Space.
🆓 Stanford AI Index Report — Stanford HAI. Annual deep economic + research snapshot.

Product strategy

🆓 16 Changes to the Way Enterprises Build Software with AI — a16z.
🆓 AI product strategy — Lenny's Newsletter (AI tag).
🆓 Every Inc — Prose-heavy essays on AI product + consumer LLM UX.

🧑‍🤝‍🧑 Teams, Hiring & Org Design

How organisations structure AI-engineering work, hire for it, and operate sustainably.

🆓 a16z AI canon — a16z. Curated reading list for people building AI teams.
🆓 Building the AI Engineer role — swyx / Latent Space. The foundational essay defining "AI Engineer" as a discipline.
🆓 Chip Huyen: Machine learning in production — Org-design questions from production ML.
🆓 DeepLearning.AI AI Engineer Hiring Report — The Batch periodic coverage.
🆓 Emmanuel Ameisen: Building ML Powered Applications — Book + blog on AI-team building.
🆓 Eugene Yan: Team size and velocity — Eugene Yan.
🆓 GitHub: The AI-native developer — GitHub's research on workflows / productivity.
🆓 Shreya Shankar: Operationalizing ML — Shreya Shankar.
🆓 Staff Engineer — AI org posts — Will Larson & community.
⭐ 🆓 The Pragmatic Engineer — AI tag — Gergely Orosz. AI-engineering hiring + org design.
🆓 What is an AI Engineer? — Applied LLMs consortium.