README.md

May 10, 2026 Β· View on GitHub

Awesome Prompt Engineering πŸ§™β€β™‚οΈ

A hand-curated collection of resources for Prompt Engineering and Context Engineering β€” covering papers, tools, models, APIs, benchmarks, courses, and communities for working with Large Language Models.

https://promptslab.github.io

   Master Prompt Engineering. Join the Course at https://promptslab.github.io

Awesome License PRs Welcome Community Last Updated


πŸš€ Start Here

New to prompt engineering? Follow this path:

  1. Learn the basics β†’ ChatGPT Prompt Engineering for Developers (free, ~90 min)
  2. Read the guide β†’ Prompt Engineering Guide by DAIR.AI (open-source, comprehensive)
  3. Study provider docs β†’ OpenAI Prompt Engineering Guide Β· Anthropic Prompt Engineering Guide
  4. Understand where the field is heading β†’ Anthropic: Effective Context Engineering for AI Agents
  5. Read the research β†’ The Prompt Report β€” taxonomy of 58+ prompting techniques from 1,500+ papers

Table of Contents


Papers

πŸ“„

Major Surveys

Prompt Optimization and Automatic Prompting

Prompt Compression

Reasoning Advances

In-Context Learning

Agentic Prompting and Multi-Agent Systems

Multimodal Prompting

Structured Output and Format Control

Prompt Injection and Security

Applications of Prompt Engineering

Text-to-Image Generation

Text-to-Music/Audio Generation

Foundational Papers (Pre-2024)

These papers established the core concepts that modern prompt engineering builds on:


Tools and Code

πŸ”§

Prompt Management and Testing

NameDescriptionLink
PromptfooOpen-source CLI for testing, evaluating, and red-teaming LLM prompts. YAML configs, CI/CD integration, adversarial testing. ~9K+ ⭐GitHub
PromptifySolve NLP Problems with LLM's & Easily generate different NLP Task prompts for popular generative models like GPT, PaLM, and more with Promptify[Github]
AgentaOpen-source LLM developer platform for prompt management, evaluation, human feedback, and deployment.GitHub
PromptLayerVersion, test, and monitor every prompt and agent with robust evals, tracing, and regression sets.Website
HeliconeProduction prompt monitoring and optimization platform.Website
LangGPTFramework for structured and meta-prompt design. 10K+ ⭐GitHub
ChainForgeVisual toolkit for building, testing, and comparing LLM prompt responses without code.GitHub
LMQLA query language for LLMs making complex prompt logic programmable.GitHub
PromptotypePlatform for developing, testing, and managing structured LLM prompts.Website
PromptPandaAI-powered prompt management system for streamlining prompt workflows.Website
Promptimize AIBrowser extension to automatically improve user prompts for any AI model.Website
PROMPTMETHEUSWeb-based "Prompt Engineering IDE" for iteratively creating and running prompts.Website
Better PromptTest suite for LLM prompts before pushing to production.GitHub
OpenPromptOpen-source framework for prompt-learning research.GitHub
Prompt SourceToolkit for creating, sharing, and using natural language prompts.GitHub
Prompt EngineNPM utility library for creating and maintaining prompts for LLMs (Microsoft).GitHub
PromptInjectFramework for quantitative analysis of LLM robustness to adversarial prompt attacks.GitHub
LynxPromptSelf-hostable platform for managing AI IDE config files (.cursorrules, CLAUDE.md, copilot-instructions.md). Web UI, REST API, CLI, and federated blueprint marketplace for 30+ AI coding assistants.GitHub
flomptVisual AI prompt builder that decomposes prompts into 12 semantic blocks (role, context, constraints, examples, etc.) and compiles them into optimized XML. Browser extension for ChatGPT/Claude/Gemini, and MCP server for Claude Code agents. Free, open-source.Website

LLM Evaluation Tools

NameDescriptionLink
DeepEvalOpen-source evaluation framework covering RAG, agents, and conversations with CI/CD integration. ~7K+ ⭐GitHub
RagasRAG evaluation with knowledge-graph-based test set generation and 30+ metrics. ~8K+ ⭐GitHub
LangSmithLangChain's platform for debugging, testing, evaluating, and monitoring LLM applications.Website
LangfuseOpen-source LLM observability with tracing, prompt management, and human annotation. ~7K+ ⭐GitHub
BraintrustEnd-to-end AI evaluation platform, SOC2 Type II certified.Website
Arize AI / PhoenixReal-time LLM monitoring with drift detection and tracing.GitHub
TruLensEvaluating and explaining LLM apps; tracks hallucinations, relevance, groundedness.GitHub
InspectAIPurpose-built for evaluating agents against benchmarks (UK AISI).GitHub
OpikEvaluate, test, and ship LLM applications across dev and production lifecycles.GitHub
EvalViewCLI tool for testing multi-step AI agents with YAML test cases, regression detection, and production monitoring.GitHub

Agent Frameworks

NameDescriptionLink
LangChain / LangGraphMost widely adopted LLM app framework; LangGraph adds graph-based multi-step agent workflows. ~100K+ / ~10K+ ⭐GitHub · LangGraph
CrewAIRole-playing AI agent orchestration with 700+ integrations. ~44K+ ⭐GitHub
AutoGen (AG2)Microsoft's multi-agent conversational framework. ~40K+ ⭐GitHub
DSPyStanford's framework for programming LLMs with automatic prompt/weight optimization. ~22K+ ⭐GitHub
OpenAI Agents SDKOfficial agent framework with function calling, guardrails, and handoffs. ~10K+ ⭐GitHub
Semantic KernelMicrosoft's AI framework powering M365 Copilot; C#, Python, Java. ~24K+ ⭐GitHub
LlamaIndexData framework for RAG and agent capabilities. ~40K+ ⭐GitHub
HaystackOpen-source NLP framework with pipeline architecture for RAG and agents. ~20K+ ⭐GitHub
Agno (formerly Phidata)Python agent framework with microsecond instantiation. ~20K+ ⭐GitHub
SmolagentsHugging Face's minimalist code-centric agent framework (~1000 LOC). ~15K+ ⭐GitHub
Pydantic AIType-safe agent framework using Pydantic for structured validation. ~8K+ ⭐GitHub
MastraTypeScript AI agent framework with assistants, RAG, and observability. ~20K+ ⭐GitHub
Google ADKAgent Development Kit deeply integrated with Gemini and Google Cloud.GitHub
Strands Agents (AWS)Model-agnostic framework with deep AWS integrations.GitHub
LangflowNode-based visual agent builder with drag-and-drop. ~50K+ ⭐GitHub
n8nWorkflow automation with AI agent capabilities and 400+ integrations. ~60K+ ⭐GitHub
DifyAll-in-one backend for agentic workflows with tool-using agents and RAG.GitHub
PraisonAIMulti-AI Agents framework with 100+ LLM support, MCP integration, and built-in memory.GitHub
NeurolinkMulti-provider AI agent framework unifying 12+ providers with workflow orchestration.GitHub
ComposioConnect 100+ tools to AI agents with zero setup.GitHub

Prompt Optimization Tools

NameDescriptionLink
DSPyMultiple optimizers (MIPROv2, BootstrapFewShot, COPRO) for automatic prompt tuning. ~22K+ ⭐GitHub
TextGradAutomatic differentiation via text (Stanford). ~2K+ ⭐GitHub
OPROGoogle DeepMind's optimization by prompting.GitHub

Red Teaming and Prompt Security

NameDescriptionLink
Garak (NVIDIA)LLM vulnerability scanner for hallucination, injection, and jailbreaks β€” the "nmap for LLMs." ~3K+ ⭐GitHub
PyRIT (Microsoft)Python Risk Identification Tool for automated red-teaming. ~3K+ ⭐GitHub
DeepTeam40+ vulnerabilities, 10+ attack methods, OWASP Top 10 support.GitHub
LLM GuardSecurity toolkit for LLM I/O validation. ~2K+ ⭐GitHub
NeMo Guardrails (NVIDIA)Programmable guardrails for conversational systems. ~5K+ ⭐GitHub
Guardrails AIDefine strict output formats (JSON schemas) to ensure system reliability.Website
LakeraAI security platform for real-time prompt injection detection.Website
Purple Llama (Meta)Open-source LLM safety evaluation including CyberSecEval.GitHub
GPTFuzzAutomated jailbreak template generation achieving >90% success rates.GitHub
RebuffOpen-source tool for detection and prevention of prompt injection.GitHub
AgentSeal"Open-source scanner that runs 150 attack probes to test AI agents for prompt injection and extraction vulnerabilities."GitHub

MCP (Model Context Protocol)

MCP is an open standard developed by Anthropic (Nov 2024, donated to Linux Foundation Dec 2025) for connecting AI assistants to external data sources and tools through a standardized interface. It has 97M+ monthly SDK downloads and has been adopted by GitHub, Google, and most major AI providers.

NameDescriptionLink
MCP SpecificationThe core protocol specification and SDKs. ~15K+ ⭐GitHub
MCP Reference ServersOfficial implementations: fetch, filesystem, GitHub, Slack, Postgres.GitHub
FastMCP (Python)High-level Pythonic framework for building MCP servers. ~5K+ ⭐GitHub
GitHub MCP ServerGitHub's official MCP server for repo, issue, PR, and Actions interaction. ~15K+ ⭐GitHub
Awesome MCP ServersCurated list of 10,000+ community MCP servers. ~30K+ ⭐GitHub
Context7MCP server providing version-specific documentation to reduce code hallucination.GitHub
GitMCPCreates remote MCP servers for any GitHub repo by changing the domain.Website
MCP InspectorVisual testing tool for MCP server development.GitHub

Vibe Coding and AI Coding Assistants

🟒 = Open Source Β· πŸ”΅ = Commercial Β· 🟣 = Open Source + Commercial (open core with paid cloud/API)

CLI-Based Coding Agents

Terminal-native agentic tools that understand your codebase and execute multi-step tasks.

NameDescriptionTypeLink
Claude CodeAnthropic's agentic coding CLI; understands full codebases and executes complex multi-step tasks via natural language.πŸ”΅Docs
OpenAI Codex CLIOpen-source terminal coding agent from OpenAI; lightweight, local-first, with sandboxed code execution. ~68K+ ⭐🟣GitHub
Gemini CLIGoogle's open-source terminal AI agent with 1M-token context window and Google Search grounding. ~96K+ ⭐🟣GitHub
Qwen CodeOpen-source terminal AI agent optimized for Qwen3-Coder; multi-protocol support (OpenAI/Anthropic/Gemini APIs), 1,000 free requests/day. ~21K+ ⭐🟒GitHub
AiderAI pair programming in terminal with deep Git integration; maps entire codebases and auto-commits changes. ~42K+ ⭐🟒GitHub
OpenCodePowerful open-source AI coding agent with beautiful TUI; supports nearly all AI model providers. ~120K+ ⭐🟒GitHub
GooseExtensible open-source AI agent from Block (Square/Cash App); installs, executes, edits, and tests with any LLM. ~29K+ ⭐🟒GitHub
CrushGlamorous agentic coding agent from Charmbracelet with multi-model support, LSP integration, and beautiful terminal UI. ~9K+ ⭐🟒GitHub
Amazon Q Developer CLIAgentic chat experience in terminal from AWS; transitioning to Kiro CLI.🟣GitHub
AmpSourcegraph's agentic coding tool (Cody successor); works across CLI and IDE.πŸ”΅Website
Junie CLIJetBrains' LLM-agnostic coding agent CLI (beta 2026); supports all major model providers.πŸ”΅Website
Autohand Code CLISelf-evolving autonomous terminal coding agent with multi-provider LLM support, 40+ tools, and modular skills system.🟒GitHub

AI Code Editors / IDEs

Standalone editors or IDE forks with deep AI integration.

NameDescriptionTypeLink
CursorLeading AI-native code editor (VS Code fork); Composer generates entire apps from natural language, agentic multi-file edits.πŸ”΅Website
WindsurfAI-powered IDE (VS Code fork) with proprietary Cascade agent and SWE-1.5 model; acquired by Cognition AI.πŸ”΅Website
ZedHigh-performance editor in Rust with native AI features, Zeta edit prediction, and Agent Client Protocol support. ~77K+ ⭐🟒GitHub
TraeFree AI-powered IDE from ByteDance ("The Real AI Engineer") with Builder Mode; provides free access to Claude, GPT-4o, and DeepSeek.πŸ”΅Website
Google AntigravityGoogle's agent-first IDE (VS Code fork) with Manager view for orchestrating multiple agents in parallel; powered by Gemini.πŸ”΅Website
KiroAWS's spec-driven agentic AI IDE (VS Code fork); turns prompts into specs, then working code, docs, and tests.πŸ”΅Website
PearAIOpen-source AI code editor (VS Code fork) with Continue-based chat and completions. ~40K+ ⭐🟒GitHub
VoidOpen-source Cursor alternative (VS Code fork); any model or local hosting with change visualization. ~28K+ ⭐🟒GitHub
MeltyOpen-source chat-first AI code editor with multi-file editing and deep Git integration. ~7K+ ⭐🟒GitHub
EmdashOpen-source agentic dev environment (YC W26) for running multiple coding agents in parallel in isolated Git worktrees.🟒GitHub

IDE Extensions / Plugins

Plugins for VS Code, JetBrains, Neovim, and other editors.

NameDescriptionTypeLink
GitHub CopilotMost widely adopted AI coding assistant; inline completions, chat, and agentic coding agent across VS Code, JetBrains, Neovim.πŸ”΅Website
ClineAutonomous coding agent in VS Code with human-in-the-loop approvals; file editing, terminal commands, and browser use. ~59K+ ⭐🟒GitHub
ContinueOpen-source VS Code and JetBrains extension for creating custom, modular AI dev systems; any model. ~32K+ ⭐🟒GitHub
CodySourcegraph-powered AI assistant that pulls context from local and remote codebases; VS Code, JetBrains, Visual Studio.πŸ”΅Website
CodeiumFree AI coding extension for 40+ IDEs with completions, chat, and search across 70+ languages.🟣Website
Amazon Q DeveloperAWS's AI coding assistant with completions, inline chat, and agent mode; deep AWS integration.🟣Website
Gemini Code AssistGoogle's IDE extension powered by Gemini with completions, Next Edit Predictions, and inline diffs; free for individuals.🟣Website
TabninePrivacy-focused AI assistant trained on permissive-licensed OSS; supports all major IDEs with on-premises deployment.πŸ”΅Website
Augment CodeEnterprise AI coding assistant with 200K-token Context Engine for deep codebase understanding.πŸ”΅Website
QodoAI code review and quality platform with multi-agent architecture; test generation, code review, CI/CD enforcement.🟣Website
CodeGeeXOpen-source multilingual code generation model supporting 20+ languages with VS Code and JetBrains extensions. ~11K+ ⭐🟒GitHub
TabbySelf-hosted open-source AI coding assistant (Copilot alternative); runs entirely on your infrastructure. ~25K+ ⭐🟒GitHub

AI Coding Platforms / Cloud Agents

Browser-based or cloud-hosted agents that build, test, and deploy autonomously.

NameDescriptionTypeLink
DevinFirst fully autonomous cloud-based AI software engineer; plans, codes, tests, and opens PRs independently.πŸ”΅Website
Replit AgentCloud-native AI agent that autonomously builds, tests, and deploys full-stack apps in-browser; 50+ languages.πŸ”΅Website
bolt.newAI-powered web dev agent; prompt, run, edit, and deploy full-stack apps directly in the browser via WebContainers. ~15K+ ⭐🟒GitHub
bolt.diyCommunity fork of bolt.new with extended features and broader LLM flexibility. ~12K+ ⭐🟒GitHub
LovableFull-stack apps from natural language with built-in Supabase, auth, and one-click deploy; fastest European startup to $20M ARR.πŸ”΅Website
v0Vercel's AI platform for generating high-quality React/Next.js UI components from natural language.πŸ”΅Website
GitHub Copilot WorkspaceCloud-based coding environment with plan, brainstorm, and repair agents; included with paid Copilot plans.πŸ”΅Website
Firebase StudioGoogle's agentic cloud-based development environment.πŸ”΅Website

Open-Source Coding Agent Frameworks

Frameworks and research projects for building autonomous coding agents.

NameDescriptionTypeLink
OpenHandsLeading open-source platform for cloud coding agents; consistently top on SWE-bench. Formerly OpenDevin. ~69K+ ⭐🟒GitHub
SWE-agentTakes a GitHub issue and automatically fixes it using a custom agent-computer interface. [NeurIPS 2024] ~19K+ ⭐🟒GitHub
Open SWELangChain's async cloud-hosted coding agent framework built on LangGraph with Slack/Linear integration. ~8K+ ⭐🟒GitHub
DevikaOpen-source agentic software engineer; breaks down instructions, researches, and writes code. Devin alternative. ~18K+ ⭐🟒GitHub
AutoCodeRoverAutonomous program improvement combining LLMs with fault localization for GitHub issue resolution. ~2.8K+ ⭐🟒GitHub
AgentlessSimple three-phase approach (localize β†’ repair β†’ validate) to solving software development problems. ~2K+ ⭐🟒GitHub
DevonOpen-source pair programmer SWE agent with code writing, planning, and research; supports Claude, GPT-4, Llama, Ollama. ~3.5K+ ⭐🟒GitHub

Other Notable Repositories

NameDescriptionLink
Prompt Engineering Guide (DAIR.AI)The definitive open-source guide and resource hub. 3M+ learners. ~55K+ ⭐GitHub
Awesome ChatGPT Prompts / Prompts.chatWorld's largest open-source prompt library. 1000s of prompts for all major models.GitHub
12-Factor AgentsPrinciples for building production-grade LLM-powered software. ~17K+ ⭐GitHub
NirDiamant/Prompt_Engineering22 hands-on Jupyter Notebook tutorials. ~3K+ ⭐GitHub
Context Engineering RepositoryFirst-principles handbook for moving beyond prompt engineering to context design.GitHub
AI Agent System Prompts LibraryCollection of system prompts from production AI coding agents (Claude Code, Gemini CLI, Cline, Aider, Roo Code).GitHub
Awesome Vibe CodingCurated list of 245+ tools and resources for building software through natural language prompts.GitHub
OpenAI CookbookOfficial recipes for prompts, tools, RAG, and evaluations.GitHub
EmbedchainFramework to create ChatGPT-like bots over your dataset.GitHub
ThoughtSourceFramework for the science of machine thinking.GitHub
PromptextExtracts and formats code context for AI prompts with token counting.GitHub
Price Per TokenCompare LLM API pricing across 200+ models.Website
OpenPawCLI tool (npx pawmode) that turns Claude Code into a personal assistant by generating system prompts (CLAUDE.md + SOUL.md) with personality, memory, and 38 skill routers.GitHub
Think BetterOpen-source CLI that permanently injects 10 structured decision frameworks (MECE, Issue Trees, Pre-Mortems) and 12 cognitive bias detectors into AI assistant prompts. Go, MIT.GitHub

APIs

πŸ’»

OpenAI

ModelContextPrice (Input/Output per 1M tokens)Key Feature
GPT-5.2 / 5.2 Thinking400K$1.75 / $14Latest flagship, 90% cached discount, configurable reasoning
GPT-5.1400K$1.25 / $10Previous generation flagship
GPT-4.1 / 4.1 mini / nano1M$2 / $8Best non-reasoning model, 40% faster and 80% cheaper than GPT-4o
o3 / o3-pro200KVariesReasoning models with native tool use
o4-mini200KCost-efficientFast reasoning, best on AIME at its cost class
GPT-OSS-120B / 20B128K$0.03 / $0.30First open-weight models, Apache 2.0

Key features: Responses API, Agents SDK, Structured Outputs, function calling, prompt caching (90% discount), Batch API (50% discount), MCP support. Platform Docs

Anthropic (Claude)

ModelContextPrice (Input/Output per 1M tokens)Key Feature
Claude Opus 4.61M (beta)$5 / $25Most powerful, state-of-the-art coding and agentic tasks
Claude Sonnet 4.5200K$3 / $15Best coding model, 61.4% OSWorld (computer use)
Claude Haiku 4.5200KFast tierNear-frontier, fastest model class
Claude Opus 4 / Sonnet 4200K$15/$75 (Opus)Opus: 72.5% SWE-bench, Sonnet 4 powers GitHub Copilot

Key features: Extended Thinking with tool use, Computer Use, MCP (originated here), prompt caching, Claude Code CLI, available on AWS Bedrock and Google Vertex AI. API Docs

Google (Gemini)

ModelContextPrice (Input/Output per 1M tokens)Key Feature
Gemini 3 Pro Preview1M$2 / $12Most intelligent Google model, deployed to 2B+ Search users
Gemini 2.5 Pro1M$1.25 / $10Best for coding/agentic tasks, thinking model
Gemini 2.5 Flash / Flash-Lite1M$0.30/$1.50 Β· $0.10/$0.40Price-performance leaders

Key features: Thinking (all 2.5+ models), Google Search grounding, code execution, Live API (real-time audio/video), context caching. Google AI Studio

Meta (Llama)

ModelArchitectureContextKey Feature
Llama 4 Scout109B MoE / 17B active10MFits single H100, multimodal, open-weight
Llama 4 Maverick400B MoE / 17B active, 128 experts1MBeats GPT-4o, open-weight
Llama 3.3 70BDense128KMatches Llama 3.1 405B

Available on 25+ cloud partners, Hugging Face, and inference APIs. Llama

Other Notable Providers

ProviderDescriptionLink
Mistral AIMistral Large 3 (675B MoE), Devstral 2, Ministral 3. Apache 2.0.Website
DeepSeekV3.2 (671B MoE), R1 (reasoning, MIT license). $0.15/$0.75 per 1M tokens.Website
xAI (Grok)Grok 4.1 Fast: 2M context, $0.20/$0.50 per 1M tokens.Website
CohereCommand A (111B, 256K context), Embed v4, Rerank 4.0. Excels at RAG.Website
Together AI200+ open models with sub-100ms latency.Website
GroqLPU hardware with ~300+ tokens/sec inference.Website
Fireworks AIFast inference with HIPAA + SOC2 compliance.Website
OpenRouterUnified API for 300+ models from all providers.Website
CerebrasWafer-scale chips with best total response time.Website
Perplexity AISearch-augmented API with citations.Website
Amazon BedrockManaged multi-model service with Claude, Llama, Mistral, Cohere.Website
Hugging Face InferenceAccess to open models via API.Website

Datasets and Benchmarks

πŸ’Ύ

Major Benchmarks (2024–2026)

NameDescriptionLink
Chatbot Arena / LM Arena6M+ user votes for Elo-rated pairwise LLM comparisons. De facto standard for human preference.Website
MMLU-Pro12,000+ graduate-level questions across 14 domains. NeurIPS 2024 Spotlight.GitHub
GPQA448 "Google-proof" STEM questions; non-expert validators achieve only 34%.arXiv
SWE-bench VerifiedHuman-validated 500-task subset for real-world GitHub issue resolution.Website
SWE-bench Pro1,865 tasks across 41 professional repos; best models score only ~23%.Leaderboard
Humanity's Last Exam (HLE)2,500 expert-vetted questions; top AI scores only ~10–30%.Website
BigCodeBench1,140 coding tasks across 7 domains; AI achieves ~35.5% vs. 97% human success.Leaderboard
LiveBenchContamination-resistant with frequently updated questions.Paper
FrontierMathResearch-level math; AI solves only ~2% of problems.Research
ARC-AGI v2Abstract reasoning measuring fluid intelligence.Research
IFEvalInstruction-following evaluation with formatting/content constraints.arXiv
MLE-benchOpenAI's ML engineering evaluation via Kaggle-style tasks.GitHub
PaperBenchEvaluates AI's ability to replicate 20 ICML 2024 papers from scratch.GitHub

Leaderboards and Meta-Benchmarks

NameDescriptionLink
Hugging Face Open LLM Leaderboard v2Evaluates open models on MMLU-Pro, GPQA, IFEval, MATH.Leaderboard
Artificial Analysis Intelligence Index v3Aggregates 10 evaluations.Website
SEAL by Scale AIHosts SWE-bench Pro and agentic evaluations.Leaderboard

Prompt and Instruction Datasets

NameDescriptionLink
P3 (Public Pool of Prompts)Prompt templates for 270+ NLP tasks used to train T0 and similar models.HuggingFace
System Prompts Dataset944 system prompt templates for agent workflows (by Daniel Rosehill, Aug 2025).HuggingFace
OpenAssistant Conversations (OASST)161,443 messages in 35 languages with 461,292 quality ratings.HuggingFace
UltraChat / UltraFeedbackLarge-scale synthetic instruction and preference datasets for alignment training.HuggingFace
SoftAge Prompt Engineering Dataset1,000 diverse prompts across 10 categories for benchmarking prompt performance.HuggingFace
Text Transformation Prompt LibraryComprehensive collection of text transformation prompts (May 2025).HuggingFace
Writing Prompts~300K human-written stories paired with prompts from r/WritingPrompts.Kaggle
Midjourney PromptsText prompts and image URLs scraped from MidJourney's public Discord.HuggingFace
CodeAlpaca-20k20,000 programming instruction-output pairs.HuggingFace
ProPEX-RAGDataset for prompt optimization in RAG workflows.HuggingFace
NanoBanana Trending Prompts1,000+ curated AI image prompts from X/Twitter, ranked by engagement.GitHub

Red Teaming and Adversarial Datasets

NameDescriptionLink
HarmBench510 harmful behaviors across standard, contextual, copyright, and multimodal categories.Website
JailbreakBenchOpen robustness benchmark for jailbreaking with 100 prompts.Research
AgentHarm110 malicious agent tasks across 11 harm categories.arXiv
DecodingTrust243,877 prompts evaluating trustworthiness across 8 perspectives.Research
SafetyPrompts.comAggregator tracking 50+ safety/red-teaming datasets.Website

Models

🧠

Frontier Models (2025–2026)

ModelProviderContextKey Strength
GPT-5.2OpenAI400KGeneral intelligence, 100% AIME 2025
Claude Opus 4.6Anthropic1M (beta)Coding, agentic tasks, extended thinking
Gemini 3 ProGoogle1M#1 LMArena (~1500 Elo), multimodal
Grok 4.1xAI2M#2 LMArena (1483 Elo), low hallucination
Mistral Large 3Mistral AI256KBest open-weight (675B MoE/41B active), Apache 2.0
DeepSeek-V3.2DeepSeek128KBest value (671B MoE/37B active), MIT license
Llama 4 MaverickMeta1MBeats GPT-4o (400B MoE/17B active), open-weight

Reasoning Models

ModelKey Detail
OpenAI o3 / o3-pro87.7% GPQA Diamond. Native tool use.
OpenAI o4-miniBest AIME at its cost class with visual reasoning.
DeepSeek-R1 / R1-0528Open-weight, RL-trained. 87.5% on AIME 2025. MIT license.
QwQ (Qwen with Questions)32B reasoning model. Apache 2.0. Comparable to R1.
Gemini 2.5 Pro/Flash (Thinking)Built-in reasoning with configurable thinking budget.
Claude Extended ThinkingHybrid mode with visible chain-of-thought and tool use.
Phi-4 Reasoning / Plus14B reasoning models rivaling much larger models. Open-weight.
GPT-OSS-120BOpenAI's open-weight with CoT. Near-parity with o4-mini. Apache 2.0.

Notable Open-Source Models

ModelProviderKey Detail
Qwen3-235B-A22BAlibabaFlagship MoE. Strong reasoning/code/multilingual. Apache 2.0. Most downloaded family on HuggingFace.
Gemma 3Google270M to 27B. Multimodal. 128K context. 140+ languages.
OLMo 2/3Allen AIFully open (data, code, weights, logs). OLMo 2 32B surpasses GPT-3.5. Apache 2.0.
SmolLM3-3BHugging FaceOutperforms Llama-3.2-3B. Dual-mode reasoning. 128K context.
Kimi K2Moonshot AI32B active. Open-weight. Tailored for coding/agentic use.
Llama 4 ScoutMeta109B MoE/17B active. 10M token context. Fits single H100.

Code-Specialized Models

ModelKey Detail
Qwen3-Coder (480B-A35B)69.6% SWE-bench β€” milestone for open-source coding. 256K context. Apache 2.0.
Devstral 2 (123B)72.2% SWE-bench Verified. 7x more cost-efficient than Claude Sonnet.
Codestral 25.01Mistral's code model. 80+ languages. Fill-in-the-Middle support.
DeepSeek-Coder-V2236B MoE / 21B active. 338 programming languages.
Qwen 2.5-Coder7B/32B. 92 programming languages. 88.4% HumanEval. Apache 2.0.

Foundational Models (Historical Reference)

These models established key concepts but are largely superseded for practical use:

ModelProviderSignificance
GLM-130BTsinghuaOpen bilingual English/Chinese LLM (2023)
Falcon 180BTIILarge open generative model (2023)
Mixtral 8x7BMistral AIPioneered MoE architecture for open models (2023)
GPT-NeoX-20BEleutherAIEarly open autoregressive LLM
GPT-J-6BEleutherAIEarly open causal language model

AI Content Detectors

πŸ”Ž

Leading Commercial Detectors

NameAccuracyKey FeatureLink
GPTZero99% claimed10M+ users, #1 on G2 (2025). Detects GPT-4/5, Gemini, Claude, Llama. Free tier available.Website
Originality.ai98–100% (peer-reviewed)Consistently rated most accurate. Combines AI detection + plagiarism + fact checking. From $14.95/month.Website
Turnitin AI Detection98%+ on unmodified AI textDominant in academia. Launched AI bypasser/humanizer detection (Aug 2025). Institutional licensing.Website
Copyleaks99%+ claimedEnterprise tool detecting AI in 30+ languages. LMS integrations.Website
Winston AI99.98% claimedOCR for scanned documents, AI image/deepfake detection. 11 languages.Website
Pangram Labs99.3% (COLING 2025)Highest score in COLING 2025 Shared Task. 100% TPR on "humanized" text. 97.7% adversarial robustness.Website

Free and Research Detectors

NameDescriptionLink
BinocularsOpen-source research detector using cross-perplexity between two LLMs.arXiv
DetectGPT / Fast-DetectGPTStatistical method comparing log-probabilities of original text vs. perturbations.arXiv
Openai DetectorAI classifier for indicating AI-written text (OpenAI Detector Python wrapper)[GitHub]
Sapling AI DetectorFree browser-based detector (up to 2,000 chars). 97% accuracy in some studies.Website
QuillBot AI DetectorFree, no sign-up required.Website
Writer AI Content DetectorFree tool with color-coded results.Website
ZeroGPTPopular free detector evaluated in multiple academic studies.Website

Watermarking Approaches

NameDescriptionLink
SynthID (Google DeepMind)Watermarking for AI text, images, and audio via statistical token sampling. Deployed in Google products.Website
OpenAI Text WatermarkingDeveloped but still experimental as of 2025. Research shows fragility concerns.Experimental

Important caveat: No detector claims 100% accuracy. Mixed human/AI text remains hardest to detect (50–70% accuracy). Adversarial robustness varies widely. The AI detection market is projected to grow from ~$2.3B (2025) to $15B by 2035.


Books

πŸ“–

Prompt Engineering

TitleAuthor(s)PublisherYear
Prompt Engineering for LLMsJohn Berryman & Albert ZieglerO'Reilly2024
Prompt Engineering for Generative AIJames Phoenix & Mike TaylorO'Reilly2024
Prompt Engineering for LLMsThomas R. CaldwellIndependent2025

LLM Application Development

TitleAuthor(s)PublisherYear
AI Engineering: Building Applications with Foundation ModelsChip HuyenO'Reilly2025
Build a Large Language Model (From Scratch)Sebastian RaschkaManning2024
Building LLMs for ProductionLouis-FranΓ§ois Bouchard & Louie PetersO'Reilly2024
LLM Engineer's HandbookPaul Iusztin & Maxime LabonnePackt2024
The Hundred-Page Language Models BookAndriy BurkovSelf-Published2025

AI Agents

TitleAuthor(s)PublisherYear
Building Applications with AI AgentsMichael AlbadaO'Reilly2025
AI Agents and ApplicationsRoberto InfanteManning2025
AI Agents in ActionMicheal LanhamManning2025

Production, Reliability, and Security

TitleAuthor(s)PublisherYear
LLMs in ProductionChristopher Brousseau & Matthew SharpManning2025
Building Reliable AI SystemsRush ShahaniManning2025
The Developer's Playbook for LLM SecuritySteve WilsonO'Reilly2024

Courses

πŸ‘©β€πŸ«

Free Short Courses

University and Platform Courses

Free Platform Courses

Learn Prompting Courses


Tutorials and Guides

πŸ“š

Official Provider Guides

Community and Independent Guides


Videos

πŸŽ₯


Communities

🀝

Discord Servers

  • Learn Prompting β€” 40,000+ members. Largest PE Discord with courses, hackathons, HackAPrompt competitions.
  • PromptsLab Discord - Community
  • Midjourney β€” 1M+ members. Primary hub for text-to-image prompt sharing.
  • OpenAI Discord β€” Official community with channels for GPTs, Sora, DALL-E, and API help.
  • Anthropic Discord β€” Official Claude community for AI development collaboration.
  • Hugging Face Discord β€” Model discussions, library support, community events.
  • FlowGPT β€” 33K+ members. 100K+ prompts across ChatGPT, DALL-E, Stable Diffusion, Claude.

Reddit

  • r/PromptEngineering β€” Dedicated subreddit for prompt crafting techniques and discussions.
  • r/ChatGPT β€” 10M+ members. Primary hub for ChatGPT users and prompt sharing.
  • r/LocalLLaMA β€” Highly technical community for running open-source LLMs locally.
  • r/ClaudeAI β€” Anthropic's Claude community: prompt sharing, API tips, model comparisons.
  • r/MachineLearning β€” Academic-oriented ML research discussions.
  • r/OpenAI β€” OpenAI product and API discussions.
  • r/StableDiffusion β€” 450K+ members for AI art prompting and workflows.
  • r/ChatGPTPromptGenius β€” 35K+ members sharing and refining prompts.

Forums and Platforms

GitHub Organizations

  • LangChain β€” Open-source LLM app framework. 100K+ stars.
  • Promptslab β€” Generative Models | Prompt-Engineering | LLMs
  • Hugging Face β€” Central hub: Transformers, Diffusers, Datasets, TRL.
  • DSPy (Stanford NLP) β€” Growing community for systematic prompt optimization.
  • OpenAI β€” Open-source models, benchmarks, and tools.

πŸ”¬ Autonomous Research & Self-Improving Agents

Auto-synced from awesome-autoresearch Β· Last synced: 2026-05-10

General-Purpose Descendants

  • kayba-ai/recursive-improve β€” Recursive self-improvement framework where agents capture execution traces, analyze failure patterns, and apply targeted fixes with keep-or-revert evaluation.
  • vukrosic/auto-research β€” Docs-only control plane for an open autonomous AI research lab β€” file-based operating model for human direction and agent execution.
  • uditgoenka/autoresearch β€” Claude Code skill that generalizes autoresearch into a reusable loop for software, docs, security, shipping, debugging, and other measurable goals.
  • leo-lilinxiao/codex-autoresearch β€” Codex-native autoresearch skill with resume support, lessons across runs, optional parallel experiments, and mode-specific workflows.
  • supratikpm/gemini-autoresearch β€” Gemini CLI skill that generalises autoresearch to any measurable goal. Gemini-native: uses Google Search grounding as a live verification source inside the loop, true headless overnight mode via --yolo --prompt, and 1M token context. Also works in Antigravity IDE via .agents/skills/.
  • davebcn87/pi-autoresearch β€” pi extension plus dashboard for persistent experiment loops, live metrics, confidence tracking, and resumable autoresearch sessions.
  • drivelineresearch/autoresearch-claude-code β€” Claude Code plugin/skill port of pi-autoresearch, with a clean experiment-loop workflow and a concrete biomechanics case study.
  • greyhaven-ai/autocontext β€” Closed-loop control plane for repeated agent improvement, with evaluation, persistent knowledge, staged validation, and optional distillation into cheaper local runtimes.
  • jmilinovich/goal-md β€” Generalizes autoresearch into a GOAL.md pattern for repos where the agent must first construct a measurable fitness function before it can optimize.
  • james-s-tayler/lazy-developer β€” Claude Code skill that orchestrates autoresearch across a prioritized sequence of optimization goals (coverage, test speed, build speed, complexity, LOC, performance) using GOAL.md as the engine. Supports standalone and Ralph Mode multi-instance execution.
  • mutable-state-inc/autoresearch-at-home β€” Collaborative fork of upstream autoresearch that adds experiment claiming, shared best-config syncing, hypothesis exchange, and swarm-style coordination across many single-GPU agents.
  • zkarimi22/autoresearch-anything β€” Generalizes autoresearch to any measurable metric β€” system prompts, API performance, landing pages, test suites, config tuning, SQL queries. "If you can measure it, you can optimize it."
  • Entrpi/autoresearch-everywhere β€” Cross-platform expansion that auto-detects hardware config and starts the loop. The "glue and generalization" half of autoresearch.
  • ShengranHu/ADAS β€” Automated Design of Agentic Systems β€” ICLR 2025. Meta-agents that invent novel agent architectures by programming them in code.
  • MaximeRobeyns/self_improving_coding_agent β€” SICA: Self-Improving Coding Agent that edits its own codebase. ICLR 2025 Workshop paper demonstrating scaffold-level self-improvement on coding benchmarks.
  • peterskoett/self-improving-agent β€” Alternative self-improving agent architecture with reflection and meta-learning cycles.
  • metauto-ai/HGM β€” Huxley-GΓΆdel Machine for coding agents β€” applies self-improvement to SWE-bench performance via meta-level optimization.
  • gepa-ai/gepa β€” GEPA (Genetic-Pareto) β€” ICLR 2026 Oral. Reflective prompt evolution that outperforms RL (GRPO) on benchmarks. Optimizes any textual parameters against any metric using natural language reflection.
  • MrTsepa/autoevolve β€” GEPA-inspired autoresearch for self-play: mutate code strategies, evaluate head-to-head, rate with Elo/Bradley-Terry, branch from the Pareto front. Agent reads match traces to target mutations. Works as a Claude Code skill.
  • HKUDS/ClawTeam β€” Agent swarm intelligence for autoresearch β€” spawns parallel GPU research directions, distributes work across agents, aggregates results.
  • Orchestra-Research/AI-Research-SKILLs β€” Comprehensive skill library including autoresearch orchestration with two-loop architecture (inner optimization + outer synthesis).
  • WecoAI/aideml β€” AIDE: Tree-search ML engineering agent that autonomously improves model performance via iterative code generation and evaluation.
  • weco.ai β€” Weco: Cloud platform for AIDE with observability, experiment tracking, and managed runs β€” brings the autoresearch loop into production.

Research-Agent Systems

  • aiming-lab/AutoResearchClaw β€” End-to-end research pipeline that turns a topic into literature review, experiments, analysis, peer review, and paper drafts; broader than autoresearch, but clearly in the same lineage.
  • OpenRaiser/NanoResearch β€” End-to-end autonomous research engine that plans experiments, generates code, runs jobs locally or on SLURM, analyzes real results, and writes papers grounded in those outputs.
  • wanshuiyin/Auto-claude-code-research-in-sleep β€” Markdown-first research workflows for Claude Code and other agents, centered on autonomous literature review, experiments, paper iteration, and cross-model critique.
  • skyllwt/OmegaWiki β€” Wiki-centric full-lifecycle research platform built on Claude Code, realizing Karpathy's LLM-Wiki vision. 20+ skills cover the full loop: ingest β†’ ideate β†’ novelty check β†’ experiment design / run / eval β†’ paper writing. Research state lives in a structured knowledge wiki with an interactive graph.
  • Sibyl-Research-Team/AutoResearch-SibylSystem β€” Fully autonomous AI scientist built on Claude Code, with explicit AutoResearch lineage, multi-agent research iteration, GPU experiment execution, and a self-evolving outer loop.
  • eimenhmdt/autoresearcher β€” Early open-source package for automating scientific workflows, currently centered on literature-review generation with an ambition toward broader autonomous research.
  • hyperspaceai/agi β€” Distributed, peer-to-peer research network where autonomous agents run experiments, gossip findings, maintain CRDT leaderboards, and archive results to GitHub across multiple research domains.
  • Human-Agent-Society/CORAL β€” CORAL: Autonomous multi-agent evolution for open-ended discovery (arXiv:2604.01658). Long-running agents with shared persistent memory, asynchronous execution, and heartbeat-based interventions; SOTA on 10 math/algorithmic/systems tasks.
  • SakanaAI/AI-Scientist β€” The AI Scientist: First comprehensive system for fully automatic scientific discovery. From idea generation to paper writing with minimal human supervision.
  • SakanaAI/AI-Scientist-v2 β€” Workshop-level automated scientific discovery via agentic tree search. Removes template dependency from v1, generalizes across research domains.
  • AweAI-Team/AiScientist β€” AiScientist: long-horizon ML research lab with hierarchical orchestration and File-as-Bus coordination β€” workspace files act as the durable system of record. Drives autonomous paper-reproduction (PaperBench) and competition-style MLE-Bench iteration loops under fixed compute/time budgets. (arXiv 2604.13018)
  • HKUDS/AI-Researcher β€” NeurIPS 2025 paper. Full end-to-end research automation: hypothesis β†’ experiments β†’ manuscript β†’ peer review. Production version at novix.science.
  • openags/Auto-Research β€” OpenAGS: Orchestrates a team of AI agents across the full research lifecycle β€” lit review, hypothesis generation, experiments, manuscript writing, and peer review.
  • SamuelSchmidgall/AgentLaboratory β€” End-to-end autonomous research workflow: idea β†’ literature review β†’ experiments β†’ report. Supports both autonomous and co-pilot modes.
  • AgentRxiv β€” Collaborative autonomous research framework where agent laboratories share a preprint server to build on each other's work iteratively.
  • JinheonBaek/ResearchAgent β€” Iterative research idea generation over scientific literature with LLMs. Multi-agent review and feedback loops.
  • du-nlp-lab/MLR-Copilot β€” Autonomous ML research framework β€” generates ideas, implements experiments, analyzes results.
  • MASWorks/ML-Agent β€” Reinforcing LLM agents for autonomous ML engineering. Learns from trial and error to improve model performance.
  • PouriaRouzrokh/LatteReview β€” Low-code Python package for automated systematic literature reviews via AI-powered agents.
  • LitLLM/LitLLM β€” AI-powered literature review assistant using RAG for accurate, well-structured related-work sections in academic writing.
  • Agent Laboratory β€” Three-phase research pipeline: Literature Review β†’ Experimentation β†’ Report Writing, with specialized agents for each phase.
  • WecoAI/aideml β€” AIDE: AI-Driven Exploration β€” tree-search-based ML engineering agent that automates experiment design, code generation, and evaluation. Treats ML engineering as code optimization against any metric.

Platform Ports & Hardware Forks

  • gianfrancopiana/openclaw-autoresearch β€” OpenClaw port of pi-autoresearch; autonomous experiment loop for any optimization target with statistical confidence scoring.
  • miolini/autoresearch-macos β€” Widely adopted macOS fork that adapts upstream autoresearch for Apple Silicon / MPS while preserving the original loop shape.
  • trevin-creator/autoresearch-mlx β€” MLX-native Apple Silicon port that keeps the upstream fixed-budget val_bpb loop while removing the PyTorch/CUDA dependency entirely.
  • jsegov/autoresearch-win-rtx β€” Windows-native RTX fork focused on consumer NVIDIA GPUs, with explicit VRAM floors and a practical desktop setup path.
  • iii-hq/n-autoresearch β€” Multi-GPU autoresearch infrastructure with structured experiment tracking, adaptive search strategy, crash recovery, and queryable orchestration around the classic train.py loop.
  • lucasgelfond/autoresearch-webgpu β€” Browser/WebGPU port that lets agents generate training code, run experiments in-browser, and feed results back into the loop without a Python setup.
  • tonitangpotato/autoresearch-engram β€” Fork with persistent cognitive memory β€” frequency-weighted retrieval of cross-session knowledge for improved experiment continuity.
  • Colab/Kaggle T4 port - Adapts autoresearch for free T4 GPUs (Google Colab / Kaggle) with zero cost and zero local setup. Key changes: Flash Attention 3 β†’ PyTorch SDPA, removes H100-only kernel dependency. (upstream issue #208)
  • ArmanJR-Lab/autoautoresearch β€” Jetson AGX Orin port with a director β€” a Go binary that acts as a "creative director" injecting novelty (arxiv papers + DeepSeek Reasoner) into the loop to escape local minima. Includes multi-experiment comparison (baseline vs director-guided) with detailed stall analysis.

Domain-Specific Adaptations

  • mattprusak/autoresearch-genealogy β€” Applies the autoresearch pattern to genealogy, using structured prompts, archive guides, source checks, and vault workflows to iteratively expand and verify family-history research.
  • ArchishmanSengupta/autovoiceevals β€” Uses adversarial callers plus keep-or-revert prompt edits to harden voice AI agents across Vapi, Smallest AI, and ElevenLabs.
  • chrisworsey55/atlas-gic β€” Applies the autoresearch keep-or-revert loop to trading agents, optimizing prompts and portfolio orchestration against rolling Sharpe ratio instead of model loss.
  • RightNow-AI/autokernel β€” Applies the autoresearch loop to GPU kernel optimization: profile bottlenecks, edit one kernel, benchmark, keep or revert, repeat.
  • Agent-Analytics/autoresearch-growth β€” Applies autoresearch to landing-page positioning and A/B test candidates, using analytics snapshots and measured experiment results to seed subsequent rounds.
  • Rkcr7/autoresearch-sudoku β€” Enhanced autoresearch workflow where an AI agent iteratively rewrites and benchmarks a Rust sudoku solver, ultimately beating leading human-built solvers on hard benchmark sets.
  • jeongph/autospec β€” Reads natural-language business rules and autonomously builds a Spring Boot service with tests via the keep-or-revert loop. Evaluates with Gradle build + JUnit XML. 119-line skeleton to 950 lines in 5 cycles.

Evaluation & Benchmarks

  • snap-stanford/MLAgentBench β€” Benchmark suite for evaluating AI agents on ML experimentation tasks. 13 tasks from CIFAR-10 to BabyLM.
  • openai/mle-bench β€” OpenAI's benchmark for measuring how well AI agents perform at ML engineering.
  • chchenhui/mlrbench β€” MLR-Bench: Evaluating AI agents on open-ended ML research. 201 tasks from NeurIPS/ICLR/ICML workshops.
  • gersteinlab/ML-Bench β€” Evaluates LLMs and agents for ML tasks on repository-level code.
  • THUDM/AgentBench β€” Comprehensive benchmark for LLM-as-Agent evaluation across 8 distinct environments. ICLR 2024.

How to Contribute

We welcome contributions to this list! Before contributing, please take a moment to review our contribution guidelines. These guidelines will help ensure that your contributions align with our objectives and meet our standards for quality and relevance.

What we're looking for:

  • New high-quality papers, tools, or resources with a brief description of why they matter
  • Updates to existing entries (broken links, outdated information)
  • Corrections to star counts, pricing, or model details
  • Translations and accessibility improvements

Quality standards:

  • All tools should be actively maintained (updated within the last 6 months)
  • Papers should be from peer-reviewed venues or have significant community adoption
  • Datasets should be publicly accessible
  • Please include a one-line description explaining why the resource is valuable

Thank you for your interest in contributing to this project!


Maintained by PromptsLab Β· Star this repo if you find it useful!