LLM Providers

March 15, 2026 · View on GitHub

🏥 Quick Return to Emergency Room

You are in a specialist desk.
For full triage and doctors on duty, return here:

Think of this page as a sub-room.
If you want full consultation and prescriptions, go back to the Emergency Room lobby.

This page helps you choose between LLM vendors and fix provider-looking bugs that are actually schema, retrieval, orchestration, or eval drift. If you are new, start with the Orientation table and the FAQ. If you are debugging, jump to the Fix Hub.


Orientation: who is who

ProviderWhat it isTypical use caseLink
OpenAIGPT-4/4o from OpenAI Inc.Direct API, fastest model accessopenai.md
Azure OpenAIMicrosoft enterprise wrapper for OpenAI modelsVNet, compliance, enterprise billingazure_openai.md
AnthropicThe company behind ClaudeSafety-focused platformanthropic.md
Claude (Anthropic)The model family from AnthropicLong context, tool use, JSON controlanthropic_claude.md
Google GeminiGoogle DeepMind multimodal modelsMultimodal chat, reasoninggemini.md
Google Vertex AIGoogle Cloud AI platform that hosts Gemini and morePipelines, deployment, governancegoogle_vertex_ai.md
MistralEU startup with efficient open-weight models (e.g., Mixtral MoE)Cost/perf, open ecosystemmistral.md
Meta LLaMAMeta open-weight model familyLocal or private deployment, llama.cppmeta_llama.md
CohereEnterprise NLP API and embeddingsRAG stacks, enterprise NLPcohere.md
DeepSeekCN player with infra-optimized long-context modelsCost-efficient, long windowsdeepseek.md
Kimi (Moonshot)CN chat-first models, very large parameter claimsConsumer chat focuskimi.md
GroqHardware vendor: LPUs for transformer inferenceUltra-low latency serving (not a model)groq.md
xAI GrokxAI model familyX/Twitter integration, general chatgrok_xai.md
AWS BedrockAWS gateway to many models via one APIEnterprises already on AWSaws_bedrock.md
OpenRouterCommunity model aggregator, OpenAI-style endpointTry many models via one API keyopenrouter.md
Together AIAggregator + infra for open weights and fine-tunesFast hosting, tuning servicestogether.md
MiniMaxCN AI lab with long-context models (204K), OpenAI-compatible APICost-efficient chat, RAG, agent workflowsminimax.md

FAQ for newcomers

OpenAI vs Azure OpenAI — are they the same?
Same models, different packaging. OpenAI = direct API and fastest releases. Azure OpenAI = Microsoft billing, VNet, compliance, data residency.

Anthropic vs Claude — why two pages?
Anthropic is the company. Claude is the model family. We separate because “platform issues” and “model quirks” often need different fixes.

Gemini vs Vertex AI — what is the relation?
Gemini is a model. Vertex AI is Google Cloud’s platform that runs Gemini and provides pipelines, eval, and deployment features.

What makes Mistral special?
Efficient open-weights and MoE designs. Good cost/perf. Easy to host in your own infra.

Meta LLaMA vs local LLaMA
Meta releases the weights. Community tools like llama.cpp let you run them locally on CPU or GPU.

Groq LPU vs GPU
GPU is general purpose. LPU is a chip specialized for transformer inference. You get very low latency for chat workloads.

Bedrock vs OpenRouter vs Together
Bedrock is an AWS enterprise gateway. OpenRouter is a community aggregator with OpenAI-style API. Together is an infra host for open weights with training and fine-tune options.


Open these first


Core acceptance targets

  • ΔS(question, retrieved) ≤ 0.45
  • Coverage ≥ 0.70 for the target section
  • λ remains convergent across three paraphrases and two seeds
  • E_resonance stays flat on long windows

Fix Hub — typical provider symptoms → exact fix

SymptomLikely causeOpen this
JSON mode breaks, invalid objectsSchema too loose or nested tool callsData Contracts, Logic Collapse
Tool calls loop or stallAgent role drift, missing timeoutsMulti-Agent Problems, Role-drift deep dive
High similarity yet wrong snippetMetric mismatch or fragmented storeEmbedding ≠ Semantic, Vectorstore Fragmentation
Answers flip between runsPrompt headers reorder and λ flipsContext Drift, Retrieval Traceability
Hybrid retrievers worse than singleQuery parsing split, mis-weighted rerankQuery Parsing Split, Rerankers
Jailbreaks or bluffingOverconfidence and missing fencesBluffing Controls, Retrieval Traceability

Fix in 60 seconds

  1. Measure ΔS
    Compute ΔS(question, retrieved) and ΔS(retrieved, expected anchor). Stable < 0.40, transitional 0.40–0.60, risk ≥ 0.60.

  2. Probe λ_observe
    Vary top-k and prompt headers. If λ flips, lock the schema and apply a BBAM variance clamp.

  3. Apply the module
    Retrieval drift → BBMC + Data Contracts
    Reasoning collapse → BBCR bridge + BBAM
    Dead ends in long runs → BBPF alternate paths

  4. Verify
    Coverage ≥ 0.70 on three paraphrases. λ convergent on two seeds.


Quick-Start Downloads

ToolLink3-step setup
WFGY 1.0 PDFEngine Paper1) Download 2) Upload to your LLM 3) Ask “Answer using WFGY +
TXT OS (plain text OS)TXTOS.txt1) Download 2) Paste into any LLM chat 3) Type “hello world” to boot

Explore More

LayerPageWhat it’s for
⭐ ProofWFGY Recognition MapExternal citations, integrations, and ecosystem proof
⚙️ EngineWFGY 1.0Original PDF tension engine and early logic sketch (legacy reference)
⚙️ EngineWFGY 2.0Production tension kernel for RAG and agent systems
⚙️ EngineWFGY 3.0TXT based Singularity tension engine (131 S class set)
🗺️ MapProblem Map 1.0Flagship 16 problem RAG failure taxonomy and fix map
🗺️ MapProblem Map 2.0Global Debug Card for RAG and agent pipeline diagnosis
🗺️ MapProblem Map 3.0Global AI troubleshooting atlas and failure pattern map
🧰 AppTXT OS.txt semantic OS with fast bootstrap
🧰 AppBlah Blah BlahAbstract and paradox Q&A built on TXT OS
🧰 AppBlur Blur BlurText to image generation with semantic control
🏡 OnboardingStarter VillageGuided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.
GitHub Repo stars