AI Provider Guides
May 26, 2026 ยท View on GitHub
Complete setup guides for all supported AI providers.
๐ Free Tier Providers
Start with zero cost using these free-tier options:
Hugging Face
100,000+ open-source models
- โ Free inference API
- ๐ Largest model collection
- ๐ Fully open source
- ๐ Models by task: chat, classification, NER, summarization
Google AI Studio
Gemini models with generous free tier
- โ 1,500 requests/day free
- โก Fast Gemini 2.0 Flash
- ๐ฏ 15 requests/minute
- ๐ฐ Pay-as-you-go option
๐ค Direct AI Providers
Access leading AI models directly from their creators:
Anthropic
Claude models with API key or OAuth authentication
- ๐ง Claude 4.5 Opus/Sonnet/Haiku, Claude 4.0 Opus/Sonnet
- ๐ API key or OAuth (Pro/Max subscription)
- ๐ญ Extended thinking for deep reasoning
- ๐ 200K context window, multimodal support
๐ข Enterprise Providers
Production-grade providers for enterprise deployments:
Azure OpenAI
Enterprise AI with Microsoft Azure
- ๐ SOC2, HIPAA, ISO 27001 compliant
- ๐ Multi-region deployment (30+ regions)
- ๐ก๏ธ Private endpoints with VNet
- ๐ผ Enterprise SLAs
Google Vertex AI
Google Cloud ML platform
- โ๏ธ GCP integration
- ๐ IAM, VPC, service accounts
- ๐ Global deployment
- ๐ฏ Gemini, PaLM, Codey models
AWS Bedrock
Serverless AI on AWS
- ๐ฆ 13 foundation models (Claude, Llama, Mistral)
- ๐ IAM, VPC integration
- ๐ Multi-region (us-east-1, eu-west-1, ap-southeast-1)
- ๐ฐ Pay-per-use pricing
๐ Compliance-Focused
Providers with specific compliance certifications:
Mistral AI
European AI with GDPR compliance
- ๐ช๐บ EU data residency
- โ GDPR compliant by default
- ๐ Open source models
- ๐ฐ Cost-effective
๐งโ๐ป Hosted Inference Providers
Access frontier models via hosted cloud inference APIs:
DeepSeek
deepseek-chat (V3) and deepseek-reasoner (R1)
- ๐ง deepseek-chat โ high-quality general chat at low cost
- ๐ญ deepseek-reasoner โ R1 chain-of-thought reasoning model
- ๐ API key from platform.deepseek.com
- ๐ Aliases:
ds
NVIDIA NIM
400+ models via NVIDIA's hosted and self-hosted inference platform
- ๐ Llama 3.3 70B Instruct (default), Mistral, Nemotron, and 400+ catalog models
- ๐ง NIM-specific extras: top_k, min_p, repetition_penalty, reasoning_budget
- ๐ API key from build.nvidia.com
- ๐ฅ๏ธ Also supports self-hosted NIM endpoints via
NVIDIA_NIM_BASE_URL - ๐ Aliases:
nim,nvidia
xAI Grok
Grok 3 / 3 Mini / 2 / 2 Vision via api.x.ai
- ๐ง Grok 3 โ flagship reasoning + coding
- โก Grok 3 Mini โ faster + cheaper
- ๐๏ธ Grok 2 Vision โ multimodal text + images
- ๐ API key from console.x.ai
- ๐ Aliases:
grok
Groq
Sub-100ms inference via LPU acceleration
- โก <100ms TTFT โ fastest hosted inference available
- ๐ฆ Llama 3.3 70B Versatile (default), Llama 3.1 8B Instant, Mixtral, Gemma 2
- ๐๏ธ Llama 3.2 vision-preview variants for multimodal
- ๐ API key from console.groq.com/keys
Together AI
Hosted open-model gateway
- ๐ Llama 3.3 / 3.1 (8Bโ405B), Mixtral, Qwen 2.5, DeepSeek R1/V3, WizardLM
- โก Turbo variants for low latency
- ๐ API key from api.together.xyz/settings/api-keys
- ๐ Aliases:
together
Fireworks AI
Fast open-model serving
- ๐ฅ Llama v3.1 70B/405B, Mixtral 8x22B, Qwen 2.5 Coder, DeepSeek V3
- ๐๏ธ Phi-3-Vision and Llama 3.2 vision variants
- ๐ API key from fireworks.ai/account/api-keys
Perplexity
Sonar models with built-in web grounding
- ๐ sonar / sonar-pro / sonar-reasoning / sonar-deep-research
- ๐ Built-in web search + citations
- ๐ API key from perplexity.ai/settings/api
- ๐ Aliases:
pplx
Cloudflare Workers AI
Edge-served open models
- ๐ Lowest cost tier โ bills per "neuron" not token
- ๐ฆ Llama 3.3 70B FP8, Llama 3.1, Mistral, Qwen, Gemma
- ๐ Token from dash.cloudflare.com/profile/api-tokens (Workers AI Read+Write)
- โ ๏ธ Requires both
CLOUDFLARE_API_KEYANDCLOUDFLARE_ACCOUNT_ID - ๐ Aliases:
workers-ai,cf-ai
Cohere
Command R / R+ chat + Embed v3 / Rerank v3 (RAG-essential)
- ๐ฌ Command R+ flagship + Command R + Command R7B
- ๐ Embed v3 (English / multilingual) + Rerank v3 โ top-tier RAG
- ๐ API key from dashboard.cohere.com/api-keys
Replicate
Multi-modal gateway โ LLM + image + video + avatar + music in one auth
- ๐ฏ One
REPLICATE_API_TOKENfor 5 modalities - ๐ Llama 3.1 70B/405B, Mistral, Mixtral
- ๐จ FLUX 1.1 Pro, SDXL, Stable Diffusion 3.5
- ๐ฌ Wan-Alpha video, MuseTalk avatar, MusicGen music
- ๐ Token from replicate.com/account/api-tokens
๐ Embedding-Only Providers
Specialised embedding providers for RAG / retrieval pipelines (no chat):
Voyage AI
Top-tier RAG embeddings
- ๐ voyage-3-large flagship; voyage-3.5 default; voyage-code-3 for code
- ๐ voyage-multilingual-2 + domain-tuned (finance, law)
- ๐ API key from dash.voyageai.com/api-keys
Jina AI
Embeddings + reranking
- ๐ jina-embeddings-v3 multilingual flagship
- ๐ jina-reranker-v2 for retrieval reranking
- ๐ jina-colbert-v2 late-interaction retrieval
- ๐ API key from jina.ai
๐จ Direct Image Generation
Specialised image-gen providers (in addition to Vertex Imagen / OpenAI DALL-E / Anthropic / Bedrock):
Stability AI
Stable Image Ultra/Core + SD 3.5 family
- ๐จ Stable Image Ultra (flagship), Core (fast), SD 3.5 Large/Large-Turbo/Medium
- ๐ผ๏ธ PNG output, aspect-ratio + negative-prompt + seed support
- ๐ API key from platform.stability.ai/account/keys
- ๐ Aliases:
stability-ai,sd
Ideogram
Strong typography + design-focused image generation
- ๐ V3 default; V2/V2-Turbo/V1 also supported
- ๐จ magic_prompt + style + aspect_ratio controls
- ๐ API key from developer.ideogram.ai
Recraft
Vector / illustration-focused image generation
- ๐จ recraftv3 (raster), recraftv3-svg (vector), recraftv2
- ๐ OpenAI-compat shape + style + size controls
- ๐ API token from recraft.ai/api
๐ป Local Providers
Run models entirely on your own hardware โ no API key or internet required for inference:
LM Studio
Run any supported model locally with a GUI app
- ๐ฅ๏ธ Download and run models via the LM Studio desktop application
- ๐ Auto-discovers the loaded model from
/v1/models(no model name required) - ๐ OpenAI-compatible API at
http://localhost:1234/v1by default - ๐ No API key needed for local use (key optional for reverse-proxy setups)
- ๐ Aliases:
lmstudio,lms
llama.cpp
High-performance local inference via llama-server
- โก Run GGUF models with llama-server at
http://localhost:8080/v1by default - ๐ Auto-discovers the loaded model from
/v1/models - ๐ ๏ธ Tool support requires
--jinjaflag when starting llama-server - ๐ No API key needed for local use (key optional for reverse-proxy setups)
- ๐ Aliases:
llama.cpp
๐ Aggregators & Proxies
Access multiple providers through unified interfaces:
OpenRouter
300+ models from 60+ providers
- ๐ Single API for all major providers (Anthropic, OpenAI, Google, Meta, etc.)
- โก Automatic failover and routing
- ๐ฐ Competitive pricing with cost optimization
- ๐ฏ Zero lock-in - switch models instantly
- ๐ Usage tracking dashboard
- ๐ Free models available
OpenAI Compatible
OpenRouter, vLLM, LocalAI, and more
- ๐ 100+ models through OpenRouter
- ๐ป Local deployment with vLLM
- ๐ Self-hosted with LocalAI
- ๐ Drop-in OpenAI replacement
LiteLLM
100+ providers through proxy
- ๐ Unified API for 100+ providers
- ๐ Load balancing and fallbacks
- ๐ฐ Cost tracking
- ๐ฏ Model routing
๐๏ธ Voice Providers {#voice-providers}
Synthesize speech, transcribe audio, or run live voice sessions. Voice providers are separate from LLM providers โ they handle audio I/O rather than text generation.
Text-to-Speech (TTS)
OpenAI TTS
Highest-quality text-to-speech
- ๐๏ธ Voices: alloy, echo, fable, onyx, nova, shimmer
- ๐ต Models: tts-1 (fast) and tts-1-hd (high quality)
- ๐ผ Formats: MP3, WAV, OGG, Opus
- ๐ Auth: API Key (
OPENAI_API_KEY)
ElevenLabs
Best multilingual and voice-cloning TTS
- ๐ Supports 30+ languages with natural prosody
- ๐ญ Custom voice cloning from short audio samples
- ๐ผ Formats: MP3, WAV (raw PCM, surfaced as
pcm16), Opus (Ogg container) - ๐ Auth: API Key (
ELEVENLABS_API_KEY)
Google TTS
1M characters/month free tier
- ๐ฐ Generous free tier for standard voices
- ๐ 380+ voices across 50+ languages
- ๐ผ Formats: MP3, WAV, OGG
- ๐ Auth: Service Account
Azure TTS
Enterprise TTS with full SSML support
- ๐ข Fine-grained prosody control via SSML
- ๐ 400+ neural voices, 140+ languages
- ๐ผ Formats: MP3, WAV (PCM), Opus (Ogg container)
- ๐ Auth: API Key + Region
Fish Audio
Low-cost TTS with 15s voice cloning
- ๐ฐ ~80% cheaper than ElevenLabs
- ๐ญ 15-second reference audio โ custom voice
- ๐ 14 languages
- ๐ผ Formats: MP3, WAV, PCM16 (raw)
- ๐ Auth: API Key (
FISH_AUDIO_API_KEY)
Cartesia
Low-latency Sonic models โ synchronous + streaming
- โก Sub-second turnaround on the synchronous
/tts/bytesendpoint - ๐ Separate WebSocket streaming flow via
CartesiaStream(voice server) - ๐ญ Voice cloning via dashboard upload
- ๐ผ Formats: MP3 (44.1 kHz), WAV (PCM s16le @ 44.1 kHz), PCM16 (raw @ 24 kHz)
- ๐ Auth: API Key (
CARTESIA_API_KEY)
Speech-to-Text (STT)
Whisper (OpenAI)
Highest transcription accuracy
- ๐ฏ Best-in-class accuracy on diverse audio
- ๐ Multilingual with automatic language detection
- ๐ผ Formats: WAV, MP3, M4A, FLAC, OGG, OPUS, WEBM, MP4, MPEG, MPGA
- ๐ Auth: API Key (
OPENAI_API_KEY)
Deepgram
Real-time streaming transcription via WebSocket
- โก Sub-300 ms word-level results over WebSocket
- ๐ REST batch and WebSocket streaming modes
- ๐ผ Formats: WAV, MP3, OGG, FLAC
- ๐ Auth: API Key (
DEEPGRAM_API_KEY)
Google STT
125+ languages with speaker diarization
- ๐ Best fit for existing Google Cloud users
- ๐ฅ Speaker diarization and multi-channel audio
- ๐ผ Formats: WAV, FLAC, MP3, OGG
- ๐ Auth: API Key (
GOOGLE_AI_API_KEY/GEMINI_API_KEY) or Service Account (GOOGLE_APPLICATION_CREDENTIALS)
Azure STT
Enterprise STT with custom model training
- ๐ข Batch transcription and custom model support
- ๐ Compliance controls for regulated industries
- ๐ผ Formats: WAV (PCM), Ogg/Opus โ convert MP3 to WAV first
- ๐ Auth: API Key + Region
Realtime Voice
Realtime providers maintain a persistent bidirectional WebSocket connection, enabling low-latency spoken conversation with the AI model.
OpenAI Realtime
Low-latency bidirectional voice over WebSocket
- โก Full-duplex audio stream with GPT-4o
- ๐ต Voice activity detection (VAD) built-in
- ๐ผ Formats: WAV, Opus
- ๐ Auth: API Key (
OPENAI_API_KEY)
Gemini Live
Google's native realtime voice API
- โก Native multimodal realtime session with Gemini
- ๐ต Supports audio + video input simultaneously
- ๐ผ Formats: WAV, Opus
- ๐ Auth: API Key (
GOOGLE_AI_API_KEYorGEMINI_API_KEY)
๐ฌ Video Generation
Image-to-video and text-to-video providers (use via output: { mode: "video" }):
- Vertex Veo 3.1 (default) โ
--videoProvider vertex - Kling (PiAPI) โ
--videoProvider kling(details) - Runway (Gen-3 Alpha / Gen-4 Turbo) โ
--videoProvider runway - Replicate โ Wan-Alpha + many others โ
--videoProvider replicate(guide)
See Video Generation feature page for the full SDK / CLI surface.
๐ค Avatar / Lip-Sync Generation
Talking-head video synthesis from a portrait image + audio (use via output: { mode: "avatar" }):
- D-ID โ
--avatarProvider d-id(text-driven via Microsoft voices, or audio-driven) - HeyGen โ
--avatarProvider heygen(HeyGen avatar catalog id required) - Replicate (MuseTalk) โ
--avatarProvider replicateormusetalk(guide)
See docs/provider-integration/21-adding-new-modality.md for the architectural pattern.
๐ต Music / Sound Generation
Music + sound-effect generation (use via output: { mode: "music" }):
- Beatoven.ai โ
--musicProvider beatoven(royalty-free background music) - ElevenLabs Music โ
--musicProvider elevenlabs-music(short SFX / loops up to 22s; sameELEVENLABS_API_KEYas TTS) - Lyria 3 Pro (Google) โ
--musicProvider lyria - Replicate (MusicGen) โ
--musicProvider replicateormusicgen(guide)
Quick Comparison
| Provider | Free Tier | Enterprise | GDPR | Latency | Best For |
|---|---|---|---|---|---|
| Anthropic | Limited | โ | โ | Low | Reasoning, coding, Claude |
| Hugging Face | โ | โ | โ | Medium | Open source, experimentation |
| Google AI | โ | โ | โ | Low | Free tier, Gemini |
| Mistral AI | โ | โ | โ | Low | EU compliance, cost |
| OpenRouter | โ | โ | Varies | Low | Multi-model, automatic failover |
| OpenAI Compatible | Varies | โ | Varies | Varies | Flexibility, local deployment |
| LiteLLM | โ | โ | Varies | Low | Multi-provider, unified API |
| Azure OpenAI | โ | โ | โ | Low | Enterprise, Microsoft ecosystem |
| Vertex AI | โ | โ | โ | Low | Enterprise, GCP ecosystem |
| AWS Bedrock | โ | โ | โ | Low | Enterprise, AWS ecosystem |
| DeepSeek | โ | โ | โ | Low | Cost-effective reasoning, R1 model |
| NVIDIA NIM | โ | โ | Varies | Low | NVIDIA-hosted or self-hosted LLMs |
| LM Studio | โ (Local) | โ | โ | Varies | Local GUI model management |
| llama.cpp | โ (Local) | โ | โ | Varies | High-performance local GGUF inference |
| OpenAI TTS | โ | โ | โ | Low | High-quality TTS (tts-1-hd) |
| ElevenLabs | โ | โ | Varies | Low | Multilingual TTS, voice cloning |
| Google TTS | โ | โ | โ | Low | Cost-effective TTS, 1M chars free |
| Azure TTS | โ | โ | โ | Low | Enterprise TTS, SSML support |
| Fish Audio | โ | โ | Varies | Low | Low-cost TTS, voice cloning, 14 langs |
| Cartesia | โ | โ | Varies | Low | Low-latency Sonic models |
| Whisper | โ | โ | โ | Low | Best STT accuracy |
| Deepgram | โ | โ | Varies | Low | Real-time STT streaming (WebSocket) |
| Google STT | โ | โ | โ | Low | STT for GCP users, 125+ languages |
| Azure STT | โ | โ | โ | Low | Enterprise STT, custom models |
| OpenAI Realtime | โ | โ | โ | Low | Realtime bidirectional voice |
| Gemini Live | โ | โ | โ | Low | Realtime voice + video (Gemini) |
Setup Strategies
Strategy 1: Free Tier First (Recommended for Development)
=== "SDK Usage"
```typescript
const ai = new NeuroLink({
providers: [
{
name: 'google-ai',
priority: 1,
config: { apiKey: process.env.GOOGLE_AI_KEY },
quotas: { daily: 1500 }
},
{
name: 'openai',
priority: 2,
config: { apiKey: process.env.OPENAI_API_KEY }
}
],
failoverConfig: { enabled: true, fallbackOnQuota: true }
});
const result = await ai.generate({
input: { text: "Hello world" }
});
```
=== "CLI Usage"
```bash
# Set up environment variables
export GOOGLE_AI_KEY="your-key"
export OPENAI_API_KEY="your-key"
# Use with automatic failover
npx @juspay/neurolink generate "Hello world" \
--provider google-ai
```
Strategy 2: Multi-Region Enterprise
const ai = new NeuroLink({
providers: [
{
name: "azure-us",
region: "us-east",
config: {
/* Azure US */
},
},
{
name: "azure-eu",
region: "eu-west",
config: {
/* Azure EU */
},
},
{
name: "bedrock-us",
region: "us-east",
config: {
/* Bedrock US */
},
},
],
loadBalancing: "latency-based",
});
Strategy 3: GDPR Compliance
const ai = new NeuroLink({
providers: [
{
name: "mistral",
priority: 1,
config: { apiKey: process.env.MISTRAL_API_KEY },
},
{
name: "azure-eu",
priority: 2,
config: {
/* Azure EU region */
},
},
],
compliance: {
framework: "GDPR",
dataResidency: "EU",
},
});
Next Steps
- Choose a provider based on your requirements (free tier, compliance, region)
- Follow the setup guide to get your API key
- Configure NeuroLink with the provider
- Test the integration with a simple request
- Add failover for production reliability
Related Documentation
- Multi-Provider Failover - High availability patterns
- Cost Optimization - Reduce costs by 80-95%
- Compliance & Security - GDPR, SOC2, HIPAA
- Load Balancing - Distribution strategies
- Voice Providers Comparison - TTS, STT, and Realtime capability matrix
- Voice Provider Selection - Choosing the right voice provider