AI Provider Guides

May 26, 2026 ยท View on GitHub

Complete setup guides for all supported AI providers.


๐Ÿ†“ Free Tier Providers

Start with zero cost using these free-tier options:

Hugging Face

100,000+ open-source models

  • โœ… Free inference API
  • ๐ŸŒ Largest model collection
  • ๐Ÿ”“ Fully open source
  • ๐Ÿ“Š Models by task: chat, classification, NER, summarization

Setup Guide โ†’

Google AI Studio

Gemini models with generous free tier

  • โœ… 1,500 requests/day free
  • โšก Fast Gemini 2.0 Flash
  • ๐ŸŽฏ 15 requests/minute
  • ๐Ÿ’ฐ Pay-as-you-go option

Setup Guide โ†’


๐Ÿค– Direct AI Providers

Access leading AI models directly from their creators:

Anthropic

Claude models with API key or OAuth authentication

  • ๐Ÿง  Claude 4.5 Opus/Sonnet/Haiku, Claude 4.0 Opus/Sonnet
  • ๐Ÿ” API key or OAuth (Pro/Max subscription)
  • ๐Ÿ’ญ Extended thinking for deep reasoning
  • ๐Ÿ“„ 200K context window, multimodal support

Setup Guide โ†’


๐Ÿข Enterprise Providers

Production-grade providers for enterprise deployments:

Azure OpenAI

Enterprise AI with Microsoft Azure

  • ๐Ÿ”’ SOC2, HIPAA, ISO 27001 compliant
  • ๐ŸŒ Multi-region deployment (30+ regions)
  • ๐Ÿ›ก๏ธ Private endpoints with VNet
  • ๐Ÿ’ผ Enterprise SLAs

Setup Guide โ†’

Google Vertex AI

Google Cloud ML platform

  • โ˜๏ธ GCP integration
  • ๐Ÿ” IAM, VPC, service accounts
  • ๐ŸŒ Global deployment
  • ๐ŸŽฏ Gemini, PaLM, Codey models

Setup Guide โ†’

AWS Bedrock

Serverless AI on AWS

  • ๐Ÿ“ฆ 13 foundation models (Claude, Llama, Mistral)
  • ๐Ÿ” IAM, VPC integration
  • ๐ŸŒ Multi-region (us-east-1, eu-west-1, ap-southeast-1)
  • ๐Ÿ’ฐ Pay-per-use pricing

Setup Guide โ†’


๐ŸŒ Compliance-Focused

Providers with specific compliance certifications:

Mistral AI

European AI with GDPR compliance

  • ๐Ÿ‡ช๐Ÿ‡บ EU data residency
  • โœ… GDPR compliant by default
  • ๐Ÿ”“ Open source models
  • ๐Ÿ’ฐ Cost-effective

Setup Guide โ†’


๐Ÿง‘โ€๐Ÿ’ป Hosted Inference Providers

Access frontier models via hosted cloud inference APIs:

DeepSeek

deepseek-chat (V3) and deepseek-reasoner (R1)

  • ๐Ÿง  deepseek-chat โ€” high-quality general chat at low cost
  • ๐Ÿ’ญ deepseek-reasoner โ€” R1 chain-of-thought reasoning model
  • ๐Ÿ”‘ API key from platform.deepseek.com
  • ๐Ÿ”„ Aliases: ds

Setup Guide โ†’

NVIDIA NIM

400+ models via NVIDIA's hosted and self-hosted inference platform

  • ๐Ÿš€ Llama 3.3 70B Instruct (default), Mistral, Nemotron, and 400+ catalog models
  • ๐Ÿ”ง NIM-specific extras: top_k, min_p, repetition_penalty, reasoning_budget
  • ๐Ÿ”‘ API key from build.nvidia.com
  • ๐Ÿ–ฅ๏ธ Also supports self-hosted NIM endpoints via NVIDIA_NIM_BASE_URL
  • ๐Ÿ”„ Aliases: nim, nvidia

Setup Guide โ†’

xAI Grok

Grok 3 / 3 Mini / 2 / 2 Vision via api.x.ai

  • ๐Ÿง  Grok 3 โ€” flagship reasoning + coding
  • โšก Grok 3 Mini โ€” faster + cheaper
  • ๐Ÿ‘๏ธ Grok 2 Vision โ€” multimodal text + images
  • ๐Ÿ”‘ API key from console.x.ai
  • ๐Ÿ”„ Aliases: grok

Setup Guide โ†’

Groq

Sub-100ms inference via LPU acceleration

  • โšก <100ms TTFT โ€” fastest hosted inference available
  • ๐Ÿฆ™ Llama 3.3 70B Versatile (default), Llama 3.1 8B Instant, Mixtral, Gemma 2
  • ๐Ÿ‘๏ธ Llama 3.2 vision-preview variants for multimodal
  • ๐Ÿ”‘ API key from console.groq.com/keys

Setup Guide โ†’

Together AI

Hosted open-model gateway

  • ๐Ÿ“š Llama 3.3 / 3.1 (8Bโ€“405B), Mixtral, Qwen 2.5, DeepSeek R1/V3, WizardLM
  • โšก Turbo variants for low latency
  • ๐Ÿ”‘ API key from api.together.xyz/settings/api-keys
  • ๐Ÿ”„ Aliases: together

Fireworks AI

Fast open-model serving

  • ๐Ÿ”ฅ Llama v3.1 70B/405B, Mixtral 8x22B, Qwen 2.5 Coder, DeepSeek V3
  • ๐Ÿ‘๏ธ Phi-3-Vision and Llama 3.2 vision variants
  • ๐Ÿ”‘ API key from fireworks.ai/account/api-keys

Perplexity

Sonar models with built-in web grounding

  • ๐ŸŒ sonar / sonar-pro / sonar-reasoning / sonar-deep-research
  • ๐Ÿ“š Built-in web search + citations
  • ๐Ÿ”‘ API key from perplexity.ai/settings/api
  • ๐Ÿ”„ Aliases: pplx

Cloudflare Workers AI

Edge-served open models

  • ๐ŸŒ Lowest cost tier โ€” bills per "neuron" not token
  • ๐Ÿฆ™ Llama 3.3 70B FP8, Llama 3.1, Mistral, Qwen, Gemma
  • ๐Ÿ”‘ Token from dash.cloudflare.com/profile/api-tokens (Workers AI Read+Write)
  • โš ๏ธ Requires both CLOUDFLARE_API_KEY AND CLOUDFLARE_ACCOUNT_ID
  • ๐Ÿ”„ Aliases: workers-ai, cf-ai

Cohere

Command R / R+ chat + Embed v3 / Rerank v3 (RAG-essential)

  • ๐Ÿ’ฌ Command R+ flagship + Command R + Command R7B
  • ๐Ÿ” Embed v3 (English / multilingual) + Rerank v3 โ€” top-tier RAG
  • ๐Ÿ”‘ API key from dashboard.cohere.com/api-keys

Replicate

Multi-modal gateway โ€” LLM + image + video + avatar + music in one auth

  • ๐ŸŽฏ One REPLICATE_API_TOKEN for 5 modalities
  • ๐Ÿ“š Llama 3.1 70B/405B, Mistral, Mixtral
  • ๐ŸŽจ FLUX 1.1 Pro, SDXL, Stable Diffusion 3.5
  • ๐ŸŽฌ Wan-Alpha video, MuseTalk avatar, MusicGen music
  • ๐Ÿ”‘ Token from replicate.com/account/api-tokens

Setup Guide โ†’


๐Ÿ” Embedding-Only Providers

Specialised embedding providers for RAG / retrieval pipelines (no chat):

Voyage AI

Top-tier RAG embeddings

  • ๐Ÿ“Š voyage-3-large flagship; voyage-3.5 default; voyage-code-3 for code
  • ๐ŸŒ voyage-multilingual-2 + domain-tuned (finance, law)
  • ๐Ÿ”‘ API key from dash.voyageai.com/api-keys

Setup Guide โ†’

Jina AI

Embeddings + reranking

  • ๐Ÿ“Š jina-embeddings-v3 multilingual flagship
  • ๐Ÿ”„ jina-reranker-v2 for retrieval reranking
  • ๐Ÿ” jina-colbert-v2 late-interaction retrieval
  • ๐Ÿ”‘ API key from jina.ai

๐ŸŽจ Direct Image Generation

Specialised image-gen providers (in addition to Vertex Imagen / OpenAI DALL-E / Anthropic / Bedrock):

Stability AI

Stable Image Ultra/Core + SD 3.5 family

  • ๐ŸŽจ Stable Image Ultra (flagship), Core (fast), SD 3.5 Large/Large-Turbo/Medium
  • ๐Ÿ–ผ๏ธ PNG output, aspect-ratio + negative-prompt + seed support
  • ๐Ÿ”‘ API key from platform.stability.ai/account/keys
  • ๐Ÿ”„ Aliases: stability-ai, sd

Setup Guide โ†’

Ideogram

Strong typography + design-focused image generation

  • ๐Ÿ“ V3 default; V2/V2-Turbo/V1 also supported
  • ๐ŸŽจ magic_prompt + style + aspect_ratio controls
  • ๐Ÿ”‘ API key from developer.ideogram.ai

Recraft

Vector / illustration-focused image generation

  • ๐ŸŽจ recraftv3 (raster), recraftv3-svg (vector), recraftv2
  • ๐Ÿ“ OpenAI-compat shape + style + size controls
  • ๐Ÿ”‘ API token from recraft.ai/api

๐Ÿ’ป Local Providers

Run models entirely on your own hardware โ€” no API key or internet required for inference:

LM Studio

Run any supported model locally with a GUI app

  • ๐Ÿ–ฅ๏ธ Download and run models via the LM Studio desktop application
  • ๐Ÿ” Auto-discovers the loaded model from /v1/models (no model name required)
  • ๐ŸŒ OpenAI-compatible API at http://localhost:1234/v1 by default
  • ๐Ÿ†“ No API key needed for local use (key optional for reverse-proxy setups)
  • ๐Ÿ”„ Aliases: lmstudio, lms

Setup Guide โ†’

llama.cpp

High-performance local inference via llama-server

  • โšก Run GGUF models with llama-server at http://localhost:8080/v1 by default
  • ๐Ÿ” Auto-discovers the loaded model from /v1/models
  • ๐Ÿ› ๏ธ Tool support requires --jinja flag when starting llama-server
  • ๐Ÿ†“ No API key needed for local use (key optional for reverse-proxy setups)
  • ๐Ÿ”„ Aliases: llama.cpp

Setup Guide โ†’


๐Ÿ”Œ Aggregators & Proxies

Access multiple providers through unified interfaces:

OpenRouter

300+ models from 60+ providers

  • ๐ŸŒ Single API for all major providers (Anthropic, OpenAI, Google, Meta, etc.)
  • โšก Automatic failover and routing
  • ๐Ÿ’ฐ Competitive pricing with cost optimization
  • ๐ŸŽฏ Zero lock-in - switch models instantly
  • ๐Ÿ“Š Usage tracking dashboard
  • ๐Ÿ†“ Free models available

Setup Guide โ†’

OpenAI Compatible

OpenRouter, vLLM, LocalAI, and more

  • ๐ŸŒ 100+ models through OpenRouter
  • ๐Ÿ’ป Local deployment with vLLM
  • ๐Ÿ”“ Self-hosted with LocalAI
  • ๐Ÿ”„ Drop-in OpenAI replacement

Setup Guide โ†’

LiteLLM

100+ providers through proxy

  • ๐Ÿ”„ Unified API for 100+ providers
  • ๐Ÿ“Š Load balancing and fallbacks
  • ๐Ÿ’ฐ Cost tracking
  • ๐ŸŽฏ Model routing

Setup Guide โ†’


๐ŸŽ™๏ธ Voice Providers {#voice-providers}

Synthesize speech, transcribe audio, or run live voice sessions. Voice providers are separate from LLM providers โ€” they handle audio I/O rather than text generation.

Text-to-Speech (TTS)

OpenAI TTS

Highest-quality text-to-speech

  • ๐ŸŽ™๏ธ Voices: alloy, echo, fable, onyx, nova, shimmer
  • ๐ŸŽต Models: tts-1 (fast) and tts-1-hd (high quality)
  • ๐ŸŽผ Formats: MP3, WAV, OGG, Opus
  • ๐Ÿ”‘ Auth: API Key (OPENAI_API_KEY)

Setup Guide โ†’

ElevenLabs

Best multilingual and voice-cloning TTS

  • ๐ŸŒ Supports 30+ languages with natural prosody
  • ๐ŸŽญ Custom voice cloning from short audio samples
  • ๐ŸŽผ Formats: MP3, WAV (raw PCM, surfaced as pcm16), Opus (Ogg container)
  • ๐Ÿ”‘ Auth: API Key (ELEVENLABS_API_KEY)

Setup Guide โ†’

Google TTS

1M characters/month free tier

  • ๐Ÿ’ฐ Generous free tier for standard voices
  • ๐ŸŒ 380+ voices across 50+ languages
  • ๐ŸŽผ Formats: MP3, WAV, OGG
  • ๐Ÿ”‘ Auth: Service Account

Setup Guide โ†’

Azure TTS

Enterprise TTS with full SSML support

  • ๐Ÿข Fine-grained prosody control via SSML
  • ๐ŸŒ 400+ neural voices, 140+ languages
  • ๐ŸŽผ Formats: MP3, WAV (PCM), Opus (Ogg container)
  • ๐Ÿ”‘ Auth: API Key + Region

Setup Guide โ†’

Fish Audio

Low-cost TTS with 15s voice cloning

  • ๐Ÿ’ฐ ~80% cheaper than ElevenLabs
  • ๐ŸŽญ 15-second reference audio โ†’ custom voice
  • ๐ŸŒ 14 languages
  • ๐ŸŽผ Formats: MP3, WAV, PCM16 (raw)
  • ๐Ÿ”‘ Auth: API Key (FISH_AUDIO_API_KEY)

Setup Guide โ†’

Cartesia

Low-latency Sonic models โ€” synchronous + streaming

  • โšก Sub-second turnaround on the synchronous /tts/bytes endpoint
  • ๐ŸŒŠ Separate WebSocket streaming flow via CartesiaStream (voice server)
  • ๐ŸŽญ Voice cloning via dashboard upload
  • ๐ŸŽผ Formats: MP3 (44.1 kHz), WAV (PCM s16le @ 44.1 kHz), PCM16 (raw @ 24 kHz)
  • ๐Ÿ”‘ Auth: API Key (CARTESIA_API_KEY)

Setup Guide โ†’


Speech-to-Text (STT)

Whisper (OpenAI)

Highest transcription accuracy

  • ๐ŸŽฏ Best-in-class accuracy on diverse audio
  • ๐ŸŒ Multilingual with automatic language detection
  • ๐ŸŽผ Formats: WAV, MP3, M4A, FLAC, OGG, OPUS, WEBM, MP4, MPEG, MPGA
  • ๐Ÿ”‘ Auth: API Key (OPENAI_API_KEY)

Setup Guide โ†’

Deepgram

Real-time streaming transcription via WebSocket

  • โšก Sub-300 ms word-level results over WebSocket
  • ๐ŸŒŠ REST batch and WebSocket streaming modes
  • ๐ŸŽผ Formats: WAV, MP3, OGG, FLAC
  • ๐Ÿ”‘ Auth: API Key (DEEPGRAM_API_KEY)

Setup Guide โ†’

Google STT

125+ languages with speaker diarization

  • ๐ŸŒ Best fit for existing Google Cloud users
  • ๐Ÿ‘ฅ Speaker diarization and multi-channel audio
  • ๐ŸŽผ Formats: WAV, FLAC, MP3, OGG
  • ๐Ÿ”‘ Auth: API Key (GOOGLE_AI_API_KEY / GEMINI_API_KEY) or Service Account (GOOGLE_APPLICATION_CREDENTIALS)

Setup Guide โ†’

Azure STT

Enterprise STT with custom model training

  • ๐Ÿข Batch transcription and custom model support
  • ๐Ÿ”’ Compliance controls for regulated industries
  • ๐ŸŽผ Formats: WAV (PCM), Ogg/Opus โ€” convert MP3 to WAV first
  • ๐Ÿ”‘ Auth: API Key + Region

Setup Guide โ†’


Realtime Voice

Realtime providers maintain a persistent bidirectional WebSocket connection, enabling low-latency spoken conversation with the AI model.

OpenAI Realtime

Low-latency bidirectional voice over WebSocket

  • โšก Full-duplex audio stream with GPT-4o
  • ๐ŸŽต Voice activity detection (VAD) built-in
  • ๐ŸŽผ Formats: WAV, Opus
  • ๐Ÿ”‘ Auth: API Key (OPENAI_API_KEY)

Setup Guide โ†’

Gemini Live

Google's native realtime voice API

  • โšก Native multimodal realtime session with Gemini
  • ๐ŸŽต Supports audio + video input simultaneously
  • ๐ŸŽผ Formats: WAV, Opus
  • ๐Ÿ”‘ Auth: API Key (GOOGLE_AI_API_KEY or GEMINI_API_KEY)

Setup Guide โ†’


๐ŸŽฌ Video Generation

Image-to-video and text-to-video providers (use via output: { mode: "video" }):

  • Vertex Veo 3.1 (default) โ€” --videoProvider vertex
  • Kling (PiAPI) โ€” --videoProvider kling (details)
  • Runway (Gen-3 Alpha / Gen-4 Turbo) โ€” --videoProvider runway
  • Replicate โ€” Wan-Alpha + many others โ€” --videoProvider replicate (guide)

See Video Generation feature page for the full SDK / CLI surface.


๐Ÿ‘ค Avatar / Lip-Sync Generation

Talking-head video synthesis from a portrait image + audio (use via output: { mode: "avatar" }):

  • D-ID โ€” --avatarProvider d-id (text-driven via Microsoft voices, or audio-driven)
  • HeyGen โ€” --avatarProvider heygen (HeyGen avatar catalog id required)
  • Replicate (MuseTalk) โ€” --avatarProvider replicate or musetalk (guide)

See docs/provider-integration/21-adding-new-modality.md for the architectural pattern.


๐ŸŽต Music / Sound Generation

Music + sound-effect generation (use via output: { mode: "music" }):

  • Beatoven.ai โ€” --musicProvider beatoven (royalty-free background music)
  • ElevenLabs Music โ€” --musicProvider elevenlabs-music (short SFX / loops up to 22s; same ELEVENLABS_API_KEY as TTS)
  • Lyria 3 Pro (Google) โ€” --musicProvider lyria
  • Replicate (MusicGen) โ€” --musicProvider replicate or musicgen (guide)

Quick Comparison

ProviderFree TierEnterpriseGDPRLatencyBest For
AnthropicLimitedโœ…โœ…LowReasoning, coding, Claude
Hugging Faceโœ…โŒโœ…MediumOpen source, experimentation
Google AIโœ…โœ…โœ…LowFree tier, Gemini
Mistral AIโŒโœ…โœ…LowEU compliance, cost
OpenRouterโœ…โœ…VariesLowMulti-model, automatic failover
OpenAI CompatibleVariesโœ…VariesVariesFlexibility, local deployment
LiteLLMโŒโœ…VariesLowMulti-provider, unified API
Azure OpenAIโŒโœ…โœ…LowEnterprise, Microsoft ecosystem
Vertex AIโŒโœ…โœ…LowEnterprise, GCP ecosystem
AWS BedrockโŒโœ…โœ…LowEnterprise, AWS ecosystem
DeepSeekโŒโœ…โŒLowCost-effective reasoning, R1 model
NVIDIA NIMโŒโœ…VariesLowNVIDIA-hosted or self-hosted LLMs
LM Studioโœ… (Local)โŒโœ…VariesLocal GUI model management
llama.cppโœ… (Local)โŒโœ…VariesHigh-performance local GGUF inference
OpenAI TTSโŒโœ…โœ…LowHigh-quality TTS (tts-1-hd)
ElevenLabsโŒโœ…VariesLowMultilingual TTS, voice cloning
Google TTSโœ…โœ…โœ…LowCost-effective TTS, 1M chars free
Azure TTSโŒโœ…โœ…LowEnterprise TTS, SSML support
Fish AudioโŒโœ…VariesLowLow-cost TTS, voice cloning, 14 langs
CartesiaโŒโœ…VariesLowLow-latency Sonic models
WhisperโŒโœ…โœ…LowBest STT accuracy
DeepgramโŒโœ…VariesLowReal-time STT streaming (WebSocket)
Google STTโŒโœ…โœ…LowSTT for GCP users, 125+ languages
Azure STTโŒโœ…โœ…LowEnterprise STT, custom models
OpenAI RealtimeโŒโœ…โœ…LowRealtime bidirectional voice
Gemini LiveโŒโœ…โœ…LowRealtime voice + video (Gemini)

Setup Strategies

=== "SDK Usage"

```typescript
const ai = new NeuroLink({
providers: [
{
name: 'google-ai',
priority: 1,
config: { apiKey: process.env.GOOGLE_AI_KEY },
quotas: { daily: 1500 }
},
{
name: 'openai',
priority: 2,
config: { apiKey: process.env.OPENAI_API_KEY }
}
],
failoverConfig: { enabled: true, fallbackOnQuota: true }
});

    const result = await ai.generate({
      input: { text: "Hello world" }
    });
    ```

=== "CLI Usage"

```bash
# Set up environment variables
export GOOGLE_AI_KEY="your-key"
export OPENAI_API_KEY="your-key"

    # Use with automatic failover
    npx @juspay/neurolink generate "Hello world" \
      --provider google-ai
    ```

Strategy 2: Multi-Region Enterprise

const ai = new NeuroLink({
  providers: [
    {
      name: "azure-us",
      region: "us-east",
      config: {
        /* Azure US */
      },
    },
    {
      name: "azure-eu",
      region: "eu-west",
      config: {
        /* Azure EU */
      },
    },
    {
      name: "bedrock-us",
      region: "us-east",
      config: {
        /* Bedrock US */
      },
    },
  ],
  loadBalancing: "latency-based",
});

Strategy 3: GDPR Compliance

const ai = new NeuroLink({
  providers: [
    {
      name: "mistral",
      priority: 1,
      config: { apiKey: process.env.MISTRAL_API_KEY },
    },
    {
      name: "azure-eu",
      priority: 2,
      config: {
        /* Azure EU region */
      },
    },
  ],
  compliance: {
    framework: "GDPR",
    dataResidency: "EU",
  },
});

Next Steps

  1. Choose a provider based on your requirements (free tier, compliance, region)
  2. Follow the setup guide to get your API key
  3. Configure NeuroLink with the provider
  4. Test the integration with a simple request
  5. Add failover for production reliability