Sayna

April 29, 2026 ยท View on GitHub

A high-performance real-time voice processing server providing unified Speech-to-Text (STT) and Text-to-Speech (TTS) services through WebSocket and REST APIs.

Quick Start

docker run -d \
  -p 3001:3001 \
  -e DEEPGRAM_API_KEY=your-key \
  saynaai/sayna

The server will be available at http://localhost:3001.

Supported Providers

  • Deepgram - STT and TTS
  • ElevenLabs - STT and TTS
  • Google Cloud - STT and TTS (WaveNet, Neural2, Studio voices)
  • Microsoft Azure - STT and TTS (400+ neural voices)
  • Cartesia - STT and TTS

Environment Variables

Server Configuration

VariableDescriptionDefault
HOSTBind address0.0.0.0
PORTServer port3001
CACHE_PATHCache directory for models and audio/app/cache
RUST_LOGLog level (error, warn, info, debug, trace)info

Provider API Keys

VariableDescription
DEEPGRAM_API_KEYDeepgram API key
ELEVENLABS_API_KEYElevenLabs API key
GOOGLE_APPLICATION_CREDENTIALSPath to Google Cloud service account JSON
AZURE_SPEECH_SUBSCRIPTION_KEYAzure Speech subscription key
AZURE_SPEECH_REGIONAzure region (default: eastus)
CARTESIA_API_KEYCartesia API key

LiveKit Integration

VariableDescriptionDefault
LIVEKIT_URLLiveKit server WebSocket URLws://localhost:7880
LIVEKIT_PUBLIC_URLPublic LiveKit URL for clients-
LIVEKIT_API_KEYLiveKit API key-
LIVEKIT_API_SECRETLiveKit API secret-

Authentication (Optional)

VariableDescriptionDefault
AUTH_REQUIREDEnable authenticationfalse
AUTH_API_SECRETS_JSONAPI secrets JSON array ([{id, secret}])-
AUTH_API_SECRETLegacy single API secret-
AUTH_API_SECRET_IDLegacy API secret id for AUTH_API_SECRETdefault
AUTH_SERVICE_URLExternal auth service URL-
AUTH_SIGNING_KEY_PATHPath to JWT signing key-
AUTH_TIMEOUT_SECONDSAuth request timeout5

Minimal multi-secret example:

AUTH_API_SECRETS_JSON='[{"id":"default","secret":"sk_test_default_123"}]'

Recording Storage (Optional)

The same configuration drives both LiveKit Egress uploads and Sayna's GET /recording/{stream_id} download endpoint. Pick one backend.

Common variables:

VariableDescription
RECORDING_BACKENDs3 or gcs (auto-detected from the RECORDING_<S3|GCS>_* vars when unset)
RECORDING_PREFIXObject key prefix; {prefix}/{stream_id}/audio.ogg

Amazon S3 (or S3-compatible: MinIO, Cloudflare R2):

VariableDescription
RECORDING_S3_BUCKETBucket name
RECORDING_S3_REGIONRegion
RECORDING_S3_ACCESS_KEYAccess key
RECORDING_S3_SECRET_KEYSecret key
RECORDING_S3_ENDPOINTOptional custom endpoint URL
RECORDING_S3_FORCE_PATH_STYLEtrue for MinIO/R2 (default: false)

Google Cloud Storage (provide exactly one credentials variable):

VariableDescription
RECORDING_GCS_BUCKETBucket name
RECORDING_GCS_CREDENTIALS_PATHPath to a service-account JSON file
RECORDING_GCS_CREDENTIALS_JSONInline service-account JSON (alternative to the path)

Docker Compose

version: "3.9"
services:
  sayna:
    image: saynaai/sayna
    ports:
      - "3001:3001"
    environment:
      DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
      ELEVENLABS_API_KEY: ${ELEVENLABS_API_KEY}
      CACHE_PATH: /data/cache
    volumes:
      - sayna-cache:/data/cache

volumes:
  sayna-cache: {}

With LiveKit

version: "3.9"
services:
  livekit:
    image: livekit/livekit-server:latest
    environment:
      LIVEKIT_KEYS: "devkey:secret"
    ports:
      - "7880:7880"

  sayna:
    image: saynaai/sayna
    depends_on:
      - livekit
    ports:
      - "3001:3001"
    environment:
      DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
      ELEVENLABS_API_KEY: ${ELEVENLABS_API_KEY}
      LIVEKIT_URL: ws://livekit:7880
      LIVEKIT_PUBLIC_URL: ws://localhost:7880
      LIVEKIT_API_KEY: devkey
      LIVEKIT_API_SECRET: secret
      CACHE_PATH: /data/cache
    volumes:
      - sayna-cache:/data/cache

volumes:
  sayna-cache: {}

Persistent Cache

Mount a volume to /data/cache (or your configured CACHE_PATH) to persist:

  • VAD and turn detection model assets
  • Cached TTS audio outputs
docker run -d \
  -p 3001:3001 \
  -e DEEPGRAM_API_KEY=your-key \
  -e CACHE_PATH=/data/cache \
  -v sayna-cache:/data/cache \
  saynaai/sayna

Health Check

curl http://localhost:3001/
# Returns: {"status":"OK"}

API Endpoints

EndpointMethodDescription
/GETHealth check
/wsWebSocketReal-time voice processing
/voicesGETList available TTS voices
/speakPOSTGenerate speech from text
/livekit/tokenPOSTGenerate LiveKit participant token
/livekit/webhookPOSTLiveKit event webhook

Audio-Disabled Mode

Run without provider API keys for development and testing:

docker run -d -p 3001:3001 saynaai/sayna

Then send a WebSocket config with audio_disabled: true:

{
  "type": "config",
  "config": {
    "audio_disabled": true
  }
}

Image Variants

The default image includes:

  • VAD + Turn Detection - Silero-VAD voice activity detection with ONNX-based turn detection
  • Noise Filter - DeepFilterNet noise suppression

Pre-downloaded model assets are included in the image.

Architecture

  • amd64 (x86_64)
  • arm64 (aarch64)

Documentation

For complete documentation, visit https://docs.sayna.ai

Source Code

https://github.com/saynaai/sayna