Sayna
April 29, 2026 ยท View on GitHub
A high-performance real-time voice processing server providing unified Speech-to-Text (STT) and Text-to-Speech (TTS) services through WebSocket and REST APIs.
Quick Start
docker run -d \
-p 3001:3001 \
-e DEEPGRAM_API_KEY=your-key \
saynaai/sayna
The server will be available at http://localhost:3001.
Supported Providers
- Deepgram - STT and TTS
- ElevenLabs - STT and TTS
- Google Cloud - STT and TTS (WaveNet, Neural2, Studio voices)
- Microsoft Azure - STT and TTS (400+ neural voices)
- Cartesia - STT and TTS
Environment Variables
Server Configuration
| Variable | Description | Default |
|---|---|---|
HOST | Bind address | 0.0.0.0 |
PORT | Server port | 3001 |
CACHE_PATH | Cache directory for models and audio | /app/cache |
RUST_LOG | Log level (error, warn, info, debug, trace) | info |
Provider API Keys
| Variable | Description |
|---|---|
DEEPGRAM_API_KEY | Deepgram API key |
ELEVENLABS_API_KEY | ElevenLabs API key |
GOOGLE_APPLICATION_CREDENTIALS | Path to Google Cloud service account JSON |
AZURE_SPEECH_SUBSCRIPTION_KEY | Azure Speech subscription key |
AZURE_SPEECH_REGION | Azure region (default: eastus) |
CARTESIA_API_KEY | Cartesia API key |
LiveKit Integration
| Variable | Description | Default |
|---|---|---|
LIVEKIT_URL | LiveKit server WebSocket URL | ws://localhost:7880 |
LIVEKIT_PUBLIC_URL | Public LiveKit URL for clients | - |
LIVEKIT_API_KEY | LiveKit API key | - |
LIVEKIT_API_SECRET | LiveKit API secret | - |
Authentication (Optional)
| Variable | Description | Default |
|---|---|---|
AUTH_REQUIRED | Enable authentication | false |
AUTH_API_SECRETS_JSON | API secrets JSON array ([{id, secret}]) | - |
AUTH_API_SECRET | Legacy single API secret | - |
AUTH_API_SECRET_ID | Legacy API secret id for AUTH_API_SECRET | default |
AUTH_SERVICE_URL | External auth service URL | - |
AUTH_SIGNING_KEY_PATH | Path to JWT signing key | - |
AUTH_TIMEOUT_SECONDS | Auth request timeout | 5 |
Minimal multi-secret example:
AUTH_API_SECRETS_JSON='[{"id":"default","secret":"sk_test_default_123"}]'
Recording Storage (Optional)
The same configuration drives both LiveKit Egress uploads and Sayna's
GET /recording/{stream_id} download endpoint. Pick one backend.
Common variables:
| Variable | Description |
|---|---|
RECORDING_BACKEND | s3 or gcs (auto-detected from the RECORDING_<S3|GCS>_* vars when unset) |
RECORDING_PREFIX | Object key prefix; {prefix}/{stream_id}/audio.ogg |
Amazon S3 (or S3-compatible: MinIO, Cloudflare R2):
| Variable | Description |
|---|---|
RECORDING_S3_BUCKET | Bucket name |
RECORDING_S3_REGION | Region |
RECORDING_S3_ACCESS_KEY | Access key |
RECORDING_S3_SECRET_KEY | Secret key |
RECORDING_S3_ENDPOINT | Optional custom endpoint URL |
RECORDING_S3_FORCE_PATH_STYLE | true for MinIO/R2 (default: false) |
Google Cloud Storage (provide exactly one credentials variable):
| Variable | Description |
|---|---|
RECORDING_GCS_BUCKET | Bucket name |
RECORDING_GCS_CREDENTIALS_PATH | Path to a service-account JSON file |
RECORDING_GCS_CREDENTIALS_JSON | Inline service-account JSON (alternative to the path) |
Docker Compose
version: "3.9"
services:
sayna:
image: saynaai/sayna
ports:
- "3001:3001"
environment:
DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
ELEVENLABS_API_KEY: ${ELEVENLABS_API_KEY}
CACHE_PATH: /data/cache
volumes:
- sayna-cache:/data/cache
volumes:
sayna-cache: {}
With LiveKit
version: "3.9"
services:
livekit:
image: livekit/livekit-server:latest
environment:
LIVEKIT_KEYS: "devkey:secret"
ports:
- "7880:7880"
sayna:
image: saynaai/sayna
depends_on:
- livekit
ports:
- "3001:3001"
environment:
DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
ELEVENLABS_API_KEY: ${ELEVENLABS_API_KEY}
LIVEKIT_URL: ws://livekit:7880
LIVEKIT_PUBLIC_URL: ws://localhost:7880
LIVEKIT_API_KEY: devkey
LIVEKIT_API_SECRET: secret
CACHE_PATH: /data/cache
volumes:
- sayna-cache:/data/cache
volumes:
sayna-cache: {}
Persistent Cache
Mount a volume to /data/cache (or your configured CACHE_PATH) to persist:
- VAD and turn detection model assets
- Cached TTS audio outputs
docker run -d \
-p 3001:3001 \
-e DEEPGRAM_API_KEY=your-key \
-e CACHE_PATH=/data/cache \
-v sayna-cache:/data/cache \
saynaai/sayna
Health Check
curl http://localhost:3001/
# Returns: {"status":"OK"}
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ | GET | Health check |
/ws | WebSocket | Real-time voice processing |
/voices | GET | List available TTS voices |
/speak | POST | Generate speech from text |
/livekit/token | POST | Generate LiveKit participant token |
/livekit/webhook | POST | LiveKit event webhook |
Audio-Disabled Mode
Run without provider API keys for development and testing:
docker run -d -p 3001:3001 saynaai/sayna
Then send a WebSocket config with audio_disabled: true:
{
"type": "config",
"config": {
"audio_disabled": true
}
}
Image Variants
The default image includes:
- VAD + Turn Detection - Silero-VAD voice activity detection with ONNX-based turn detection
- Noise Filter - DeepFilterNet noise suppression
Pre-downloaded model assets are included in the image.
Architecture
- amd64 (x86_64)
- arm64 (aarch64)
Documentation
For complete documentation, visit https://docs.sayna.ai