API Reference
December 1, 2025 ยท View on GitHub
The server provides OpenAI-compatible endpoints for text-to-speech generation.
Base URL
http://localhost:8000
Request Model
All TTS endpoints accept the following request parameters:
{
"model": "voxcpm-0.5b", // Model identifier (fixed)
"input": "Text to synthesize", // Required: Text to generate speech for
"voice": "voice_name", // Optional: Use cached voice
"prompt_wav_path": "/path/to/audio.wav", // Optional: Path to prompt audio file
"prompt_text": "Transcription of prompt audio", // Optional: Text matching prompt audio
"response_format": "wav", // Optional: Audio format (wav, mp3, flac, opus, aac, pcm)
"max_length": 2048, // Optional: Max generated sequence length (1-2048)
"cfg_value": 2.0, // Optional: Classifier-free guidance (0.0-10.0)
"inference_timesteps": 10 // Optional: Diffusion steps (1-100)
}
Voice Selection
You have two options for voice control:
-
Cached Voices: Use pre-computed voice embeddings
- Set
voiceparameter to a cached voice name - Available voices can be listed via
/voicesendpoint - Ignores
prompt_wav_pathandprompt_textparameters
- Set
-
Custom Voice Cloning: Provide your own audio prompt
- Set
prompt_wav_pathto the path of local WAV file - Set
prompt_textto the exact transcription of the audio - If
prompt_wav_pathis empty, generates with random voice
- Set
Parameters
- max_length: Controls maximum generated audio length (each unit โ 0.04 seconds)
- cfg_value: Classifier-free guidance strength.
- inference_timesteps: Number of diffusion steps, defaults to 10.
Endpoints
1. Generate Speech (File)
POST /v1/audio/speech
Generates a complete audio file and returns it for download.
Request:
curl -X POST "http://localhost:8000/v1/audio/speech" \
-H "Content-Type: application/json" \
-d '{
"model": "voxcpm-0.5b",
"input": "Hello, this is a test of the VoxCPM TTS system.",
"response_format": "wav"
}'
Response: Binary audio file with appropriate Content-Type header
Supported formats: wav, mp3, flac, opus, aac, pcm
2. Stream Speech (Real-time)
POST /v1/audio/speech/stream
Streams audio chunks in real-time for low-latency playback.
Request:
curl -X POST "http://localhost:8000/v1/audio/speech/stream" \
-H "Content-Type: application/json" \
-H "Accept: application/octet-stream" \
-d '{
"model": "voxcpm-0.5b",
"input": "This speech will be streamed in real-time.",
"response_format": "pcm"
}'
Response: Streaming binary audio data (16-bit PCM, 16kHz)
Headers:
X-Sample-Rate: Sample rate of the audio (typically 16000)
3. Play on Server
POST /v1/audio/speech/playback
Generates speech and plays it directly on the server with progress indicators.
Request:
{
"model": "voxcpm-0.5b",
"input": "This will play on the server"
}
Response:
{
"status": "success",
"message": "Audio playback completed on server",
"duration_seconds": 5.23,
}
4. Cancel Generation
POST /v1/audio/speech/cancel
Cancels the currently running audio generation.
Request:
curl -X POST "http://localhost:8000/v1/audio/speech/cancel"
Response:
{
"status": "success",
"message": "Cancellation signal sent to Job 123"
}
5. Create Custom Voice
POST /v1/voices
Creates a new cached voice from a prompt audio file.
Request:
{
"voice_name": "my_custom_voice",
"prompt_wav_path": "/path/to/prompt.wav",
"prompt_text": "Transcription of the prompt audio", // Can be raw text or path to .txt file
"replace": false
}
Response:
{
"status": "success",
"message": "Voice 'my_custom_voice' created successfully."
}
6. List Available Voices
GET /voices
Returns a list of available cached voice names.
Request:
curl -X GET "http://localhost:8000/voices"
Response:
{
"voices": ["voice1", "voice2", "voice3"],
"count": 3,
"cache_directory": "assets/caches"
}
7. Health Check
GET /health
Returns server status and current processing state.
Request:
curl -X GET "http://localhost:8000/health"
Response:
{
"status": "healthy",
"is_processing": true,
"current_job_id": 123,
"queue_pending": false,
"model": "voxcpm-0.5b"
}
8. Web Playground
GET /
Interactive web interface for testing the TTS functionality.
Access at: http://localhost:{PORT}
CLI Arguments
The server accepts the following command-line arguments:
--port: Port to run the server on (default: 8000)--host: Host to bind the server to (default: 0.0.0.0)--cache-dir: Directory for custom voice caches (default:~/.cache/ane_tts)