Active Call API Documentation
June 12, 2026 · View on GitHub
This document describes the WebSocket and REST API endpoints provided by Active Call.
Base URL
All API endpoints are relative to the server base URL.
Authentication
Most endpoints require WebSocket upgrade for real-time communication.
WebSocket Call Endpoints
The following three endpoints establish WebSocket connections for different voice communication protocols:
1. WebSocket Call Handler
Endpoint: GET /call
Description: Establishes a WebSocket connection for voice call handling with audio stream transmitted via WebSocket.
Parameters:
id(optional, string): Session ID. If not provided, a new UUID will be generated (prefixed withs.).dump_events(optional, boolean): Enable event dumping to file. Default:true.ping_interval(optional, number): Interval in seconds to send Ping events. Default:20. Set to0to disable.server_side_track(optional, string): Override server-side track ID.
Response: WebSocket connection upgrade
Usage:
const ws = new WebSocket('ws://localhost:8080/call?id=session123&dump_events=true&ping_interval=20');
2. WebRTC Call Handler
Endpoint: GET /call/webrtc
Description: Establishes a WebSocket connection for WebRTC call handling with audio stream transmitted via WebRTC RTP.
Note: WebRTC requires a Secure Context. Ensure you are accessing your web client via HTTPS or 127.0.0.1, otherwise the browser will not enable WebRTC functionality.
Parameters:
id(optional, string): Session ID. If not provided, a new UUID will be generated (prefixed withs.).dump_events(optional, boolean): Enable event dumping to file. Default:true.ping_interval(optional, number): Interval in seconds to send Ping events. Default:20. Set to0to disable.server_side_track(optional, string): Override server-side track ID.
Response: WebSocket connection upgrade
Usage:
const ws = new WebSocket('ws://localhost:8080/call/webrtc?id=session123&dump_events=true');
3. SIP Call Handler
Endpoint: GET /call/sip
Description: Establishes a WebSocket connection for SIP call handling with audio stream transmitted via SIP/RTP.
Parameters:
id(optional, string): Session ID. If not provided, a new UUID will be generated (prefixed withs.).dump_events(optional, boolean): Enable event dumping to file. Default:true.ping_interval(optional, number): Interval in seconds to send Ping events. Default:20. Set to0to disable.server_side_track(optional, string): Override server-side track ID.
Response: WebSocket connection upgrade
Usage:
const ws = new WebSocket('ws://localhost:8080/call/sip?id=session123&dump_events=true');
WebSocket Communication Flow
sequenceDiagram
participant Client
participant RustPBX
participant MediaEngine
participant ASR/TTS
Client->>RustPBX: WebSocket Connect
RustPBX->>Client: Connection Established
Client->>RustPBX: Send Command (JSON)
RustPBX->>MediaEngine: Process Command
MediaEngine->>ASR/TTS: Audio Processing
ASR/TTS->>MediaEngine: Processing Results
MediaEngine->>RustPBX: Generate Events
RustPBX->>Client: Send Events (JSON)
Note over Client,RustPBX: Audio Stream Flow
Client->>RustPBX: Audio Data (Binary/WebRTC/SIP)
RustPBX->>MediaEngine: Process Audio
MediaEngine->>Client: Audio Response
WebRTC Call Flow
sequenceDiagram
participant Client
participant RustPBX
participant WebRTC Engine
participant ICE Servers
Client->>RustPBX: WebSocket Connect (/call/webrtc)
RustPBX->>Client: Connection Established
Client->>RustPBX: Send Invite Command with SDP Offer
RustPBX->>WebRTC Engine: Create PeerConnection
RustPBX->>ICE Servers: Get ICE Servers
WebRTC Engine->>RustPBX: Generate SDP Answer
RustPBX->>Client: Send Answer Event with SDP
Client->>RustPBX: Set Remote Description
Note over Client,RustPBX: WebRTC Media Flow
Client->>RustPBX: RTP Audio Packets (PCM/PCMA/PCMU/G722)
RustPBX->>Client: RTP Audio Response
Client->>RustPBX: Send TTS/Play Commands
RustPBX->>Client: Send Audio Events
SIP Call Flow
sequenceDiagram
participant Client
participant RustPBX
participant SIP UA
participant SIP Server
Client->>RustPBX: WebSocket Connect (/call/sip)
RustPBX->>Client: Connection Established
Client->>RustPBX: Send Invite Command with Caller/Callee
RustPBX->>SIP UA: Create SIP Dialog
SIP UA->>SIP Server: Send INVITE Request
SIP Server->>SIP UA: Send 200 OK with SDP Answer
RustPBX->>Client: Send Answer Event with SDP
Client->>RustPBX: Set Remote Description
Note over SIP UA,SIP Server: SIP/RTP Media Flow
SIP UA->>SIP Server: RTP Audio Packets (PCM/PCMA/PCMU/G722)
SIP Server->>SIP UA: RTP Audio Response
Client->>RustPBX: Send TTS/Play Commands
RustPBX->>Client: Send Audio Events
Voice Stream Communication Methods
1. WebSocket Audio Stream (/call)
- Audio Format: PCM, PCMA, PCMU, G722
- Transport: WebSocket binary messages
- Usage: Direct audio streaming over WebSocket connection
- Advantages: Simple, low latency, works through firewalls
2. WebRTC Audio Stream (/call/webrtc)
- Audio Format: PCM, PCMA, PCMU, G722
- Transport: WebRTC RTP over UDP
- Usage: Browser-compatible, NAT traversal
- Advantages: Browser native support, adaptive bitrate
3. SIP Audio Stream (/call/sip)
- Audio Format: PCM, PCMA, PCMU, G722
- Transport: SIP/RTP over UDP
- Usage: Traditional telephony integration
- Advantages: Standard telephony protocol, PBX integration
MediaPass Feature
MediaPass allows for bidirectional audio streaming between RustPBX and an external WebSocket server. This feature enables another side to receive and send audio streams during a call.
MediaPass Configuration
The mediaPass option in CallOption configures the WebSocket connection for audio streaming:
{
"mediaPass": {
"url": "ws://localhost:9090/media",
"inputSampleRate": 16000,
"outputSampleRate": 16000,
"packetSize": 2560
}
}
MediaPass Fields:
url(string): WebSocket URL to connect to for media streaminginputSampleRate(number): Sample rate of audio received from the WebSocket server (also the sample rate of the track)outputSampleRate(number): Sample rate of audio sent to the WebSocket serverpacketSize(number, optional): Packet size sent to WebSocket server, default is 2560 bytesptime(numer, optional): if ptime is set, server will buffering the input audio, and playing it withptimeperiod
MediaPass Example Usage
Example 1: Basic MediaPass Setup
{
"command": "invite",
"option": {
"caller": "sip:alice@rustpbx.com",
"callee": "sip:bob@rustpbx.com",
"codec": "g722",
"mediaPass": {
"url": "ws://ai-server.rustpbx.com:9090/audio",
"inputSampleRate": 16000,
"outputSampleRate": 16000,
"packetSize": 1280
},
"asr": {
"provider": "tencent",
"language": "zh-CN",
"secretId": "your_secret_id",
"secretKey": "your_secret_key",
"modelType": "16k_zh",
"samplerate": 16000
}
}
}
Example 2: MediaPass with AI Voice Processing
{
"command": "accept",
"option": {
"caller": "sip:caller@rustpbx.com",
"callee": "sip:agent@rustpbx.com",
"codec": "pcmu",
"denoise": true,
"agc": {},
"mediaPass": {
"url": "ws://ai-voice-processor.rustpbx.com:8090/stream",
"inputSampleRate": 8000,
"outputSampleRate": 16000,
"packetSize": 2560
},
"vad": {
"type": "webrtc",
"samplerate": 16000,
"speechPadding": 250,
"silencePadding": 100,
"voiceThreshold": 0.5
},
"recorder": {
"recorderFile": "/recordings/call_with_ai.wav",
"samplerate": 16000,
"ptime": 200
}
}
}
MediaPass WebSocket Protocol
The external WebSocket server should handle binary audio data in PCM format:
- Receiving Audio: RustPBX sends PCM audio data as binary WebSocket messages at the configured
outputSampleRate - Sending Audio: The WebSocket server can send PCM audio data back to RustPBX at the configured
inputSampleRate - Audio Format: Raw PCM data, signed 16-bit little-endian
- Packet Size: Configurable via
packetSizeparameter (default: 2560 bytes)
MediaPass Flow Diagram
sequenceDiagram
participant Caller
participant RustPBX
participant AI_Server
participant Callee
Caller->>RustPBX: Audio Stream
RustPBX->>AI_Server: PCM Audio (WebSocket)
AI_Server->>AI_Server: Process Audio (ASR/AI/TTS)
AI_Server->>RustPBX: Processed Audio (WebSocket)
RustPBX->>Callee: Processed Audio Stream
Note over Caller,Callee: Bidirectional AI-enhanced communication
WebSocket Commands
Commands are sent as JSON messages through the WebSocket connection. All timestamps are in milliseconds. Each command follows a common structure with the command field indicating the operation type.
Core Call Management Commands
Invite Command
Purpose: Initiates a new outbound call.
Fields:
command(string): Always "invite"option(CallOption): Call configuration parameters
{
"command": "invite",
"option": {
"caller": "sip:alice@rustpbx.com",
"callee": "sip:bob@rustpbx.com",
"offer": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n...",
"codec": "g722",
"denoise": true,
"asr": {
"provider": "tencent",
"language": "zh-CN",
"appId": "app_id",
"secretId": "your_secret_id",
"secretKey": "your_secret_key",
"modelType": "16k_zh",
"samplerate": 16000,
"startWhenAnswer": true
},
"tts": {
"provider": "tencent",
"speaker": "xiaoyan",
"volume": 5,
"speed": 1.0,
"emotion": "neutral"
}
}
}
Accept Command
Purpose: Accepts an incoming call.
Fields:
command(string): Always "accept"option(CallOption): Call configuration parameters
{
"command": "accept",
"option": {
"caller": "sip:alice@rustpbx.com",
"callee": "sip:bob@rustpbx.com",
"codec": "g722",
"recorder": {
"recorderFile": "/path/to/recording.wav",
"samplerate": 16000,
"ptime": 200
}
}
}
Reject Command
Purpose: Rejects an incoming call.
Fields:
command(string): Always "reject"reason(string): Reason for rejectioncode(number, optional): SIP response code
{
"command": "reject",
"reason": "Busy",
"code": 486
}
Ringing Command
Purpose: Sends ringing response for incoming call.
Note: If a
recorderis set in the ringing command, therecorderoption in the subsequent accept command will not override the recorder settings from the ringing phase.
Fields:
command(string): Always "ringing"recorder(RecorderOption, optional): Call recording configurationrecorderFile(string): Path to the recording filesamplerate(number): Recording sample rate in Hz (default: 16000)ptime(number): Packet time in milliseconds (default: 200)
earlyMedia(boolean): Enable early media during ringingringtone(string, optional): Custom ringtone URL
{
"command": "ringing",
"recorder": {
"recorderFile": "/path/to/recording.wav",
"samplerate": 16000,
"ptime": 200
},
"earlyMedia": true,
"ringtone": "http://rustpbx.com/ringtone.wav"
}
Media Control Commands
TTS Command
Purpose: Converts text to speech and plays audio.
Fields:
command(string): Always "tts"text(string): Text to synthesizespeaker(string, optional): Speaker voice nameplayId(string, optional): Unique identifier for this TTS session. If the same playId is used, it will not interrupt the previous playback.autoHangup(boolean, optional): If true, the call will be automatically hung up after TTS playback is finished.streaming(boolean, optional): If true, indicates streaming text input (like LLM streaming output).endOfStream(boolean, optional): If true, indicates the input text is finished (used with streaming).waitInputTimeout(number, optional): Maximum time to wait for user input in secondsoption(SynthesisOption, optional): TTS provider specific optionsbase64(bool, optional): If true, text is base64 encoded PCM samples of sample rate 16000 hz, DO NOT use this feature in Streaming TTS
{
"command": "tts",
"text": "Hello, this is a test message",
"speaker": "xiaoyan",
"playId": "unique_play_id",
"autoHangup": false,
"streaming": false,
"endOfStream": false,
"waitInputTimeout": 30,
"option": {
"provider": "tencent",
"speaker": "xiaoyan",
"volume": 5,
"speed": 1.0
}
}
Play Command
Purpose: Plays audio from a URL.
Fields:
command(string): Always "play"url(string): URL of audio file to play (supports HTTP/HTTPS URLs). This URL will be returned as playId in the trackEnd event.autoHangup(boolean, optional): If true, the call will be automatically hung up after playback is finished.waitInputTimeout(number, optional): Maximum time to wait for user input in seconds
{
"command": "play",
"url": "http://rustpbx.com/audio.mp3",
"autoHangup": false,
"waitInputTimeout": 30
}
Interrupt Command
Purpose: Interrupts current TTS or audio playback.
Fields:
command(string): Always "interrupt"graceful(boolean, optional): If true, waits for the current TTS command to finish playing before stopping. Default:false.fadeOutMs(number, optional): Fade-out duration in milliseconds before stopping playback.
{
"command": "interrupt",
"graceful": false
}
Pause Command
Purpose: Pauses current server-side file/TTS playback without ending the track.
Pause targets the current server-side playback track. It freezes playback progress; it does not emit trackEnd while paused.
{
"command": "pause"
}
Resume Command
Purpose: Resumes paused playback.
Resume continues paused server-side file/TTS playback from the paused playback position.
{
"command": "resume"
}
Call Transfer Commands
Refer Command
Purpose: Transfers the call to another party (SIP REFER).
Fields:
command(string): Always "refer"caller(string): Caller identity for the transfercallee(string): Address of Record (AOR) of the transfer target (e.g., sip:bob@rustpbx.com)options(ReferOption, optional): Transfer configurationdenoise(boolean, optional): Enable noise reductionagc(AGCOption, optional): Enable Automatic Gain Control (AGC); use{}for defaultstimeout(number, optional): Transfer timeout in secondsmoh(string, optional): Music on hold URL to play during transferasr(TranscriptionOption, optional): Automatic Speech Recognition configurationprovider(string): ASR provider (e.g., "tencent", "aliyun", "openai")secretId(string): Provider secret IDsecretKey(string): Provider secret keyregion(string, optional): Provider regionmodel(string, optional): ASR model to use
autoHangup(boolean, optional): Automatically hang up after transfer completionsip(SipOption, optional): SIP configurationusername(string): SIP usernamepassword(string): SIP passwordrealm(string): SIP realm/domainheaders(object, optional): Additional SIP headers
{
"command": "refer",
"caller": "sip:alice@rustpbx.com",
"callee": "sip:charlie@rustpbx.com",
"options": {
"denoise": true,
"agc": {},
"timeout": 30,
"moh": "http://rustpbx.com/hold_music.wav",
"asr": {
"provider": "tencent",
"language": "zh-CN",
"appId": "app_id",
"secretId": "your_secret_id",
"secretKey": "your_secret_key",
"modelType": "16k_zh",
"bufferSize": 4000,
"samplerate": 16000,
"endpoint": "https://api.rustpbx.com",
"extra": {
"custom_param": "value"
},
"startWhenAnswer": true
},
"autoHangup": true,
"sip": {
"username": "transfer_user",
"password": "transfer_password",
"realm": "rustpbx.com",
"headers": {
"X-Transfer-Source": "pbx"
}
}
}
}
Message Command
Purpose: Sends an in-dialog SIP MESSAGE with a MIME body. This is useful for metadata that needs to reach the operator side during an active SIP call; some SIP proxies can forward these messages as SMS.
Fields:
command(string): Always "message"body(string): SIP MESSAGE body to send.textis accepted as a deprecated alias.contentType(string, optional): SIP message content type. Default:text/plain;charset=utf-8headers(object, optional): Additional SIP headersrefer(boolean, optional): If true, send on the active refer dialog instead of the main call dialog
{
"command": "message",
"body": "customer_id=12345 status=verified",
"contentType": "text/plain;charset=utf-8",
"headers": {
"X-Meta-Source": "active-call"
}
}
Audio Bridge Commands
Bridge Command
Purpose: Connects audio between this active call session and another active call session.
The bridge is a media-only operation. It creates separate internal bridge tracks for the two sessions and forwards audio packets between them. It does not replace the normal server-side track used by TTS/play/refer, and it does not send SIP signaling or hang up either call. Call lifecycle remains controlled by each call's own WebSocket client, SIP peer, or explicit hangup command.
Fields:
command(string): Always "bridge"targetSessionId(string): Session ID of the other active call
{
"command": "bridge",
"targetSessionId": "session-b"
}
Notes:
- Send the command on either session's WebSocket after both sessions are established.
- The target session must still be active and must not be the same session.
- Re-sending
bridgefor the same pair replaces the existing bridge tracks for that pair. - If either call ends, its bridge track stops and the peer bridge task exits; the other call remains active until its own client or SIP peer ends it.
Unbridge Command
Purpose: Removes the audio bridge between this active call session and another active call session.
unbridge removes the internal bridge tracks from both sessions when both sessions are active. If the target session has already ended, it removes the local bridge track only. It is safe to send after one side has already hung up.
Fields:
command(string): Always "unbridge"targetSessionId(string): Session ID of the other call
{
"command": "unbridge",
"targetSessionId": "session-b"
}
Audio Track Control Commands
Mute Command
Purpose: Mutes a specific audio track.
Fields:
command(string): Always "mute"trackId(string, optional): Track ID to mute (if not specified, mutes all tracks)
{
"command": "mute",
"trackId": "track-123"
}
Unmute Command
Purpose: Unmutes a specific audio track.
Fields:
command(string): Always "unmute"trackId(string, optional): Track ID to unmute (if not specified, unmutes all tracks)
{
"command": "unmute",
"trackId": "track-123"
}
Session Management Commands
Hangup Command
Purpose: Ends the call.
Fields:
command(string): Always "hangup"reason(string, optional): Reason for hanging upinitiator(string, optional): Who initiated the hangup (user, system, etc.)headers(object, optional): Additional SIP headers to include in the BYE request (SIP calls only)
{
"command": "hangup",
"reason": "user_requested",
"initiator": "user",
"headers": {
"X-Hangup-Cause": "normal"
}
}
History Command
Purpose: Adds a conversation history entry.
Fields:
command(string): Always "history"speaker(string): Speaker identifiertext(string): Conversation text
{
"command": "history",
"speaker": "user",
"text": "Hello, I need help with my account"
}
CallOption Object Structure
The CallOption object is used in invite and accept commands and contains the following fields:
{
"denoise": true,
"agc": {},
"offer": "SDP offer string",
"callee": "sip:callee@rustpbx.com",
"caller": "sip:caller@rustpbx.com",
"recorder": {
"recorderFile": "/path/to/recording.wav",
"samplerate": 16000,
"ptime": 200
},
"asr": {
"provider": "tencent",
"language": "zh-CN",
"appId": "app_id",
"secretId": "your_secret_id",
"secretKey": "your_secret_key",
"modelType": "16k_zh",
"bufferSize": 4000,
"samplerate": 16000,
"endpoint": "https://api.rustpbx.com",
"extra": {
"custom_param": "value"
},
"startWhenAnswer": true
},
"vad": {
"type": "webrtc",
"samplerate": 16000,
"speechPadding": 250,
"silencePadding": 100,
"ratio": 0.5,
"voiceThreshold": 0.5,
"maxBufferDurationSecs": 50,
"silenceTimeout": null,
"endpoint": null,
"secretKey": null,
"secretId": null
},
"tts": {
"samplerate": 16000,
"provider": "tencent",
"speed": 1.0,
"appId": "app_id",
"secretId": "your_secret_id",
"secretKey": "your_secret_key",
"volume": 5,
"speaker": "1345",
"codec": "pcm",
"subtitle": true,
"emotion": "neutral",
"endpoint": "https://api.rustpbx.com",
"extra": {
"custom_param": "value"
},
"cacheKey": "cache_key_example"
},
"mediaPass": {
"url": "ws://localhost:9090/media",
"inputSampleRate": 16000,
"outputSampleRate": 16000,
"packetSize": 2560
},
"handshakeTimeout": 30,
"enableIpv6": false,
"inactivityTimeout": 50,
"sip": {
"username": "user",
"password": "password",
"realm": "rustpbx.com",
"headers": {
"X-Custom-Header": "value"
}
},
"extra": {
"custom_field": "custom_value"
},
"codec": "g722",
"eou": {
"type": "tencent",
"endpoint": "https://api.rustpbx.com",
"secretKey": "your_secret_key",
"secretId": "your_secret_id",
"timeout": 5000
}
}
CallOption Fields:
denoise(boolean, optional): Enable noise reduction for audio processingagc(AGCOption, optional): Enable Automatic Gain Control (AGC); use{}for defaults or specify fieldsoffer(string, optional): SDP offer string for WebRTC/SIP negotiationcallee(string, optional): Callee's SIP URI or phone number (e.g., "sip:bob@rustpbx.com")caller(string, optional): Caller's SIP URI or phone number (e.g., "sip:alice@rustpbx.com")recorder(RecorderOption, optional): Call recording configurationrecorderFile(string): Path to the recording filesamplerate(number): Recording sample rate in Hz (default: 16000)ptime(number): Packet time in milliseconds (default: 200)
asr(TranscriptionOption, optional): Automatic Speech Recognition configurationprovider(string): ASR provider ("tencent", "aliyun", "voiceapi")language(string, optional): Language code (e.g., "zh-CN", "en-US")appId(string, optional): Application ID for the ASR servicesecretId(string, optional): Secret ID for authenticationsecretKey(string, optional): Secret key for authenticationmodelType(string, optional): ASR model type (e.g., "16k_zh", "8k_en")bufferSize(number, optional): Audio buffer size in bytessamplerate(number, optional): Audio sample rate for ASR processingendpoint(string, optional): Custom ASR service endpoint URLextra(object, optional): Additional provider-specific parametersstartWhenAnswer(boolean, optional): Start ASR when call is answered
agc(AGCOption, optional): Automatic Gain Control configuration (WebRTC AGC2); use{}for defaults. Requiresvadto be configured upstream — AGC reads the per-frame speech probability written by the VAD.headroomDb(number, optional): Target headroom below 0 dBFS in dB (default: 5.0)maxGainDb(number, optional): Maximum gain in dB (default: 50.0)initialGainDb(number, optional): Initial gain in dB applied before the speech-level estimator is confident (default: 15.0)maxGainChangeDbPerSecond(number, optional): Maximum gain change in dB per second — controls both attack and release (default: 6.0)maxOutputNoiseLevelDbfs(number, optional): Noise floor cap in dBFS above which AGC will not amplify (default: -50.0)adjacentSpeechFramesThreshold(number, optional): Number of consecutive 10 ms speech sub-frames required before gain increase is allowed (default: 12, ≈120 ms)enableLimiter(boolean, optional): Run the soft-knee limiter after the adaptive gain stage (default: true)
vad(VADOption, optional): Voice Activity Detection configurationtype(string): VAD algorithm type ("silero")samplerate(number): Audio sample rate for VAD processing (default: 16000)speechPadding(number): Padding before speech detection in milliseconds (default: 250)silencePadding(number): Padding after silence detection in milliseconds (default: 100)ratio(number): Voice detection ratio threshold (default: 0.5)voiceThreshold(number): Voice energy threshold (default: 0.5)maxBufferDurationSecs(number): Maximum buffer duration in seconds (default: 50)silenceTimeout(number, optional): Timeout for silence detection in millisecondsendpoint(string, optional): Custom VAD service endpointsecretKey(string, optional): Secret key for VAD service authenticationsecretId(string, optional): Secret ID for VAD service authentication
tts(SynthesisOption, optional): Text-to-Speech configurationsamplerate(number, optional): TTS output sample rate in Hzprovider(string, optional): TTS provider ("tencent", "aliyun", "deepgram", "supertonic"). Default: "aliyun" for Chinese (zh), "supertonic" for English (en).speed(number, optional): Speech speed multiplier (default: 1.0)appId(string, optional): Application ID for TTS servicesecretId(string, optional): Secret ID for authenticationsecretKey(string, optional): Secret key for authenticationvolume(number, optional): Speech volume level (1-10)speaker(string, optional): Voice speaker name (e.g., "xiaoyan", "xiaoyun")codec(string, optional): Audio codec for TTS outputsubtitle(boolean, optional): Enable subtitle generationemotion(string, optional): Speech emotion ("neutral", "sad", "happy", "angry", "fear", "news", "story", "radio", "poetry", "call", "sajiao", "disgusted", "amaze", "peaceful", "exciting", "aojiao", "jieshuo")endpoint(string, optional): Custom TTS service endpoint URLextra(object, optional): Additional provider-specific parametersmaxConcurrentTasks(number,optional): Max Concurrent tasks for non streaming tts cmd
mediaPass(MediaPassOption, optional): Media pass-through configuration for external audio processingurl(string): WebSocket URL for media streaminginputSampleRate(number): Sample rate of audio received from WebSocket serveroutputSampleRate(number): Sample rate of audio sent to WebSocket serverpacketSize(number, optional): Packet size sent to WebSocket server in bytes (default: 2560)
subscribe(boolean, optional): Enable real-time audio subscription for non-WebSocket calls (SIP/WebRTC). If true, audio will be pushed via the control WebSocket using binary frames with a 1-byte track header (0x00 for caller, 0x01 for callee).handshakeTimeout(number, optional): Timeout for connection handshake in seconds (e.g., 30)enableIpv6(boolean, optional): Enable IPv6 support for networkinginactivityTimeout(number, optional): Timeout for audio inactivity in secondssip(SipOption, optional): SIP protocol configurationusername(string): SIP username for authenticationpassword(string): SIP password for authenticationrealm(string): SIP realm/domainheaders(object, optional): Additional SIP headers as key-value pairs
extra(object, optional): Additional custom parameters as key-value pairscodec(string, optional): Audio codec for WebSocket calls ("pcmu", "pcma", "g722", "pcm")eou(EouOption, optional): End of Utterance detection configurationtype(string, optional): EOU detection providerendpoint(string, optional): Custom EOU service endpoint URLsecretKey(string, optional): Secret key for EOU service authenticationsecretId(string, optional): Secret ID for EOU service authenticationtimeout(number, optional): Maximum timeout for EOU detection in milliseconds
ReferOption Object Structure
The ReferOption object is used in the refer command and contains the following fields:
{
"denoise": true,
"timeout": 30,
"moh": "http://rustpbx.com/hold_music.wav",
"asr": {
"provider": "tencent",
"language": "zh-CN",
"appId": "app_id",
"secretId": "your_secret_id",
"secretKey": "your_secret_key",
"modelType": "16k_zh",
"bufferSize": 4000,
"samplerate": 16000,
"endpoint": "https://api.rustpbx.com",
"extra": {
"custom_param": "value"
},
"startWhenAnswer": true
},
"autoHangup": true,
"sip": {
"username": "transfer_user",
"password": "transfer_password",
"realm": "rustpbx.com",
"headers": {
"X-Transfer-Source": "pbx"
}
}
}
Fields:
denoise(boolean, optional): Enable noise reduction during transferagc(AGCOption, optional): Enable Automatic Gain Control (AGC); use{}for defaults or specify fieldstimeout(number, optional): Transfer timeout in secondsmoh(string, optional): Music on hold URL to play during transferasr(TranscriptionOption, optional): Automatic Speech Recognition configurationautoHangup(boolean, optional): Automatically hang up after transfer completionsip(SipOption, optional): SIP configuration for the transfer
WebSocket Events
Events are received as JSON messages from the server. All timestamps are in milliseconds. Each event contains an event field that indicates the event type, and most events include a trackId field to identify the associated audio track.
Call Lifecycle Events
Incoming Event
Triggered when: An incoming call is received (SIP calls only).
Fields:
event(string): Always "incoming"trackId(string): Unique identifier for the audio track. Used to identify which track generated this event.timestamp(number): Event timestamp in milliseconds since Unix epochcaller(string): Caller's SIP URI or phone numbercallee(string): Callee's SIP URI or phone numbersdp(string): SDP offer from the caller
{
"event": "incoming",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"caller": "sip:alice@rustpbx.com",
"callee": "sip:bob@rustpbx.com",
"sdp": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n..."
}
Answer Event
Triggered when: Call is answered and SDP negotiation is complete.
Fields:
event(string): Always "answer"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochsdp(string): SDP answer from the server
{
"event": "answer",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"sdp": "v=0\r\no=- 1234567890 2 IN IP4 127.0.0.1\r\n..."
}
Reject Event
Triggered when: Call is rejected.
Fields:
event(string): Always "reject"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochreason(string): Reason for rejectioncode(number, optional): SIP response code
{
"event": "reject",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"reason": "Busy",
"code": 486
}
Ringing Event
Triggered when: Call is ringing (SIP calls only).
Fields:
event(string): Always "ringing"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochearlyMedia(boolean): Whether early media is available
{
"event": "ringing",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"earlyMedia": false
}
Hangup Event
Triggered when: Call is ended.
Fields:
event(string): Always "hangup"timestamp(number): Event timestamp in milliseconds since Unix epochreason(string, optional): Reason for hangupinitiator(string, optional): Who initiated the hangup (user, system, etc.)startTime(string): ISO 8601 timestamp when call startedhangupTime(string): ISO 8601 timestamp when call endedanswerTime(string, optional): ISO 8601 timestamp when call was answeredringingTime(string, optional): ISO 8601 timestamp when call started ringingfrom(Attendee, optional): Information about the callerto(Attendee, optional): Information about the calleeextra(object, optional): Additional call metadata
{
"event": "hangup",
"timestamp": 1640995200000,
"reason": "user_requested",
"initiator": "user",
"startTime": "2024-01-01T12:00:00Z",
"hangupTime": "2024-01-01T12:05:30Z",
"answerTime": "2024-01-01T12:00:05Z",
"ringingTime": "2024-01-01T12:00:02Z",
"from": {
"username": "alice",
"realm": "rustpbx.com",
"source": "sip:alice@rustpbx.com"
},
"to": {
"username": "bob",
"realm": "rustpbx.com",
"source": "sip:bob@rustpbx.com"
},
"extra": {
"call_quality": "good",
"network_type": "wifi"
}
}
Voice Activity Detection Events
Speaking Event
Triggered when: Voice activity detection detects speech start.
Fields:
event(string): Always "speaking"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochstartTime(number): When speech started in milliseconds since Unix epochisFiller(boolean, optional): Whether this speech segment is a filler wordconfidence(number, optional): Confidence score of the voice detection (0.0–1.0)
{
"event": "speaking",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"startTime": 1640995200000,
"isFiller": false,
"confidence": 0.95
}
Silence Event
Triggered when: Voice activity detection detects silence.
Fields:
event(string): Always "silence"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochstartTime(number): When silence started in milliseconds since Unix epochduration(number): Duration of silence in milliseconds
{
"event": "silence",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"startTime": 1640995195000,
"duration": 5000
}
AI and Speech Processing Events
Answer Machine Detection Event
Triggered when: Answer machine detection algorithm identifies automated response.
Fields:
event(string): Always "answerMachineDetection"timestamp(number): Event timestamp in milliseconds since Unix epochstartTime(number): Detection window start time in milliseconds since Unix epochendTime(number): Detection window end time in milliseconds since Unix epochtext(string): Detected automated message text
{
"event": "answerMachineDetection",
"timestamp": 1640995200000,
"startTime": 1640995200000,
"endTime": 1640995205000,
"text": "Hello, you have reached ABC Company. Please leave a message..."
}
EOU (End of Utterance) Event
Triggered when: End of utterance detection identifies when user has finished speaking.
Fields:
event(string): Always "eou"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochcompleted(boolean): Whether the utterance was completed normallyinterruptPoint(string, optional): Position in TTS subtitle text where the interruption occurred
{
"event": "eou",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"completed": true,
"interruptPoint": null
}
ASR Final Event
Triggered when: ASR provides final transcription result.
Fields:
event(string): Always "asrFinal"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochindex(number): ASR result sequence numberstartTime(number, optional): Start time of speech in milliseconds since Unix epochendTime(number, optional): End time of speech in milliseconds since Unix epochtext(string): Final transcribed textisFiller(boolean, optional): Whether this result is a filler wordconfidence(number, optional): Confidence score (0.0–1.0)taskId(string, optional): ASR provider task identifier
{
"event": "asrFinal",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"index": 1,
"startTime": 1640995200000,
"endTime": 1640995205000,
"text": "Hello, how can I help you today?",
"isFiller": false,
"confidence": 0.98,
"taskId": "asr-task-001"
}
ASR Delta Event
Triggered when: ASR provides partial transcription result (streaming mode).
Fields:
event(string): Always "asrDelta"trackId(string): Unique identifier for the audio track.index(number): ASR result sequence numbertimestamp(number): Event timestamp in milliseconds since Unix epochstartTime(number, optional): Start time of speech in milliseconds since Unix epochendTime(number, optional): End time of speech in milliseconds since Unix epochtext(string): Partial transcribed textisFiller(boolean, optional): Whether this result is a filler wordconfidence(number, optional): Confidence score (0.0–1.0)taskId(string, optional): ASR provider task identifier
{
"event": "asrDelta",
"trackId": "track-abc123",
"index": 1,
"timestamp": 1640995200000,
"startTime": 1640995200000,
"endTime": 1640995203000,
"text": "Hello, how can",
"isFiller": false,
"confidence": 0.85
}
Audio Track Events
Track Start Event
Triggered when: Audio track starts (TTS, file playback, etc.).
Fields:
event(string): Always "trackStart"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochplayId(string, optional): For TTS command, this is the playId from the TTS command. For Play command, this is the URL from the Play command.
{
"event": "trackStart",
"trackId": "track-tts-456",
"timestamp": 1640995200000,
"playId": "llm-001"
}
Track End Event
Triggered when: Audio track ends (TTS finished, file playback finished, etc.).
Fields:
event(string): Always "trackEnd"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochduration(number): Duration of track in millisecondsssrc(number): RTP Synchronization Source identifierplayId(string, optional): For TTS command, this is the playId from the TTS command. For Play command, this is the URL from the Play command.
{
"event": "trackEnd",
"trackId": "track-tts-456",
"timestamp": 1640995230000,
"duration": 30000,
"ssrc": 1234567890,
"playId": "llm-001"
}
Interruption Event
Triggered when: Current playback is interrupted by user input or another command.
Fields:
event(string): Always "interruption"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochplayId(string, optional): For TTS command, this is the playId from the TTS command. For Play command, this is the URL from the Play command.subtitle(string, optional): Current TTS text being played when interruptedposition(number, optional): Word index position in the subtitle when interruptedtotalDuration(number): Total duration of the TTS content in millisecondscurrent(number): Elapsed time since start of TTS when interrupted in milliseconds
{
"event": "interruption",
"trackId": "track-tts-456",
"timestamp": 1640995215000,
"playId": "llm-001",
"subtitle": "Hello, this is a long message that was interrupted",
"position": 5,
"totalDuration": 30000,
"current": 15000
}
User Input Events
DTMF Event
Triggered when: DTMF tone is detected.
Fields:
event(string): Always "dtmf"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochdigit(string): DTMF digit (0-9, *, #, A-D)
{
"event": "dtmf",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"digit": "1"
}
System Events
Ping Event
Triggered when: Server sends a periodic heartbeat to keep the connection alive.
Fields:
event(string): Always "ping"timestamp(number): Event timestamp in milliseconds since Unix epochpayload(string, optional): ISO 8601 timestamp of the ping
The client should respond with a WebSocket Pong frame (this is handled automatically by most WebSocket clients). The server sends a Ping every
ping_intervalseconds (default: 20). Setping_interval=0to disable.
{
"event": "ping",
"timestamp": 1640995200000,
"payload": "2024-01-01T12:00:00Z"
}
Hold Event
Triggered when: A call is placed on hold or taken off hold.
Fields:
event(string): Always "hold"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochonHold(boolean):trueif call is now on hold,falseif taken off hold
{
"event": "hold",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"onHold": true
}
Inactivity Event
Triggered when: Audio inactivity timeout expires (no audio activity detected for inactivityTimeout seconds).
Fields:
event(string): Always "inactivity"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epoch
{
"event": "inactivity",
"trackId": "track-abc123",
"timestamp": 1640995200000
}
FunctionCall Event
Triggered when: A function/tool call is made by the AI agent (Playbook mode).
Fields:
event(string): Always "functionCall"trackId(string): Unique identifier for the audio track.callId(string): Unique identifier for this function callname(string): Name of the function being calledarguments(string): JSON-encoded arguments string for the functiontimestamp(number): Event timestamp in milliseconds since Unix epoch
{
"event": "functionCall",
"trackId": "track-abc123",
"callId": "call-uuid-123",
"name": "get_weather",
"arguments": "{\"city\": \"Beijing\"}",
"timestamp": 1640995200000
}
Metrics Event
Triggered when: Performance metrics are available.
Fields:
event(string): Always "metrics"timestamp(number): Event timestamp in milliseconds since Unix epochkey(string): Metric key (e.g., "ttfb.asr.tencent", "completed.asr.tencent")duration(number): Duration in millisecondsdata(object): Additional metric data
{
"event": "metrics",
"timestamp": 1640995200000,
"key": "ttfb.asr.tencent",
"duration": 150,
"data": {
"index": 1,
"provider": "tencent",
"model": "16k_zh"
}
}
Error Event
Triggered when: An error occurs during processing.
Fields:
event(string): Always "error"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochsender(string): Component that generated the error (asr, tts, media, etc.)error(string): Error message descriptioncode(number, optional): Error code
{
"event": "error",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"sender": "asr",
"error": "Connection timeout to ASR service",
"code": 408
}
Add History Event
Triggered when: A conversation history entry is added.
Fields:
event(string): Always "addHistory"sender(string, optional): Component that added the history entrytimestamp(number): Event timestamp in milliseconds since Unix epochspeaker(string): Speaker identifier (user, assistant, system, etc.)text(string): Conversation text
{
"event": "addHistory",
"sender": "system",
"timestamp": 1640995200000,
"speaker": "user",
"text": "Hello, I need help with my account"
}
Binary Event (Audio Data)
Triggered when: Binary audio data is sent (WebSocket calls or calls with subscribe: true).
Fields:
event(string): Always "binary"trackId(string): Unique identifier for the audio track. For subscribed SIP/WebRTC calls, Caller usesserver-side-trackid, Callee uses the session ID.timestamp(number): Event timestamp in milliseconds since Unix epochdata(array): Binary audio data bytes. Insubscribemode, the first byte is the track index (0 for Caller, 1 for Callee) followed by original PCM data.
{
"event": "binary",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"data": [/* binary audio data array */]
}
Other Event
Triggered when: Custom or extension events are generated.
Fields:
event(string): Always "other"trackId(string): Unique identifier for the audio track.timestamp(number): Event timestamp in milliseconds since Unix epochsender(string): Component that generated the eventextra(object, optional): Additional event data as key-value pairs
{
"event": "other",
"trackId": "track-abc123",
"timestamp": 1640995200000,
"sender": "custom_plugin",
"extra": {
"custom_field": "custom_value",
"plugin_version": "1.0.0"
}
}
Attendee Object Structure
The Attendee object appears in call events and contains participant information:
{
"username": "alice",
"realm": "rustpbx.com",
"source": "sip:alice@rustpbx.com"
}
Fields:
username(string): Username portion of the SIP URIrealm(string): Domain/realm portion of the SIP URIsource(string): Full SIP URI or phone number
REST API Endpoints
4. List Active Calls
Endpoint: GET /list
Description: Returns a list of all currently active calls.
Parameters: None
Response:
{
"active_calls": [
{
"id": "s.session-id",
"callType": "webrtc",
"cs.option": { ... },
"ringTime": "2024-01-01T12:00:02Z",
"startTime": "2024-01-01T12:00:05Z"
}
]
}
Usage:
curl http://localhost:8080/list
5. Kill Call
Endpoint: GET /kill/{id}
Description: Terminates a specific active call by its session ID.
Parameters:
id(path parameter, string): The session ID of the call to terminate.
Response:
{ "status": "killed", "id": "s.session123" }
If the session is not found:
{ "status": "not_found", "id": "s.session123" }
Usage:
curl http://localhost:8080/kill/s.session123
6. Send Command
Endpoint: POST /command/{id}
Description: Sends a command to a specific active call by its session ID. Accepts the same command objects as the WebSocket command interface.
Parameters:
id(path parameter, string): The session ID of the target call.
Request Body: A command object (see WebSocket Commands for the full list).
{ "command": "tts", "text": "Hello, how can I help you?" }
Response:
{ "status": "sent", "id": "s.session123" }
If the session is not found:
{ "status": "not_found", "id": "s.session123" }
Usage:
curl -X POST http://localhost:8080/command/s.session123 \
-H "Content-Type: application/json" \
-d '{"command": "hangup", "reason": "normal", "initiator": "server"}'
7. Get ICE Servers
Endpoint: GET /iceservers
Description: Returns ICE servers configuration for WebRTC connections.
Parameters: None
Response:
[
{
"urls": ["stun:stun.l.google.com:19302"],
"username": null,
"credential": null
},
{
"urls": ["turn:restsend.com:3478"],
"username": "username",
"credential": "password"
}
]
Usage:
curl http://localhost:8080/iceservers
8. Stream Events
Endpoint: GET /events/{id}
Description: Opens a Server-Sent Events (SSE) stream for a specific active call, delivering real-time session events and commands as they occur.
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
id | string | Active call/track ID |
Response: text/event-stream;charset=utf-8
The stream emits two SSE event types:
| SSE Event | Data |
|---|---|
event | JSON-serialized SessionEvent (same as WebSocket events) |
command | JSON-serialized command sent to the session |
The stream closes when the call ends (channel closed). Lagged messages are silently skipped.
Errors:
| Status | Description |
|---|---|
| 404 | No active call found for given id |
Usage:
curl -N http://localhost:8080/events/{id}
Example output:
event: event
data: {"event":"answer","trackId":"track-abc","timestamp":1700000000}
event: command
data: {"command":"tts","text":"Hello, how can I help you?"}
9. Playbook API
List Playbooks
Endpoint: GET /api/playbooks
Description: Returns a list of all available playbook files in config/playbook/.
Response:
[
{ "name": "demo.md", "updated": "2024-01-01T12:00:00Z" },
{ "name": "simple-demo-en.md", "updated": "2024-01-02T08:00:00Z" }
]
Usage:
curl http://localhost:8080/api/playbooks
Get Playbook
Endpoint: GET /api/playbooks/{name}
Description: Returns the content of a specific playbook file.
Parameters:
name(path parameter, string): Playbook filename (e.g.,demo.md)
Response: Plain text content of the playbook file.
Usage:
curl http://localhost:8080/api/playbooks/demo.md
Save Playbook
Endpoint: POST /api/playbooks/{name}
Description: Creates or updates a playbook file.
Parameters:
name(path parameter, string): Playbook filename (e.g.,my-playbook.md)- Body: Plain text playbook content
Response: 200 OK on success.
Usage:
curl -X POST http://localhost:8080/api/playbooks/my-playbook.md \
-H "Content-Type: text/plain" \
--data-binary @my-playbook.md
Run Playbook
Endpoint: POST /api/playbook/run
Description: Associates a playbook with a future WebSocket session. When the session connects, the playbook will automatically be loaded.
Request Body (JSON):
{
"playbook": "demo.md",
"type": "webrtc",
"to": "sip:bob@example.com"
}
Or with inline content:
{
"content": "---\nname: inline-demo\n...",
"type": "webrtc"
}
Fields:
playbook(string): Playbook filename to load fromconfig/playbook/content(string): Inline YAML playbook content (alternative toplaybook)type(string, optional): Call type hintto(string, optional): Callee address
Response:
{ "session_id": "s.uuid-here" }
Use the returned session_id as the id parameter when connecting the WebSocket.
Usage:
curl -X POST http://localhost:8080/api/playbook/run \
-H "Content-Type: application/json" \
-d '{"playbook": "demo.md"}'
List Records
Endpoint: GET /api/records
Description: Returns a list of call event records (.events.jsonl files in the recorder directory).
Response:
[
{ "id": "s.session-uuid", "date": "2024-01-01T12:00:00Z", "duration": "0s", "status": "completed" }
]
Usage:
curl http://localhost:8080/api/records
Error Handling
All endpoints return appropriate HTTP status codes:
200 OK: Success400 Bad Request: Invalid parameters404 Not Found: Resource not found500 Internal Server Error: Server error
WebSocket connections may be closed with specific close codes indicating the reason for disconnection.
Notes
- All WebSocket endpoints support real-time bidirectional communication
- Call sessions are automatically cleaned up when the WebSocket connection is closed
- Event dumping to file can be disabled by setting
dump_events=falsequery parameter - ICE servers are automatically configured based on server configuration
- Audio codecs are automatically negotiated based on capabilities
- VAD (Voice Activity Detection) events are sent for speech detection
- ASR (Automatic Speech Recognition) provides real-time transcription
- TTS (Text-to-Speech) supports streaming synthesis
- All timestamps are in milliseconds
- trackId is used to identify which audio track generated an event
- playId prevents interruption of previous TTS playback when the same ID is used. For TTS commands, playId is the specified identifier; for Play commands, playId is the URL
- Session IDs generated by the server are prefixed with
s.(WebSocket sessions) orc.(CLI outbound calls) - The
ping_intervalparameter controls heartbeat frequency (default 20s). Set to 0 to disable - autoHangup automatically ends the call after TTS/playback completion