FunASR OpenAI-Compatible API Server

May 25, 2026 · View on GitHub

FunASR OpenAI-Compatible API Server

Drop-in replacement for OpenAI's /v1/audio/transcriptions endpoint. Works with any agent framework that supports OpenAI audio API.

Quick Start

pip install funasr fastapi uvicorn python-multipart
python server.py --model sensevoice --device cuda --port 8000

Server starts in ~20s (model loading). Health check: GET /health

Need copy-paste integration snippets for Python SDK, JavaScript/TypeScript, HTTP clients, agent tools, a browser demo, Postman, OpenAPI imports, Kubernetes deployment, or Dify/n8n-style workflows? See Client recipes, JavaScript/TypeScript recipes, Gradio browser demo, workflow recipes, the Chinese workflow recipes, the Postman collection, the OpenAPI spec, the security and gateway guide, and the Kubernetes deployment template.

End-to-end smoke test

In another terminal, download a public sample and verify both health and transcription:

bash smoke_test.sh
# Cross-platform alternative without curl/bash:
python smoke_test.py

Equivalent manual commands:

curl -L https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav -o sample.wav
curl http://localhost:8000/health
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@sample.wav \
  -F model=sensevoice \
  -F response_format=verbose_json

Browser demo with Gradio

If you want a local browser UI for upload or microphone testing, run the API server first and then launch the optional Gradio frontend:

pip install gradio
python gradio_app.py --base-url http://localhost:8000

The browser demo calls the same OpenAI-compatible API endpoints as the smoke tests. See Gradio browser demo for Docker, Kubernetes, and production notes.

Usage with OpenAI SDK (Python)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

# Basic transcription
result = client.audio.transcriptions.create(
    model="sensevoice",  # or "paraformer", "paraformer-en", "fun-asr-nano"
    file=open("meeting.wav", "rb"),
)
print(result.text)

# With timestamps/segments
result = client.audio.transcriptions.create(
    model="sensevoice",
    file=open("meeting.wav", "rb"),
    response_format="verbose_json",
)
# Returns: text, segments (with start/end/speaker), duration

Usage with curl

curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model=sensevoice

# With verbose output
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model=sensevoice \
  -F response_format=verbose_json

Available Models

Model	Speed (GPU)	Speed (CPU)	Languages	Features
`sensevoice`	170x realtime	17x realtime	zh/en/ja/ko/yue	Emotion detection
`paraformer`	120x realtime	15x realtime	zh/en	Punctuation
`paraformer-en`	120x realtime	15x realtime	en	English only
`fun-asr-nano`	17x realtime	3.6x realtime	31 languages	LLM-based, timestamps

API Endpoints

Endpoint	Method	Description
`/v1/audio/transcriptions`	POST	Transcribe audio (OpenAI-compatible)
`/v1/models`	GET	List available models
`/health`	GET	Health check + loaded models
`/docs`	GET	Interactive API documentation (Swagger)

Prefer no-code API checks? Use the Gradio browser demo for local upload or microphone testing, or import the Postman collection and run health, model-list, and transcription requests from Postman. For API gateways, developer portals, or client generation, use the OpenAPI spec.

Agent Framework Integration

Works with: LangChain, LlamaIndex, AutoGen, CrewAI, Semantic Kernel, Dify, n8n, or any framework using OpenAI audio API. See Client recipes and JavaScript/TypeScript recipes for SDK and agent-tool patterns, plus workflow recipes for low-code HTTP nodes and webhook workers (中文).

LangChain Example

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")

def transcribe_for_agent(audio_path: str) -> str:
    """Tool function for LangChain agent."""
    result = client.audio.transcriptions.create(
        model="sensevoice", file=open(audio_path, "rb")
    )
    return result.text

Docker Deployment

Build the example image from this directory. The default image starts in CPU mode so it can be used as a portable smoke test.

cd examples/openai_api
cp .env.example .env

docker compose up --build

Equivalent one-off docker run command:

docker build -t funasr-api .

docker run --rm -p 8000:8000 \
  -e FUNASR_DEVICE=cpu \
  -e FUNASR_MODEL=sensevoice \
  funasr-api

For GPU hosts, use NVIDIA Container Toolkit and a CUDA-capable PyTorch/FunASR image. After adapting the image dependencies for CUDA, run the same server with FUNASR_DEVICE=cuda:

docker run --rm --gpus all -p 8000:8000 \
  -e FUNASR_DEVICE=cuda \
  -e FUNASR_MODEL=sensevoice \
  funasr-api

Verify the container from another terminal:

BASE_URL=http://localhost:8000 bash smoke_test.sh
python smoke_test.py --base-url http://localhost:8000

Kubernetes Deployment

Before sharing the service across a team or exposing it through a gateway, review the security and gateway guide for TLS, authentication, upload limits, rate limits, and logging.

For an internal cluster service with persistent model cache, health probes, and a private ClusterIP, start from the Kubernetes deployment template. Build and push the example image, apply the manifests, then verify through kubectl port-forward with python smoke_test.py --base-url http://localhost:8000.

Keep the default CPU mode until you have built a CUDA-capable image and configured GPU scheduling for your cluster.

Configuration

Arg	Default	Description
`--host`	0.0.0.0	Bind address
`--port`	8000	Port
`--device`	cuda	Device (cuda/cpu/mps)
`--model`	sensevoice	Pre-load model at startup

Docker environment variables:

Env	Default	Description
`FUNASR_PORT`	8000	Container port passed to `server.py`
`FUNASR_DEVICE`	cpu	Container device mode; set to `cuda` only when the image has CUDA-capable dependencies
`FUNASR_MODEL`	sensevoice	Model alias loaded at container startup

Troubleshooting

If CUDA is unavailable, use --device cpu for a slower but simple smoke test.
If port 8000 is occupied, start with --port 9000 and run BASE_URL=http://localhost:9000 bash smoke_test.sh or python smoke_test.py --base-url http://localhost:9000.
If model download is slow, retry with a stable network or pre-download the model from ModelScope/Hugging Face.