Migrate from Whisper or Cloud ASR to FunASR

May 25, 2026 · View on GitHub

Use this guide when you already have a Whisper, OpenAI/Cloud ASR, or custom speech pipeline and want to decide whether FunASR is worth switching to. The goal is not to prove a benchmark with one sample file; it is to compare quality, speed, cost, and deployment fit on audio that looks like your real workload.

When FunASR is a good fit

FunASR is usually worth evaluating when you need one or more of these properties:

Private or self-hosted transcription where audio should stay inside your environment.
High-throughput long-form transcription for meetings, archives, media, or call recordings.
Speaker-aware transcripts with VAD, punctuation, timestamps, and diarization in one pipeline.
An OpenAI-compatible audio endpoint for agents, Dify, LangChain, AutoGen, or internal apps.
Streaming ASR or live captions with WebSocket/runtime service support.
CPU-viable smoke tests before moving to GPU deployment.

Stay on your current pipeline if you need a managed service with no operations work, a vendor SLA, or a language/domain that your own benchmark shows FunASR does not handle well enough yet.

Fast evaluation plan

Pick 20-50 representative audio files. Include short clips, long recordings, noisy samples, different speakers, and the languages or dialects you care about.
Run your current Whisper or cloud ASR pipeline exactly as you use it in production. Save transcripts, latency, cost, and failure cases.
Run FunASR locally with the README quick start, or use the migration benchmark example to measure a representative audio folder. Then choose a deployment path from the deployment matrix.
Compare output with human review or your normal WER/CER process. Do not compare only one clean demo file.
Run the OpenAI-compatible API smoke test if your application already uses OpenAI-style clients.
Record warmup time, model download time, device, GPU/CPU type, batch size, and audio duration separately from steady-state throughput.

Feature mapping

Existing workflow	FunASR path	What to validate
Whisper file transcription	README quick start and model selection guide with SenseVoice, Paraformer, or Fun-ASR-Nano	Transcript quality, timestamps, speed, model download, CPU/GPU behavior.
Whisper plus pyannote	`spk_model="cam++"` with VAD and punctuation	Speaker labels, speaker changes, overlapping speech, long silences.
OpenAI audio API or cloud batch ASR	OpenAI-compatible API example	`/v1/audio/transcriptions`, response format, client compatibility, upload limits.
Dify/LangChain/AutoGen agent audio	Client recipes or MCP server	Tool latency, file handling, auth boundary, error reporting.
Live captions or call-center streaming	Runtime service docs	Chunking, endpointing, reconnects, backpressure, partial/final result behavior.
Subtitle generation	Subtitle example	Segment readability, line length, speaker labels, SRT/VTT compatibility.
Offline archive processing	Batch ASR example	Manifest handling, retries, progress logs, throughput, failed-file recovery.

Minimal local comparison

Install FunASR and run the same file you used for your baseline:

pip install funasr

from funasr import AutoModel

model = AutoModel(
    model="iic/SenseVoiceSmall",
    vad_model="fsmn-vad",
    spk_model="cam++",
    device="cuda",  # use "cpu" for a portable smoke test
)
result = model.generate(input="sample.wav")
print(result)

For an API-style comparison:

pip install funasr fastapi uvicorn python-multipart
funasr-server --model sensevoice --device cuda

curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@sample.wav \
  -F model=sensevoice \
  -F response_format=verbose_json

If you want a repeatable folder-level benchmark, run examples/migration/benchmark_funasr.py to produce results.jsonl and summary.md for your own audio set. For a container smoke test, start from examples/openai_api/docker-compose.yml and verify it with BASE_URL=http://localhost:8000 bash examples/openai_api/smoke_test.sh.

Quality and speed checklist

Track these fields for both the old pipeline and FunASR:

Audio duration, language, domain, sample rate, channel count, and speaker count.
Model name, model version, FunASR version, Python/PyTorch/CUDA versions, and Docker image tag if used.
Hardware, device mode, batch size, streaming chunk size, and whether warmup/model download is excluded.
WER/CER or human review notes for names, numbers, punctuation, diarization, timestamps, and domain terms.
Latency, throughput, GPU/CPU memory, cost per hour of audio, and failed-file rate.
Operational requirements: authentication, upload limits, TLS, logs, monitoring, retries, and retention rules.

Rollout checklist

Keep the old pipeline available until FunASR passes your representative benchmark.
Start with an internal endpoint or batch job before exposing a public API.
Add request IDs and log audio duration, model, device, latency, and error type for every request.
Pin the model alias and deployment command in your runbook.
Test noisy audio, silence, overlapping speakers, long files, non-UTF-8 filenames, and network interruptions.
Open a Deployment Help issue with your command, logs, model, device, and sample characteristics if you hit a blocker.

If FunASR replaces or complements your existing ASR stack, consider opening a Migration Benchmark Report or showcase issue. Migration reports with hardware, speed, quality notes, and deployment details help new users choose the right path faster.