Migrate from Whisper or Cloud ASR to FunASR
May 25, 2026 ยท View on GitHub
Use this guide when you already have a Whisper, OpenAI/Cloud ASR, or custom speech pipeline and want to decide whether FunASR is worth switching to. The goal is not to prove a benchmark with one sample file; it is to compare quality, speed, cost, and deployment fit on audio that looks like your real workload.
When FunASR is a good fit
FunASR is usually worth evaluating when you need one or more of these properties:
- Private or self-hosted transcription where audio should stay inside your environment.
- High-throughput long-form transcription for meetings, archives, media, or call recordings.
- Speaker-aware transcripts with VAD, punctuation, timestamps, and diarization in one pipeline.
- An OpenAI-compatible audio endpoint for agents, Dify, LangChain, AutoGen, or internal apps.
- Streaming ASR or live captions with WebSocket/runtime service support.
- CPU-viable smoke tests before moving to GPU deployment.
Stay on your current pipeline if you need a managed service with no operations work, a vendor SLA, or a language/domain that your own benchmark shows FunASR does not handle well enough yet.
Fast evaluation plan
- Pick 20-50 representative audio files. Include short clips, long recordings, noisy samples, different speakers, and the languages or dialects you care about.
- Run your current Whisper or cloud ASR pipeline exactly as you use it in production. Save transcripts, latency, cost, and failure cases.
- Run FunASR locally with the README quick start, or use the migration benchmark example to measure a representative audio folder. Then choose a deployment path from the deployment matrix.
- Compare output with human review or your normal WER/CER process. Do not compare only one clean demo file.
- Run the OpenAI-compatible API smoke test if your application already uses OpenAI-style clients.
- Record warmup time, model download time, device, GPU/CPU type, batch size, and audio duration separately from steady-state throughput.
Feature mapping
| Existing workflow | FunASR path | What to validate |
|---|---|---|
| Whisper file transcription | README quick start and model selection guide with SenseVoice, Paraformer, or Fun-ASR-Nano | Transcript quality, timestamps, speed, model download, CPU/GPU behavior. |
| Whisper plus pyannote | spk_model="cam++" with VAD and punctuation | Speaker labels, speaker changes, overlapping speech, long silences. |
| OpenAI audio API or cloud batch ASR | OpenAI-compatible API example | /v1/audio/transcriptions, response format, client compatibility, upload limits. |
| Dify/LangChain/AutoGen agent audio | Client recipes or MCP server | Tool latency, file handling, auth boundary, error reporting. |
| Live captions or call-center streaming | Runtime service docs | Chunking, endpointing, reconnects, backpressure, partial/final result behavior. |
| Subtitle generation | Subtitle example | Segment readability, line length, speaker labels, SRT/VTT compatibility. |
| Offline archive processing | Batch ASR example | Manifest handling, retries, progress logs, throughput, failed-file recovery. |
Minimal local comparison
Install FunASR and run the same file you used for your baseline:
pip install funasr
from funasr import AutoModel
model = AutoModel(
model="iic/SenseVoiceSmall",
vad_model="fsmn-vad",
spk_model="cam++",
device="cuda", # use "cpu" for a portable smoke test
)
result = model.generate(input="sample.wav")
print(result)
For an API-style comparison:
pip install funasr fastapi uvicorn python-multipart
funasr-server --model sensevoice --device cuda
curl http://localhost:8000/v1/audio/transcriptions \
-F file=@sample.wav \
-F model=sensevoice \
-F response_format=verbose_json
If you want a repeatable folder-level benchmark, run examples/migration/benchmark_funasr.py to produce results.jsonl and summary.md for your own audio set. For a container smoke test, start from examples/openai_api/docker-compose.yml and verify it with BASE_URL=http://localhost:8000 bash examples/openai_api/smoke_test.sh.
Quality and speed checklist
Track these fields for both the old pipeline and FunASR:
- Audio duration, language, domain, sample rate, channel count, and speaker count.
- Model name, model version, FunASR version, Python/PyTorch/CUDA versions, and Docker image tag if used.
- Hardware, device mode, batch size, streaming chunk size, and whether warmup/model download is excluded.
- WER/CER or human review notes for names, numbers, punctuation, diarization, timestamps, and domain terms.
- Latency, throughput, GPU/CPU memory, cost per hour of audio, and failed-file rate.
- Operational requirements: authentication, upload limits, TLS, logs, monitoring, retries, and retention rules.
Rollout checklist
- Keep the old pipeline available until FunASR passes your representative benchmark.
- Start with an internal endpoint or batch job before exposing a public API.
- Add request IDs and log audio duration, model, device, latency, and error type for every request.
- Pin the model alias and deployment command in your runbook.
- Test noisy audio, silence, overlapping speakers, long files, non-UTF-8 filenames, and network interruptions.
- Open a Deployment Help issue with your command, logs, model, device, and sample characteristics if you hit a blocker.
Share the result
If FunASR replaces or complements your existing ASR stack, consider opening a Migration Benchmark Report or showcase issue. Migration reports with hardware, speed, quality notes, and deployment details help new users choose the right path faster.