Whisper Benchmark for Apple Silicon
August 4, 2025 ยท View on GitHub
A comprehensive benchmarking tool to compare different Whisper implementations optimized for Apple Silicon, focusing on speed while maintaining accuracy. Now supports 8 different implementations including native Swift frameworks and MLX-based solutions.
Example output
Test results from MacBook Pro M4 24GB:
=== Benchmark Summary for 'large' model ===
Implementation Avg Time (s) Parameters
--------------------------------------------------------------------------------
fluidaudio-coreml 0.1935 model=parakeet-tdt-0.6b-v2-coreml, backend=FluidAudio Swift Bridge, platform=Apple Silicon
"Which is the fastest transcription on my Mac?"
parakeet-mlx 0.4995 model=parakeet-tdt-0.6b-v2, implementation=parakeet-mlx, platform=Apple Silicon (MLX)
"Which is the fastest transcription on my Mac?"
mlx-whisper 1.0230 model=whisper-large-v3-turbo, quantization=none
"Which is the fastest transcription on my Mac?"
insanely-fast-whisper 1.1324 model=whisper-large-v3-turbo, device_id=mps, batch_size=12, compute_type=float16, quantization=4bit
"Which is the fastest transcription on my Mac?"
whisper.cpp 1.2293 model=large-v3-turbo-q5_0, coreml=True, n_threads=4
"Which is the fastest transcription on my Mac?"
lightning-whisper-mlx 1.8160 model=large, batch_size=12, quant=none
"which is the fastest transcription on my Mac"
whisperkit 2.2190 model=large-v3, backend=WhisperKit Swift Bridge, platform=Apple Silicon
"Which is the fastest transcription on my Mac?"
whisper-mps 5.3722 model=large, backend=whisper-mps, device=mps, language=None
"Which is the fastest transcription on my Mac?"
faster-whisper 6.9613 model=large-v3-turbo, device=cpu, compute_type=int8, beam_size=1, cpu_threads=12, original_model_requested=large
"Which is the fastest transcription on my Mac?"
Demo: https://x.com/anvanvan/status/1913624854584037443
Overview
This tool measures transcription performance across different implementations of the same base model (e.g., all variants of "small"). It helps you find the fastest Whisper implementation on your Apple Silicon Mac for a given base model. With 8 different implementations including native Swift frameworks, MLX-based solutions, and CPU-optimized variants, you can find the perfect balance of speed and accuracy for your use case.
Features
- Live speech recording with automatic audio preprocessing
- 8 different Whisper implementations with Apple Silicon optimizations
- Base model selection (tiny, small, base, medium, large) with automatic fallback chains
- Transcription quality comparison - see actual transcription text alongside performance metrics
- Apple Silicon-specific optimizations with up to 16% performance improvements
- Native Swift bridge support for WhisperKit and FluidAudio frameworks
- Comprehensive parameter reporting showing actual models and configurations used
Implementations Tested
๐ Native Apple Silicon Implementations
-
WhisperKit โก Apple Silicon Native
- Source: https://github.com/argmaxinc/WhisperKit
- Technology: Native Swift + CoreML, optimized for Apple Neural Engine
- Performance: Fastest implementation, leverages on-device inference
- Bridge: Custom Swift bridge for seamless Python integration
-
FluidAudio CoreML โก Apple Silicon Native
- Source: https://github.com/FluidInference/FluidAudio
- Technology: Native Swift + CoreML with Parakeet TDT models
- Performance: Real-time streaming ASR with ~110x RTF on M4 Pro
- Bridge: Custom Swift bridge with internal timing measurement
๐ฅ MLX-Accelerated Implementations
-
mlx-whisper
- Source: https://github.com/ml-explore/mlx-examples
- Technology: Apple MLX framework for unified memory optimization
- Models: Quantized MLX models (4-bit, 8-bit) for memory efficiency
-
Parakeet MLX โก New Implementation
- Source: https://github.com/nvidia/parakeet
- Technology: NVIDIA Parakeet models running on Apple MLX
- Performance: Optimized for Apple Silicon with MLX acceleration
-
lightning-whisper-mlx โก Apple Silicon Optimized
- Source: https://github.com/lightning-AI/lightning-whisper
- Optimization: 4-bit quantization enabled, optimized batch sizing
- Performance: Significant memory efficiency improvements
โก GPU/MPS-Accelerated Implementations
-
insanely-fast-whisper โก Apple Silicon Optimized
- Source: https://github.com/Vaibhavs10/insanely-fast-whisper
- Optimization: Adaptive batch sizing, SDPA attention, memory layout optimization
- Performance: 16.0% faster on Apple Silicon with MPS acceleration
-
whisper-mps โก New Implementation
- Source: https://github.com/AtomGradient/whisper-mps
- Technology: Direct MPS (Metal Performance Shaders) acceleration
- Performance: Optimized for Apple Silicon GPU acceleration
๐ฅ๏ธ CPU-Optimized Implementations
-
faster-whisper โก Apple Silicon Optimized
- Source: https://github.com/SYSTRAN/faster-whisper
- Optimization: Dynamic CPU thread allocation, Apple Accelerate framework
- Performance: 2.0% faster with intelligent core detection (performance vs efficiency cores)
-
whisper.cpp + CoreML
- Source: https://github.com/abdeladim-s/pywhispercpp
- Technology: C++ implementation with optional CoreML acceleration
- Performance: Excellent baseline performance with CoreML support
Installation
# Clone the repository
git clone https://github.com/anvanvan/mac-whisper-speedtest.git
cd mac-whisper-speedtest
# Install dependencies (Python 3.11+ required)
uv sync
# Build Swift bridges (required for native implementations)
# WhisperKit bridge
cd tools/whisperkit-bridge && swift build -c release && cd ../..
# FluidAudio bridge (optional - only if you want FluidAudio support)
cd tools/fluidaudio-bridge && swift build -c release && cd ../..
Usage
# Run benchmark with default settings (small model, all implementations)
.venv/bin/mac-whisper-speedtest
# Run benchmark with a specific model
.venv/bin/mac-whisper-speedtest --model small
# Run benchmark with specific implementations
.venv/bin/mac-whisper-speedtest --model small --implementations "WhisperKitImplementation,FluidAudioCoreMLImplementation,ParakeetMLXImplementation"
# Test only the fastest native implementations
.venv/bin/mac-whisper-speedtest --model small --implementations "WhisperKitImplementation,FluidAudioCoreMLImplementation"
# Test MLX-based implementations
.venv/bin/mac-whisper-speedtest --model small --implementations "MLXWhisperImplementation,ParakeetMLXImplementation,LightningWhisperMLXImplementation"
# Run benchmark with more runs for statistical accuracy
.venv/bin/mac-whisper-speedtest --model small --num-runs 5
Features
Universal Transcription Display ๐ฏ
All implementations display their transcription results in the benchmark summary, allowing you to compare both performance and transcription quality across different implementations.
Key features:
- โ Universal: Works with all 9 supported implementations including native Swift bridges
- โ Smart formatting: Long text is truncated, empty results show "(no transcription)"
- โ Clean display: Consistent indentation and formatting across all implementations
- โ Model transparency: Shows actual models used (including fallback substitutions)
- โ Bridge timing: Native implementations report internal transcription time (excluding bridge overhead)
- โ Performance focus: Transcription display doesn't interfere with timing comparisons
Requirements
- macOS with Apple Silicon (M1/M2/M3/M4) - Required for optimal performance
- macOS 14.0+ (for WhisperKit and FluidAudio native support)
- Xcode 15.0+ (for building Swift bridges)
- Python 3.11+ (updated requirement for latest dependencies)
- PyAudio and its dependencies (for audio recording)
- Swift Package Manager (for building native bridges)
- Various Whisper implementations (installed automatically via uv)
Project Structure
mac-whisper-speedtest/
โโโ pyproject.toml # Updated dependencies (Python 3.11+)
โโโ docs/
โ โโโ APPLE_SILICON_OPTIMIZATIONS.md # Detailed optimization guide
โโโ src/
โ โโโ mac_whisper_speedtest/
โ โโโ __init__.py
โ โโโ audio.py # Audio recording/processing
โ โโโ benchmark.py # Enhanced benchmarking with transcription display
โ โโโ implementations/ # Individual implementation wrappers
โ โ โโโ __init__.py # Implementation registry (9 implementations)
โ โ โโโ base.py # Abstract base class
โ โ โโโ coreml.py # WhisperCpp with CoreML
โ โ โโโ faster.py # Faster Whisper (Apple Silicon optimized)
โ โ โโโ insanely.py # Insanely Fast Whisper (Apple Silicon optimized)
โ โ โโโ mlx.py # MLX Whisper
โ โ โโโ lightning.py # Lightning Whisper MLX (4-bit quantization)
โ โ โโโ whisperkit.py # WhisperKit (Swift bridge)
โ โ โโโ fluidaudio_coreml.py # FluidAudio CoreML (Swift bridge)
โ โ โโโ parakeet_mlx.py # Parakeet MLX (new implementation)
โ โ โโโ whisper_mps.py # whisper-mps (new implementation)
โ โโโ cli.py # Command line interface
โโโ tools/
โ โโโ whisperkit-bridge/ # Swift bridge for WhisperKit
โ โ โโโ Package.swift
โ โ โโโ Sources/whisperkit-bridge/main.swift
โ โโโ fluidaudio-bridge/ # Swift bridge for FluidAudio
โ โโโ Package.swift
โ โโโ Sources/fluidaudio-bridge/main.swift
โโโ tests/
โ โโโ test_model_params.py # Model parameter validation tests
โ โโโ test_parakeet_integration.py # Parakeet MLX integration tests
โโโ README.md
Version 2.0 (Latest)
- โ 4 new implementations: WhisperKit, FluidAudio CoreML, Parakeet MLX, whisper-mps
- โ Native Swift bridges: Direct integration with macOS frameworks
- โ Enhanced Apple Silicon optimizations: Up to 16% performance improvements
- โ Transcription quality comparison: See actual transcription text in results
- โ Model fallback chains: Automatic fallback for unavailable models (e.g., large โ large-v3-turbo โ large-v3)
- โ Python 3.11+ support: Updated dependencies and requirements
License
MIT