faster-whisper

May 10, 2026 ยท View on GitHub

faster_whisper is the default RealtimeSTT transcription engine. It uses CTranslate2 through the faster-whisper package and supports the familiar Whisper model names plus local CTranslate2 model directories.

Install

Install the faster-whisper extra:

pip install "RealtimeSTT[faster-whisper]"

If you are working from a source checkout:

python -m pip install -e ".[faster-whisper]"

Basic Use

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="faster_whisper",
    model="small.en",
    device="cuda",
    compute_type="default",
)

For CPU:

recorder = AudioToTextRecorder(
    model="tiny.en",
    device="cpu",
    compute_type="int8",
)

Model Behavior

Known model names such as tiny, tiny.en, base, small, medium, large-v1, and large-v2 are downloaded automatically by faster-whisper. Use download_root to control the cache/download directory:

recorder = AudioToTextRecorder(
    model="small.en",
    download_root="models/faster-whisper",
)

You can also pass a path to a local CTranslate2-converted model directory as model.

GPU Notes

Use device="cuda" for GPU inference. gpu_device_index can be an integer or a list of GPU ids for compatible multi-GPU loading.

compute_type controls CTranslate2 precision and quantization. Common values include:

  • default
  • float16
  • float32
  • int8
  • int8_float16

CPU runs are usually more practical with small models and compute_type="int8".

Common Options

RealtimeSTT parameterfaster-whisper mapping
modelWhisperModel(model_size_or_path=...)
download_rootWhisperModel(download_root=...)
deviceWhisperModel(device=...)
compute_typeWhisperModel(compute_type=...)
gpu_device_indexWhisperModel(device_index=...)
beam_sizemodel.transcribe(beam_size=...)
batch_sizeEnables BatchedInferencePipeline when greater than 0.
languagePassed as the transcription language when set.
initial_promptPassed as initial_prompt.
suppress_tokensPassed as suppress_tokens.
faster_whisper_vad_filterPassed as vad_filter.
normalize_audioNormalizes audio before transcription when enabled.

Realtime Suggestions

Use a smaller realtime model than the final model:

recorder = AudioToTextRecorder(
    model="small.en",
    enable_realtime_transcription=True,
    realtime_model_type="tiny.en",
    realtime_processing_pause=0.15,
)

For a single shared model, set use_main_model_for_realtime=True. This saves memory but can reduce responsiveness when final and realtime work contend for the same model.

Troubleshooting

  • If CUDA libraries fail to load, reinstall PyTorch/torchaudio for the CUDA version on the machine.
  • If model downloads fail, set download_root to a writable directory and test network access to the Hugging Face Hub.
  • If realtime text lags, use a smaller realtime model, lower beam_size_realtime, increase realtime_processing_pause, or switch realtime to a CPU-friendly engine.