README.md

June 16, 2026 · View on GitHub

WhisperSubs Banner

Release Tests License .NET 9.0 Jellyfin 10.11+ listed on awesome-jellyfin codecov


WhisperSubs is a Jellyfin plugin that automatically generates subtitles for your media library using local AI models. All transcription runs entirely on your server -- no audio data ever leaves your network. Your media stays private.

Features

  • Fully Local Processing -- Audio is transcribed on your hardware using whisper.cpp. No cloud APIs, no external services, no data exfiltration.
  • Built-in Engine Setup -- Download whisper-cli binaries and models directly from the plugin settings page on Linux. No manual installation needed for most users.
  • Automatic Language Detection -- Reads audio stream metadata to detect the spoken language and generate matching subtitles. Falls back to whisper's built-in language detection when tags are absent.
  • Forced Subtitles -- Detect and transcribe only foreign-language dialogue (e.g., French lines in an English movie) via VAD-based speech segmentation and per-chunk language detection.
  • Lyrics Generation (Experimental) -- Generate .lrc lyrics files for music libraries via whisper transcription. Jellyfin picks up .lrc files automatically.
  • GPU Acceleration -- Supports CUDA (NVIDIA), Vulkan (Intel / AMD / NVIDIA), and ROCm (AMD) for significantly faster transcription.
  • Priority Queue -- Manual requests are queued with priority and processed before scheduled items. Queue persists across restarts.
  • Real-time Progress -- Live progress banner in the admin UI showing current item, phase (extracting audio, transcribing), per-file progress, and overall stats.
  • Subtitle Resume -- If transcription is interrupted, it resumes from the last timestamp rather than starting over.
  • Admin Dashboard UI -- Browse libraries, view items, manage the whisper engine, and trigger subtitle generation directly from the Jellyfin admin panel.
  • Scheduled Tasks -- Enable automatic scanning so new media gets subtitles without manual intervention. Runs daily at 2:00 AM and on startup by default.
  • Per-Library Control -- Choose which libraries are monitored for automatic subtitle generation.
  • Multiple Output Formats -- Generates .srt subtitles, .forced.generated.srt forced subtitles, and .lrc lyrics, all placed alongside your media and auto-detected by Jellyfin.

Prerequisites

DependencyDetails
Jellyfin10.11.0 or later
FFmpegBundled with Jellyfin (/usr/lib/jellyfin-ffmpeg/ffmpeg) or available in PATH. Used to extract audio from media files.
whisper.cppThe whisper-cli binary. On Linux, the plugin can download this automatically from the settings page. Otherwise, install manually -- see Installing whisper.cpp.
Whisper ModelA GGML model file. The plugin can download models automatically from the settings page, or download manually from Hugging Face.

Quick start (Linux): After installing the plugin, go to Dashboard > Plugins > WhisperSubs. The Whisper Engine section lets you download both the binary and a model with one click each. The manual steps below are only needed for non-Linux platforms or custom setups.

Installation

  1. In Jellyfin, go to Dashboard > Plugins > Repositories.
  2. Add a new repository with this URL:
    https://geiserx.github.io/whisper-subs/manifest.json
    
  3. Go to Catalog, find WhisperSubs, and click Install.
  4. Restart Jellyfin.

Manual Installation

  1. Build from source:
    dotnet build --configuration Release
    
  2. Copy WhisperSubs.dll to your Jellyfin plugins directory:
    /var/lib/jellyfin/plugins/WhisperSubs/
    
  3. Restart Jellyfin.

Installing whisper.cpp

The plugin requires whisper.cpp for transcription. Choose the method that matches your setup.

  1. Download the latest release for your platform from whisper.cpp releases.
  2. Extract and place the whisper-cli binary somewhere persistent (e.g., /opt/whisper/).
  3. Download a model:
    mkdir -p /opt/whisper/models
    
    # Base model (~148 MB) -- fast, good for quick transcription
    wget -O /opt/whisper/models/ggml-base.bin \
      https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin
    
    # Large V3 Turbo (~1.6 GB) -- best accuracy with reasonable speed (recommended)
    wget -O /opt/whisper/models/ggml-large-v3-turbo.bin \
      https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin
    
  4. In the plugin settings, set Whisper Binary Path to /opt/whisper/whisper-cli and Whisper Model Path to the model file.

Option B: Build from Source (CPU only)

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build -DBUILD_SHARED_LIBS=OFF
cmake --build build --config Release -j$(nproc)
# Binary will be at build/bin/whisper-cli

Option C: Build from Source with GPU Acceleration

See GPU Acceleration below for detailed instructions.

Docker / Container Setups

If Jellyfin runs in a Docker container, whisper.cpp must be accessible inside the container. The recommended approach is to bind-mount a host directory containing the binary and model:

# docker-compose.yml
services:
  jellyfin:
    image: jellyfin/jellyfin
    volumes:
      - /opt/whisper:/opt/whisper:ro   # whisper-cli binary + models
      # ... your other volumes

Then configure the plugin with:

  • Whisper Binary Path: /opt/whisper/whisper-cli
  • Whisper Model Path: /opt/whisper/models/ggml-large-v3-turbo.bin

Note: The binary must be compiled for the same architecture as the container (typically x86_64 Linux). Download the linux-x64 release asset or build inside a matching environment.

Container Library Requirements

The plugin's built-in binary downloader fetches pre-built whisper-cli binaries. These require runtime libraries that are not included in the default Jellyfin Docker image:

VariantRequired packagesInstall command
CPUlibgomp1apt install libgomp1
CPU (Compatibility)none (self-contained)
Vulkanlibgomp1, libvulkan1, mesa-vulkan-driversapt install libgomp1 libvulkan1 mesa-vulkan-drivers
CUDA 12libgomp1 + NVIDIA Container Toolkit on hostSee CUDA section
ROCmlibgomp1 + ROCm runtimeSee ROCm docs

Minimal containers (TrueNAS Scale, slim Docker): The default cpu build links libgomp1 (OpenMP) for best performance. If that library is missing, the plugin automatically falls back to the noavx build, which is compiled with OpenMP off and has no such dependency — so transcription still works out of the box, just without the small OpenMP speed-up.

Older / low-power CPUs (no AVX): The cpu, vulkan and cuda12 builds are compiled with AVX/AVX2 instructions and will crash with an illegal instruction error (exit 132) on CPUs that lack them — common on budget NAS boxes, Atom/Celeron chips and some VMs. The plugin detects missing AVX support and automatically uses the noavx (Compatibility) build instead, both when recommending a variant and as a download-time fallback. You can also pick "CPU (Compatibility)" manually on the setup page.

For the faster build, install libgomp1 persistently via your container's entrypoint or Dockerfile:

apt-get update -qq && apt-get install -y -qq --no-install-recommends libgomp1 && rm -rf /var/lib/apt/lists/*

The plugin's setup page will detect missing libraries and warn you before downloading.

Verifying the Installation

# If in PATH:
whisper-cli --help

# If using an absolute path:
/opt/whisper/whisper-cli --help

# Inside a Docker container:
docker exec jellyfin /opt/whisper/whisper-cli --help

GPU Acceleration

whisper.cpp supports GPU offloading via Vulkan (Intel, AMD, and some NVIDIA GPUs), CUDA (NVIDIA), and ROCm (AMD). GPU acceleration dramatically reduces transcription time, especially with larger models.

Docker users: Passing the GPU device (e.g., /dev/dri) to a container is not enough -- the container also needs the matching userspace libraries installed. The auto-setup wizard detects both the device and the library and will fall back to CPU if the library is missing.

BackendDeviceRequired libraryInstall command (Debian/Ubuntu)
CUDA/dev/nvidia0libcuda.so.1nvidia-container-toolkit (host)
Vulkan/dev/drilibvulkan.so.1 + ICD JSONapt install libvulkan1 mesa-vulkan-drivers (also needs /usr/share/vulkan/icd.d/*.json)
ROCm/dev/kfdlibamdhip64.soapt install rocm-hip-runtime

Vulkan (Intel / AMD)

Vulkan is the best option for Intel iGPUs (e.g., UHD 770) and AMD GPUs. It works through the Mesa Vulkan drivers.

Building whisper.cpp with Vulkan

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build \
  -DGGML_VULKAN=ON \
  -DBUILD_SHARED_LIBS=OFF
cmake --build build --config Release -j$(nproc)
# Binary: build/bin/whisper-cli

Important: The CMake flag is -DGGML_VULKAN=ON (not -DWHISPER_VULKAN). This is a common source of confusion.

Runtime Dependencies

The Vulkan binary requires these libraries at runtime:

Package (Debian/Ubuntu)Purpose
libvulkan1Vulkan loader
mesa-vulkan-driversIntel (ANV) and AMD (RADV) Vulkan ICDs
libgomp1OpenMP threading
apt-get install -y libvulkan1 mesa-vulkan-drivers libgomp1

Docker: GPU Passthrough for Vulkan

To use an Intel or AMD GPU inside a Docker container:

services:
  jellyfin:
    image: jellyfin/jellyfin
    devices:
      - /dev/dri:/dev/dri    # GPU render nodes
    volumes:
      - /opt/whisper:/opt/whisper:ro

The container also needs the Vulkan runtime libraries. If using the official Jellyfin image (Debian-based), install them on startup:

    entrypoint:
      - /bin/bash
      - -c
      - |
        dpkg -s libvulkan1 > /dev/null 2>&1 || \
          (apt-get update -qq && \
           apt-get install -y -qq --no-install-recommends \
             libvulkan1 mesa-vulkan-drivers libgomp1 > /dev/null 2>&1 && \
           rm -rf /var/lib/apt/lists/*)
        exec /jellyfin/jellyfin

Verify GPU detection inside the container:

# Should show your GPU (e.g., "Intel(R) UHD Graphics 770")
docker exec jellyfin apt-get update -qq && \
  docker exec jellyfin apt-get install -y -qq vulkan-tools && \
  docker exec jellyfin vulkaninfo --summary

Building Inside Docker (ABI Compatibility)

When Jellyfin runs in a container, the whisper binary must be compiled against matching system libraries. Build inside a container with the same base image:

# On the Docker host:
docker run --rm -v /opt/whisper:/output debian:trixie bash -c '
  apt-get update && apt-get install -y git cmake g++ libvulkan-dev &&
  git clone https://github.com/ggerganov/whisper.cpp.git /tmp/whisper &&
  cd /tmp/whisper &&
  cmake -B build -DGGML_VULKAN=ON -DBUILD_SHARED_LIBS=OFF &&
  cmake --build build --config Release -j$(nproc) &&
  cp build/bin/whisper-cli /output/whisper-cli
'

CUDA (NVIDIA)

For NVIDIA GPUs with CUDA support:

Building whisper.cpp with CUDA

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build \
  -DGGML_CUDA=ON \
  -DBUILD_SHARED_LIBS=OFF
cmake --build build --config Release -j$(nproc)

Docker: NVIDIA GPU Passthrough

services:
  jellyfin:
    image: jellyfin/jellyfin
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    volumes:
      - /opt/whisper:/opt/whisper:ro

Requires the NVIDIA Container Toolkit.

Verifying GPU Acceleration

After configuring GPU support, trigger a transcription and check the Jellyfin logs. You should see:

# Vulkan
whisper_backend_init_gpu: using Vulkan0 backend

# CUDA
whisper_backend_init_gpu: using CUDA0 backend

If you see no GPU found or using CPU backend, the binary was not built with GPU support or the runtime drivers are missing.

Model Recommendations

ModelSizeSpeed (CPU)Speed (GPU)QualityUse Case
ggml-large-v3-turbo-q5_0.bin574 MBModerateFastExcellentRecommended. Best quality/size ratio.
ggml-large-v3-turbo.bin1.6 GBSlowFastExcellentFull-precision turbo. Slightly better quality, 3x larger.
ggml-medium-q5_0.bin539 MBModerateFastVery goodSimilar size to turbo-q5 but slower and less accurate.
ggml-medium.bin1.5 GBModerateFastVery goodFull-precision medium model.
ggml-small.bin488 MBFastVery fastGoodFaster inference, lower accuracy.
ggml-base.bin148 MBFastVery fastFairLightweight. Fast but noticeably less accurate.
ggml-tiny.bin78 MBVery fastVery fastBasicSmallest model. Only for testing or constrained environments.

The Q5 quantized models offer nearly identical quality to their F16 counterparts at a fraction of the size. ggml-large-v3-turbo-q5_0 is the default when downloading from the plugin settings page.

Configuration

After installation, navigate to Dashboard > Plugins > WhisperSubs to configure:

SettingDescription
Default LanguageAuto-detect reads the language from each file's audio stream metadata and generates matching subtitles. Choose a specific language to force it for all transcriptions.
Subtitle ModeFull, Forced Only, or Full + Forced. See Subtitle Modes below.
Enable Auto-GenerationWhen enabled, the scheduled task will scan selected libraries and generate subtitles for items that lack them.
Enabled LibrariesSelect which libraries should be monitored for automatic subtitle generation.
Enable Lyrics GenerationWhen enabled, music libraries are scanned and audio tracks receive .lrc lyrics files (experimental -- whisper is optimized for speech, not singing).
Whisper Binary Path(Advanced) Absolute path to the whisper-cli binary. Leave empty to use the auto-downloaded binary or search PATH.
Whisper Model Path(Advanced) Absolute path to the GGML model file. Leave empty to use the auto-downloaded model.
Whisper Thread Count(Advanced) Number of CPU threads for whisper inference. 0 = whisper default (4). Set to your CPU core count for faster transcription.

Subtitle Modes

ModeWhat it generatesPerformance
Full (default)Complete transcription of all speechFast -- single whisper run per audio track
Forced OnlyOnly foreign-language dialogue (e.g., French lines in an English movie)Slow -- see below
Full + ForcedBoth files per trackSlowest -- runs both pipelines

Performance warning for Forced / Full + Forced modes: Forced subtitle generation uses a multi-step pipeline: audio extraction, VAD-based speech segmentation, then per-chunk language detection on every ~30-second segment of the movie. For a 2-hour film this means ~240 individual whisper calls just for detection, before any transcription begins. On CPU, this phase alone can take 10--20+ minutes per movie. GPU acceleration helps significantly.

If you don't need forced subtitles (most users don't), use Full mode for much faster processing.

Language Handling

The plugin supports three language modes:

  1. Auto-detect (recommended) -- The plugin uses FFprobe to read the audio stream's language tag (e.g., spaes, engen). Subtitles are generated in the language that matches the audio. If a file has multiple audio tracks in different languages, subtitles are generated for each one.

  2. Whisper auto-detection -- When no language metadata is available, the request falls through to whisper's built-in language detection (-l auto), which analyzes the first 30 seconds of audio.

  3. Forced language -- Set a specific language code (e.g., es) in the configuration or per-request via the API. This overrides detection and tells whisper to transcribe using that language model.

Subtitle Timing

whisper.cpp emits subtitle segments back-to-back with no gaps, so the next line can appear on screen during the pause before it is actually spoken. The plugin corrects this, and the relevant settings are on by default:

SettingWhat it does
Enable VADRuns whisper-cli with its native Silero Voice Activity Detection (--vad), so each cue starts at the real speech onset rather than during the preceding silence. The Silero VAD model (~865 KB) is auto-downloaded into the plugin's whisper/vad/ data directory on first use. This is the primary speech-onset mechanism.
Align subtitles to speechOlder, energy-based fallback. Snaps each subtitle's start to the detected speech onset using a quick FFmpeg silence-detection pass over the audio. Used only when Enable VAD is off (native VAD handles this more reliably).
Compensate audio start offsetShifts all subtitle timestamps by the audio stream's container start time, keeping subtitles in sync when a file's audio doesn't begin exactly at 0:00.

These corrections apply only to locally-generated subtitles (whisper-cli) -- both full and translated subtitles. They do not affect the remote Whisper API or forced subtitles.

With native VAD enabled (the default), no extra FFmpeg pass is needed. The FFmpeg silence-detection alignment only runs as a fallback when VAD is disabled.

Usage

Admin Dashboard

The plugin adds a dedicated page to the Jellyfin admin dashboard (accessible from Dashboard > Plugins > WhisperSubs, or from the main sidebar menu). From there you can:

  • Configure the plugin settings (language, subtitle mode, binary/model paths, enabled libraries).
  • Manage the whisper engine -- download binaries (CPU / Vulkan / CUDA / ROCm) and models directly from the UI.
  • Browse all libraries and their items.
  • See which items already have subtitles (green check / orange cross).
  • Select a language for subtitle generation (auto-detect or any specific language).
  • Generate subtitles for individual items with a single click.
  • Monitor progress -- a live banner shows the current item, processing phase, and queue depth.

REST API

All endpoints require Jellyfin admin authentication. Setup endpoints additionally require elevated privileges.

Library & Items

MethodEndpointDescription
GET/Plugins/WhisperSubs/LibrariesList all media libraries
GET/Plugins/WhisperSubs/Libraries/{libraryId}/ItemsList items in a library (supports startIndex and limit)
POST/Plugins/WhisperSubs/Items/{itemId}/Generate?language=autoQueue subtitle generation (priority)
GET/Plugins/WhisperSubs/Items/{itemId}/AudioLanguagesDetect audio languages in a media file
GET/Plugins/WhisperSubs/Items/{itemId}/StatusCheck subtitle generation status

Queue & Task

MethodEndpointDescription
GET/Plugins/WhisperSubs/QueueQueue status: current item, progress, phase, remaining count
POST/Plugins/WhisperSubs/RunTaskTrigger the scheduled subtitle generation task
GET/Plugins/WhisperSubs/ModelsList downloaded models with active/size info

Engine Setup (requires elevated privileges)

MethodEndpointDescription
GET/Plugins/WhisperSubs/Setup/StatusBinary/model status, GPU detection, platform info
GET/Plugins/WhisperSubs/Setup/BinaryVariantsAvailable binary variants for this platform
POST/Plugins/WhisperSubs/Setup/DownloadBinary?variant=cpuDownload whisper-cli binary
GET/Plugins/WhisperSubs/Setup/AvailableModelsModel catalog with sizes and descriptions
POST/Plugins/WhisperSubs/Setup/DownloadModel?name=...Download a model from HuggingFace
GET/Plugins/WhisperSubs/Setup/ProgressDownload progress (percent, message, errors)
POST/Plugins/WhisperSubs/Setup/Models/{filename}/ActivateSet a downloaded model as active
DELETE/Plugins/WhisperSubs/Setup/Models/{filename}Delete a downloaded model

The language parameter accepts auto (default), or any ISO 639-1 code (en, es, fr, etc.).

Scheduled Task

A scheduled task named Generate Subtitles is registered under the WhisperSubs category. It can be configured in Dashboard > Scheduled Tasks with your preferred schedule or triggered manually. The task:

  1. Scans all enabled libraries (or all libraries if none are explicitly selected).
  2. Finds video items that lack subtitles.
  3. Generates subtitles using the configured default language (auto-detect by default).
  4. Reports progress in the Jellyfin task UI.

Skipping Already-Subtitled Media

The auto-generation task skips media that already has a usable subtitle in the needed language, so an already-subtitled library is not needlessly re-processed. For the translation pass, an existing English subtitle track -- embedded or external -- counts as already translated and is skipped. Forced (foreign-dialogue-only) and image-based subtitle tracks do not count as satisfying the need, so full subtitles are still generated when only those are present.

These settings control it. The skip toggles and Generate original-language subtitles are on by default; translation and the image-subtitle toggle are off by default:

SettingWhat it does
Skip media that already has subtitlesSkips media that already has a usable subtitle in the needed language, including an existing English subtitle when translating.
Ignore forced subtitles when skippingForced subtitle tracks do not count as "already subtitled", so full subtitles are still generated when only forced tracks exist.
Generate original-language subtitlesThe main switch: transcribe each title in its own spoken language — a Korean film gets Korean, an English film gets English. On by default.
Count image-based subtitles as presentWhen on, existing image-based subtitles (PGS/VOBSUB) count as "already subtitled" and no text subtitle is generated. Off by default, since image subs can't be searched or edited.
Also create an English subtitle when a title has noneTranslation to English. For a title whose audio isn't English and that has no English subtitle, additionally produce one. Skips titles that already have English audio or an English subtitle. Off by default.

What Whisper can produce: Whisper transcribes the speech it hears, so it writes a subtitle in the title's own audio language — that's the Generate original-language subtitles switch (an English film naturally gets English subtitles here). The one thing it can additionally do is translate to English: that's the separate Translation toggle, which only ever adds an English subtitle to a foreign-language title that doesn't already have one. There are no other targets — English is the only language Whisper translates into.

Scope: the skip logic and these toggles apply to the scheduled auto-generation task and bulk "Generate all" actions. A manual "Generate" on a single item always transcribes (it bypasses the skip and the original-language toggle), so you can force fresh subtitles for a file even when it already has some — e.g. to replace a poor embedded track.

Note: detection reads each item's subtitle streams from Jellyfin's library metadata, so a recent library scan keeps it accurate. If an item hasn't been scanned yet, the plugin errs toward generating rather than wrongly skipping.

How It Works

  1. Language Detection -- FFprobe reads the audio stream metadata to determine the spoken language(s).
  2. Audio Extraction -- FFmpeg extracts a 16 kHz mono WAV track from the media file.
  3. Transcription -- The extracted audio is passed to whisper.cpp, which produces an SRT subtitle file. For forced subtitles, the audio is first segmented via VAD (silence detection), then each ~30-second chunk is language-classified before selectively transcribing only foreign-language segments.
  4. Output -- Files are saved alongside the original media:
    • Full subtitles: Movie.es.generated.srt
    • Forced subtitles: Movie.es.forced.generated.srt
    • Lyrics: Song.lrc
  5. Metadata Refresh -- The item's metadata is refreshed so Jellyfin picks up the new files immediately.

Temporary audio files are cleaned up automatically after processing. Items that have already been processed are tracked with marker files (.noforeignlang) to avoid redundant work on subsequent scans.

Roadmap

See ROADMAP.md for planned features and design details.

Other Jellyfin Projects by GeiserX

  • smart-covers — Cover extraction for books, audiobooks, comics, magazines, and music libraries with online fallback
  • quality-gate — Restrict users to specific media versions based on configurable path-based policies
  • jellyfin-encoder — Automatic 720p HEVC/AV1 transcoding service with hardware acceleration
  • jellyfin-telegram-channel-sync — Sync Jellyfin access with Telegram channel membership

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for the full text.