README.md

June 16, 2026 · View on GitHub

WhisperSubs is a Jellyfin plugin that automatically generates subtitles for your media library using local AI models. All transcription runs entirely on your server -- no audio data ever leaves your network. Your media stays private.

Features

Fully Local Processing -- Audio is transcribed on your hardware using whisper.cpp. No cloud APIs, no external services, no data exfiltration.
Built-in Engine Setup -- Download whisper-cli binaries and models directly from the plugin settings page on Linux. No manual installation needed for most users.
Automatic Language Detection -- Reads audio stream metadata to detect the spoken language and generate matching subtitles. Falls back to whisper's built-in language detection when tags are absent.
Forced Subtitles -- Detect and transcribe only foreign-language dialogue (e.g., French lines in an English movie) via VAD-based speech segmentation and per-chunk language detection.
Lyrics Generation (Experimental) -- Generate .lrc lyrics files for music libraries via whisper transcription. Jellyfin picks up .lrc files automatically.
GPU Acceleration -- Supports CUDA (NVIDIA), Vulkan (Intel / AMD / NVIDIA), and ROCm (AMD) for significantly faster transcription.
Priority Queue -- Manual requests are queued with priority and processed before scheduled items. Queue persists across restarts.
Real-time Progress -- Live progress banner in the admin UI showing current item, phase (extracting audio, transcribing), per-file progress, and overall stats.
Subtitle Resume -- If transcription is interrupted, it resumes from the last timestamp rather than starting over.
Admin Dashboard UI -- Browse libraries, view items, manage the whisper engine, and trigger subtitle generation directly from the Jellyfin admin panel.
Scheduled Tasks -- Enable automatic scanning so new media gets subtitles without manual intervention. Runs daily at 2:00 AM and on startup by default.
Per-Library Control -- Choose which libraries are monitored for automatic subtitle generation.
Multiple Output Formats -- Generates .srt subtitles, .forced.generated.srt forced subtitles, and .lrc lyrics, all placed alongside your media and auto-detected by Jellyfin.

Prerequisites

Dependency	Details
Jellyfin	10.11.0 or later
FFmpeg	Bundled with Jellyfin (`/usr/lib/jellyfin-ffmpeg/ffmpeg`) or available in `PATH`. Used to extract audio from media files.
whisper.cpp	The `whisper-cli` binary. On Linux, the plugin can download this automatically from the settings page. Otherwise, install manually -- see Installing whisper.cpp.
Whisper Model	A GGML model file. The plugin can download models automatically from the settings page, or download manually from Hugging Face.

Quick start (Linux): After installing the plugin, go to Dashboard > Plugins > WhisperSubs. The Whisper Engine section lets you download both the binary and a model with one click each. The manual steps below are only needed for non-Linux platforms or custom setups.

Installation

From the Jellyfin Plugin Repository (Recommended)

In Jellyfin, go to Dashboard > Plugins > Repositories.

Add a new repository with this URL:

https://geiserx.github.io/whisper-subs/manifest.json

Go to Catalog, find WhisperSubs, and click Install.
Restart Jellyfin.

Manual Installation

Build from source:
```
dotnet build --configuration Release
```
Copy WhisperSubs.dll to your Jellyfin plugins directory:
```
/var/lib/jellyfin/plugins/WhisperSubs/
```
Restart Jellyfin.

Installing whisper.cpp

The plugin requires whisper.cpp for transcription. Choose the method that matches your setup.

Option A: Pre-built Binary (Recommended for most users)

Download the latest release for your platform from whisper.cpp releases.
Extract and place the whisper-cli binary somewhere persistent (e.g., /opt/whisper/).

Download a model:

mkdir -p /opt/whisper/models

# Base model (~148 MB) -- fast, good for quick transcription
wget -O /opt/whisper/models/ggml-base.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin

# Large V3 Turbo (~1.6 GB) -- best accuracy with reasonable speed (recommended)
wget -O /opt/whisper/models/ggml-large-v3-turbo.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin

In the plugin settings, set Whisper Binary Path to /opt/whisper/whisper-cli and Whisper Model Path to the model file.

Option B: Build from Source (CPU only)

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build -DBUILD_SHARED_LIBS=OFF
cmake --build build --config Release -j$(nproc)
# Binary will be at build/bin/whisper-cli

Option C: Build from Source with GPU Acceleration

See GPU Acceleration below for detailed instructions.

Docker / Container Setups

If Jellyfin runs in a Docker container, whisper.cpp must be accessible inside the container. The recommended approach is to bind-mount a host directory containing the binary and model:

# docker-compose.yml
services:
  jellyfin:
    image: jellyfin/jellyfin
    volumes:
      - /opt/whisper:/opt/whisper:ro   # whisper-cli binary + models
      # ... your other volumes

Then configure the plugin with:

Whisper Binary Path: /opt/whisper/whisper-cli
Whisper Model Path: /opt/whisper/models/ggml-large-v3-turbo.bin

Note: The binary must be compiled for the same architecture as the container (typically x86_64 Linux). Download the linux-x64 release asset or build inside a matching environment.

Container Library Requirements

The plugin's built-in binary downloader fetches pre-built whisper-cli binaries. These require runtime libraries that are not included in the default Jellyfin Docker image:

Variant	Required packages	Install command
CPU	`libgomp1`	`apt install libgomp1`
CPU (Compatibility)	none (self-contained)	—
Vulkan	`libgomp1`, `libvulkan1`, `mesa-vulkan-drivers`	`apt install libgomp1 libvulkan1 mesa-vulkan-drivers`
CUDA 12	`libgomp1` + NVIDIA Container Toolkit on host	See CUDA section
ROCm	`libgomp1` + ROCm runtime	See ROCm docs

Minimal containers (TrueNAS Scale, slim Docker): The default cpu build links libgomp1 (OpenMP) for best performance. If that library is missing, the plugin automatically falls back to the noavx build, which is compiled with OpenMP off and has no such dependency — so transcription still works out of the box, just without the small OpenMP speed-up.

Older / low-power CPUs (no AVX): The cpu, vulkan and cuda12 builds are compiled with AVX/AVX2 instructions and will crash with an illegal instruction error (exit 132) on CPUs that lack them — common on budget NAS boxes, Atom/Celeron chips and some VMs. The plugin detects missing AVX support and automatically uses the noavx (Compatibility) build instead, both when recommending a variant and as a download-time fallback. You can also pick "CPU (Compatibility)" manually on the setup page.

For the faster build, install libgomp1 persistently via your container's entrypoint or Dockerfile:

apt-get update -qq && apt-get install -y -qq --no-install-recommends libgomp1 && rm -rf /var/lib/apt/lists/*

The plugin's setup page will detect missing libraries and warn you before downloading.

Verifying the Installation

# If in PATH:
whisper-cli --help

# If using an absolute path:
/opt/whisper/whisper-cli --help

# Inside a Docker container:
docker exec jellyfin /opt/whisper/whisper-cli --help

GPU Acceleration

whisper.cpp supports GPU offloading via Vulkan (Intel, AMD, and some NVIDIA GPUs), CUDA (NVIDIA), and ROCm (AMD). GPU acceleration dramatically reduces transcription time, especially with larger models.

Docker users: Passing the GPU device (e.g., /dev/dri) to a container is not enough -- the container also needs the matching userspace libraries installed. The auto-setup wizard detects both the device and the library and will fall back to CPU if the library is missing.

Backend Device Required library Install command (Debian/Ubuntu)
CUDA /dev/nvidia0 libcuda.so.1 nvidia-container-toolkit (host)
Vulkan /dev/dri libvulkan.so.1 + ICD JSON apt install libvulkan1 mesa-vulkan-drivers (also needs /usr/share/vulkan/icd.d/*.json)
ROCm /dev/kfd libamdhip64.so apt install rocm-hip-runtime

Backend	Device	Required library	Install command (Debian/Ubuntu)
CUDA	`/dev/nvidia0`	`libcuda.so.1`	`nvidia-container-toolkit` (host)
Vulkan	`/dev/dri`	`libvulkan.so.1` + ICD JSON	`apt install libvulkan1 mesa-vulkan-drivers` (also needs `/usr/share/vulkan/icd.d/*.json`)
ROCm	`/dev/kfd`	`libamdhip64.so`	`apt install rocm-hip-runtime`

Vulkan (Intel / AMD)

Vulkan is the best option for Intel iGPUs (e.g., UHD 770) and AMD GPUs. It works through the Mesa Vulkan drivers.

Building whisper.cpp with Vulkan

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build \
  -DGGML_VULKAN=ON \
  -DBUILD_SHARED_LIBS=OFF
cmake --build build --config Release -j$(nproc)
# Binary: build/bin/whisper-cli

Important: The CMake flag is -DGGML_VULKAN=ON (not -DWHISPER_VULKAN). This is a common source of confusion.

Runtime Dependencies

The Vulkan binary requires these libraries at runtime:

Package (Debian/Ubuntu)	Purpose
`libvulkan1`	Vulkan loader
`mesa-vulkan-drivers`	Intel (ANV) and AMD (RADV) Vulkan ICDs
`libgomp1`	OpenMP threading

apt-get install -y libvulkan1 mesa-vulkan-drivers libgomp1

Docker: GPU Passthrough for Vulkan

To use an Intel or AMD GPU inside a Docker container:

services:
  jellyfin:
    image: jellyfin/jellyfin
    devices:
      - /dev/dri:/dev/dri    # GPU render nodes
    volumes:
      - /opt/whisper:/opt/whisper:ro

The container also needs the Vulkan runtime libraries. If using the official Jellyfin image (Debian-based), install them on startup:

    entrypoint:
      - /bin/bash
      - -c
      - |
        dpkg -s libvulkan1 > /dev/null 2>&1 || \
          (apt-get update -qq && \
           apt-get install -y -qq --no-install-recommends \
             libvulkan1 mesa-vulkan-drivers libgomp1 > /dev/null 2>&1 && \
           rm -rf /var/lib/apt/lists/*)
        exec /jellyfin/jellyfin

Verify GPU detection inside the container:

# Should show your GPU (e.g., "Intel(R) UHD Graphics 770")
docker exec jellyfin apt-get update -qq && \
  docker exec jellyfin apt-get install -y -qq vulkan-tools && \
  docker exec jellyfin vulkaninfo --summary

Building Inside Docker (ABI Compatibility)

When Jellyfin runs in a container, the whisper binary must be compiled against matching system libraries. Build inside a container with the same base image:

# On the Docker host:
docker run --rm -v /opt/whisper:/output debian:trixie bash -c '
  apt-get update && apt-get install -y git cmake g++ libvulkan-dev &&
  git clone https://github.com/ggerganov/whisper.cpp.git /tmp/whisper &&
  cd /tmp/whisper &&
  cmake -B build -DGGML_VULKAN=ON -DBUILD_SHARED_LIBS=OFF &&
  cmake --build build --config Release -j$(nproc) &&
  cp build/bin/whisper-cli /output/whisper-cli
'

CUDA (NVIDIA)

For NVIDIA GPUs with CUDA support:

Building whisper.cpp with CUDA

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build \
  -DGGML_CUDA=ON \
  -DBUILD_SHARED_LIBS=OFF
cmake --build build --config Release -j$(nproc)

Docker: NVIDIA GPU Passthrough

services:
  jellyfin:
    image: jellyfin/jellyfin
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    volumes:
      - /opt/whisper:/opt/whisper:ro

Requires the NVIDIA Container Toolkit.

Verifying GPU Acceleration

After configuring GPU support, trigger a transcription and check the Jellyfin logs. You should see:

# Vulkan
whisper_backend_init_gpu: using Vulkan0 backend

# CUDA
whisper_backend_init_gpu: using CUDA0 backend

If you see no GPU found or using CPU backend, the binary was not built with GPU support or the runtime drivers are missing.

Model Recommendations

Model	Size	Speed (CPU)	Speed (GPU)	Quality	Use Case
`ggml-large-v3-turbo-q5_0.bin`	574 MB	Moderate	Fast	Excellent	Recommended. Best quality/size ratio.
`ggml-large-v3-turbo.bin`	1.6 GB	Slow	Fast	Excellent	Full-precision turbo. Slightly better quality, 3x larger.
`ggml-medium-q5_0.bin`	539 MB	Moderate	Fast	Very good	Similar size to turbo-q5 but slower and less accurate.
`ggml-medium.bin`	1.5 GB	Moderate	Fast	Very good	Full-precision medium model.
`ggml-small.bin`	488 MB	Fast	Very fast	Good	Faster inference, lower accuracy.
`ggml-base.bin`	148 MB	Fast	Very fast	Fair	Lightweight. Fast but noticeably less accurate.
`ggml-tiny.bin`	78 MB	Very fast	Very fast	Basic	Smallest model. Only for testing or constrained environments.

The Q5 quantized models offer nearly identical quality to their F16 counterparts at a fraction of the size. ggml-large-v3-turbo-q5_0 is the default when downloading from the plugin settings page.

Configuration

After installation, navigate to Dashboard > Plugins > WhisperSubs to configure:

Setting	Description
Default Language	`Auto-detect` reads the language from each file's audio stream metadata and generates matching subtitles. Choose a specific language to force it for all transcriptions.
Subtitle Mode	Full, Forced Only, or Full + Forced. See Subtitle Modes below.
Enable Auto-Generation	When enabled, the scheduled task will scan selected libraries and generate subtitles for items that lack them.
Enabled Libraries	Select which libraries should be monitored for automatic subtitle generation.
Enable Lyrics Generation	When enabled, music libraries are scanned and audio tracks receive `.lrc` lyrics files (experimental -- whisper is optimized for speech, not singing).
Whisper Binary Path	(Advanced) Absolute path to the `whisper-cli` binary. Leave empty to use the auto-downloaded binary or search `PATH`.
Whisper Model Path	(Advanced) Absolute path to the GGML model file. Leave empty to use the auto-downloaded model.
Whisper Thread Count	(Advanced) Number of CPU threads for whisper inference. `0` = whisper default (4). Set to your CPU core count for faster transcription.

Subtitle Modes

Mode	What it generates	Performance
Full (default)	Complete transcription of all speech	Fast -- single whisper run per audio track
Forced Only	Only foreign-language dialogue (e.g., French lines in an English movie)	Slow -- see below
Full + Forced	Both files per track	Slowest -- runs both pipelines

Performance warning for Forced / Full + Forced modes: Forced subtitle generation uses a multi-step pipeline: audio extraction, VAD-based speech segmentation, then per-chunk language detection on every ~30-second segment of the movie. For a 2-hour film this means ~240 individual whisper calls just for detection, before any transcription begins. On CPU, this phase alone can take 10--20+ minutes per movie. GPU acceleration helps significantly.

If you don't need forced subtitles (most users don't), use Full mode for much faster processing.

Language Handling

The plugin supports three language modes:

Auto-detect (recommended) -- The plugin uses FFprobe to read the audio stream's language tag (e.g., spa → es, eng → en). Subtitles are generated in the language that matches the audio. If a file has multiple audio tracks in different languages, subtitles are generated for each one.
Whisper auto-detection -- When no language metadata is available, the request falls through to whisper's built-in language detection (-l auto), which analyzes the first 30 seconds of audio.
Forced language -- Set a specific language code (e.g., es) in the configuration or per-request via the API. This overrides detection and tells whisper to transcribe using that language model.

Subtitle Timing

whisper.cpp emits subtitle segments back-to-back with no gaps, so the next line can appear on screen during the pause before it is actually spoken. The plugin corrects this, and the relevant settings are on by default:

Setting	What it does
Enable VAD	Runs whisper-cli with its native Silero Voice Activity Detection (`--vad`), so each cue starts at the real speech onset rather than during the preceding silence. The Silero VAD model (~865 KB) is auto-downloaded into the plugin's `whisper/vad/` data directory on first use. This is the primary speech-onset mechanism.
Align subtitles to speech	Older, energy-based fallback. Snaps each subtitle's start to the detected speech onset using a quick FFmpeg silence-detection pass over the audio. Used only when Enable VAD is off (native VAD handles this more reliably).
Compensate audio start offset	Shifts all subtitle timestamps by the audio stream's container start time, keeping subtitles in sync when a file's audio doesn't begin exactly at 0:00.

These corrections apply only to locally-generated subtitles (whisper-cli) -- both full and translated subtitles. They do not affect the remote Whisper API or forced subtitles.

With native VAD enabled (the default), no extra FFmpeg pass is needed. The FFmpeg silence-detection alignment only runs as a fallback when VAD is disabled.

Usage

Admin Dashboard

The plugin adds a dedicated page to the Jellyfin admin dashboard (accessible from Dashboard > Plugins > WhisperSubs, or from the main sidebar menu). From there you can:

Configure the plugin settings (language, subtitle mode, binary/model paths, enabled libraries).
Manage the whisper engine -- download binaries (CPU / Vulkan / CUDA / ROCm) and models directly from the UI.
Browse all libraries and their items.
See which items already have subtitles (green check / orange cross).
Select a language for subtitle generation (auto-detect or any specific language).
Generate subtitles for individual items with a single click.
Monitor progress -- a live banner shows the current item, processing phase, and queue depth.

REST API

All endpoints require Jellyfin admin authentication. Setup endpoints additionally require elevated privileges.

Library & Items

Method	Endpoint	Description
`GET`	`/Plugins/WhisperSubs/Libraries`	List all media libraries
`GET`	`/Plugins/WhisperSubs/Libraries/{libraryId}/Items`	List items in a library (supports `startIndex` and `limit`)
`POST`	`/Plugins/WhisperSubs/Items/{itemId}/Generate?language=auto`	Queue subtitle generation (priority)
`GET`	`/Plugins/WhisperSubs/Items/{itemId}/AudioLanguages`	Detect audio languages in a media file
`GET`	`/Plugins/WhisperSubs/Items/{itemId}/Status`	Check subtitle generation status

Queue & Task

Method	Endpoint	Description
`GET`	`/Plugins/WhisperSubs/Queue`	Queue status: current item, progress, phase, remaining count
`POST`	`/Plugins/WhisperSubs/RunTask`	Trigger the scheduled subtitle generation task
`GET`	`/Plugins/WhisperSubs/Models`	List downloaded models with active/size info

Engine Setup (requires elevated privileges)

Method	Endpoint	Description
`GET`	`/Plugins/WhisperSubs/Setup/Status`	Binary/model status, GPU detection, platform info
`GET`	`/Plugins/WhisperSubs/Setup/BinaryVariants`	Available binary variants for this platform
`POST`	`/Plugins/WhisperSubs/Setup/DownloadBinary?variant=cpu`	Download whisper-cli binary
`GET`	`/Plugins/WhisperSubs/Setup/AvailableModels`	Model catalog with sizes and descriptions
`POST`	`/Plugins/WhisperSubs/Setup/DownloadModel?name=...`	Download a model from HuggingFace
`GET`	`/Plugins/WhisperSubs/Setup/Progress`	Download progress (percent, message, errors)
`POST`	`/Plugins/WhisperSubs/Setup/Models/{filename}/Activate`	Set a downloaded model as active
`DELETE`	`/Plugins/WhisperSubs/Setup/Models/{filename}`	Delete a downloaded model

The language parameter accepts auto (default), or any ISO 639-1 code (en, es, fr, etc.).

Scheduled Task

A scheduled task named Generate Subtitles is registered under the WhisperSubs category. It can be configured in Dashboard > Scheduled Tasks with your preferred schedule or triggered manually. The task:

Scans all enabled libraries (or all libraries if none are explicitly selected).
Finds video items that lack subtitles.
Generates subtitles using the configured default language (auto-detect by default).
Reports progress in the Jellyfin task UI.

Skipping Already-Subtitled Media

The auto-generation task skips media that already has a usable subtitle in the needed language, so an already-subtitled library is not needlessly re-processed. For the translation pass, an existing English subtitle track -- embedded or external -- counts as already translated and is skipped. Forced (foreign-dialogue-only) and image-based subtitle tracks do not count as satisfying the need, so full subtitles are still generated when only those are present.

These settings control it. The skip toggles and Generate original-language subtitles are on by default; translation and the image-subtitle toggle are off by default:

Setting	What it does
Skip media that already has subtitles	Skips media that already has a usable subtitle in the needed language, including an existing English subtitle when translating.
Ignore forced subtitles when skipping	Forced subtitle tracks do not count as "already subtitled", so full subtitles are still generated when only forced tracks exist.
Generate original-language subtitles	The main switch: transcribe each title in its own spoken language — a Korean film gets Korean, an English film gets English. On by default.
Count image-based subtitles as present	When on, existing image-based subtitles (PGS/VOBSUB) count as "already subtitled" and no text subtitle is generated. Off by default, since image subs can't be searched or edited.
Also create an English subtitle when a title has none	Translation to English. For a title whose audio isn't English and that has no English subtitle, additionally produce one. Skips titles that already have English audio or an English subtitle. Off by default.

What Whisper can produce: Whisper transcribes the speech it hears, so it writes a subtitle in the title's own audio language — that's the Generate original-language subtitles switch (an English film naturally gets English subtitles here). The one thing it can additionally do is translate to English: that's the separate Translation toggle, which only ever adds an English subtitle to a foreign-language title that doesn't already have one. There are no other targets — English is the only language Whisper translates into.

Scope: the skip logic and these toggles apply to the scheduled auto-generation task and bulk "Generate all" actions. A manual "Generate" on a single item always transcribes (it bypasses the skip and the original-language toggle), so you can force fresh subtitles for a file even when it already has some — e.g. to replace a poor embedded track.

Note: detection reads each item's subtitle streams from Jellyfin's library metadata, so a recent library scan keeps it accurate. If an item hasn't been scanned yet, the plugin errs toward generating rather than wrongly skipping.

How It Works

Language Detection -- FFprobe reads the audio stream metadata to determine the spoken language(s).
Audio Extraction -- FFmpeg extracts a 16 kHz mono WAV track from the media file.
Transcription -- The extracted audio is passed to whisper.cpp, which produces an SRT subtitle file. For forced subtitles, the audio is first segmented via VAD (silence detection), then each ~30-second chunk is language-classified before selectively transcribing only foreign-language segments.
Output -- Files are saved alongside the original media:
- Full subtitles: Movie.es.generated.srt
- Forced subtitles: Movie.es.forced.generated.srt
- Lyrics: Song.lrc
Metadata Refresh -- The item's metadata is refreshed so Jellyfin picks up the new files immediately.

Temporary audio files are cleaned up automatically after processing. Items that have already been processed are tracked with marker files (.noforeignlang) to avoid redundant work on subsequent scans.

Roadmap

See ROADMAP.md for planned features and design details.

Other Jellyfin Projects by GeiserX

smart-covers — Cover extraction for books, audiobooks, comics, magazines, and music libraries with online fallback
quality-gate — Restrict users to specific media versions based on configurable path-based policies
jellyfin-encoder — Automatic 720p HEVC/AV1 transcoding service with hardware acceleration
jellyfin-telegram-channel-sync — Sync Jellyfin access with Telegram channel membership

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for the full text.