README.md
June 16, 2026 · View on GitHub
WhisperSubs is a Jellyfin plugin that automatically generates subtitles for your media library using local AI models. All transcription runs entirely on your server -- no audio data ever leaves your network. Your media stays private.
Features
- Fully Local Processing -- Audio is transcribed on your hardware using whisper.cpp. No cloud APIs, no external services, no data exfiltration.
- Built-in Engine Setup -- Download whisper-cli binaries and models directly from the plugin settings page on Linux. No manual installation needed for most users.
- Automatic Language Detection -- Reads audio stream metadata to detect the spoken language and generate matching subtitles. Falls back to whisper's built-in language detection when tags are absent.
- Forced Subtitles -- Detect and transcribe only foreign-language dialogue (e.g., French lines in an English movie) via VAD-based speech segmentation and per-chunk language detection.
- Lyrics Generation (Experimental) -- Generate
.lrclyrics files for music libraries via whisper transcription. Jellyfin picks up.lrcfiles automatically. - GPU Acceleration -- Supports CUDA (NVIDIA), Vulkan (Intel / AMD / NVIDIA), and ROCm (AMD) for significantly faster transcription.
- Priority Queue -- Manual requests are queued with priority and processed before scheduled items. Queue persists across restarts.
- Real-time Progress -- Live progress banner in the admin UI showing current item, phase (extracting audio, transcribing), per-file progress, and overall stats.
- Subtitle Resume -- If transcription is interrupted, it resumes from the last timestamp rather than starting over.
- Admin Dashboard UI -- Browse libraries, view items, manage the whisper engine, and trigger subtitle generation directly from the Jellyfin admin panel.
- Scheduled Tasks -- Enable automatic scanning so new media gets subtitles without manual intervention. Runs daily at 2:00 AM and on startup by default.
- Per-Library Control -- Choose which libraries are monitored for automatic subtitle generation.
- Multiple Output Formats -- Generates
.srtsubtitles,.forced.generated.srtforced subtitles, and.lrclyrics, all placed alongside your media and auto-detected by Jellyfin.
Prerequisites
| Dependency | Details |
|---|---|
| Jellyfin | 10.11.0 or later |
| FFmpeg | Bundled with Jellyfin (/usr/lib/jellyfin-ffmpeg/ffmpeg) or available in PATH. Used to extract audio from media files. |
| whisper.cpp | The whisper-cli binary. On Linux, the plugin can download this automatically from the settings page. Otherwise, install manually -- see Installing whisper.cpp. |
| Whisper Model | A GGML model file. The plugin can download models automatically from the settings page, or download manually from Hugging Face. |
Quick start (Linux): After installing the plugin, go to Dashboard > Plugins > WhisperSubs. The Whisper Engine section lets you download both the binary and a model with one click each. The manual steps below are only needed for non-Linux platforms or custom setups.
Installation
From the Jellyfin Plugin Repository (Recommended)
- In Jellyfin, go to Dashboard > Plugins > Repositories.
- Add a new repository with this URL:
https://geiserx.github.io/whisper-subs/manifest.json - Go to Catalog, find WhisperSubs, and click Install.
- Restart Jellyfin.
Manual Installation
- Build from source:
dotnet build --configuration Release - Copy
WhisperSubs.dllto your Jellyfin plugins directory:/var/lib/jellyfin/plugins/WhisperSubs/ - Restart Jellyfin.
Installing whisper.cpp
The plugin requires whisper.cpp for transcription. Choose the method that matches your setup.
Option A: Pre-built Binary (Recommended for most users)
- Download the latest release for your platform from whisper.cpp releases.
- Extract and place the
whisper-clibinary somewhere persistent (e.g.,/opt/whisper/). - Download a model:
mkdir -p /opt/whisper/models # Base model (~148 MB) -- fast, good for quick transcription wget -O /opt/whisper/models/ggml-base.bin \ https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin # Large V3 Turbo (~1.6 GB) -- best accuracy with reasonable speed (recommended) wget -O /opt/whisper/models/ggml-large-v3-turbo.bin \ https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo.bin - In the plugin settings, set Whisper Binary Path to
/opt/whisper/whisper-cliand Whisper Model Path to the model file.
Option B: Build from Source (CPU only)
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build -DBUILD_SHARED_LIBS=OFF
cmake --build build --config Release -j$(nproc)
# Binary will be at build/bin/whisper-cli
Option C: Build from Source with GPU Acceleration
See GPU Acceleration below for detailed instructions.
Docker / Container Setups
If Jellyfin runs in a Docker container, whisper.cpp must be accessible inside the container. The recommended approach is to bind-mount a host directory containing the binary and model:
# docker-compose.yml
services:
jellyfin:
image: jellyfin/jellyfin
volumes:
- /opt/whisper:/opt/whisper:ro # whisper-cli binary + models
# ... your other volumes
Then configure the plugin with:
- Whisper Binary Path:
/opt/whisper/whisper-cli - Whisper Model Path:
/opt/whisper/models/ggml-large-v3-turbo.bin
Note: The binary must be compiled for the same architecture as the container (typically x86_64 Linux). Download the
linux-x64release asset or build inside a matching environment.
Container Library Requirements
The plugin's built-in binary downloader fetches pre-built whisper-cli binaries. These require runtime libraries that are not included in the default Jellyfin Docker image:
| Variant | Required packages | Install command |
|---|---|---|
| CPU | libgomp1 | apt install libgomp1 |
| CPU (Compatibility) | none (self-contained) | — |
| Vulkan | libgomp1, libvulkan1, mesa-vulkan-drivers | apt install libgomp1 libvulkan1 mesa-vulkan-drivers |
| CUDA 12 | libgomp1 + NVIDIA Container Toolkit on host | See CUDA section |
| ROCm | libgomp1 + ROCm runtime | See ROCm docs |
Minimal containers (TrueNAS Scale, slim Docker): The default
cpubuild linkslibgomp1(OpenMP) for best performance. If that library is missing, the plugin automatically falls back to thenoavxbuild, which is compiled with OpenMP off and has no such dependency — so transcription still works out of the box, just without the small OpenMP speed-up.
Older / low-power CPUs (no AVX): The
cpu,vulkanandcuda12builds are compiled with AVX/AVX2 instructions and will crash with an illegal instruction error (exit 132) on CPUs that lack them — common on budget NAS boxes, Atom/Celeron chips and some VMs. The plugin detects missing AVX support and automatically uses thenoavx(Compatibility) build instead, both when recommending a variant and as a download-time fallback. You can also pick "CPU (Compatibility)" manually on the setup page.
For the faster build, install libgomp1 persistently via your container's entrypoint or Dockerfile:
apt-get update -qq && apt-get install -y -qq --no-install-recommends libgomp1 && rm -rf /var/lib/apt/lists/*
The plugin's setup page will detect missing libraries and warn you before downloading.
Verifying the Installation
# If in PATH:
whisper-cli --help
# If using an absolute path:
/opt/whisper/whisper-cli --help
# Inside a Docker container:
docker exec jellyfin /opt/whisper/whisper-cli --help
GPU Acceleration
whisper.cpp supports GPU offloading via Vulkan (Intel, AMD, and some NVIDIA GPUs), CUDA (NVIDIA), and ROCm (AMD). GPU acceleration dramatically reduces transcription time, especially with larger models.
Docker users: Passing the GPU device (e.g.,
/dev/dri) to a container is not enough -- the container also needs the matching userspace libraries installed. The auto-setup wizard detects both the device and the library and will fall back to CPU if the library is missing.
Backend Device Required library Install command (Debian/Ubuntu) CUDA /dev/nvidia0libcuda.so.1nvidia-container-toolkit(host)Vulkan /dev/drilibvulkan.so.1+ ICD JSONapt install libvulkan1 mesa-vulkan-drivers(also needs/usr/share/vulkan/icd.d/*.json)ROCm /dev/kfdlibamdhip64.soapt install rocm-hip-runtime
Vulkan (Intel / AMD)
Vulkan is the best option for Intel iGPUs (e.g., UHD 770) and AMD GPUs. It works through the Mesa Vulkan drivers.
Building whisper.cpp with Vulkan
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build \
-DGGML_VULKAN=ON \
-DBUILD_SHARED_LIBS=OFF
cmake --build build --config Release -j$(nproc)
# Binary: build/bin/whisper-cli
Important: The CMake flag is
-DGGML_VULKAN=ON(not-DWHISPER_VULKAN). This is a common source of confusion.
Runtime Dependencies
The Vulkan binary requires these libraries at runtime:
| Package (Debian/Ubuntu) | Purpose |
|---|---|
libvulkan1 | Vulkan loader |
mesa-vulkan-drivers | Intel (ANV) and AMD (RADV) Vulkan ICDs |
libgomp1 | OpenMP threading |
apt-get install -y libvulkan1 mesa-vulkan-drivers libgomp1
Docker: GPU Passthrough for Vulkan
To use an Intel or AMD GPU inside a Docker container:
services:
jellyfin:
image: jellyfin/jellyfin
devices:
- /dev/dri:/dev/dri # GPU render nodes
volumes:
- /opt/whisper:/opt/whisper:ro
The container also needs the Vulkan runtime libraries. If using the official Jellyfin image (Debian-based), install them on startup:
entrypoint:
- /bin/bash
- -c
- |
dpkg -s libvulkan1 > /dev/null 2>&1 || \
(apt-get update -qq && \
apt-get install -y -qq --no-install-recommends \
libvulkan1 mesa-vulkan-drivers libgomp1 > /dev/null 2>&1 && \
rm -rf /var/lib/apt/lists/*)
exec /jellyfin/jellyfin
Verify GPU detection inside the container:
# Should show your GPU (e.g., "Intel(R) UHD Graphics 770")
docker exec jellyfin apt-get update -qq && \
docker exec jellyfin apt-get install -y -qq vulkan-tools && \
docker exec jellyfin vulkaninfo --summary
Building Inside Docker (ABI Compatibility)
When Jellyfin runs in a container, the whisper binary must be compiled against matching system libraries. Build inside a container with the same base image:
# On the Docker host:
docker run --rm -v /opt/whisper:/output debian:trixie bash -c '
apt-get update && apt-get install -y git cmake g++ libvulkan-dev &&
git clone https://github.com/ggerganov/whisper.cpp.git /tmp/whisper &&
cd /tmp/whisper &&
cmake -B build -DGGML_VULKAN=ON -DBUILD_SHARED_LIBS=OFF &&
cmake --build build --config Release -j$(nproc) &&
cp build/bin/whisper-cli /output/whisper-cli
'
CUDA (NVIDIA)
For NVIDIA GPUs with CUDA support:
Building whisper.cpp with CUDA
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build \
-DGGML_CUDA=ON \
-DBUILD_SHARED_LIBS=OFF
cmake --build build --config Release -j$(nproc)
Docker: NVIDIA GPU Passthrough
services:
jellyfin:
image: jellyfin/jellyfin
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
- /opt/whisper:/opt/whisper:ro
Requires the NVIDIA Container Toolkit.
Verifying GPU Acceleration
After configuring GPU support, trigger a transcription and check the Jellyfin logs. You should see:
# Vulkan
whisper_backend_init_gpu: using Vulkan0 backend
# CUDA
whisper_backend_init_gpu: using CUDA0 backend
If you see no GPU found or using CPU backend, the binary was not built with GPU support or the runtime drivers are missing.
Model Recommendations
| Model | Size | Speed (CPU) | Speed (GPU) | Quality | Use Case |
|---|---|---|---|---|---|
ggml-large-v3-turbo-q5_0.bin | 574 MB | Moderate | Fast | Excellent | Recommended. Best quality/size ratio. |
ggml-large-v3-turbo.bin | 1.6 GB | Slow | Fast | Excellent | Full-precision turbo. Slightly better quality, 3x larger. |
ggml-medium-q5_0.bin | 539 MB | Moderate | Fast | Very good | Similar size to turbo-q5 but slower and less accurate. |
ggml-medium.bin | 1.5 GB | Moderate | Fast | Very good | Full-precision medium model. |
ggml-small.bin | 488 MB | Fast | Very fast | Good | Faster inference, lower accuracy. |
ggml-base.bin | 148 MB | Fast | Very fast | Fair | Lightweight. Fast but noticeably less accurate. |
ggml-tiny.bin | 78 MB | Very fast | Very fast | Basic | Smallest model. Only for testing or constrained environments. |
The Q5 quantized models offer nearly identical quality to their F16 counterparts at a fraction of the size. ggml-large-v3-turbo-q5_0 is the default when downloading from the plugin settings page.
Configuration
After installation, navigate to Dashboard > Plugins > WhisperSubs to configure:
| Setting | Description |
|---|---|
| Default Language | Auto-detect reads the language from each file's audio stream metadata and generates matching subtitles. Choose a specific language to force it for all transcriptions. |
| Subtitle Mode | Full, Forced Only, or Full + Forced. See Subtitle Modes below. |
| Enable Auto-Generation | When enabled, the scheduled task will scan selected libraries and generate subtitles for items that lack them. |
| Enabled Libraries | Select which libraries should be monitored for automatic subtitle generation. |
| Enable Lyrics Generation | When enabled, music libraries are scanned and audio tracks receive .lrc lyrics files (experimental -- whisper is optimized for speech, not singing). |
| Whisper Binary Path | (Advanced) Absolute path to the whisper-cli binary. Leave empty to use the auto-downloaded binary or search PATH. |
| Whisper Model Path | (Advanced) Absolute path to the GGML model file. Leave empty to use the auto-downloaded model. |
| Whisper Thread Count | (Advanced) Number of CPU threads for whisper inference. 0 = whisper default (4). Set to your CPU core count for faster transcription. |
Subtitle Modes
| Mode | What it generates | Performance |
|---|---|---|
| Full (default) | Complete transcription of all speech | Fast -- single whisper run per audio track |
| Forced Only | Only foreign-language dialogue (e.g., French lines in an English movie) | Slow -- see below |
| Full + Forced | Both files per track | Slowest -- runs both pipelines |
Performance warning for Forced / Full + Forced modes: Forced subtitle generation uses a multi-step pipeline: audio extraction, VAD-based speech segmentation, then per-chunk language detection on every ~30-second segment of the movie. For a 2-hour film this means ~240 individual whisper calls just for detection, before any transcription begins. On CPU, this phase alone can take 10--20+ minutes per movie. GPU acceleration helps significantly.
If you don't need forced subtitles (most users don't), use Full mode for much faster processing.
Language Handling
The plugin supports three language modes:
-
Auto-detect (recommended) -- The plugin uses FFprobe to read the audio stream's language tag (e.g.,
spa→es,eng→en). Subtitles are generated in the language that matches the audio. If a file has multiple audio tracks in different languages, subtitles are generated for each one. -
Whisper auto-detection -- When no language metadata is available, the request falls through to whisper's built-in language detection (
-l auto), which analyzes the first 30 seconds of audio. -
Forced language -- Set a specific language code (e.g.,
es) in the configuration or per-request via the API. This overrides detection and tells whisper to transcribe using that language model.
Subtitle Timing
whisper.cpp emits subtitle segments back-to-back with no gaps, so the next line can appear on screen during the pause before it is actually spoken. The plugin corrects this, and the relevant settings are on by default:
| Setting | What it does |
|---|---|
| Enable VAD | Runs whisper-cli with its native Silero Voice Activity Detection (--vad), so each cue starts at the real speech onset rather than during the preceding silence. The Silero VAD model (~865 KB) is auto-downloaded into the plugin's whisper/vad/ data directory on first use. This is the primary speech-onset mechanism. |
| Align subtitles to speech | Older, energy-based fallback. Snaps each subtitle's start to the detected speech onset using a quick FFmpeg silence-detection pass over the audio. Used only when Enable VAD is off (native VAD handles this more reliably). |
| Compensate audio start offset | Shifts all subtitle timestamps by the audio stream's container start time, keeping subtitles in sync when a file's audio doesn't begin exactly at 0:00. |
These corrections apply only to locally-generated subtitles (whisper-cli) -- both full and translated subtitles. They do not affect the remote Whisper API or forced subtitles.
With native VAD enabled (the default), no extra FFmpeg pass is needed. The FFmpeg silence-detection alignment only runs as a fallback when VAD is disabled.
Usage
Admin Dashboard
The plugin adds a dedicated page to the Jellyfin admin dashboard (accessible from Dashboard > Plugins > WhisperSubs, or from the main sidebar menu). From there you can:
- Configure the plugin settings (language, subtitle mode, binary/model paths, enabled libraries).
- Manage the whisper engine -- download binaries (CPU / Vulkan / CUDA / ROCm) and models directly from the UI.
- Browse all libraries and their items.
- See which items already have subtitles (green check / orange cross).
- Select a language for subtitle generation (auto-detect or any specific language).
- Generate subtitles for individual items with a single click.
- Monitor progress -- a live banner shows the current item, processing phase, and queue depth.
REST API
All endpoints require Jellyfin admin authentication. Setup endpoints additionally require elevated privileges.
Library & Items
| Method | Endpoint | Description |
|---|---|---|
GET | /Plugins/WhisperSubs/Libraries | List all media libraries |
GET | /Plugins/WhisperSubs/Libraries/{libraryId}/Items | List items in a library (supports startIndex and limit) |
POST | /Plugins/WhisperSubs/Items/{itemId}/Generate?language=auto | Queue subtitle generation (priority) |
GET | /Plugins/WhisperSubs/Items/{itemId}/AudioLanguages | Detect audio languages in a media file |
GET | /Plugins/WhisperSubs/Items/{itemId}/Status | Check subtitle generation status |
Queue & Task
| Method | Endpoint | Description |
|---|---|---|
GET | /Plugins/WhisperSubs/Queue | Queue status: current item, progress, phase, remaining count |
POST | /Plugins/WhisperSubs/RunTask | Trigger the scheduled subtitle generation task |
GET | /Plugins/WhisperSubs/Models | List downloaded models with active/size info |
Engine Setup (requires elevated privileges)
| Method | Endpoint | Description |
|---|---|---|
GET | /Plugins/WhisperSubs/Setup/Status | Binary/model status, GPU detection, platform info |
GET | /Plugins/WhisperSubs/Setup/BinaryVariants | Available binary variants for this platform |
POST | /Plugins/WhisperSubs/Setup/DownloadBinary?variant=cpu | Download whisper-cli binary |
GET | /Plugins/WhisperSubs/Setup/AvailableModels | Model catalog with sizes and descriptions |
POST | /Plugins/WhisperSubs/Setup/DownloadModel?name=... | Download a model from HuggingFace |
GET | /Plugins/WhisperSubs/Setup/Progress | Download progress (percent, message, errors) |
POST | /Plugins/WhisperSubs/Setup/Models/{filename}/Activate | Set a downloaded model as active |
DELETE | /Plugins/WhisperSubs/Setup/Models/{filename} | Delete a downloaded model |
The language parameter accepts auto (default), or any ISO 639-1 code (en, es, fr, etc.).
Scheduled Task
A scheduled task named Generate Subtitles is registered under the WhisperSubs category. It can be configured in Dashboard > Scheduled Tasks with your preferred schedule or triggered manually. The task:
- Scans all enabled libraries (or all libraries if none are explicitly selected).
- Finds video items that lack subtitles.
- Generates subtitles using the configured default language (auto-detect by default).
- Reports progress in the Jellyfin task UI.
Skipping Already-Subtitled Media
The auto-generation task skips media that already has a usable subtitle in the needed language, so an already-subtitled library is not needlessly re-processed. For the translation pass, an existing English subtitle track -- embedded or external -- counts as already translated and is skipped. Forced (foreign-dialogue-only) and image-based subtitle tracks do not count as satisfying the need, so full subtitles are still generated when only those are present.
These settings control it. The skip toggles and Generate original-language subtitles are on by default; translation and the image-subtitle toggle are off by default:
| Setting | What it does |
|---|---|
| Skip media that already has subtitles | Skips media that already has a usable subtitle in the needed language, including an existing English subtitle when translating. |
| Ignore forced subtitles when skipping | Forced subtitle tracks do not count as "already subtitled", so full subtitles are still generated when only forced tracks exist. |
| Generate original-language subtitles | The main switch: transcribe each title in its own spoken language — a Korean film gets Korean, an English film gets English. On by default. |
| Count image-based subtitles as present | When on, existing image-based subtitles (PGS/VOBSUB) count as "already subtitled" and no text subtitle is generated. Off by default, since image subs can't be searched or edited. |
| Also create an English subtitle when a title has none | Translation to English. For a title whose audio isn't English and that has no English subtitle, additionally produce one. Skips titles that already have English audio or an English subtitle. Off by default. |
What Whisper can produce: Whisper transcribes the speech it hears, so it writes a subtitle in the title's own audio language — that's the Generate original-language subtitles switch (an English film naturally gets English subtitles here). The one thing it can additionally do is translate to English: that's the separate Translation toggle, which only ever adds an English subtitle to a foreign-language title that doesn't already have one. There are no other targets — English is the only language Whisper translates into.
Scope: the skip logic and these toggles apply to the scheduled auto-generation task and bulk "Generate all" actions. A manual "Generate" on a single item always transcribes (it bypasses the skip and the original-language toggle), so you can force fresh subtitles for a file even when it already has some — e.g. to replace a poor embedded track.
Note: detection reads each item's subtitle streams from Jellyfin's library metadata, so a recent library scan keeps it accurate. If an item hasn't been scanned yet, the plugin errs toward generating rather than wrongly skipping.
How It Works
- Language Detection -- FFprobe reads the audio stream metadata to determine the spoken language(s).
- Audio Extraction -- FFmpeg extracts a 16 kHz mono WAV track from the media file.
- Transcription -- The extracted audio is passed to whisper.cpp, which produces an SRT subtitle file. For forced subtitles, the audio is first segmented via VAD (silence detection), then each ~30-second chunk is language-classified before selectively transcribing only foreign-language segments.
- Output -- Files are saved alongside the original media:
- Full subtitles:
Movie.es.generated.srt - Forced subtitles:
Movie.es.forced.generated.srt - Lyrics:
Song.lrc
- Full subtitles:
- Metadata Refresh -- The item's metadata is refreshed so Jellyfin picks up the new files immediately.
Temporary audio files are cleaned up automatically after processing. Items that have already been processed are tracked with marker files (.noforeignlang) to avoid redundant work on subsequent scans.
Roadmap
See ROADMAP.md for planned features and design details.
Other Jellyfin Projects by GeiserX
- smart-covers — Cover extraction for books, audiobooks, comics, magazines, and music libraries with online fallback
- quality-gate — Restrict users to specific media versions based on configurable path-based policies
- jellyfin-encoder — Automatic 720p HEVC/AV1 transcoding service with hardware acceleration
- jellyfin-telegram-channel-sync — Sync Jellyfin access with Telegram channel membership
License
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for the full text.