YouDub WebUI

May 26, 2026 · View on GitHub

YouDub

YouDub WebUI

QQ group: 618246010

An open-source video localization tool proven in a real creator workflow.

YouDub WebUI turns a single YouTube or Bilibili video into a dubbed video in the target language. It downloads the source video, separates vocals from background audio, transcribes speech, translates the transcript, generates new voiceover, mixes audio, burns subtitles, and produces a final video that can be played or downloaded from the web UI.

The most mature path is YouTube English -> Chinese dubbing. The app also supports Bilibili Chinese -> English dubbing through the same task pipeline.

中文 README: README.md

Proven Creator Workflow

Author's Bilibili channel: 黑纹白斑马 — 800K+ followers, 20K+ videos. The whole channel is translated and dubbed with YouDub WebUI, covering technology, games, science, animals, history, and more.

This is not just a demo runner. YouDub WebUI is built for creators, developers, and small teams who want to own a complete local video-localization pipeline while keeping the system simple enough to inspect, debug, and modify.

Demo

The samples below were generated by this project and can be played directly on GitHub. The left side is the original video, and the right side is the automatically dubbed version. The dubbed videos include target-language voiceover and subtitles while preserving the original background music and sound effects.

1. Jensen Huang on Nvidia's Competition

YouTube source · YouTube Shorts · English -> Chinese

Original EnglishChinese dubbed

https://github.com/user-attachments/assets/befd11ca-e720-4faa-b4e0-d89bfe73df87

https://github.com/user-attachments/assets/bf01f912-eec8-4e0d-8698-0f69283a73e7

2. How much YT paid me for 129 million shorts views

YouTube source · Long-form landscape video · English -> Chinese · The embedded clip shows the first 40 seconds; the full version is available in the demo-assets Release

Original EnglishChinese dubbed

https://github.com/user-attachments/assets/bd02936f-cf3c-4e4b-85b5-0410d38f69f5

https://github.com/user-attachments/assets/158de60a-7de4-4ddf-b3d8-478d0423aee6

Quick Start

1. Prepare the runtime

Verified and recommended runtime:

  • Windows 10/11 + PowerShell 5.1+: recommended for local development and covered first in this README.
  • Linux / WSL2 / macOS: backend and frontend commands are provided for POSIX shells. CUDA, FFmpeg, PyTorch, and audio dependencies still need to match your platform.
  • CUDA GPU: recommended for complete video processing. DEVICE=cpu can be used for some flows, but transcription, separation, and TTS will be very slow; with DEVICE=mps, Whisper automatically falls back to CPU to avoid the MPS float64 limitation.

Base dependencies:

  • Python 3.12.
  • Node.js 20+.
  • FFmpeg / ffprobe available on PATH.
  • A working YouTube proxy when processing YouTube videos.
  • Netscape-format YouTube cookies, recommended for YouTube videos.
  • An OpenAI-compatible Chat Completions base URL, API key, and model name.

The first run may download or load large ASR, TTS, and audio-processing models. Leave enough disk space and time for that setup.

Platform notes:

  • Windows PowerShell uses .venv\Scripts\...; do not copy .venv/bin/... commands there.
  • macOS/Linux use .venv/bin/....
  • If multiple Python installs exist, check py -0p on Windows or python3.12 --version on macOS/Linux first.
  • Proxy settings, cookies, model cache, and work folders stay local. Quote paths that contain spaces, or put them in .env.

Common system dependency examples:

# Windows PowerShell (choose a package manager already available on your machine)
winget install Gyan.FFmpeg
winget install OpenJS.NodeJS.LTS
# Ubuntu / Debian / WSL2
sudo apt update
sudo apt install -y ffmpeg nodejs npm
# macOS (Homebrew)
brew install ffmpeg node

If your system package manager does not provide Python 3.12, install it from python.org, pyenv, conda/mamba, or your distro's recommended channel. The important part is to create the virtual environment with Python 3.12.

2. Clone

The clone commands are the same on Windows PowerShell, macOS, and Linux:

git clone https://github.com/liuzhao1225/YouDub-webui.git
cd YouDub-webui
git submodule update --init --recursive

Demucs is included as a source submodule, so do not skip git submodule update.

3. Install dependencies

Windows PowerShell

Python:

py -3.12 -m venv .venv
.\.venv\Scripts\python.exe -m pip install -U pip
.\.venv\Scripts\pip.exe install -i https://mirrors.aliyun.com/pypi/simple/ -r requirements.txt

Frontend:

npm --prefix apps/web install --registry=https://registry.npmmirror.com

macOS / Linux / WSL2

Python:

python3.12 -m venv .venv
.venv/bin/python -m pip install -U pip
.venv/bin/pip install -i https://mirrors.aliyun.com/pypi/simple/ -r requirements.txt

Frontend:

npm --prefix apps/web install --registry=https://registry.npmmirror.com

Use Aliyun first. If a specific Python package is temporarily unavailable there, retry only that package with the Tsinghua mirror instead of mixing multiple mirrors in one resolver command.

Optional: NVIDIA CUDA GPU

If you want Whisper, Demucs, or VoxCPM to use an NVIDIA GPU, install the CUDA-enabled PyTorch wheels before installing requirements.txt:

Windows PowerShell:

.\.venv\Scripts\pip.exe install -r requirements-pytorch-cu128.txt

Linux / WSL2:

.venv/bin/pip install -r requirements-pytorch-cu128.txt

requirements-pytorch-cu128.txt uses PyTorch's cu128 wheel index by default. Different NVIDIA driver or CUDA environments may need a different PyTorch CUDA build, so use the official PyTorch installation page when you need a matching command. CPU users and macOS users do not need this step; set DEVICE=cpu in .env when CUDA-enabled PyTorch is not installed.

Verify that CUDA is actually available after installation:

.venv/bin/python -c "import torch; print(torch.__version__, torch.version.cuda, torch.cuda.is_available())"

4. Configure

Windows PowerShell:

Copy-Item env.txt.example .env

macOS / Linux / WSL2:

cp env.txt.example .env

The application reads .env at runtime. Do not commit API keys, cookies, downloaded media, or generated artifacts.

Common environment variables:

VariablePurpose
WORKFOLDERPer-task media, audio segments, and intermediate artifacts.
MODEL_CACHE_DIRModelScope cache directory, used by VoxCPM2 by default.
DEVICEModel runtime device, for example auto, cuda, cuda:0, mps, mps:0, or cpu; auto selects CUDA, then MPS, then CPU.
DEMUCS_DEVICE / WHISPER_DEVICEOptional component-level device overrides. Empty values use DEVICE. Whisper falls back to CPU when MPS is selected because word timestamp alignment depends on float64 DTW, which MPS does not support.
OPENAI_BASE_URLOpenAI-compatible API endpoint, for example https://api.openai.com/v1.
OPENAI_API_KEYAPI key used by the translation stage.
OPENAI_MODELChat Completions model used by the translation stage.
OPENAI_TRANSLATE_CONCURRENCYParallel requests during translation. Default: 50.
YTDLP_PROXY_PORTLocal proxy port used by yt-dlp, for example 7890.
HTTP_PROXY / ALL_PROXYyt-dlp reads HTTP_PROXY when no UI proxy port is set; HTTPX/OpenAI SDK also reads these environment proxies.
NO_PROXYComma-separated proxy bypass list. Include localhost,127.0.0.1,::1 when using a local OpenAI-compatible service so local requests stay direct.
VOXCPM_MODEL / VOXCPM_MODEL_DIRVoxCPM2 ModelScope model ID or local model directory. VoxCPM currently selects CUDA/MPS/CPU inside the upstream package, and task logs report it as voxcpm=library-auto.
VOXCPM_LOAD_DENOISER / VOXCPM_CFG_VALUE / VOXCPM_INFERENCE_TIMESTEPS / VOXCPM_MIN_REFERENCE_MSVoxCPM2 inference controls.
CORS_ALLOW_ORIGINS / CORS_ALLOW_ORIGIN_REGEXAdditional frontend origins.

Common localhost, LAN, and Tailscale :3000 frontend origins are allowed by default. If you open the frontend through a custom hostname, add the full origin to CORS_ALLOW_ORIGINS, for example http://youdub.example.com:3000.

5. Run

Windows PowerShell

Backend:

.\.venv\Scripts\uvicorn.exe backend.app.main:app --reload --host 0.0.0.0 --port 8000

Frontend:

npm --prefix apps/web run dev -- --hostname 0.0.0.0 --port 3000

macOS / Linux / WSL2

Backend:

.venv/bin/uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000

Frontend:

npm --prefix apps/web run dev -- --hostname 0.0.0.0 --port 3000

By default, the frontend calls same-origin /api/... URLs and Next.js proxies them to http://127.0.0.1:8000. If the backend is not on local port 8000, set NEXT_SERVER_API_BASE_URL when starting the frontend, for example:

NEXT_SERVER_API_BASE_URL=http://192.168.1.10:8000 npm --prefix apps/web run dev -- --hostname 0.0.0.0 --port 3000

Open:

http://localhost:3000

When opening the app from LAN, WSL2, or another machine, use the actual frontend host IP or hostname, for example http://192.168.1.20:3000. The backend listens on 0.0.0.0:8000, and the frontend listens on 0.0.0.0:3000.

Using the Web UI

  1. Open Settings in the top-right corner.
  2. Paste Netscape-format YouTube cookies.
  3. Set the yt-dlp proxy port, such as 7890 or 20171.
  4. Enter the OpenAI base URL and API key.
  5. Click Get models to fetch model IDs, or enter a model manually.
  6. Tune Translate concurrency based on your API provider's rate limits.
  7. Return to the home page and submit a YouTube URL or Bilibili URL.
  8. Open the task detail page to watch stage progress, logs, and the final video.

API keys and cookies are masked in the UI. The backend does not return plaintext cookie content to the frontend.

Exporting YouTube cookies

A convenient option is the open-source Chrome extension Get cookies.txt LOCALLY, which keeps cookies on your machine:

  1. Install and enable the extension in Chrome.
  2. Sign in to https://www.youtube.com.
  3. On a youtube.com page, click the extension icon and choose Export -> Netscape to get cookies.txt.
  4. Paste the full file content into the YouTube cookie field in Settings.

Only process videos you have the right to download, transform, and publish.

Pipeline

YouTube / Bilibili URL
  -> yt-dlp downloads one video
  -> Demucs separates vocals and background audio
  -> Whisper transcribes speech with word timestamps
  -> Sentence and timing normalization
  -> OpenAI-compatible API preprocesses the full transcript and translates sentences in parallel
  -> Original vocals are split into per-sentence reference clips
  -> VoxCPM2 generates target-language voiceover
  -> Voiceover timing is aligned and mixed with background audio
  -> FFmpeg burns subtitles and renders the final mp4

Highlights

  • Real end-to-end workflow: URL in, final video out. No manual audio slicing, subtitle editing, or video rendering steps.
  • Two source paths: YouTube English -> Chinese is the primary mature workflow; Bilibili Chinese -> English is wired into the same task pipeline.
  • Local-first storage: SQLite state, cookies, logs, intermediate artifacts, and final videos stay on your machine.
  • Observable task progress: Task history, stage status, stage duration, logs, and errors are visible in the web UI.
  • Resume after failure: Failed tasks can resume from the failed stage, reusing cached outputs from stages that already succeeded.
  • Rerun and clean up: Rerun a task from scratch, or delete its database row, log file, and session directory under workfolder/.
  • Inspect the result: Successful tasks expose an inline video player and an mp4 download link.
  • Settings in the UI: YouTube cookies, yt-dlp proxy port, OpenAI base URL, API key, model name, and translation concurrency can be maintained from Settings.
  • Hackable architecture: The pipeline is serial and module boundaries are clear, making it practical to replace ASR, translation, TTS, or subtitle rendering.

Tech Stack

  • Frontend: Next.js App Router, shadcn/ui, Tailwind CSS, Lucide icons
  • Backend: FastAPI, SQLite, in-process background worker
  • Download: yt-dlp
  • Source separation: Demucs source submodule
  • ASR: openai-whisper, defaulting to large-v3-turbo
  • Translation: OpenAI-compatible Chat Completions API
  • TTS: VoxCPM2
  • Media processing: FFmpeg, pydub, librosa, audiostretchy

Development and Tests

Backend tests:

Windows PowerShell:

.\.venv\Scripts\pytest.exe backend/tests

macOS / Linux / WSL2:

.venv/bin/pytest backend/tests

Frontend checks:

npm --prefix apps/web run lint
npm --prefix apps/web run build

Main project directories:

backend/app/       FastAPI API, task worker, pipeline, and model adapters
backend/tests/     Backend unit tests
apps/web/          Next.js WebUI
scripts/           Helper scripts
submodule/demucs/  Demucs source submodule

Project Status and Contributing

YouDub WebUI is still an MVP, but it already supports a real creator's daily video-localization workflow. The current priority is to keep the shortest end-to-end path stable, keep the architecture readable, and make the project easy to run and modify.

Contributions are welcome:

  • Improve installation and model download experience.
  • Add more ASR, TTS, or translation backends.
  • Improve subtitle styling, portrait/landscape layouts, and voice timing alignment.
  • Make YouTube / Bilibili downloading more robust.
  • Improve task management, artifact management, and failure recovery.
  • Add runtime notes for more platforms.

If this project is useful to you, please Star it, Fork it, open Issues or PRs, and share it with people interested in AI video localization, open-source tools, and cross-language content.

Community

QQ group: 618246010

YouDub QQ group QR code

License

This project is licensed under the Apache License 2.0. See LICENSE.

Star History

Star History Chart