YouDub WebUI
May 26, 2026 · View on GitHub
YouDub WebUI
QQ group: 618246010
An open-source video localization tool proven in a real creator workflow.
YouDub WebUI turns a single YouTube or Bilibili video into a dubbed video in the target language. It downloads the source video, separates vocals from background audio, transcribes speech, translates the transcript, generates new voiceover, mixes audio, burns subtitles, and produces a final video that can be played or downloaded from the web UI.
The most mature path is YouTube English -> Chinese dubbing. The app also supports Bilibili Chinese -> English dubbing through the same task pipeline.
中文 README: README.md
Proven Creator Workflow
Author's Bilibili channel: 黑纹白斑马 — 800K+ followers, 20K+ videos. The whole channel is translated and dubbed with YouDub WebUI, covering technology, games, science, animals, history, and more.
This is not just a demo runner. YouDub WebUI is built for creators, developers, and small teams who want to own a complete local video-localization pipeline while keeping the system simple enough to inspect, debug, and modify.
Demo
The samples below were generated by this project and can be played directly on GitHub. The left side is the original video, and the right side is the automatically dubbed version. The dubbed videos include target-language voiceover and subtitles while preserving the original background music and sound effects.
1. Jensen Huang on Nvidia's Competition
YouTube source · YouTube Shorts · English -> Chinese
| Original English | Chinese dubbed |
|---|---|
|
https://github.com/user-attachments/assets/befd11ca-e720-4faa-b4e0-d89bfe73df87 |
https://github.com/user-attachments/assets/bf01f912-eec8-4e0d-8698-0f69283a73e7 |
2. How much YT paid me for 129 million shorts views
YouTube source · Long-form landscape video · English -> Chinese · The embedded clip shows the first 40 seconds; the full version is available in the demo-assets Release
| Original English | Chinese dubbed |
|---|---|
|
https://github.com/user-attachments/assets/bd02936f-cf3c-4e4b-85b5-0410d38f69f5 |
https://github.com/user-attachments/assets/158de60a-7de4-4ddf-b3d8-478d0423aee6 |
Quick Start
1. Prepare the runtime
Verified and recommended runtime:
- Windows 10/11 + PowerShell 5.1+: recommended for local development and covered first in this README.
- Linux / WSL2 / macOS: backend and frontend commands are provided for POSIX shells. CUDA, FFmpeg, PyTorch, and audio dependencies still need to match your platform.
- CUDA GPU: recommended for complete video processing.
DEVICE=cpucan be used for some flows, but transcription, separation, and TTS will be very slow; withDEVICE=mps, Whisper automatically falls back to CPU to avoid the MPS float64 limitation.
Base dependencies:
- Python 3.12.
- Node.js 20+.
- FFmpeg / ffprobe available on
PATH. - A working YouTube proxy when processing YouTube videos.
- Netscape-format YouTube cookies, recommended for YouTube videos.
- An OpenAI-compatible Chat Completions base URL, API key, and model name.
The first run may download or load large ASR, TTS, and audio-processing models. Leave enough disk space and time for that setup.
Platform notes:
- Windows PowerShell uses
.venv\Scripts\...; do not copy.venv/bin/...commands there. - macOS/Linux use
.venv/bin/.... - If multiple Python installs exist, check
py -0pon Windows orpython3.12 --versionon macOS/Linux first. - Proxy settings, cookies, model cache, and work folders stay local. Quote paths that contain spaces, or put them in
.env.
Common system dependency examples:
# Windows PowerShell (choose a package manager already available on your machine)
winget install Gyan.FFmpeg
winget install OpenJS.NodeJS.LTS
# Ubuntu / Debian / WSL2
sudo apt update
sudo apt install -y ffmpeg nodejs npm
# macOS (Homebrew)
brew install ffmpeg node
If your system package manager does not provide Python 3.12, install it from python.org, pyenv, conda/mamba, or your distro's recommended channel. The important part is to create the virtual environment with Python 3.12.
2. Clone
The clone commands are the same on Windows PowerShell, macOS, and Linux:
git clone https://github.com/liuzhao1225/YouDub-webui.git
cd YouDub-webui
git submodule update --init --recursive
Demucs is included as a source submodule, so do not skip git submodule update.
3. Install dependencies
Windows PowerShell
Python:
py -3.12 -m venv .venv
.\.venv\Scripts\python.exe -m pip install -U pip
.\.venv\Scripts\pip.exe install -i https://mirrors.aliyun.com/pypi/simple/ -r requirements.txt
Frontend:
npm --prefix apps/web install --registry=https://registry.npmmirror.com
macOS / Linux / WSL2
Python:
python3.12 -m venv .venv
.venv/bin/python -m pip install -U pip
.venv/bin/pip install -i https://mirrors.aliyun.com/pypi/simple/ -r requirements.txt
Frontend:
npm --prefix apps/web install --registry=https://registry.npmmirror.com
Use Aliyun first. If a specific Python package is temporarily unavailable there, retry only that package with the Tsinghua mirror instead of mixing multiple mirrors in one resolver command.
Optional: NVIDIA CUDA GPU
If you want Whisper, Demucs, or VoxCPM to use an NVIDIA GPU, install the CUDA-enabled PyTorch wheels before installing requirements.txt:
Windows PowerShell:
.\.venv\Scripts\pip.exe install -r requirements-pytorch-cu128.txt
Linux / WSL2:
.venv/bin/pip install -r requirements-pytorch-cu128.txt
requirements-pytorch-cu128.txt uses PyTorch's cu128 wheel index by default. Different NVIDIA driver or CUDA environments may need a different PyTorch CUDA build, so use the official PyTorch installation page when you need a matching command. CPU users and macOS users do not need this step; set DEVICE=cpu in .env when CUDA-enabled PyTorch is not installed.
Verify that CUDA is actually available after installation:
.venv/bin/python -c "import torch; print(torch.__version__, torch.version.cuda, torch.cuda.is_available())"
4. Configure
Windows PowerShell:
Copy-Item env.txt.example .env
macOS / Linux / WSL2:
cp env.txt.example .env
The application reads .env at runtime. Do not commit API keys, cookies, downloaded media, or generated artifacts.
Common environment variables:
| Variable | Purpose |
|---|---|
WORKFOLDER | Per-task media, audio segments, and intermediate artifacts. |
MODEL_CACHE_DIR | ModelScope cache directory, used by VoxCPM2 by default. |
DEVICE | Model runtime device, for example auto, cuda, cuda:0, mps, mps:0, or cpu; auto selects CUDA, then MPS, then CPU. |
DEMUCS_DEVICE / WHISPER_DEVICE | Optional component-level device overrides. Empty values use DEVICE. Whisper falls back to CPU when MPS is selected because word timestamp alignment depends on float64 DTW, which MPS does not support. |
OPENAI_BASE_URL | OpenAI-compatible API endpoint, for example https://api.openai.com/v1. |
OPENAI_API_KEY | API key used by the translation stage. |
OPENAI_MODEL | Chat Completions model used by the translation stage. |
OPENAI_TRANSLATE_CONCURRENCY | Parallel requests during translation. Default: 50. |
YTDLP_PROXY_PORT | Local proxy port used by yt-dlp, for example 7890. |
HTTP_PROXY / ALL_PROXY | yt-dlp reads HTTP_PROXY when no UI proxy port is set; HTTPX/OpenAI SDK also reads these environment proxies. |
NO_PROXY | Comma-separated proxy bypass list. Include localhost,127.0.0.1,::1 when using a local OpenAI-compatible service so local requests stay direct. |
VOXCPM_MODEL / VOXCPM_MODEL_DIR | VoxCPM2 ModelScope model ID or local model directory. VoxCPM currently selects CUDA/MPS/CPU inside the upstream package, and task logs report it as voxcpm=library-auto. |
VOXCPM_LOAD_DENOISER / VOXCPM_CFG_VALUE / VOXCPM_INFERENCE_TIMESTEPS / VOXCPM_MIN_REFERENCE_MS | VoxCPM2 inference controls. |
CORS_ALLOW_ORIGINS / CORS_ALLOW_ORIGIN_REGEX | Additional frontend origins. |
Common localhost, LAN, and Tailscale :3000 frontend origins are allowed by default. If you open the frontend through a custom hostname, add the full origin to CORS_ALLOW_ORIGINS, for example http://youdub.example.com:3000.
5. Run
Windows PowerShell
Backend:
.\.venv\Scripts\uvicorn.exe backend.app.main:app --reload --host 0.0.0.0 --port 8000
Frontend:
npm --prefix apps/web run dev -- --hostname 0.0.0.0 --port 3000
macOS / Linux / WSL2
Backend:
.venv/bin/uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000
Frontend:
npm --prefix apps/web run dev -- --hostname 0.0.0.0 --port 3000
By default, the frontend calls same-origin /api/... URLs and Next.js proxies them to http://127.0.0.1:8000. If the backend is not on local port 8000, set NEXT_SERVER_API_BASE_URL when starting the frontend, for example:
NEXT_SERVER_API_BASE_URL=http://192.168.1.10:8000 npm --prefix apps/web run dev -- --hostname 0.0.0.0 --port 3000
Open:
http://localhost:3000
When opening the app from LAN, WSL2, or another machine, use the actual frontend host IP or hostname, for example http://192.168.1.20:3000. The backend listens on 0.0.0.0:8000, and the frontend listens on 0.0.0.0:3000.
Using the Web UI
- Open Settings in the top-right corner.
- Paste Netscape-format YouTube cookies.
- Set the yt-dlp proxy port, such as
7890or20171. - Enter the OpenAI base URL and API key.
- Click
Get modelsto fetch model IDs, or enter a model manually. - Tune
Translate concurrencybased on your API provider's rate limits. - Return to the home page and submit a YouTube URL or Bilibili URL.
- Open the task detail page to watch stage progress, logs, and the final video.
API keys and cookies are masked in the UI. The backend does not return plaintext cookie content to the frontend.
Exporting YouTube cookies
A convenient option is the open-source Chrome extension Get cookies.txt LOCALLY, which keeps cookies on your machine:
- Install and enable the extension in Chrome.
- Sign in to
https://www.youtube.com. - On a youtube.com page, click the extension icon and choose
Export->Netscapeto getcookies.txt. - Paste the full file content into the YouTube cookie field in Settings.
Only process videos you have the right to download, transform, and publish.
Pipeline
YouTube / Bilibili URL
-> yt-dlp downloads one video
-> Demucs separates vocals and background audio
-> Whisper transcribes speech with word timestamps
-> Sentence and timing normalization
-> OpenAI-compatible API preprocesses the full transcript and translates sentences in parallel
-> Original vocals are split into per-sentence reference clips
-> VoxCPM2 generates target-language voiceover
-> Voiceover timing is aligned and mixed with background audio
-> FFmpeg burns subtitles and renders the final mp4
Highlights
- Real end-to-end workflow: URL in, final video out. No manual audio slicing, subtitle editing, or video rendering steps.
- Two source paths: YouTube English -> Chinese is the primary mature workflow; Bilibili Chinese -> English is wired into the same task pipeline.
- Local-first storage: SQLite state, cookies, logs, intermediate artifacts, and final videos stay on your machine.
- Observable task progress: Task history, stage status, stage duration, logs, and errors are visible in the web UI.
- Resume after failure: Failed tasks can resume from the failed stage, reusing cached outputs from stages that already succeeded.
- Rerun and clean up: Rerun a task from scratch, or delete its database row, log file, and session directory under
workfolder/. - Inspect the result: Successful tasks expose an inline video player and an mp4 download link.
- Settings in the UI: YouTube cookies, yt-dlp proxy port, OpenAI base URL, API key, model name, and translation concurrency can be maintained from Settings.
- Hackable architecture: The pipeline is serial and module boundaries are clear, making it practical to replace ASR, translation, TTS, or subtitle rendering.
Tech Stack
- Frontend: Next.js App Router, shadcn/ui, Tailwind CSS, Lucide icons
- Backend: FastAPI, SQLite, in-process background worker
- Download: yt-dlp
- Source separation: Demucs source submodule
- ASR: openai-whisper, defaulting to
large-v3-turbo - Translation: OpenAI-compatible Chat Completions API
- TTS: VoxCPM2
- Media processing: FFmpeg, pydub, librosa, audiostretchy
Development and Tests
Backend tests:
Windows PowerShell:
.\.venv\Scripts\pytest.exe backend/tests
macOS / Linux / WSL2:
.venv/bin/pytest backend/tests
Frontend checks:
npm --prefix apps/web run lint
npm --prefix apps/web run build
Main project directories:
backend/app/ FastAPI API, task worker, pipeline, and model adapters
backend/tests/ Backend unit tests
apps/web/ Next.js WebUI
scripts/ Helper scripts
submodule/demucs/ Demucs source submodule
Project Status and Contributing
YouDub WebUI is still an MVP, but it already supports a real creator's daily video-localization workflow. The current priority is to keep the shortest end-to-end path stable, keep the architecture readable, and make the project easy to run and modify.
Contributions are welcome:
- Improve installation and model download experience.
- Add more ASR, TTS, or translation backends.
- Improve subtitle styling, portrait/landscape layouts, and voice timing alignment.
- Make YouTube / Bilibili downloading more robust.
- Improve task management, artifact management, and failure recovery.
- Add runtime notes for more platforms.
If this project is useful to you, please Star it, Fork it, open Issues or PRs, and share it with people interested in AI video localization, open-source tools, and cross-language content.
Community
QQ group: 618246010
License
This project is licensed under the Apache License 2.0. See LICENSE.