stream-translator

February 27, 2023 ยท View on GitHub

Command line utility to transcribe or translate audio from livestreams in real time. Uses streamlink to get livestream URLs from various services and OpenAI's whisper for transcription/translation. This script is inspired by audioWhisper which transcribes/translates desktop audio.

Prerequisites

  1. Install and add ffmpeg to your PATH
  2. Install CUDA on your system. If you installed a different version of CUDA than 11.3, change cu113 in requirements.txt accordingly. You can check the installed CUDA version with nvcc --version.

Setup

  1. Setup a virtual environment.
  2. git clone https://github.com/fortypercnt/stream-translator.git
  3. pip install -r requirements.txt
  4. Make sure that pytorch is installed with CUDA support. Whisper will probably not run in real time on a CPU.

Command-line usage

python translator.py URL --flags

By default, the URL can be of the form twitch.tv/forsen and streamlink is used to obtain the .m3u8 link which is passed to ffmpeg. See streamlink plugins for info on all supported sites.

--flagsDefault ValueDescription
--modelsmallSelect model size. See here for available models.
--tasktranslateWhether to transcribe the audio (keep original language) or translate to english.
--languageautoLanguage spoken in the stream. See here for available languages.
--interval5Interval between calls to the language model in seconds.
--history_buffer_size0Seconds of previous audio/text to use for conditioning the model. Set to 0 to just use audio from the last interval. Note that this can easily lead to repetition/loops if the chosen language/model settings do not produce good results to begin with.
--beam_size5Number of beams in beam search. Set to 0 to use greedy algorithm instead (faster but less accurate).
--best_of5Number of candidates when sampling with non-zero temperature.
--preferred_qualityaudio_onlyPreferred stream quality option. "best" and "worst" should always be available. Type "streamlink URL" in the console to see quality options for your URL.
--disable_vadSet this flag to disable additional voice activity detection by Silero VAD.
--direct_urlSet this flag to pass the URL directly to ffmpeg. Otherwise, streamlink is used to obtain the stream URL.
--use_faster_whisperSet this flag to use faster_whisper implementation instead of the original OpenAI implementation
--faster_whisper_model_pathwhisper-large-v2-ct2/Path to a directory containing a Whisper model in the CTranslate2 format.
--faster_whisper_devicecudaSet the device to run faster-whisper on.
--faster_whisper_compute_typefloat16Set the quantization type for faster_whisper. See here for more info.

Using faster-whisper

faster-whisper provides significant performance upgrades over the original OpenAI implementation (~ 4x faster, ~ 2x less memory). To use it, follow the instructions here to install faster-whisper and convert your models to CTranslate2 format. Then you can run the CLI with --use_faster_whisper and set --faster_whisper_model_path to the location of your converted model.