Whisper Transcribe
July 24, 2025 · View on GitHub
A voice-to-text transcription tool that allows you (or me, anyway) to use voice like a keyboard in any text field. This is my own variant of nerd-dictation, using whisper.cpp for transcription. The main reason I'm not using nerd-dictation is that I want capitalization and punctuation in transcribed text. The basic flow is:
- Run the background app (use the .desktop file in your autostart directory to make it always run). This will put an icon in your system tray that allows you to toggle transcription on or off.
- Turn on transcription with the icon, or by configuring a keyboard shortcut. I use ctrl+alt+m.
- Put your cursor anywhere and speak to type. You can of course continue to type with your regular keyboard as well. The voice keyboard should never enter newline characters, so as to avoid accidentally executing an action on your behalf.
- Turn off transcription whenever you want to with the system tray icon or hotkey.
Dependencies
System packages
sudo apt update && sudo apt install libsdl2-dev xdotool
Python packages
pip install PyQt5
Build Instructions
-
Clone whisper.cpp and build the required libraries:
cd /where/you/like/to/clone/repos gh repo clone ggml-org/whisper.cpp cd whisper.cpp cmake -B build \ -DWHISPER_SDL2=ON \ -DGGML_CUDA=ON \ -DWHISPER_BUILD_EXAMPLES=OFF \ -DWHISPER_BUILD_TESTS=OFF \ -DWHISPER_BUILD_SERVER=OFF cmake --build build --target whisper common common-sdl --config ReleaseRemove
-DGGML_CUDA=ONif you don't have (or don't want to use) a GPU. -
Download transcription and voice activity detection models: From the root of your whisper.cpp repo:
sh ./models/download-ggml-model.sh base.en sh ./models/download-vad-model.sh silero-v5.1.2 -
Clone this repo in your
~/.config:cd $HOME/.config gh repo clone geraschenko/whisper-transcribe -
Symlink models to current directory:
ln -s /path/to/whisper.cpp/models models -
Build the transcribe binary: Update WHISPER_CPP_DIR in the Makefile to point to your
/path/to/whisper.cppand run make from this directory:make
Installation
-
Set up autostart (choose one option):
Option A: Copy the desktop file
cp whisper-transcribe.desktop ~/.config/autostart/Option B: Create a symlink
ln -s "$(pwd)/whisper-transcribe.desktop" ~/.config/autostart/ -
Configure hotkey:
- Open Settings → Keyboard → Keyboard Shortcuts
- Add a new custom shortcut:
- Name: Whisper Transcribe Toggle
- Command:
$HOME/.config/whisper-transcribe/whisper-transcribe-toggle - Shortcut: Ctrl+Alt+M (or your preferred combination)
Usage
Starting the application
The application will start automatically on login (if you set up autostart). To start manually:
python3 whisper-transcribe.py
Using transcription
-
System tray icon: Look for the whisper-transcribe icon in your system tray
Inactive: Transcription is off, ready to start
Active: Transcription is on and will enter text
whenever there's a short gap in speech.
-
Toggle transcription:
- Use your configured hotkey (e.g., Ctrl+Alt+M)
- Or left-click the system tray icon
- Or right-click the tray icon → "Toggle Transcription"
-
Speak: When active (red), speak and text will appear where your cursor is.
-
Stop: Press the hotkey again or click the tray icon to stop.
-
Select audio input device: Your choice is remembered in
~/.config/whisper-transcribe/config.json. If your preferred device is available, we use it. Otherwise, the default device is used.
Exiting the application
- Right-click the system tray icon → "Quit"
- Or press Ctrl+C in the terminal if running manually
How it works
graph TD
A(User presses hotkey) --> |runs| B[whisper-transcribe-toggle]
B --> |sends SIGUSR1| C[whisper-transcribe.py<br/>Qt App]
C --> |updates| D(System Tray Icon)
C --> |runs| E[transcribe binary]
H(Audio input) --> |capture| E
E --> |pipes text| F[xdotool]
F --> |types| G(Text appears at cursor)
style B fill:#49b
style C fill:#a7b
style E fill:#798
style F fill:#c80
The transcription work is done by the transcribe binary, which captures audio
from a given input device and outputs text to stdout. It starts collecting audio
when it detects speech, and continues collecting until there's a 500ms interval
with no speech, at which point it transcribes everything it's collected and
sends the text to stdout.
The whisper-transcribe.py Qt app handles the system tray icon. It's also
responsible for starting and stopping the transcribe binary and piping the
output text to xdotool, which "types" the text in as if it were input by a
keyboard.
The whisper-transcribe-toggle script controls whisper-transcribe.py by
sending it SIGUSR1.
Troubleshooting
Hotkey not working
- Check if another application is using the same hotkey
- Verify the toggle script path in keyboard shortcut settings
- Test the toggle script manually:
./whisper-transcribe-toggle
No system tray icon
- Ensure your desktop environment supports system tray
- Install GNOME Extensions if using vanilla GNOME
- Check if PyQt5 is properly installed
Transcription not working
- Verify the transcribe binary exists:
ls build/transcribe - Check if xdotool is installed:
which xdotool - Test the binary manually:
./build/transcribe
Permission issues
- Ensure scripts are executable:
chmod +x whisper-transcribe-toggle whisper-transcribe.py - Check file paths in desktop file and scripts
Development
The application uses:
- Qt5 for system tray and GUI components
- Unix signals (SIGUSR1) for inter-process communication
- Subprocess management for the transcription pipeline
- Process groups for clean shutdown of all child processes