OpenWispr: Global Voice Dictation for Linux

August 14, 2025 · View on GitHub

OpenWispr is a voice-to-text dictation tool that allows you to type in any application on your Linux desktop simply by speaking. Hold down a hotkey, say what you want to type, and release. The text will appear wherever your cursor is.

This tool is designed to work on modern Linux distributions using the Wayland display server (such as recent versions of Ubuntu) but is also compatible with X11.

It is platform-independent for applications: all typing is simulated at the shell/input level, so it works with any app that accepts keyboard input.

Setup Instructions

Follow these steps carefully to set up the tool and its dependencies.

Step 1: Clone This Repository

First, get the project files onto your local machine.

git clone https://github.com/imsidharthj/VoxType.git
cd

Step 2: Install System Build Dependencies

We need some essential tools to build ydotool (our virtual keyboard) from source.

sudo apt update sudo apt install git cmake scdoc build-essential

Step 3: Build and Install ydotool

ydotool is the core utility that simulates keyboard presses. We will build it from its official source for maximum compatibility.

Clone the ydotool repository

git clone https://github.com/ReimuNotMoe/ydotool.git cd ydotool

Create a build directory and compile the tool

mkdir build cd build cmake .. make -j "$(nproc)"

Install the compiled tool to your system

sudo make install

Return to the project directory

cd ../..

Step 4: Set Up the ydotool Service

ydotool requires a background service (daemon) to be running.

Reload the systemd manager configuration

systemctl --user daemon-reload

Start the ydotool service

systemctl --user start ydotoold.service

(Optional) Check the status to ensure it's running

systemctl --user status ydotoold.service

Expected Output: You should see Active: active (running). If it says inactive or failed, a system reboot after the next step often resolves the issue.

● ydotoold.service - Starts ydotoold Daemon Loaded: loaded (/usr/lib/systemd/user/ydotoold.service; static) Active: active (running) since Thu 2025-08-07 11:30:00 IST; 5s ago Main PID: 12345 (ydotoold)

Copy and paste the following content into that file:

pip install -r requirements.txt

▶️ How to Use

Once all setup steps are complete, you can run the dictation service.

Start the Service: Open a terminal in the project directory and run:

python3 whisper.py

You will see a confirmation that the service is running and listening for the hotkey.

📝 Workflow Guide

Focus the cursor in any input box (web browser, text editor, chat app, etc.).
Press and hold the Ctrl + Alt keys.
Speak clearly into your microphone.
Release the keys when done.
The tool will automatically transcribe your speech and type the text into the input box at your cursor's location.

Dictate Anywhere:

Click on any input box in any application (web browser, text editor, etc.).

Press and hold the Left Ctrl + Left Alt keys.

You will see a "🔴 Recording..." message in your terminal.

Speak clearly.

Release the keys.

The terminal will show "⏹️ Processing..." and then the transcribed text will be typed out at your cursor's location.

Stop the Service: To stop the tool, go back to the terminal where it is running and press Ctrl + C.

🎥 Demo Video

See a demonstration of SpeakWrite in action:
Loom Video Demo

OpenWispr: Global Voice Dictation for Linux

Setup Instructions

Step 1: Clone This Repository

Step 2: Install System Build Dependencies

Step 3: Build and Install ydotool

Clone the ydotool repository

Create a build directory and compile the tool

Install the compiled tool to your system

Return to the project directory

Step 4: Set Up the ydotool Service

Reload the systemd manager configuration

Start the ydotool service

(Optional) Check the status to ensure it's running

Step 5: Configure Critical Permissions (Very Important!)

Add your current user to the 'input' group

The $USER variable automatically uses your username

Step 6: Install Python Dependencies

For audio recording and processing

For hotkey detection on Wayland/X11

For AI-based speech-to-text

For simulating keyboard input (primary method)

▶️ How to Use

📝 Workflow Guide

Dictate Anywhere:

🎥 Demo Video