speech2keys

November 5, 2025 · View on GitHub

A fast, lightweight Linux tool that converts speech to text and types it into any window using OpenAI's Whisper API.

Features

Fast startup: Optimized Rust binary starts in milliseconds
Wayland native: Built-in virtual keyboard support with automatic compositor detection
Multi-compositor support: Works on Sway, Hyprland, KDE Plasma, and others via fallback mechanisms
Smart stopping: Automatically stops after 8 seconds of silence
Single instance: Toggle on/off by pressing your hotkey twice
Visual feedback: Desktop notifications show recording status (KDE Plasma compatible)

Prerequisites

Rust toolchain (for building):

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Build dependencies (development libraries):

# Fedora
sudo dnf install alsa-lib-devel wayland-devel wayland-protocols-devel libxkbcommon-devel

# Arch
sudo pacman -S alsa-lib wayland wayland-protocols libxkbcommon

# Ubuntu/Debian
sudo apt install libasound2-dev libwayland-dev wayland-protocols libxkbcommon-dev

# Using Homebrew (Linux)
brew install alsa-lib wayland wayland-protocols libxkbcommon

Runtime dependencies (KDE Plasma users only):

KDE Plasma/KWin does not support the standard Wayland virtual keyboard protocol. You need one of these fallback tools:

# Option 1: kwtype (recommended for KDE Plasma)
sudo dnf install kwtype  # Fedora
# Or build from: https://github.com/Sporif/KWtype

# Option 2: wtype (works on most compositors)
sudo dnf install wtype   # Fedora
sudo pacman -S wtype     # Arch
sudo apt install wtype   # Ubuntu/Debian

Note: Compositors like Sway, Hyprland, and Cosmic don't need external tools - they work out of the box!

OpenAI API key:
- Sign up at https://platform.openai.com
- Create an API key
- Add to your shell profile (e.g., ~/.bashrc or ~/.zshrc):
```
export OPENAI_API_KEY="sk-..."
```

Installation

Clone and build:

cd /path/to/speech2keys
cargo build --release

Note: If you installed dependencies via Homebrew, you need to set PKG_CONFIG_PATH and RUSTFLAGS:

PKG_CONFIG_PATH="/home/linuxbrew/.linuxbrew/lib/pkgconfig:$PKG_CONFIG_PATH" \
RUSTFLAGS="-L /home/linuxbrew/.linuxbrew/lib" \
cargo build --release

The binary will be at target/release/speech2keys (approximately 3.8MB)

(Optional) Install to your PATH:

sudo cp target/release/speech2keys /usr/local/bin/

Usage

Automatic Compositor Detection

speech2keys automatically detects the best text injection method for your compositor:

First choice: Native Wayland virtual keyboard protocol (Sway, Hyprland, Cosmic, Niri, etc.)
KDE Plasma fallback: kwtype command if available
Universal fallback: wtype command if available

You'll see a log message on startup indicating which method was selected.

Command Line

Simply run:

speech2keys

The program will:

Auto-detect the best injection method for your compositor
Start recording from your default microphone
Show a notification that it's recording
Transcribe speech and type it into the active window
Stop after 8 seconds of silence

To stop early, run speech2keys again (it will signal the existing instance to stop).

KDE Plasma Global Shortcut

Open System Settings → Shortcuts → Custom Shortcuts
Click Edit → New → Global Shortcut → Command/URL
Set:
- Trigger: Your preferred key combo (e.g., Meta+Shift+S)
- Action: /path/to/speech2keys (or just speech2keys if in PATH)
Click Apply

Now you can press your hotkey to start/stop recording from anywhere!

Example Workflow

Press your hotkey (e.g., Meta+Shift+S)
See notification: "Recording... Press the hotkey again to stop."
Start speaking
Watch as your words appear in the active window
Stop speaking for 8 seconds, or press the hotkey again

Configuration

Change Language

Edit src/transcribe.rs and change the language parameter:

.language("en") // Change to "es", "fr", "de", etc.

Then rebuild:

cargo build --release

Adjust Silence Timeout

Edit src/transcribe.rs and change:

const SILENCE_TIMEOUT_SECS: u64 = 8; // Change to desired seconds

Adjust Transcription Chunk Size

Edit src/transcribe.rs and change:

const CHUNK_DURATION_SECS: u64 = 2; // Smaller = faster, larger = more accurate

Troubleshooting

"OPENAI_API_KEY environment variable not set"

Make sure you've exported your API key in your shell profile and restarted your terminal/session.

"Failed to create Wayland virtual keyboard client"

Make sure you're running on Wayland (check with: echo $XDG_SESSION_TYPE)
Ensure you have the required runtime libraries installed (libxkbcommon, wayland-client)
Check that your Wayland compositor supports the virtual keyboard protocol

No audio input

Check your default microphone in system settings
Test with: pactl list sources short or pipewire-cli list-objects

Text not appearing

Verify you're on Wayland (check with: echo $XDG_SESSION_TYPE)
Check that the target window has focus
Some applications may not accept virtual keyboard input

Cost Estimate

OpenAI Whisper API pricing (as of 2024):

~$0.006 per minute of audio
Example: 5 minutes of daily use = ~$0.03/day = ~$0.90/month

Technical Details

Language: Rust
Audio capture: cpal (PipeWire/PulseAudio)
Transcription: OpenAI Whisper API via async-openai
Keystroke injection: wrtype library (Wayland virtual keyboard protocol)
Notifications: notify-rust (D-Bus)

License

MIT

Contributing

Issues and pull requests welcome!