speech2keys

November 5, 2025 · View on GitHub

A fast, lightweight Linux tool that converts speech to text and types it into any window using OpenAI's Whisper API.

Features

  • Fast startup: Optimized Rust binary starts in milliseconds
  • Wayland native: Built-in virtual keyboard support with automatic compositor detection
  • Multi-compositor support: Works on Sway, Hyprland, KDE Plasma, and others via fallback mechanisms
  • Smart stopping: Automatically stops after 8 seconds of silence
  • Single instance: Toggle on/off by pressing your hotkey twice
  • Visual feedback: Desktop notifications show recording status (KDE Plasma compatible)

Prerequisites

  1. Rust toolchain (for building):

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    
  2. Build dependencies (development libraries):

    # Fedora
    sudo dnf install alsa-lib-devel wayland-devel wayland-protocols-devel libxkbcommon-devel
    
    # Arch
    sudo pacman -S alsa-lib wayland wayland-protocols libxkbcommon
    
    # Ubuntu/Debian
    sudo apt install libasound2-dev libwayland-dev wayland-protocols libxkbcommon-dev
    
    # Using Homebrew (Linux)
    brew install alsa-lib wayland wayland-protocols libxkbcommon
    
  3. Runtime dependencies (KDE Plasma users only):

    KDE Plasma/KWin does not support the standard Wayland virtual keyboard protocol. You need one of these fallback tools:

    # Option 1: kwtype (recommended for KDE Plasma)
    sudo dnf install kwtype  # Fedora
    # Or build from: https://github.com/Sporif/KWtype
    
    # Option 2: wtype (works on most compositors)
    sudo dnf install wtype   # Fedora
    sudo pacman -S wtype     # Arch
    sudo apt install wtype   # Ubuntu/Debian
    

    Note: Compositors like Sway, Hyprland, and Cosmic don't need external tools - they work out of the box!

  4. OpenAI API key:

    • Sign up at https://platform.openai.com
    • Create an API key
    • Add to your shell profile (e.g., ~/.bashrc or ~/.zshrc):
      export OPENAI_API_KEY="sk-..."
      

Installation

  1. Clone and build:

    cd /path/to/speech2keys
    cargo build --release
    

    Note: If you installed dependencies via Homebrew, you need to set PKG_CONFIG_PATH and RUSTFLAGS:

    PKG_CONFIG_PATH="/home/linuxbrew/.linuxbrew/lib/pkgconfig:$PKG_CONFIG_PATH" \
    RUSTFLAGS="-L /home/linuxbrew/.linuxbrew/lib" \
    cargo build --release
    
  2. The binary will be at target/release/speech2keys (approximately 3.8MB)

  3. (Optional) Install to your PATH:

    sudo cp target/release/speech2keys /usr/local/bin/
    

Usage

Automatic Compositor Detection

speech2keys automatically detects the best text injection method for your compositor:

  1. First choice: Native Wayland virtual keyboard protocol (Sway, Hyprland, Cosmic, Niri, etc.)
  2. KDE Plasma fallback: kwtype command if available
  3. Universal fallback: wtype command if available

You'll see a log message on startup indicating which method was selected.

Command Line

Simply run:

speech2keys

The program will:

  1. Auto-detect the best injection method for your compositor
  2. Start recording from your default microphone
  3. Show a notification that it's recording
  4. Transcribe speech and type it into the active window
  5. Stop after 8 seconds of silence

To stop early, run speech2keys again (it will signal the existing instance to stop).

KDE Plasma Global Shortcut

  1. Open System SettingsShortcutsCustom Shortcuts
  2. Click EditNewGlobal ShortcutCommand/URL
  3. Set:
    • Trigger: Your preferred key combo (e.g., Meta+Shift+S)
    • Action: /path/to/speech2keys (or just speech2keys if in PATH)
  4. Click Apply

Now you can press your hotkey to start/stop recording from anywhere!

Example Workflow

  1. Press your hotkey (e.g., Meta+Shift+S)
  2. See notification: "Recording... Press the hotkey again to stop."
  3. Start speaking
  4. Watch as your words appear in the active window
  5. Stop speaking for 8 seconds, or press the hotkey again

Configuration

Change Language

Edit src/transcribe.rs and change the language parameter:

.language("en") // Change to "es", "fr", "de", etc.

Then rebuild:

cargo build --release

Adjust Silence Timeout

Edit src/transcribe.rs and change:

const SILENCE_TIMEOUT_SECS: u64 = 8; // Change to desired seconds

Adjust Transcription Chunk Size

Edit src/transcribe.rs and change:

const CHUNK_DURATION_SECS: u64 = 2; // Smaller = faster, larger = more accurate

Troubleshooting

"OPENAI_API_KEY environment variable not set"

Make sure you've exported your API key in your shell profile and restarted your terminal/session.

"Failed to create Wayland virtual keyboard client"

  • Make sure you're running on Wayland (check with: echo $XDG_SESSION_TYPE)
  • Ensure you have the required runtime libraries installed (libxkbcommon, wayland-client)
  • Check that your Wayland compositor supports the virtual keyboard protocol

No audio input

  • Check your default microphone in system settings
  • Test with: pactl list sources short or pipewire-cli list-objects

Text not appearing

  • Verify you're on Wayland (check with: echo $XDG_SESSION_TYPE)
  • Check that the target window has focus
  • Some applications may not accept virtual keyboard input

Cost Estimate

OpenAI Whisper API pricing (as of 2024):

  • ~$0.006 per minute of audio
  • Example: 5 minutes of daily use = ~$0.03/day = ~$0.90/month

Technical Details

  • Language: Rust
  • Audio capture: cpal (PipeWire/PulseAudio)
  • Transcription: OpenAI Whisper API via async-openai
  • Keystroke injection: wrtype library (Wayland virtual keyboard protocol)
  • Notifications: notify-rust (D-Bus)

License

MIT

Contributing

Issues and pull requests welcome!