Deepgram Voice Keyboard for Linux

November 21, 2025 · View on GitHub

alt text

Status: Working prototype! Successfully types into applications on Wayland (KDE Plasma 25.10)

A system-level voice-to-text/dictation application for Ubuntu Linux using Deepgram's cloud STT API. Types wherever your cursor is - works across all applications on Wayland.

Installation

  1. Download the latest .deb package from Releases

  2. Install:

sudo dpkg -i voice-keyboard_0.1.0_amd64.deb
sudo apt-get install -f  # Fix dependencies if needed
  1. Configure your Deepgram API key (choose one method):

    Method 1 - Via config file (recommended):

    # Edit the config file
    nano ~/.config/voice-keyboard/config.toml
    
    # Add your API key:
    # api_key = "your-deepgram-api-key-here"
    

    Method 2 - Via environment variable:

    export DEEPGRAM_API_KEY="your-api-key-here"
    # Add to ~/.bashrc or ~/.zshrc for persistence
    
  2. Run:

# Test STT functionality
voice-keyboard-launcher --test-stt

# Or launch GUI (default)
voice-keyboard-launcher

Or launch from your application menu.

Option 2: Build from Source

  1. Set your Deepgram API key:
export DEEPGRAM_API_KEY="your-api-key-here"
  1. Run the application:
./run.sh --test-stt

The run script will automatically:

  • Create a Python virtual environment (for future GUI components)
  • Build the Rust application
  • Run with proper privilege handling (sudo for keyboard creation, drops to user for audio)

Building a Debian Package

Use the unified package.sh script for all package management:

# Build only
./package.sh

# Build and install
./package.sh install

# Build and reinstall (removes old version first)
./package.sh reinstall

# Update version and build
./package.sh update 0.2.0

# Update version and install
./package.sh update 0.2.0 --install

# Show help
./package.sh --help

See PACKAGING.md for detailed packaging documentation.

Validated

✓ Text input into Kate text editor on Wayland (KDE Plasma 25.10) ✓ Cloud-based STT via Deepgram API ✓ Real-time transcription with on-the-fly punctuation

Project Structure

Voice-Typing-Ubuntu-App/
├── app/                    # Main application code (Rust)
│   ├── src/               # Source files
│   │   ├── main.rs        # Entry point with privilege handling
│   │   ├── audio_input.rs # Audio capture via CPAL
│   │   ├── stt_client.rs  # Deepgram WebSocket client
│   │   ├── virtual_keyboard.rs # Virtual keyboard & typing logic
│   │   ├── input_event.rs # Linux input event definitions
│   │   ├── api_spend.rs   # API cost tracking
│   │   └── gui/           # Qt GUI components
│   ├── Cargo.toml         # Dependencies and build config
│   └── target/            # Build artifacts
├── package.sh             # Unified package management (build/install/update)
├── run.sh                 # Development script to build and run
├── CLAUDE.md              # AI assistant context
├── PACKAGING.md           # Debian packaging documentation
└── README.md              # This file

Roadmap

See PROJECT_NOTES.md for the complete feature roadmap and project context.

MVP Features (In Priority Order)

  1. Hotkey support - Configurable activation key (default: F13), tap-to-start/tap-to-stop, with PTT option
  2. Microphone handling - Device selection with persistent settings
  3. Model selection - Starting with Deepgram, future: multiple cloud providers
  4. GUI - Simple interface with system tray integration
  5. Type anywhere - ✓ Works on Wayland across all applications
  6. API spend monitoring - Track usage costs with hourly polling

Future Enhancements

  • Mic level monitoring with dB display and alerts
  • Post-processing for transcriptions
  • Custom dictionary/replacements
  • Long-form STT with chunking for extended recordings

Why This Project?

After a year of voice typing becoming my default input method, I needed an OS-level dictation solution that:

  • Works with cloud STT APIs (not local inference)
  • Types anywhere on Wayland (the critical technical hurdle)
  • Supports quality real-time transcription with punctuation
  • Provides cost visibility for API usage

This builds on Deepgram's voice-keyboard-linux starter which successfully solved the Wayland typing challenge.

Technical Details

  • Language: Rust (core), Python (future GUI)
  • STT Provider: Deepgram cloud API
  • OS: Ubuntu 25.10 with KDE Plasma on Wayland
  • Audio: PipeWire via CPAL
  • Input: Virtual keyboard device via uinput

Credits

Based on Deepgram's voice-keyboard-linux starter project.