Hyprvoice - Voice-Powered Typing for Wayland/Hyprland

March 15, 2026 · View on GitHub

26 voice models, cloud and local, built for hyprland dictation.

Press a toggle key, speak, and get instant text input. Built natively for Wayland/Hyprland with clean PipeWire capture and robust text injection.

Highlights

26 speech-to-text models across cloud and local providers, including whisper.cpp.
Optional LLM post-processing for grammar, punctuation, and more.
Toggle workflow with optional status notifications and cancel support.
Text injection via ydotool, wtype, and clipboard fallback with clipboard restore.
Guided onboarding and a full configure menu with hot-reload.
Personalization through custom prompt and keywords sent both to LLM and to voice model.
Whisprflow quality but for linux and open source.
Support for streaming models for blazing fast transcription.

Voice Providers and Models

All supported speech-to-text providers and models:

OpenAI (cloud)

whisper-1 (batch)
gpt-4o-transcribe (batch)
gpt-4o-mini-transcribe (batch)
gpt-4o-realtime-preview (streaming)

Groq (cloud)

whisper-large-v3
whisper-large-v3-turbo

ElevenLabs (cloud)

scribe_v1 (batch)
scribe_v2 (batch)
scribe_v2_realtime (streaming)

whisper-cpp (local)

English-only: tiny.en, base.en, small.en, medium.en
Multilingual: tiny, base, small, medium, large-v1, large-v2, large-v3, large-v3-turbo

Installation (AUR)

yay -S hyprvoice-bin
# or
paru -S hyprvoice-bin

The package installs system dependencies and the systemd user service. You'll still need an API key for a cloud provider, or whisper.cpp for local transcription. Onboarding will guide you through the choice.

Quick Start

Run onboarding:

hyprvoice onboarding

Enable and start the service:

systemctl --user enable --now hyprvoice.service

Add a keybinding (Hyprland example):

bind = SUPER, R, exec, hyprvoice toggle

See Hyprland Keybindings for push-to-talk and other patterns.

Test voice input:

hyprvoice toggle

Run hyprvoice configure anytime for advanced settings.

Hyprland Keybindings

Simple toggle

# ~/.config/hypr/hyprland.conf
bind = SUPER, R, exec, hyprvoice toggle

Each press toggles between recording and idle.

Push-to-talk (hold-to-record)

Combine both bind types to get hold-to-record behavior — press to start, release to stop:

# ~/.config/hypr/hyprland.conf
bind  = SUPER, R, exec, hyprvoice toggle   # key down → start recording
bindr = SUPER, R, exec, hyprvoice toggle    # key up   → stop and transcribe

This gives a walkie-talkie feel: hold the key while speaking, release when done. The daemon receives two toggle commands — the first starts recording, the second stops it and triggers transcription.

`bind` vs `bindr`

Keyword	Fires on
`bind`	Key press (down)
`bindr`	Key release (up)

With bindr, modifier keys (SUPER, CTRL, etc.) are fully released before the command executes. This can prevent modifiers from interfering with text injection.

Commands

Core CLI

hyprvoice onboarding
hyprvoice configure
hyprvoice serve
hyprvoice toggle
hyprvoice cancel
hyprvoice status
hyprvoice version
hyprvoice stop

Model management (whisper-cpp)

hyprvoice model list
hyprvoice model list --provider whisper-cpp
hyprvoice model download base.en
hyprvoice model remove base.en

Model testing (E2E)

hyprvoice test-models
hyprvoice test-models --audio /path/to/sample.wav --output test-models.json

Service management

systemctl --user status hyprvoice.service
systemctl --user restart hyprvoice.service
journalctl --user -u hyprvoice.service -f

Configuration

Configuration lives in ~/.config/hyprvoice/config.toml and hot-reloads automatically.

First-time setup: hyprvoice onboarding
Full TUI editor: hyprvoice configure

Docs

docs/config.md - configuration reference and examples
docs/providers.md - provider and model details
docs/architecture.md - architecture and adapter overview
docs/structure.md - code map and entry points
docs/testing.md - integration testing with test-models

# Check if already running
hyprvoice status

# Check for stale files
ls -la ~/.cache/hyprvoice/

# Clean up and restart
rm -f ~/.cache/hyprvoice/hyprvoice.pid
rm -f ~/.cache/hyprvoice/control.sock
hyprvoice serve

Command not found:

# Check installation
which hyprvoice

# Add to PATH if using ~/.local/bin
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Audio Issues

No audio recording:

# Check PipeWire is running
systemctl --user status pipewire

# Test microphone
pw-record --help
pw-record test.wav

# Check microphone permissions and levels

Audio device issues:

# List available audio devices
pw-cli list-objects | grep -A5 -B5 Audio

# Check microphone is not muted in system settings

Notification Issues

No desktop notifications:

# Test notify-send directly
notify-send "Test" "This is a test notification"

# Install if missing
sudo pacman -S libnotify  # Arch
sudo apt install libnotify-bin  # Ubuntu/Debian

Text Injection Issues

Text not appearing:

Ensure cursor is in a text field when toggling off recording

Check that wtype and wl-clipboard tools are installed:

# Test wtype directly
wtype "test text"

# Test clipboard tools
echo "test" | wl-copy
wl-paste

Verify Wayland compositor supports text input protocols
Check injection backends in configuration (fallback chain is most robust)

Clipboard issues:

# Install wl-clipboard if missing
sudo pacman -S wl-clipboard  # Arch
sudo apt install wl-clipboard  # Ubuntu/Debian

# Test clipboard functionality
wl-copy "test text"
wl-paste

Debug Mode

# Run daemon with verbose output
hyprvoice serve

# Check logs from systemd service (or just see results from hyprvoice serve)
journalctl --user -u hyprvoice.service -f

# Test individual commands
hyprvoice toggle
hyprvoice status

Architecture Overview

Hyprvoice uses a daemon + pipeline architecture for efficient resource management:

Control Daemon: Lightweight IPC server managing lifecycle
Pipeline: Stateful audio processing (recording → transcribing → processing → injecting)
State Machine: idle → recording → transcribing → processing → injecting → idle

System Architecture

flowchart LR
  subgraph Client
    CLI["CLI/Tool"]
  end
  subgraph Daemon
    D["Control Daemon (lifecycle + IPC)"]
  end
  subgraph Pipeline
    A["Audio Capture"]
    T["Transcribing"]
    I["Injecting (wtype + clipboard)"]
  end
  N["notify-send/log"]

  CLI -- unix socket --> D
  D -- start/stop --> A
  A -- frames --> T
  T -- status --> D
  D -- events --> N
  D -- inject action --> T
  T --> I
  I -->|done| D

stateDiagram-v2
  [*] --> idle
  idle --> recording: toggle
  recording --> transcribing: first_frame
  transcribing --> processing: llm_enabled
  transcribing --> injecting: llm_disabled
  processing --> injecting: inject_action
  injecting --> idle: done
  recording --> idle: abort
  injecting --> idle: abort

How It Works

Toggle recording → Pipeline starts, audio capture begins
Audio streaming → PipeWire frames buffered for transcription
Toggle stop → Recording ends, transcription starts
LLM processing → Text cleaned up (if enabled)
Text injection → Result typed or copied to clipboard
Return to idle → Pipeline cleaned up, ready for next session

Data Flow

toggle (daemon) → create pipeline → recording
First frame arrives → transcribing (daemon may notify Transcribing later)
Audio frames → audio buffer (collect all audio during session)
Second toggle during transcribing → transcribe collected audio
If LLM enabled → processing → clean up text with LLM
injecting → type or paste text
Complete → idle; pipeline stops; daemon clears reference
Notifications at key transitions

License

MIT License - see LICENSE.md for details.