Hyprvoice - Voice-Powered Typing for Wayland/Hyprland
March 15, 2026 · View on GitHub
26 voice models, cloud and local, built for hyprland dictation.
Press a toggle key, speak, and get instant text input. Built natively for Wayland/Hyprland with clean PipeWire capture and robust text injection.
Highlights
- 26 speech-to-text models across cloud and local providers, including whisper.cpp.
- Optional LLM post-processing for grammar, punctuation, and more.
- Toggle workflow with optional status notifications and cancel support.
- Text injection via ydotool, wtype, and clipboard fallback with clipboard restore.
- Guided onboarding and a full configure menu with hot-reload.
- Personalization through custom prompt and keywords sent both to LLM and to voice model.
- Whisprflow quality but for linux and open source.
- Support for streaming models for blazing fast transcription.
Voice Providers and Models
All supported speech-to-text providers and models:
OpenAI (cloud)
whisper-1(batch)gpt-4o-transcribe(batch)gpt-4o-mini-transcribe(batch)gpt-4o-realtime-preview(streaming)
Groq (cloud)
whisper-large-v3whisper-large-v3-turbo
Mistral (cloud)
voxtral-mini-latest
ElevenLabs (cloud)
scribe_v1(batch)scribe_v2(batch)scribe_v2_realtime(streaming)
whisper-cpp (local)
- English-only:
tiny.en,base.en,small.en,medium.en - Multilingual:
tiny,base,small,medium,large-v1,large-v2,large-v3,large-v3-turbo
Deepgram (cloud)
flux-general-ennova-3nova-2
Installation (AUR)
yay -S hyprvoice-bin
# or
paru -S hyprvoice-bin
The package installs system dependencies and the systemd user service. You'll still need an API key for a cloud provider, or whisper.cpp for local transcription. Onboarding will guide you through the choice.
Quick Start
- Run onboarding:
hyprvoice onboarding
- Enable and start the service:
systemctl --user enable --now hyprvoice.service
- Add a keybinding (Hyprland example):
bind = SUPER, R, exec, hyprvoice toggle
See Hyprland Keybindings for push-to-talk and other patterns.
- Test voice input:
hyprvoice toggle
Run hyprvoice configure anytime for advanced settings.
Hyprland Keybindings
Simple toggle
# ~/.config/hypr/hyprland.conf
bind = SUPER, R, exec, hyprvoice toggle
Each press toggles between recording and idle.
Push-to-talk (hold-to-record)
Combine both bind types to get hold-to-record behavior — press to start, release to stop:
# ~/.config/hypr/hyprland.conf
bind = SUPER, R, exec, hyprvoice toggle # key down → start recording
bindr = SUPER, R, exec, hyprvoice toggle # key up → stop and transcribe
This gives a walkie-talkie feel: hold the key while speaking, release when done. The daemon receives two toggle commands — the first starts recording, the second stops it and triggers transcription.
bind vs bindr
| Keyword | Fires on |
|---|---|
bind | Key press (down) |
bindr | Key release (up) |
With bindr, modifier keys (SUPER, CTRL, etc.) are fully released before the command executes. This can prevent modifiers from interfering with text injection.
Commands
Core CLI
hyprvoice onboarding
hyprvoice configure
hyprvoice serve
hyprvoice toggle
hyprvoice cancel
hyprvoice status
hyprvoice version
hyprvoice stop
Model management (whisper-cpp)
hyprvoice model list
hyprvoice model list --provider whisper-cpp
hyprvoice model download base.en
hyprvoice model remove base.en
Model testing (E2E)
hyprvoice test-models
hyprvoice test-models --audio /path/to/sample.wav --output test-models.json
Service management
systemctl --user status hyprvoice.service
systemctl --user restart hyprvoice.service
journalctl --user -u hyprvoice.service -f
Configuration
Configuration lives in ~/.config/hyprvoice/config.toml and hot-reloads automatically.
- First-time setup:
hyprvoice onboarding - Full TUI editor:
hyprvoice configure
Docs
docs/config.md- configuration reference and examplesdocs/providers.md- provider and model detailsdocs/architecture.md- architecture and adapter overviewdocs/structure.md- code map and entry pointsdocs/testing.md- integration testing with test-models
Troubleshooting
Common Issues
Daemon Issues
Daemon won't start:
# Check if already running
hyprvoice status
# Check for stale files
ls -la ~/.cache/hyprvoice/
# Clean up and restart
rm -f ~/.cache/hyprvoice/hyprvoice.pid
rm -f ~/.cache/hyprvoice/control.sock
hyprvoice serve
Command not found:
# Check installation
which hyprvoice
# Add to PATH if using ~/.local/bin
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
Audio Issues
No audio recording:
# Check PipeWire is running
systemctl --user status pipewire
# Test microphone
pw-record --help
pw-record test.wav
# Check microphone permissions and levels
Audio device issues:
# List available audio devices
pw-cli list-objects | grep -A5 -B5 Audio
# Check microphone is not muted in system settings
Notification Issues
No desktop notifications:
# Test notify-send directly
notify-send "Test" "This is a test notification"
# Install if missing
sudo pacman -S libnotify # Arch
sudo apt install libnotify-bin # Ubuntu/Debian
Text Injection Issues
Text not appearing:
-
Ensure cursor is in a text field when toggling off recording
-
Check that
wtypeandwl-clipboardtools are installed:# Test wtype directly wtype "test text" # Test clipboard tools echo "test" | wl-copy wl-paste -
Verify Wayland compositor supports text input protocols
-
Check injection backends in configuration (fallback chain is most robust)
Clipboard issues:
# Install wl-clipboard if missing
sudo pacman -S wl-clipboard # Arch
sudo apt install wl-clipboard # Ubuntu/Debian
# Test clipboard functionality
wl-copy "test text"
wl-paste
Debug Mode
# Run daemon with verbose output
hyprvoice serve
# Check logs from systemd service (or just see results from hyprvoice serve)
journalctl --user -u hyprvoice.service -f
# Test individual commands
hyprvoice toggle
hyprvoice status
Architecture Overview
Hyprvoice uses a daemon + pipeline architecture for efficient resource management:
- Control Daemon: Lightweight IPC server managing lifecycle
- Pipeline: Stateful audio processing (recording → transcribing → processing → injecting)
- State Machine:
idle → recording → transcribing → processing → injecting → idle
System Architecture
flowchart LR
subgraph Client
CLI["CLI/Tool"]
end
subgraph Daemon
D["Control Daemon (lifecycle + IPC)"]
end
subgraph Pipeline
A["Audio Capture"]
T["Transcribing"]
I["Injecting (wtype + clipboard)"]
end
N["notify-send/log"]
CLI -- unix socket --> D
D -- start/stop --> A
A -- frames --> T
T -- status --> D
D -- events --> N
D -- inject action --> T
T --> I
I -->|done| D
stateDiagram-v2 [*] --> idle idle --> recording: toggle recording --> transcribing: first_frame transcribing --> processing: llm_enabled transcribing --> injecting: llm_disabled processing --> injecting: inject_action injecting --> idle: done recording --> idle: abort injecting --> idle: abort
How It Works
- Toggle recording → Pipeline starts, audio capture begins
- Audio streaming → PipeWire frames buffered for transcription
- Toggle stop → Recording ends, transcription starts
- LLM processing → Text cleaned up (if enabled)
- Text injection → Result typed or copied to clipboard
- Return to idle → Pipeline cleaned up, ready for next session
Data Flow
toggle(daemon) → create pipeline → recording- First frame arrives → transcribing (daemon may notify
Transcribinglater) - Audio frames → audio buffer (collect all audio during session)
- Second
toggleduring transcribing → transcribe collected audio - If LLM enabled → processing → clean up text with LLM
- injecting → type or paste text
- Complete → idle; pipeline stops; daemon clears reference
- Notifications at key transitions
License
MIT License - see LICENSE.md for details.