dicti
June 15, 2026 · View on GitHub
Local, offline live dictation for Linux. Tap a key, talk, and your words appear as you speak, transcribed by whisper.cpp on your own machine and typed into whatever window has focus. No cloud, no account, no network.
I built this because I was used to a good dictation app on the Mac and wanted the same thing on Linux. Tested on Debian + GNOME (X11).
Status: v0.3 (alpha). Live streaming dictation; works well day to day, rough edges remain. See the ROADMAP.
Features
- Push-to-talk via a global key (the "Copilot"/AI key, or any key you bind).
- Live streaming: text appears as you speak, refined with full context (whisper
re-transcribes the whole utterance each pass, so quality matches batch). Append-only, so
it never rewrites text behind your cursor. Batch mode is one config line away (
mode = "batch"). - No silence hallucinations: whisper-server runs with silero VAD (padded so it doesn't clip the first word), and a word must also repeat across passes before it's typed, so the stock "thanks for watching" guesses on pauses never reach the screen.
- Fully offline: whisper.cpp medium model, GPU-accelerated via Vulkan.
- Long sessions (1-hour cap) with silence auto-stop after a few minutes of real quiet.
- Universal insertion: types the transcript with
ydotool, so it works in plain editors, IDEs and terminals alike. The transcript is also left on the clipboard as a safety net. - A single top-bar indicator (GNOME Shell extension): idle / listening / transcribing.
- Multilingual auto-detect (e.g. English + Polish).
How it works
[your key] -> keyd -> Super+Shift+Alt+F12 -> GNOME shortcut -> dictate-toggle
-> unix socket -> dictation daemon -> pw-record -> /tmp WAV
-> HTTP -> whisper-server (Vulkan) -> transcript -> ydotool type -> focused window
In streaming mode the daemon re-transcribes the whole utterance so far every ~2s (whisper always has full context, so quality matches batch) and types only the words that have stabilised across passes. It is append-only, so text already typed is never rewritten. whisper-server's VAD drops silence inside the window (padded so quiet onsets survive), and the repeat-across-passes rule is a second filter against hallucinations.
Two user services (whisper-server, dictation) plus the GNOME Shell extension
dicti@local for the indicator. The daemon mirrors its state to
$XDG_RUNTIME_DIR/dictation.state so the indicator can follow it.
Requirements
- Debian/Ubuntu-family distro (apt), PipeWire audio (
pw-record). - A Vulkan-capable GPU (integrated is fine). CPU fallback works but is ~4-5x slower.
- GNOME Shell (tested on 48) for the indicator extension.
xclipandxprop(x11-utils): used to insert accented/non-ASCII characters like Polish ąęóśżźćń via a paste (ydotool 1.x can only type ASCII keycodes), choosing Ctrl+V or Ctrl+Shift+V based on the focused app. The installer pulls them in.
Install
git clone https://github.com/tksimson/dicti.git
cd dicti
bash install/install.sh
The guided installer runs the phases in order (system packages, keyd remap, build
whisper.cpp + download/quantize the model, the user services, ydotool, the GNOME shortcut,
the indicator extension). You will be asked to log out/in once after the first phase so
input-group membership takes effect. Each phase is also runnable on its own from
install/00..07.
If your dictation key isn't a Copilot/AI key, run sudo evtest (or wev on Wayland) to
find its KEY_* name and edit keyd/default.conf.
Usage
- Tap your bound key to start listening, tap again to transcribe and insert.
- Pause to think freely; it won't cut off until a few minutes of real silence.
dictate-toggle [START|STOP|TOGGLE|CANCEL|STATUS]controls the daemon from the CLI.- Left-click the indicator to toggle, right-click for a menu.
Configuration
Copy and edit ~/.config/dicti/config.toml (the installer seeds one from
config/config.toml.example): mode (streaming or
batch) and the streaming phrase tuning, silence_timeout_sec, max_record_sec,
language, paste_method, the silence thresholds, and the transcript cleanup flags.
Restart after editing: systemctl --user restart dictation.
Tip: whisper transcribes one language per pass. language = "auto" works well for mixed
use now that streaming keeps full context, but if a quiet or ambiguous voice gets detected
wrong, pin it, e.g. language = "pl".
Troubleshooting
- Slow transcription / "Dictation degraded" popup: whisper-server fell back to CPU. Run
systemctl --user restart whisper-server; checkjournalctl --user -u whisper-server. - Nothing types: make sure
ydotooldruns andYDOTOOL_SOCKETis set, and that you are in theinputgroup (log out/in after install). - No top-bar icon: reload GNOME Shell (Alt+F2,
ron X11; log out/in on Wayland), thengnome-extensions enable dicti@local. - Live logs:
journalctl --user -u dictation -u whisper-server -f.
License
MIT, see LICENSE. Uses OpenAI's Whisper model via whisper.cpp; respect the model's license/terms.