WhisperCrabs

March 9, 2026 · View on GitHub

Floating voice-to-text and text-to-speech tool for Linux, macOS, and Windows.

Click to record, click to transcribe, text copied to clipboard. Copy any text and click to hear it read aloud. Setup customized commands to hit record and stop. Play sound when transcription ready.

Local-first — runs entirely on your machine with whisper.cpp for speech-to-text and Piper for text-to-speech. No cloud required. Also supports any OpenAI-compatible API endpoint (Groq, Ollama, OpenRouter, LM Studio, LocalAI, etc.).

AI Agent-Ready — fully controllable via D-Bus.

Works with OpenCrabs, OpenClaw, and any AI agent that can run shell commands.

Simple setup: download binary, launch, switch provider via D-Bus.

WhisperCrabs button states

WhisperCrabs on macOS

Privacy

WhisperCrabs has no account, no telemetry, and no background processes. Your microphone is never accessed until you explicitly click the record button. Audio is captured in-memory, never written to disk. Only the transcribed text is stored locally in SQLite on your machine.

With local mode (PRIMARY_TRANSCRIPTION_SERVICE=local), everything stays on your machine - no network requests at all. With API mode (PRIMARY_TRANSCRIPTION_SERVICE=api), audio is sent to your configured endpoint (Groq by default, but can point to a local Ollama/LM Studio instance too).

Features

  • Floating microphone button (draggable, position persists)
  • One-click voice recording with visual feedback (red idle, green recording, orange transcribing)
  • STT — Local: whisper.cpp transcription, no internet required (Tiny, Base, Small, Medium models)
  • STT — API: any OpenAI-compatible endpoint (Groq, Ollama, OpenRouter, LM Studio, Custom)
  • TTS — Local: optional text-to-speech via Piper, 6 built-in voices (US/UK, male/female)
  • One-click switching via right-click menu for both STT and TTS
  • Custom API dialog — connect to any OpenAI-compatible endpoint with Base URL, API Key, and Model
  • Global keyboard shortcuts via D-Bus (works on GNOME, KDE, Sway, etc.)
  • Transcribed text copied to clipboard automatically
  • Provider and model choice persists across restarts (saved to DB)
  • SQLite history with right-click access
  • AI Agent-Ready: full D-Bus control for provider switching, custom API setup, recording
  • No background mic access — recording only on explicit click
  • Audio stays in-memory, never saved to disk

Right-Click Menu

Right-click the button to switch STT provider/model or TTS voice:

Right-click menu

STT API Presets

ProviderBase URLDefault ModelAPI Key
Groqhttps://api.groq.com/openai/v1whisper-large-v3-turboRequired
Ollamahttp://localhost:11434/v1whisperNot needed
OpenRouterhttps://openrouter.ai/api/v1openai/whisper-1Required
LM Studiohttp://localhost:1234/v1whisper-1Not needed
Custom API...User-configuredUser-configuredOptional

Quick Install

Download the pre-built binary from the latest release and run it. No build tools or Rust toolchain needed.

Linux (x86_64 / aarch64):

gh release download --repo adolfousier/whispercrabs --pattern 'whispercrabs-*-linux-x86_64.tar.gz'
tar xzf whispercrabs-*-linux-x86_64.tar.gz
chmod +x whispercrabs
./whispercrabs

macOS (Apple Silicon):

gh release download --repo adolfousier/whispercrabs --pattern 'whispercrabs-*-macos-aarch64.tar.gz'
tar xzf whispercrabs-*-macos-aarch64.tar.gz
chmod +x whispercrabs
./whispercrabs

Windows (x86_64):

gh release download --repo adolfousier/whispercrabs --pattern 'whispercrabs-*-windows-x86_64.zip'
Expand-Archive whispercrabs-*-windows-x86_64.zip -DestinationPath .
.\whispercrabs.exe

For local mode: right-click the button and select a local model size (Tiny, Base, Small, Medium) — downloads automatically.

For API mode: right-click and select a provider (Groq, OpenRouter, etc.) — enter your API key when prompted. Or set it via .env:

echo 'API_KEY=your-api-key-here' > .env

Runtime requirements

  • GTK 4 runtime libraries (pre-installed on GNOME desktops; bundled in Windows release ZIP; brew install gtk4 on macOS)
  • Working microphone
  • Python 3 (only if using TTS — auto-creates a venv with piper-tts)

AI Agent note: WhisperCrabs is fully controllable via D-Bus from the command line. An AI agent can set up WhisperCrabs for a user by: (1) downloading the binary from a GitHub release, (2) writing a .env with the user's API key, (3) launching the binary, and (4) switching transcription mode via D-Bus — no build step required. See the D-Bus Control section for all available commands.

Build from Source

System packages

Debian/Ubuntu:

sudo apt install libgtk-4-dev libgraphene-1.0-dev libvulkan-dev libasound2-dev cmake libclang-dev

Arch Linux:

sudo pacman -S gtk4 graphene vulkan-icd-loader alsa-lib cmake clang

Build tools

  • just (optional, for convenient commands)

Build & run

  1. Clone the repository:

    git clone https://github.com/adolfousier/whispercrabs.git
    cd whispercrabs
    
  2. Build and run:

    Local mode (downloads model automatically on first run):

    just run-local
    

    With a different model:

    just run-local ggml-small.en.bin
    

    API mode (requires API_KEY in .env):

    just run-api
    

    Without just (manual setup):

    # Download a whisper model for local mode
    mkdir -p ~/.local/share/whispercrabs/models
    curl -L -o ~/.local/share/whispercrabs/models/ggml-base.en.bin \
      https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
    
    # Set backend in .env
    # PRIMARY_TRANSCRIPTION_SERVICE=local  (or api)
    
    cargo build --release
    cargo run --release
    

Available whisper models

Models are downloaded from HuggingFace (ggerganov/whisper.cpp). Run just list-models to see options.

ModelSizeSpeedNotes
ggml-tiny.en.bin~75MBFastestEnglish only
ggml-base.en.bin~142MBFastEnglish only (default)
ggml-small.en.bin~466MBMediumEnglish only, better accuracy
ggml-medium.en.bin~1.5GBSlowEnglish only, high accuracy
ggml-large-v3.bin~3.1GBSlowestMultilingual, best accuracy

Usage

ActionWhat happens
Left-clickStart recording (button turns green with pulse)
Left-click againStop recording, transcribe, copy to clipboard
Left-click while speakingStop TTS playback
Esc (when focused)Stop recording
Right-clickPopover menu: STT provider (API/Local), TTS voice, Read Clipboard, History, Quit
DragMove the button anywhere on screen

After transcription completes, the text is copied to your clipboard. Paste with Ctrl+V wherever you need it.

Sound notification

Play an audio cue when transcription completes:

SOUND_NOTIFICATION_ON_COMPLETION=true

This is especially useful with local models that may take a few seconds to transcribe. You can keep working in another window, hear the notification when it's done, and just Ctrl+V to paste.

Text-to-Speech (Optional)

WhisperCrabs includes optional text-to-speech powered by Piper. To use it:

  1. Select any text on your machine and copy it (Ctrl+C / Cmd+C)
  2. Right-click the WhisperCrabs button
  3. Click Read Clipboard to hear it spoken aloud
  • TTS is completely optional — no setup required unless you want it
  • First use automatically installs a Python venv with piper-tts and downloads the selected voice model (~63 MB)
  • Button turns yellow while synthesizing, green while speaking — click to stop playback
  • Strips terminal formatting and markdown decoration before speaking

Available Voices

VoiceLocaleGender
AmyUS EnglishFemale
LessacUS EnglishFemale
RyanUS EnglishMale
KristinUS EnglishFemale
JoeUS EnglishMale
CoriUK EnglishFemale

Switch voices from the right-click menu under TTS Voices. Use Reset TTS to re-download in case of errors, or Delete TTS to remove all TTS data.

D-Bus Control

WhisperCrabs exposes D-Bus actions for full CLI control. Bind these to keyboard shortcuts, use them from scripts, or call them from an AI agent.

Start recording (raises window and begins recording):

gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate record [] {}

Stop recording (stops recording and triggers transcription):

gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate stop [] {}

Switch to a provider (e.g. Groq, Ollama, OpenRouter, LM Studio):

# Switch to Groq
gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate transcription-mode "[<'groq'>]" {}

# Switch to Ollama (local API, no key needed)
gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate transcription-mode "[<'ollama'>]" {}

# Switch to OpenRouter
gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate transcription-mode "[<'openrouter'>]" {}

# Switch to LM Studio
gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate transcription-mode "[<'lmstudio'>]" {}

Switch to local mode (auto-downloads model if missing, choose size):

# Local Base model (~142 MB, default)
gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate transcription-mode "[<'local-base'>]" {}

# Local Tiny (~75 MB), Small (~466 MB), or Medium (~1.5 GB)
gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate transcription-mode "[<'local-tiny'>]" {}
gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate transcription-mode "[<'local-small'>]" {}
gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate transcription-mode "[<'local-medium'>]" {}

Read clipboard aloud (TTS — auto-downloads voice on first use):

gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate speak [] {}

Switch TTS voice:

# Available voices: amy, lessac, ryan, kristin, joe, cori
gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate tts-mode "[<'ryan'>]" {}

Set custom API endpoint (programmatic, no dialog):

gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app \
  --method=org.gtk.Actions.Activate set-api-config \
  "[<'{\"base_url\":\"http://localhost:11434/v1\",\"model\":\"whisper\"}'>]" {}

# With API key:
gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app \
  --method=org.gtk.Actions.Activate set-api-config \
  "[<'{\"base_url\":\"https://api.example.com/v1\",\"api_key\":\"sk-...\",\"model\":\"whisper-1\"}'>]" {}

Keyboard Shortcuts

These D-Bus commands work on GNOME, KDE, Sway, Hyprland, i3, and any DE that supports custom shortcuts.

GNOME

Settings > Keyboard > Custom Shortcuts:

NameCommandSuggested shortcut
WhisperCrabs Recordgdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate record [] {}Alt+Shift+R
WhisperCrabs Stopgdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate stop [] {}Alt+Shift+S

KDE Plasma

System Settings > Shortcuts > Custom Shortcuts > Edit > New > Global Shortcut > Command/URL. Add the same gdbus commands above.

Sway / Hyprland / i3

Add to your config:

# Sway / i3
bindsym Alt+Shift+r exec gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate record [] {}
bindsym Alt+Shift+s exec gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate stop [] {}

# Hyprland
bind = ALT SHIFT, R, exec, gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate record [] {}
bind = ALT SHIFT, S, exec, gdbus call --session --dest=dev.whispercrabs.app --object-path=/dev/whispercrabs/app --method=org.gtk.Actions.Activate stop [] {}

Compatible API Backends

Any service exposing an OpenAI-compatible /v1/audio/transcriptions endpoint works. Set API_BASE_URL, API_KEY, and API_MODEL in your .env:

Groq (default, no config needed):

PRIMARY_TRANSCRIPTION_SERVICE=api
API_KEY=gsk_...

Ollama (local, no API key needed):

PRIMARY_TRANSCRIPTION_SERVICE=api
API_BASE_URL=http://localhost:11434/v1
API_KEY=unused
API_MODEL=whisper

OpenRouter:

PRIMARY_TRANSCRIPTION_SERVICE=api
API_BASE_URL=https://openrouter.ai/api/v1
API_KEY=sk-or-...
API_MODEL=openai/whisper-1

LM Studio:

PRIMARY_TRANSCRIPTION_SERVICE=api
API_BASE_URL=http://localhost:1234/v1
API_KEY=unused
API_MODEL=whisper-1

Stack

ComponentCrate/Tool
GUIgtk4-rs (GTK 4)
Audio capturecpal + hound
Audio playbackrodio
Local STTwhisper-rs (whisper.cpp) + rubato
API STTreqwest + OpenAI-compatible API
TTSpiper-tts (Python, optional)
Databaserusqlite (bundled SQLite)
Clipboardarboard
Configdotenvy

License

MIT - see LICENSE