

Star counts and last commit dates are shown via shields.io badges and update dynamically.
For background, notes on how the repo is organized, and inclusion criteria, see notes.md. For a getting started guide, see starting-points.md.
- Automatic speech recognition (ASR)
- Speech-to-text (STT)
- Text-to-speech (TTS)
- Linux voice typing
- Linux dictation
- Linux TTS
- Voice control
- Transcription
Projects with explicit Wayland support. Particularly valuable for users on modern Linux desktops (GNOME, KDE Plasma on Wayland, Hyprland, Sway, niri, etc.) where X11 virtual input methods don't work.
Desktop applications for dictation and transcription with graphical interfaces.
| Repository | Stars | Last Updated | Description |
|---|
| AI-Typer-V2 |  |  | Voice dictation with multimodal AI cleanup — speak naturally, get polished text |
| aTrain |  |  | Audio transcription training tool |
| audiov |  |  | Speech-to-text, voice-typing, dictation software for Linux distributions |
| Buzz |  |  | Offline audio transcription and translation. Supports Whisper, Whisper.cpp, Faster-Whisper. Available via Flatpak/Snap. Vulkan GPU support |
| dsnote |  |  | Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation |
| LinuxWhisper |  |  | Whisper for Linux |
| maVoice-Linux |  |  | Voice control for Linux |
| mint-whisper |  |  | Whisper for Linux Mint |
| murmure |  |  | Fully local, private, cross-platform STT with LLM post-processing |
| OpenFlow |  |  | Local speech-to-text app for Linux |
| OpenWispr |  |  | Open source Whisper-based voice assistant |
| Parakeet-Type-Ubuntu |  |  | On-device voice typing for Linux using Parakeet and NeMo ASR models via sherpa-onnx. No cloud, no GPU required |
| sotto |  |  | Local speech-to-text transcription app for Linux using Whisper models |
| soundvibes |  |  | Speech-to-text for Linux that just works |
| TalkType (zyk42) |  |  | Cross-platform Electron voice writing assistant. ASR + LLM for understanding, polishing, and Q&A |
| TranscriptionSuite |  |  | Fully local, private STT app with speaker diarization, Audio Notebook mode, LM Studio integration, longform and live transcription |
| VoiceType |  |  | Fork of Deepgram's Linux starter. CLI to GUI + hotkey support, API key editing, cost tracking |
| WhisperNow |  |  | Real-time Whisper transcription |
| whisper-to-input-desktop |  |  | Desktop app using OpenAI's Whisper to transcribe audio and input it as text |
| whisper-ui |  |  | Whisper UI interface |
| whisperer |  |  | Whisper-based transcription tool |
| whisply |  |  | A simple GUI for OpenAI Whisper |
| wisper |  |  | Voice dictation app for Linux. Type directly at cursor with AI-powered transcription |
| wispr-lite |  |  | Lightweight Whisper-based transcription tool |
Command-line dictation and transcription tools.
Tools focused on capturing voice notes with AI post-processing (LLM cleanup, formatting, summarization).
Libraries and tools for low-latency, live transcription.
| Repository | Stars | Last Updated | Description |
|---|
| RealtimeSTT |  |  | Low-latency STT library with VAD, wake word activation. Uses WebRTCVAD + SileroVAD + Faster-Whisper |
| whisper_real_time |  |  | Real-time transcription with OpenAI Whisper |
| whisper_streaming |  |  | Real-time streaming Whisper with self-adaptive latency using local agreement policy |
| WhisperLive |  |  | Real-time Whisper transcription from Collabora. OpenVINO support, browser extensions, iOS client |
| WhisperLiveKit |  |  | 2025 SOTA streaming STT with speaker diarization. Simul-Whisper for ultra-low latency |
Docker-deployed tools and web interfaces for self-hosted STT.
| Repository | Stars | Last Updated | Description |
|---|
| meeting-minutes |  |  | Self-hostable meeting transcription and minutes generation |
| Scriberr |  |  | Voice transcription tool |
| Whisper-WebUI |  |  | A Gradio-based browser interface for Whisper. Easy subtitle generation |
| whisper-fastapi |  |  | Whisper FastAPI service |
Projects that use cloud STT APIs for transcription.
| Repository | Stars | Last Updated | Description |
|---|
| speech2keys |  |  | Speech to keystrokes using OpenAI Whisper API |
Open source voice assistants emphasizing local processing and privacy.
| Repository | Stars | Last Updated | Description |
|---|
| Neon AI |  |  | Privacy-first voice assistant. Offline-capable, customizable. Maintains Mycroft community forums |
| OpenVoiceOS |  |  | Community-driven voice assistant framework. Local processing, privacy-focused. Continuation of Mycroft |
| Project Alice |  |  | Modular smart assistant, fully offline. Built around Snips, guarantees privacy |
| SEPIA Framework |  |  | Self-hosted, privacy-compliant voice assistant ecosystem |
Tools that translate voice into actions — computer control, voice-to-commands, voice-to-JSON, etc.
| Repository | Stars | Last Updated | Description |
|---|
| ovos-buildroot |  |  | OpenVoiceOS - A minimalistic Linux OS bringing the open source voice assistant to IoT and embedded devices |
| Repository | Stars | Last Updated | Description |
|---|
| auto-subs |  |  | Automatic subtitle generation |
| whisper-subs |  |  | Whisper subtitle generation |
| Repository | Stars | Last Updated | Description |
|---|
| voiceprint |  |  | Voice biometric authentication for Linux |
Tools that aren't STT themselves, but help make the most of voice workflows.
| Repository | Stars | Last Updated | Description |
|---|
| easyeffects |  |  | Audio effects for PipeWire applications - noise reduction, equalization, and more |
| NoiseTorch |  |  | Real-time microphone noise suppression on Linux |
| Repository | Stars | Last Updated | Description |
|---|
| pyannote-audio |  |  | Neural building blocks for speaker diarization: speech activity detection, speaker embedding, clustering |
| Silero VAD |  |  | Enterprise-grade Voice Activity Detector. MIT license, <1ms per chunk on CPU |
| WebRTC VAD |  |  | Python interface to WebRTC Voice Activity Detector |
| wyoming-openwakeword |  |  | Custom wake word detection for Home Assistant |
ASR/STT toolkits and frameworks for building voice applications. Developer libraries rather than end-user applications.
| Repository | Stars | Last Updated | Description |
|---|
| Coqui STT |  |  | Deep learning STT toolkit (continuation of Mozilla DeepSpeech). Custom model training |
| fairseq |  |  | Meta's sequence modeling toolkit. Includes Wav2Vec 2.0 for self-supervised ASR |
| FunASR |  |  | End-to-end speech recognition toolkit from Alibaba. Industrial-grade models |
| NVIDIA NeMo |  |  | Enterprise ASR toolkit with Conformer/Parakeet models. GPU-accelerated training and inference |
| sherpa-onnx |  |  | STT, TTS, speaker diarization, VAD using next-gen Kaldi with ONNX Runtime. Offline, 12 programming languages |
| sherpa-onnx-go |  |  | Go package for sherpa-onnx speech recognition without network access |
| SpeechBrain |  |  | PyTorch-based speech toolkit for ASR, speaker recognition, speech enhancement |
| Vosk |  |  | Offline speech recognition API. Lightweight, 20+ languages, works on Raspberry Pi |
Optimized implementations and variants of OpenAI's Whisper model.
| Repository | Stars | Last Updated | Description |
|---|
| distil-whisper |  |  | HuggingFace's distilled Whisper. 6x faster, 49% smaller, within 1% WER |
| faster-whisper |  |  | CTranslate2 reimplementation. 4x faster, less memory, 8-bit quantization support |
| insanely-fast-whisper |  |  | CLI for fastest Whisper inference. Batching, flash attention, distil-whisper support |
| whisper.cpp |  |  | C/C++ port of Whisper. CPU inference, minimal dependencies, runs on edge devices |
| whisper-plus |  |  | Advanced Whisper pipelines with diarization, translation, and video transcription support |
| wyoming-faster-whisper |  |  | Wyoming protocol server for faster-whisper. Home Assistant integration |
| wyoming-whisper-api-client |  |  | Wyoming protocol client for Whisper APIs. Centralizes STT for Home Assistant |
| Repository | Stars | Last Updated | Description |
|---|
| claude-tts |  |  | TTS plugin for Claude Code — multi-provider support (ElevenLabs, OpenAI, Google, Amazon Polly, Azure, local system TTS) |
MCP (Model Context Protocol) servers that provide STT capabilities.
| Repository | Stars | Last Updated | Description |
|---|
| stt-mcp-server-linux |  |  | Local speech-to-text MCP server for Tmux on Linux (for use with Claude Code and other MCP clients) |
| Repository | Stars | Last Updated | Description |
|---|
| awesome-voice-typing |  |  | Curated list of open-source STT and voice typing tools for Linux, macOS, Windows, Android, and iOS |
| Voice-Apps-Index |  |  | Index for STT and dictation apps and WIPs |
Projects at the concept or specification stage.
| Repository | Stars | Last Updated | Description |
|---|
| VoiceBox |  |  | Idea for a speech tech solution — specced out by Claude |
Notable projects that are no longer actively maintained.
| Repository | Stars | Last Updated | Description |
|---|
| AI-Transcription-Notepad |  |  | Voice note taking utility using cloud audio multimodal models for single-pass transcription and text cleanup (archived) |