Voice Apps Index
April 25, 2026 · View on GitHub
Index of voice typing, dictation, and speech-to-text applications and utilities.
Active Projects
Three parallel tracks, each with its own use case:
- Real-time streaming at cursor — speak and see text appear as you go, for chat, IDEs, and quick input. Covered by VoiceType (hybrid local + cloud) and Parakeet Type Ubuntu (local-only proof of concept).
- Long-form note dictation — speak a full note, get back polished, formatted text in one pass. Covered by AI Typer V2.
- Android voice-to-text reformatter — hold-to-talk, single-pass transcription + reformatting into a chosen preset (email, prompt, to-do, Hebrew). Covered by Voxcast.
VoiceType
The flexible hybrid track — aiming to blend local and cloud STT so the user picks the tradeoff per session (latency, cost, privacy). Currently cloud-only via Deepgram Nova-3 streaming with keyterm prompting; local inference is planned. Python + PyQt6, single-process, no root (evdev uinput via the input group). System tray, hotkeys, push-to-talk, VAD, and an in-app cost tracking dialog. Ships as a .deb package.
Parakeet Type Ubuntu
The local-only track — a focused proof of concept for running NVIDIA Parakeet / NeMo ASR models on AMD CPU inference via sherpa-onnx, with no cloud and no GPU required. Built-in punctuation, multiple model profiles, system tray, configurable hotkeys.
AI Typer V2
The long-form dictation track — single-pass multimodal audio understanding (Gemini via OpenRouter) where the model transcribes and formats in one call. Smart format detection (email / list / notes), VAD + AGC preprocessing, optional second-pass coherence review, custom dictionary with CSV import/export, streaming live-text preview, global F13–F24 hotkeys, append mode, and type-at-cursor that works in terminals as well as GUI apps.
Voxcast
The Android mobile track — a hold-to-talk voice-to-text app (Expo / React Native) that transcribes and reformats in a single OpenRouter call (Gemini 3.1 Flash Lite) into one of eight serious presets: business email, AI prompt, dev prompt, basic cleanup, to-do list, note to self, casual Hebrew, and Hebrew email. Email modes return separate subject + body for two-tap copy. One preset active at a time, no layering. Sibling project to Crazy-Keyboard but reframed as a productivity tool.
Earlier Iterations
Kept for reference — superseded by the active projects above.
Thought Pad
Two-stage process for creating notes from dictated speech — transcription via Whisper API followed by light text formatting. Exports to markdown. Predecessor to AI Typer V2.
Whisper Typer 0911
Early Whisper-based voice typing iteration.
Voice Keyboard
Early voice keyboard prototype.
Transcription Tools
Gemini Audio Transcriber
File upload based multimodal transcription tool using Gemini via Open Router.
Gemini Transcription Notepad
Gemini-powered transcription notepad with cleanup.
Gemini ASR Transcriber
Transcription notepad for Gemini ASR.
DVR Transcriber
Workflow workspace for importing recordings from a DVR and using AI for transcription.
Transcript Creator
Audio cleanup and transcription tool.
Local Multimodal Transcriber
Local transcription app with audio multimodal design.
ASR Transcription Pipeline
ASR transcription pipeline.
Transcription MCPs
Gemini Transcription MCP
MCP server for Gemini multimodal audio transcription with built-in post-processing.
Cloud ASR MCP
MCP for using various cloud ASR models for speech-to-text and transcription.
Local AI Transcription MCP
MCP for local AI transcription.
Local Transcription MCP
WIP MCP for local STT with cleanup on AMD GPU machines.
OR Audio Transcription MCP
Open Router-based audio transcription MCP server.
Evaluations & Benchmarks
Whisper Fine Tune Accuracy Eval
Comparing Whisper fine-tunes versus stock Whisper on local inference.
Whisper WPM Background Noise Eval
Quick eval to answer: how much does speaking pace affect WER/accuracy in ASR?
Transcription Cleanup Eval
Evaluating various cloud audio understanding models on the transcribe-and-cleanup workflow.
One Shot Transcription Microphone Eval
Test samples for various microphones with an STT accuracy evaluation.
Local ASR STT Benchmark
Quick evaluation to find the best STT model in Speech Note (Ubuntu) for local hardware.
Whisper WPM Test
Whisper words-per-minute testing.
Gemini 3.1 Lite Audio Understanding Eval
Evaluation of Gemini 3.1 Lite on audio understanding tasks.
Voice Cleanup Prompt Experiment
Testing various permutations in system prompting for raw audio transcript cleanup and comparing multimodal ASR vs. the STT + LLM approach.
Whisper Fine-Tuning & Setup
Whisper Finetune V2
Whisper fine-tuning iteration.
Modal Whisper Finetune Script
Validated fine-tuning script for fine-tuning Whisper on Modal GPU with a preformatted audio dataset.
Whisper Fine Tuning Data
Whisper fine-tuning dataset.
Whisper Fine Tune 171125
Whisper fine-tuning iteration.
Whisper Base FUTO
Whisper base model via FUTO.
Local STT Fine Tune Tests
Local STT fine-tuning tests.
Fine Tuned STT Formats
Fine-tuned STT data formats.
whisper-wayland-rocm
Whisper-Wayland with ROCm GPU acceleration — Docker setup for AMD GPUs.
whisper-cpp-rocm-setup
whisper.cpp ROCm setup scripts.
Whisper Local Notes
Notes on local Whisper usage.
ASR Training Data
ASR Training Data Collector
GUI to facilitate gathering training data for ASR/STT apps in organised datasets with audio capture, text capture, and JSONL metadata construction. Supports LLM-generated text and user-provided.
ASR Training Data Collector GUI Template
GUI template for ASR training data collection.
ASR Training Data Chunker
Breaks up texts by approximate reading duration for ASR training.
Other Utilities
Voice Note Recorder Ubuntu
GUI for recording voice notes on Ubuntu/Linux.
Readiness Voice Agent
Voice agent implementation for readiness checklists.
Voice Note Classification Model
Model for classifying voice notes.
Voice Note Classifier Model
Voice note classifier model.
Voice Note Dataset
Frontend for open source voice note dataset for annotation/classification project.
Voice Note Ragie Pipeline
Test pipeline: voice context data to Ragie.
Voice Prompt Cleanup Script
Audio processing cleanup script.
Transcription Macropad
Macropad configuration for transcription workflows.
Dictation Macropad
Plan/key allocation for a macropad optimised for heavy daily dictation workflows.
Voicepad
Planning notes for a macropad for STT users.
Voice Typer HW
Voice typer hardware notes.
Voice Headset Design
Voice headset design notes.
Dictation Microphones
Dictation microphone notes and comparisons.
speech-notes-with-text-fixes
Speech Note Linux app with text fixes — note taking, reading and translating with offline STT, TTS, and machine translation.
Hebrish Whisper Tester
Testing Whisper with Hebrew-English mixed speech.
Notes & Ideas
VoiceBox
Concept for a speech tech solution — specced out by Claude.
Linux Realtime Voice Typing
Planning and research for real-time voice typing on Linux (Deepgram, Gemini, Parakeet).
Live Typing UX Research
Claude-assisted technical research into live voice typing implementation approaches — streaming inference patterns, partial-result handling, turn detection, and UX tradeoffs for at-cursor dictation.
Linux Voice Typing App Notes
Planning notes for a Linux voice typing tool.
Speech To Text Chain Notes
Notes on STT processing chain for future voice projects.
Cloud STT Price Points
Point-in-time pricing snapshots for ASR services.
ASR And STT AI Notebook
Prompts and outputs on STT, ASR, and fine-tuning with Claude.
Linux Friendly Voice Tech
List of resources for voice technology with Linux support, encompassing STT, ASR, and dev frameworks.
Voice Control Linux
Claude-enhanced research for voice control platforms with Linux support.
voice-typing-collection
Collection of voice typing / STT GitHub repos for testing on Linux.
Awesome Whisper Apps
Useful speech-to-text tools that use Whisper under the hood (API/local).
Voiceflow Planner
Voiceflow planning notes.
STT TTS Train 1125
STT and TTS training notes.