Voice Apps Index

April 25, 2026 · View on GitHub

Index of voice typing, dictation, and speech-to-text applications and utilities.

Active Projects

Three parallel tracks, each with its own use case:

Real-time streaming at cursor — speak and see text appear as you go, for chat, IDEs, and quick input. Covered by VoiceType (hybrid local + cloud) and Parakeet Type Ubuntu (local-only proof of concept).
Long-form note dictation — speak a full note, get back polished, formatted text in one pass. Covered by AI Typer V2.
Android voice-to-text reformatter — hold-to-talk, single-pass transcription + reformatting into a chosen preset (email, prompt, to-do, Hebrew). Covered by Voxcast.

VoiceType

The flexible hybrid track — aiming to blend local and cloud STT so the user picks the tradeoff per session (latency, cost, privacy). Currently cloud-only via Deepgram Nova-3 streaming with keyterm prompting; local inference is planned. Python + PyQt6, single-process, no root (evdev uinput via the input group). System tray, hotkeys, push-to-talk, VAD, and an in-app cost tracking dialog. Ships as a .deb package.

Parakeet Type Ubuntu

The local-only track — a focused proof of concept for running NVIDIA Parakeet / NeMo ASR models on AMD CPU inference via sherpa-onnx, with no cloud and no GPU required. Built-in punctuation, multiple model profiles, system tray, configurable hotkeys.

AI Typer V2

The long-form dictation track — single-pass multimodal audio understanding (Gemini via OpenRouter) where the model transcribes and formats in one call. Smart format detection (email / list / notes), VAD + AGC preprocessing, optional second-pass coherence review, custom dictionary with CSV import/export, streaming live-text preview, global F13–F24 hotkeys, append mode, and type-at-cursor that works in terminals as well as GUI apps.

Voxcast

The Android mobile track — a hold-to-talk voice-to-text app (Expo / React Native) that transcribes and reformats in a single OpenRouter call (Gemini 3.1 Flash Lite) into one of eight serious presets: business email, AI prompt, dev prompt, basic cleanup, to-do list, note to self, casual Hebrew, and Hebrew email. Email modes return separate subject + body for two-tap copy. One preset active at a time, no layering. Sibling project to Crazy-Keyboard but reframed as a productivity tool.

Earlier Iterations

Kept for reference — superseded by the active projects above.

Thought Pad

Two-stage process for creating notes from dictated speech — transcription via Whisper API followed by light text formatting. Exports to markdown. Predecessor to AI Typer V2.

Whisper Typer 0911

Early Whisper-based voice typing iteration.

Voice Keyboard

Early voice keyboard prototype.

Transcription Tools

Gemini Audio Transcriber

File upload based multimodal transcription tool using Gemini via Open Router.

Gemini Transcription Notepad

Gemini-powered transcription notepad with cleanup.

Gemini ASR Transcriber

Transcription notepad for Gemini ASR.

DVR Transcriber

Workflow workspace for importing recordings from a DVR and using AI for transcription.

Transcript Creator

Audio cleanup and transcription tool.

Local Multimodal Transcriber

Local transcription app with audio multimodal design.

ASR Transcription Pipeline

ASR transcription pipeline.

Transcription MCPs

Gemini Transcription MCP

MCP server for Gemini multimodal audio transcription with built-in post-processing.

Cloud ASR MCP

MCP for using various cloud ASR models for speech-to-text and transcription.

Local AI Transcription MCP

MCP for local AI transcription.

Local Transcription MCP

WIP MCP for local STT with cleanup on AMD GPU machines.

OR Audio Transcription MCP

Open Router-based audio transcription MCP server.

Evaluations & Benchmarks

Whisper Fine Tune Accuracy Eval

Comparing Whisper fine-tunes versus stock Whisper on local inference.

Whisper WPM Background Noise Eval

Quick eval to answer: how much does speaking pace affect WER/accuracy in ASR?

Transcription Cleanup Eval

Evaluating various cloud audio understanding models on the transcribe-and-cleanup workflow.

One Shot Transcription Microphone Eval

Test samples for various microphones with an STT accuracy evaluation.

Local ASR STT Benchmark

Quick evaluation to find the best STT model in Speech Note (Ubuntu) for local hardware.

Whisper WPM Test

Whisper words-per-minute testing.

Gemini 3.1 Lite Audio Understanding Eval

Evaluation of Gemini 3.1 Lite on audio understanding tasks.

Voice Cleanup Prompt Experiment

Testing various permutations in system prompting for raw audio transcript cleanup and comparing multimodal ASR vs. the STT + LLM approach.

Whisper Fine-Tuning & Setup

Whisper Finetune V2

Whisper fine-tuning iteration.

Validated fine-tuning script for fine-tuning Whisper on Modal GPU with a preformatted audio dataset.

Whisper Fine Tuning Data

Whisper fine-tuning dataset.

Whisper Fine Tune 171125

Whisper fine-tuning iteration.

Whisper Base FUTO

Whisper base model via FUTO.

Local STT Fine Tune Tests

Local STT fine-tuning tests.

Fine Tuned STT Formats

Fine-tuned STT data formats.

whisper-wayland-rocm

Whisper-Wayland with ROCm GPU acceleration — Docker setup for AMD GPUs.

whisper-cpp-rocm-setup

whisper.cpp ROCm setup scripts.

Whisper Local Notes

Notes on local Whisper usage.

ASR Training Data

ASR Training Data Collector

GUI to facilitate gathering training data for ASR/STT apps in organised datasets with audio capture, text capture, and JSONL metadata construction. Supports LLM-generated text and user-provided.

ASR Training Data Collector GUI Template

GUI template for ASR training data collection.

ASR Training Data Chunker

Breaks up texts by approximate reading duration for ASR training.

Other Utilities

Voice Note Recorder Ubuntu

GUI for recording voice notes on Ubuntu/Linux.

Readiness Voice Agent

Voice agent implementation for readiness checklists.

Voice Note Classification Model

Model for classifying voice notes.

Voice Note Classifier Model

Voice note classifier model.

Voice Note Dataset

Frontend for open source voice note dataset for annotation/classification project.

Voice Note Ragie Pipeline

Test pipeline: voice context data to Ragie.

Voice Prompt Cleanup Script

Audio processing cleanup script.

Transcription Macropad

Macropad configuration for transcription workflows.

Dictation Macropad

Plan/key allocation for a macropad optimised for heavy daily dictation workflows.

Voicepad

Planning notes for a macropad for STT users.

Voice Typer HW

Voice typer hardware notes.

Voice Headset Design

Voice headset design notes.

Dictation Microphones

Dictation microphone notes and comparisons.

speech-notes-with-text-fixes

Speech Note Linux app with text fixes — note taking, reading and translating with offline STT, TTS, and machine translation.

Hebrish Whisper Tester

Testing Whisper with Hebrew-English mixed speech.

Notes & Ideas

VoiceBox

Concept for a speech tech solution — specced out by Claude.

Linux Realtime Voice Typing

Planning and research for real-time voice typing on Linux (Deepgram, Gemini, Parakeet).

Live Typing UX Research

Claude-assisted technical research into live voice typing implementation approaches — streaming inference patterns, partial-result handling, turn detection, and UX tradeoffs for at-cursor dictation.

Linux Voice Typing App Notes

Planning notes for a Linux voice typing tool.

Speech To Text Chain Notes

Notes on STT processing chain for future voice projects.

Cloud STT Price Points

Point-in-time pricing snapshots for ASR services.

ASR And STT AI Notebook

Prompts and outputs on STT, ASR, and fine-tuning with Claude.

Linux Friendly Voice Tech

List of resources for voice technology with Linux support, encompassing STT, ASR, and dev frameworks.

Voice Control Linux

Claude-enhanced research for voice control platforms with Linux support.

voice-typing-collection

Collection of voice typing / STT GitHub repos for testing on Linux.

Awesome Whisper Apps

Useful speech-to-text tools that use Whisper under the hood (API/local).

Voiceflow Planner

Voiceflow planning notes.

STT TTS Train 1125

STT and TTS training notes.