VoiceBox
March 26, 2026 · View on GitHub
Status: Work in Progress — concept and research phase
Portable hardware speech-to-text device that emulates a USB HID keyboard. Plug in, talk, and text appears — no drivers, no host software, any OS.

Concept
VoiceBox is a compact, self-contained device that captures speech via an integrated microphone, runs STT inference on an embedded NPU, and emits the recognized text as standard USB HID keystrokes. It works on any device that accepts a USB keyboard — Windows, macOS, Linux, ChromeOS, Android, iOS, game consoles, smart TVs, kiosks — with zero setup.
Users carry their preferred STT model, custom word lists, and configuration physically between machines.
Full Specification | Original Idea
Research
| Document | Summary |
|---|---|
| Feasibility Analysis | Technical viability of on-device STT, USB HID emulation, power, and thermal |
| Market Analysis | Existing products, competitive landscape, and VoiceBox's unique niche |
| Components and BOM | Suggested hardware, bill of materials, and cost estimates |
Key Findings
- No existing product combines on-device STT with USB HID keyboard output
- RK3588 NPU can run SenseVoiceSmall at 20x real-time; whisper.cpp tiny at ~3x real-time on CPU
- USB HID gadget mode is mature on Linux SBCs (Raspberry Pi and RK3588)
- Prototype BOM estimated at ~$120-170; production target ~$70-100
- Power draw of 8-12W requires USB-C PD (not bus power alone)
Hardware Target
- SoC: RK3588S (Orange Pi 5) — 6 TOPS NPU, quad A76 + quad A55
- Form factor: Desktop puck/wedge, ~100mm x 80mm x 30mm
- Connectivity: USB-C (HID keyboard) + Bluetooth LE
- Microphone: Gooseneck directional mic + onboard MEMS fallback
- Storage: 32-128GB (models, word lists, firmware)