README.md
May 26, 2026 Β· View on GitHub
My Translator is a real-time speech translation desktop app built with Tauri. It captures audio directly from your system or microphone, transcribes it, and displays translations in a minimal overlay β with no intermediary server involved.
π Installation guides: macOS (EN) Β· macOS (VI) Β· Windows (EN) Β· Windows (VI)
How It Works
βββ βοΈ Soniox (text) βββ
System Audio / Mic β 16kHz PCM ββββββΌββ β‘ OpenAI Realtime (text+π) ββΌββ Overlay UI
βββ π Qwen LiveTranslate Flash (text only) β
βββ π₯οΈ Local MLX (text, offline) ββ
β (optional, text engines)
TTS (Edge / Google / ElevenLabs) β π
Four translation engines, pick what fits your call:
| Feature | Detail |
|---|---|
| Engines | βοΈ Soniox Β· β‘ OpenAI Realtime Β· π Qwen LiveTranslate Flash Β· π₯οΈ Local MLX |
| Latency | ~2 s (Soniox / OpenAI) Β· ~4 s (Qwen) Β· ~10 s (Local) |
| Languages | 70+ source β any target (Soniox), 13 targets (OpenAI), 60+ source+target (Qwen), JA/EN/ZH/KO β VI/EN (Local) |
| Cost | ~$0.12/hr (Soniox) Β· ~$4/hr (OpenAI, includes voice) Β· Free preview (Qwen, text-only) Β· Free (Local) |
| TTS | 3 providers for Soniox / Local (Edge free, Google, ElevenLabs) β OpenAI streams its own voice (off by default), Qwen text-only |
| Platform | macOS (ARM + Intel) Β· Windows Β· Local mode = Apple Silicon only |
| Signed | β macOS signed & notarized |
| Auto-Update | β Built-in, check & install from Settings |
π Detailed head-to-head: OpenAI Realtime vs Soniox benchmark β speed, quality, cost, and translation-mechanism comparison from a 5-min real-world test.
Features
π Dual Panel View
Two display modes:
- Single (default) β Translation text only, clean and focused
- Dual β Source | Translation side-by-side, each panel scrolls independently
Toggle with the panel button (bottom-right on hover).
π Smart Scroll
Auto-scroll only when you're at the bottom. Scroll up to read old content without being yanked back down.
π€ Quick Font Size
A- / A+ floating controls (bottom-right on hover). Font size adjustable up to 140px β great for presentations.
π Two-Way Translation
Translate conversations between two languages simultaneously β ideal for bilingual meetings.
- One-way: Source language β Target language (e.g., Japanese β Vietnamese)
- Two-way: Language A β Language B (e.g., Vietnamese β Japanese) β the app detects who is speaking and translates to the other language automatically
Setup for video calls (Zoom, Google Meet, MS Teams):
- Audio Source: Both (System + Mic)
- Translation Type: Two-way
- Set Language A and Language B
Note: TTS narration is automatically disabled in two-way mode to prevent audio feedback loops (TTS output β mic recapture β re-translation).
ποΈ TTS Narration
Read translations aloud in one-way mode β 3 providers:
| Edge TTS β | Google Chirp 3 HD | ElevenLabs | |
|---|---|---|---|
| Cost | Free | Free 1M chars/mo | ~$5/mo+ |
| Quality | β β β β β Neural | β β β β β Near-human | β β β β β Premium |
| Vietnamese | β 2 voices | β 6 voices | β Yes |
| Setup | None | Google Cloud API key | API key |
| Speed control | β | β 0.5xβ2.0x | β |
TTS is OFF by default β toggle with the TTS button or β T.
π TTS guide: English Β· TiαΊΏng Viα»t
π Custom Translation Terms
Define how domain-specific words should be translated:
Original sin = Tα»i nguyΓͺn tα»
Christ = KitΓ΄
Pneumonia = ViΓͺm phα»i
Add terms in Settings β Translation β Translation terms. Great for religious, medical, or technical content.
β‘ OpenAI Realtime Mode
Single-call streaming translation via OpenAI's gpt-realtime-translate (May 2026 GA). Returns translated text and translated speech audio over one WebSocket β no separate TTS step, lower end-to-end latency, more idiomatic output. Trade-off: ~$4/hr, charged to your own OpenAI account. 13 target languages: en, es, pt, fr, de, it, ru, hi, id, vi, ja, ko, zh.
Two-way mode and the custom TTS toggle are unavailable while OpenAI Realtime is selected (audio is native).
π Qwen LiveTranslate Flash Mode
Alibaba DashScope qwen3-livetranslate-flash-realtime β streams translated text (no native voice) on Qwen's free preview tier, with a 60-language picker matching the mobile app. Server-side VAD handles turn detection, so it works with mic / system audio / both. Translation-only display (no source-transcript panel; the model doesn't expose ASR). Get a key from Alibaba Cloud Bailian (Singapore region only β other regions hit a different endpoint and fail).
Source language must be picked explicitly (auto-detect is disabled on this engine β Live Flash stalls on real mic input when source is "auto"). Two-way mode and the custom TTS toggle are also disabled while Qwen is selected.
π₯οΈ Local Mode (Apple Silicon only)
Experimental offline mode using MLX + Whisper + Gemma β runs 100% on-device. JA/EN/ZH/KO β VI/EN.
Privacy
Your audio never touches our servers β because there are none.
- App connects directly to APIs you configure β no relay, no middleman
- You own your API keys β stored locally, never transmitted elsewhere
- No account, no telemetry, no analytics β zero tracking
- Transcripts saved as
.mdfiles locally, per session
Tech Stack
- Tauri 2 β Rust backend + WebView frontend
- ScreenCaptureKit β macOS system audio
- WASAPI β Windows system audio
- cpal β Cross-platform microphone
- Soniox β Real-time STT + translation
- OpenAI Realtime Translate β
gpt-realtime-translate(text + native voice) - MLX β On-device Whisper + Gemma for offline mode
- Edge TTS β Free neural TTS (default)
- Google Cloud TTS β Chirp 3 HD (near-human quality)
- ElevenLabs β Premium TTS
Build from Source
git clone https://github.com/phuc-nt/my-translator.git
cd my-translator
npm install
npm run tauri build
Requires: Rust (stable), Node.js 18+, macOS 13+ or Windows 10+.
Star History
License
MIT