🇺🇦 Speech Recognition & Synthesis for Ukrainian

September 12, 2025 · View on GitHub

Overview

This repository collects links to models, datasets, and tools for Ukrainian Speech-to-Text and Text-to-Speech.

Speech-UK initiative

We have datasets/models/leaderboards on Hugging Face, check it out:

Community

Discord

🎤 Speech-to-Text

📦 Implementations

wav2vec2-bert

wav2vec2

You can check demos out here: https://github.com/egorsmkv/wav2vec2-uk-demo

HuBERT

Citrinet

ContextNet

FastConformer

Squeezeformer

Conformer-CTC

Whisper

Quantized variants:

Lite Whisper:

OWSM, OWSM-CTC, and OWLS

Flashlight

MMS

data2vec

VOSK

Models: https://huggingface.co/Yehor/vosk-uk

DeepSpeech

M-CTC-T

moonshine-tiny-uk

📊 Benchmarks

This benchmark uses Common Voice 10 test split.

  • WER: Word Error Rate
  • CER: Character Error Rate

wav2vec2-bert

ModelWERCERAccuracy (words)
Yehor/w2v-bert-uk (FP16)6.6%1.34%93.4%
Yehor/w2v-bert-uk-v2.1 (FP16)17.34%3.33%82.66%

wav2vec2

ModelWERCERAccuracy (words)
Yehor/w2v-xls-r-uk20.24%3.64%79.76%
robinhad/wav2vec2-xls-r-300m-uk27.36%5.37%72.64%
arampacha/wav2vec2-xls-r-1b-uk16.52%2.93%83.48%

HuBERT

ModelWERCERAccuracy (words)
Yehor/hubert-uk (FP16)37.07%6.87%62.93%

Citrinet

ModelWERCERAccuracy (words)
nvidia/stt_uk_citrinet_1024_gamma_0_254.32%0.94%95.68%
neongeckocom/stt_uk_citrinet_512_gamma_0_257.46%1.6%92.54%

ContextNet

ModelWERCERAccuracy (words)
theodotus/stt_uk_contextnet_5126.69%1.45%93.31%

FastConformer P&C

This model supports text punctuation and capitalization

ModelWERCERAccuracy (words)
nvidia/stt_ua_fastconformer_hybrid_large_pc4.52%1%95.48%
theodotus/stt_ua_fastconformer_hybrid_large_pc4%1.02%96%

Squeezeformer

ModelWERCERAccuracy (words)
theodotus/stt_uk_squeezeformer_ctc_xs10.78%2.29%89.22%
theodotus/stt_uk_squeezeformer_ctc_sm8.2%1.75%91.8%
theodotus/stt_uk_squeezeformer_ctc_ml5.91%1.26%94.09%

Conformer-CTC

ModelWERCERAccuracy (words)
taras-sereda/uk-pods-conformer6.75%1.41%93.25%

Whisper

ModelWERCERAccuracy (words)
tiny63.08%18.59%36.92%
base52.1%14.08%47.9%
small30.57%7.64%69.43%
medium18.73%4.4%81.27%
large (v1)16.42%3.93%83.58%
large (v2)13.72%3.18%86.28%
large (v3)20.53%5.28%79.478%
turbo22.83%7.05%77.17%

Quantized versions:

ModelWERCERAccuracy (words)
Yehor/whisper-large-v2-quantized-uk14.95%4.23%85.05%
Yehor/whisper-large-v3-turbo-quantized-uk12.75%3.25%87.25%
efficient-speech/lite-whisper-large-v3-turbo42.89%12.59%57.11%
efficient-speech/lite-whisper-large-v3-turbo-acc17.79%4.34%82.21%

If you want to fine-tune a Whisper model on own data, then use this repository: https://github.com/egorsmkv/whisper-ukrainian

Flashlight

ModelWERCERAccuracy (words)
Flashlight Conformer19.15%2.44%80.85%

data2vec

ModelWERCERAccuracy (words)
robinhad/data2vec-large-uk31.17%7.31%68.83%

VOSK

ModelWERCERAccuracy (words)
v353.25%38.78%46.75%

m-ctc-t

ModelWERCERAccuracy (words)
speechbrain/m-ctc-t-large57%10.94%43%

DeepSpeech

ModelWERCERAccuracy (words)
v0.570.25%20.09%29.75%

moonshine-tiny-uk

ModelWERCERAccuracy (words)
UsefulSensors/moonshine-tiny-uk24.54%7.58%75.46%

📖 Development

📚 Datasets

Compiled dataset: ~1200 hours

Voice of America: ~390 hours

FLEURS

Ukrainian broadcast: ~300 hours

YODAS2: ~400 hours

Ukrainian podcasts

Cleaned Common Voice 10 (test set)

Noised Common Voice 10

Other

Language models

Inverse Text Normalization

Text Enhancement

Aligners

Other

📢 Text-to-Speech

Test sentence with stresses:

К+ам'ян+ець-Под+ільський - м+істо в Хмельн+ицькій +області Укра+їни, ц+ентр Кам'ян+ець-Под+ільської міськ+ої об'+єднаної територі+альної гром+ади +і Кам'ян+ець-Под+ільського рай+ону.

Without stresses:

Кам'янець-Подільський - місто в Хмельницькій області України, центр Кам'янець-Подільської міської об'єднаної територіальної громади і Кам'янець-Подільського району.

📦 Implementations

StyleTTS2

P-Flow TTS

https://github.com/egorsmkv/speech-recognition-uk/assets/7875085/18cfc074-f8a1-4842-90b6-9503d0bb7250

RAD-TTS

https://user-images.githubusercontent.com/7875085/206881140-bf8c09e7-5553-43d9-8807-065c36b2904b.mp4

Coqui TTS

https://user-images.githubusercontent.com/5759207/167480982-275d8ca0-571f-4d21-b8d7-3776b3091956.mp4

Neon TTS

https://user-images.githubusercontent.com/96498856/170762023-d4b3f6d7-d756-4cb7-89de-dc50e9049b96.mp4

FastPitch

Balacoon TTS

https://github.com/clementruhm/speech-recognition-uk/assets/87281103/a13493ce-a5e5-4880-8b72-42b02feeee50

MMS

📚 Datasets

Accentors

Grapheme-to-Phoneme

ipa-uk:

Charsiu G2P:

Other:

Misc