Gruut IPA

November 10, 2021 · View on GitHub

Library for manipulating International Phonetic Alphabet (IPA) pronunciations.

Features include:

  • Getting the category and details of a phone, e.g. "open front unrounded vowel" for ɶ
  • Splitting IPA pronunciations into groups of:
    • Phones (/ˈt͡ʃuːz/ to ˈt͡ʃ uː z )
    • Phonemes (/kˈaʊ/ to k ˈaʊ for U.S. English)
  • Converting pronunciations between:

Supported Languages:

  • Arabic (ar)
  • Czech (cs-cz)
  • German (de-de)
  • U.S. English (en-us)
  • U.K. English (en-gb)
  • Spanish (es-es)
  • Persian/Farsi (fa)
  • Spanish (es-es)
  • Italian (it-it)
  • Luxembourgish (lb-lb)
  • Dutch (nl)
  • Portuguese (pt)
  • Russian (ru-ru)
  • Swahili (sw)

Installing

$ pip install gruut-ipa

Dependencies

  • Python 3.6 or higher

For command-line usage, you may also want:

Install these with:

$ sudo apt-get install espeak jq

Phones and Phonemes

IPA phones

Phones in IPA are composed of different components:

  • Letters
    • Non-combining Unicode characters that represent a distinct human sound (phone)
  • Suprasegmentals
    • Non-combining Unicode characters that represent language features above individual vowels or consonants
    • Stress (ˈˌ), elongation (ː), linking/ties (t͡s), and short/long breaks (| ‖) are suprasegmentals
  • Diacritics

See IPA Chart for more details.

Phonemes

While phones represent individual sounds, phonemes are the phonetic units of a language that meaningfully distinguish words. A phoneme may be realized by many different phones. For example, the /r/ in Standard German can be realized as a uvular fricative (χ/ʁ), a uvular approximant (ɹ), or a uvular tap or trill (ʀ/r).

A phoneme may also be composed of multiple phones, such as the dipthong in U.S. English (the "ow" in "cow").

Supported languages in gruut-ipa contain a phonemes.txt file in the gruut_ipa/data directory. This file has the following format:

<phoneme> <example> [<replace> ...]

where <phoneme> is a set of IPA letters, like ɶ or . The <example> is a word whose pronunciation contains the <phoneme>. After that, there are one or more optional <replace> strings that will be replaced with <phoneme>. The German /r/ example from above might be represented as:

r brot χ ʁ ɹ ʀ

Phonemes for a given language come from phonological analyses and from public databases. Ultimately, they are geared towards capturing pronunciations from Wiktionary.

Usage

Print JSON information about phones:

$ python3 -m gruut_ipa describe "ˈãː" | jq .
{
  "text": "ˈãː",
  "letters": "a",
  "stress": "primary",
  "height": "open",
  "placement": "front",
  "rounded": false,
  "type": "Vowel",
  "nasalated": true,
  "elongated": true
}

Split an IPA pronunciation into phones:

$ python3 -m gruut_ipa phones "ˈjɛs|ˈt͡ʃuːz aɪpiːeɪ‖"
ˈj ɛ s | ˈt͡ʃ z a ɪ p e ɪ

Group phones into phonemes for a specific language:

$ python3 -m gruut_ipa phonemes en-us "/dʒʌst ə kaʊ/"
d͡ʒ ʌ s t ə k

Convert between IPA, espeak, and sampa:

$ python3 -m gruut_ipa convert ipa espeak "mʊmˈbaɪ"
[[mUm'baI]]

$ python3 -m gruut_ipa convert espeak ipa "[[D,Is Iz sVm f@n'EtIk t'Ekst 'InpUt]]"
ðˌɪs ɪz sʌm fɘnˈɛtɪk tˈɛkst ˈɪnpʊt

Chain commands together:

$ python3 -m gruut_ipa convert espeak ipa "[[k'aU]]" | \
    python3 -m gruut_ipa phonemes en-us --keep-stress
k ˈaʊ

Alternative Phoneme Sets

Some languages have multiple phoneme sets available:

  • U.S. English (en-us)
    • CMUDict (en-us/cmudict)
    • Zamia (en-us/zamia)
  • Swahili (sw)

Convert from IPA to alternative phoneme set:

$ python3 -m gruut_ipa convert ipa en-us/cmudict "h ɛ l ˈoʊ w ˈɚ l d"
HH EH0 L OW1 W ER1 L D

Convert from alternative phoneme set to IPA:

$ python3 -m gruut_ipa convert en-us/cmudict ipa "HH EH0 L OW1 W ER1 L D"
h ɛ l ˈoʊ w ˈɚ l d

Scripts

Use the speak-ipa script to have espeak pronounce IPA. You may need to apt-get install espeak first.

$ echo '/hɛloʊ wɝld/' | bin/speak-ipa en-us -s 60 -w 'hello world.wav'
$ aplay 'hello world.wav'

Phones

Supported IPA phones can be printed with:

$ python3 -m gruut_ipa print
{"text": "i", "letters": "i", "stress": "none", "height": "close", "placement": "front", "rounded": false, "type": "Vowel", "nasalated": false, "elongated": false, "description": "close front unrounded vowel", "espeak": "i", "sampa": "i"}
{"text": "y", "letters": "y", "stress": "none", "height": "close", "placement": "front", "rounded": true, "type": "Vowel", "nasalated": false, "elongated": false, "description": "close front rounded vowel", "espeak": "y", "sampa": "y"}
...

A nice table can be generated with jq:

$ python3 -m gruut_ipa print | \
    jq -r '. | "\(.text)\t\(.espeak)\t\(.sampa)\t\(.description)"'

Converted to Markdown:

IPAeSpeakSampaDescription
iiiclose front unrounded vowel
yyyclose front rounded vowel
ɨi"1close central unrounded vowel
ʉu"}close central rounded vowel
ɯu-Mclose back unrounded vowel
uuuclose back rounded vowel
ɪIInear-close near-front unrounded vowel
ʏI.Ynear-close near-front rounded vowel
ʊUUnear-close near-back rounded vowel
eeeclose-mid front unrounded vowel
øY2close-mid front rounded vowel
ɘ@@\close-mid central unrounded vowel
ɵ@.8close-mid central rounded vowel
ɤo-7close-mid back unrounded vowel
oooclose-mid back rounded vowel
ɛEEopen-mid front unrounded vowel
œW9open-mid front rounded vowel
ɜV"3open-mid central unrounded vowel
ɞO"3\open-mid central rounded vowel
ʌVVopen-mid back unrounded vowel
ɔOOopen-mid back rounded vowel
æa{near-open front unrounded vowel
ɐV6near-open central unrounded vowel
aaaopen front unrounded vowel
ɶW&open front rounded vowel
ɑAAopen back unrounded vowel
ɒA.Qopen back rounded vowel
mmmvoiced bilabial nasal
ɱMFvoiced labio-dental nasal
nnnvoiced alveolar nasal
ɳn.n`voiced retroflex nasal
ŋNNvoiced velar nasal
ɴn"N\voiced uvular nasal
pppvoiceless bilabial plosive
bbbvoiced bilabial plosive
tttvoiceless alveolar plosive
dddvoiced alveolar plosive
ʈt.t`voiceless retroflex plosive
ɖd.d`voiced retroflex plosive
cccvoiceless palatal plosive
ɟJJ\voiced palatal plosive
kkkvoiceless velar plosive
ɡggvoiced velar plosive
gggvoiced velar plosive
qqqvoiceless uvular plosive
ɢGG\voiced uvular plosive
ʡ>\voiceless pharyngeal plosive
ʔ??voiceless glottal plosive
p͡fpfpfvoiceless labio-dental affricate
b͡vbvbvvoiced dental affricate
t̪͡stst_dsvoiceless dental affricate
t͡ststsvoiceless alveolar affricate
d͡zdzdzvoiced alveolar affricate
t͡ʃtStSvoiceless post-alveolar affricate
d͡ʒdZdZvoiced post-alveolar affricate
ʈ͡ʂtSts`voiceless retroflex affricate
ɖ͡ʐdzdz`voiced retroflex affricate
t͡ɕtS;ts\voiceless palatal affricate
d͡ʑdZ;dz\voiced palatal affricate
k͡xkk_xvoiceless velar affricate
ɸFp\voiceless bilabial fricative
βBBvoiced bilabial fricative
fffvoiceless labio-dental fricative
vvvvoiced labio-dental fricative
θTTvoiceless dental fricative
ðDDvoiced dental fricative
sssvoiceless alveolar fricative
zzzvoiced alveolar fricative
ʃSSvoiceless post-alveolar fricative
ʒZZvoiced post-alveolar fricative
ʂs.s`voiceless retroflex fricative
ʐz.z`voiced palatal fricative
çCCvoiceless palatal fricative
xxxvoiceless velar fricative
ɣQGvoiced velar fricative
χXXvoiceless uvular fricative
ʁg"Rvoiced uvular fricative
ħHX\voiceless pharyngeal fricative
hhhvoiceless glottal fricative
ɦh<?>h\voiced glottal fricative
wwwvoiced bilabial approximant
ʋv#v\voiced labio-dental approximant
ɹrr\voiced alveolar approximant
ɻr.r\`voiced retroflex approximant
jjjvoiced palatal approximant
ɰQM\voiced velar approximant
voiced labio-dental flap
ɾ*4voiced alveolar flap
ɽ*.r`voiced retroflex flap
ʙbB\voiced bilabial trill
rrrvoiced alveolar trill
ʀr"R\voiced uvular trill
lllvoiced alveolar lateral-approximant
ɫl5voiced alveolar lateral-approximant
ɭl.l`voiced retroflex lateral-approximant
ʎl^Lvoiced palatal lateral-approximant
ʟLL\voiced velar lateral-approximant
ə@@schwa
ɚ3@`r-coloured schwa
ɝ3@`r-coloured schwa
ɹ̩r-r\̩voiced alveolar approximant

If you see anything wrong or missing, please let me know.