README.md

November 13, 2016 ยท View on GitHub

CMUDict encoded in IPA

File is tab separated and found in cmudict.ipa

Notes:

  • Parenthesis deleted for words with multiple pronounciations
  • Emphasis deleted
  • Split into 10% dev, 10% test, 80% test data set in datasets/

Can modify mappings found in arpa-ipa.map. Mappings taken from wikipedia: https://en.wikipedia.org/wiki/Arpabet