Hindi

January 6, 2021 ยท View on GitHub

Chunking

ModelDev accuracyTest F1Paper / SourceCode
Dalal et al. (2006)87.4082.40Hindi Part-of-Speech Tagging and Chunking: A Maximum Entropy Approach

Part-of-speech tagging

ModelDev accuracyTest F1Paper / SourceCode
Jha et al. (2018)99.3099.06Multi-Task Deep Morphological Analyzer: Context-Aware Joint Morphological Tagging and Lemma Predictionmt-dma
Dalal et al. (2006)89.3582.22Hindi Part-of-Speech Tagging and Chunking: A Maximum Entropy Approach

Machine Translation

The IIT Bombay English-Hindi Parallel Corpus used by Kunchukuttan et al. (2018) can be accessed here. A live leaderboard involving more directions involving Hindi can be accessed at the evaluation website for the Workshop on Asian Translation.

Hindi -> English

ModelBLEUPaper / SourceCode
Philip et al. (2020)24.85Revisiting Low Resource Status of Indian Languages in MTilmulti
Siripragada et al. (2020)22.91A Multilingual Parallel Corpora Collection Effort for Indian Languagesilmulti
Goyal et al. (2019)19.06LTRC-MT Simple & Effective Hindi-English Neural Machine Translation Systems at WAT 2019

English -> Hindi

ModelBLEUPaper / SourceCode
Philip et al. (2018)21.57CVIT-MT Systems for WAT-2018
Philip et al. (2020)21.20Revisiting Low Resource Status of Indian Languages in MTilmulti
Saini et al. (2018)18.215Neural Machine Translation for English to Hindi

G2P Conversion

Schwa Deletion

Due to diachronic processes the inherent vowel of Hindi (the schwa, automatically applied to consonants that have no other vowel diacritic or vowel-killer diacritic attached) is sometimes dropped in pronunciation despite being present in the orthography. This process is known as schwa deletion. There are no known linguistic rules that can consistently and accurately predict what happens to the inherent vowel in speech. Thus, this is an open problem in the field.

Each paper below has used different datasets. The dataset for Arora et al. (2020) is the largest of all, extracted from the Oxford Hindi-English Dictionary, and future work should ideally compare against that dataset.

ModelSchwa-level accuracyWord-level accuracyPaper / SourceCode
Arora et al. (2020)98.0097.78Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabischwa-deletion
Tyson and Nagar (2009)95.00Prosodic rules for schwa-deletion in hindi text-to-speech synthesis
Narasimhan et al. (2004)88.97Schwa-Deletion in Hindi Text-to-Speech Synthesis
Choudhury et al. (2004)99.89A Diachronic Approach for Schwa Deletion in Indo Aryan Languages