This repository contains State of the Art Language models
and Classifier for Marathi, which is spoken predominantly by
Marathi people of Maharashtra, India.
The models trained here have been used in Natural Language Toolkit for Indic Languages
(iNLTK)
-
Marathi Wikipedia Articles
-
Marathi News Dataset
- iNLTK Headlines Corpus - Marathi : Uses the Marathi News Dataset prepared above.
| Architecture/Dataset | Marathi Wikipedia Articles |
|---|
| ULMFiT | 18 |
| TransformerXL | 17.42 |
| Dataset | Accuracy | MCC | Notebook to Reproduce results |
|---|
| iNLTK Headlines Corpus - Marathi | 92.40 | 85.23 | Link |
| Dataset | Dataset size (train, valid, test) | Accuracy | MCC | Notebook to Reproduce results |
|---|
| iNLTK Headlines Corpus - Marathi | (9672, 1210, 1210) | 92.40 | 85.23 | Link |
| Dataset | Dataset size (train, valid, test) | Accuracy | MCC | Notebook to Reproduce results |
|---|
| iNLTK Headlines Corpus - Marathi | (483, 1210, 1210) | 84.13 | 68.59 | Link |
| Dataset | Dataset size (train, valid, test) | Accuracy | MCC | Notebook to Reproduce results |
|---|
| iNLTK Headlines Corpus - Marathi | (483, 1210, 1210) | 84.55 | 69.11 | Link |
Download pretrained Language Model from here
Trained tokenizer using Google's sentencepiece
Download the trained model and vocabulary from here