NLP for Punjabi

August 7, 2020 · View on GitHub

This repository contains State of the Art Language models and Classifier for Punjabi language (spoken in Indian sub-continent)

Dataset

Architecture/Dataset	Punjabi Wikipedia Articles
ULMFiT	24.40
TransformerXL	14.03

Dataset	Accuracy	MCC	Notebook to Reproduce results
IndicNLP News Article Classification Dataset - Punjabi	97.12	96.17	Link

Architecture	Visualization
ULMFiT	Embeddings projection
TransformerXL	Embeddings projection

Architecture	Visualization
ULMFiT	Encodings projection

Download pretrained Language Models from here

Unsupervised training using Google's sentencepiece

Download the trained model and vocabulary from here