Bangla Benchmark runs

January 4, 2021 ยท View on GitHub

Code: https://colab.research.google.com/drive/1vltPI81atzRvlALv4eCvEB0KdFoEaCOb?usp=sharing

Can these scores be improved? YES!

Rerunning with more training data, more epochs of training, or using other libraries to set a learning rate / other hyperparameters before training.

  • Experimenting with epochs - when I doubled the number of epochs, MuRIL improves only slightly (69.5->69.7 on one task)

The point of a benchmark is to run these models through a reasonable and identical process; you can tweak hyperparameters on any model to improve results.

Bolding #1 score and other models within 1 percentage point of winner:

Model+/- SentimentHate SpeechNews Topic
random50.020.016.7
mBERT68.152.372.3
Bangla-ELECTRA69.231.082.3
Bangla-BERT70.471.889.2
neuralspace-reverie68.673.188.9
Indic-BERT71.242.188.4
MuRIL69.572.188.9

Revised hate speech csv / split

ModelHate Speech v2
random16.7
mBERT50.9
Bangla-ELECTRA34.3
Bangla-BERT69.1
neuralspace-reverie76.3
Indic-BERT59.1
MuRIL62.0