Bangla Benchmark runs

January 4, 2021 · View on GitHub

Code: https://colab.research.google.com/drive/1vltPI81atzRvlALv4eCvEB0KdFoEaCOb?usp=sharing

Can these scores be improved? YES!

Rerunning with more training data, more epochs of training, or using other libraries to set a learning rate / other hyperparameters before training.

Experimenting with epochs - when I doubled the number of epochs, MuRIL improves only slightly (69.5->69.7 on one task)

The point of a benchmark is to run these models through a reasonable and identical process; you can tweak hyperparameters on any model to improve results.

Bolding #1 score and other models within 1 percentage point of winner:

Model	+/- Sentiment	Hate Speech	News Topic
random	50.0	20.0	16.7
mBERT	68.1	52.3	72.3
Bangla-ELECTRA	69.2	31.0	82.3
Bangla-BERT	70.4	71.8	89.2
neuralspace-reverie	68.6	73.1	88.9
Indic-BERT	71.2	42.1	88.4
MuRIL	69.5	72.1	88.9

Revised hate speech csv / split

Model	Hate Speech v2
random	16.7
mBERT	50.9
Bangla-ELECTRA	34.3
Bangla-BERT	69.1
neuralspace-reverie	76.3
Indic-BERT	59.1
MuRIL	62.0