Readme.md

February 13, 2019 ยท View on GitHub

Instructions

[THIS REPOSITORY IS UNDER DEVELOPMENT AND MOER DATASETS AND MODELS WILL BE ADDED]

[FEEL FREE TO MAKE PULL REQUEST FOR A NEW DATASET OR NEW MODEL]

1. Requirements

  • CUDA 9.0
  • Python 3.6
  • bash setup.sh

Run setup.sh to download the datasets and install all the required packages.

Run prepare_datasets.py notebook to prepare the datasets.

For instruction regarding running each model go the respective model directory.

The models directory holds the result of these experiments.

Bert 20NG Confusion MatrixBert 20NG Sankey Plot
20 Newsgroup Confusion Matrix20 Newsgroup Sankey Plot

2. Results

2.1 BERT

Bert (MXNet)F1-scorePrecisionRecallAccuracyError Rate
20ng91.2491.4691.1391.048.96
IMDB88.5988.6188.6288.611.4
Reuters 21578 (R8)94.3893.6295.6498.121.88
Reuters 21578 (R52)73.8073.4876.0196.353.65
Ohsumed (all docs)70.4573.9768.8479.3020.70
Ohsumed (first 20k docs)56.5261.4956.0471.0428.96

2.2 ULMFit

ULMFitF1-scorePrecisionRecallAccuracyError Rate
20ng92.8793.0292.8292.827.18
IMDB91.9291.9691.9691.928.08
Reuters 21578 (R8)94.7994.0796.1298.181.82
Reuters 21578 (R52)73.7775.4775.9696.433.57
Ohsumed (all docs)74.8275.0175.4781.9618.04
Ohsumed (first 20k docs)43.7644.4645.4962.537.5