Deep-EOS PyTorch

October 25, 2019 · View on GitHub

Introduction

This is a pytorch implementation of "Deep-EOS: General-Purpose Neural Networks for Sentence Boundary Detection" (Schweter et al, 2019). Go here for the original Keras implementation: https://github.com/stefan-it/deep-eos

Currently only the LSTM and Bi-LSTM models are implemented.

The dataset download script and dataset creation methods are largely copied from Stefan Schweters repository.

Results

Test set

LanguagePyTorch LSTMPyTorch Bi-LSTMOriginal LSTMOriginal Bi-LSTMOriginal CNN
German0.97630.97630.97500.97600.9751
English0.98620.98620.98610.98600.9858
Bulgarian0.98930.98910.99220.99230.9919
Bosnian0.99170.99190.99570.99590.9953
Greek0.99580.99570.99670.99690.9963
Croatian0.99210.99180.99460.99480.9943
Macedonian0.97720.97860.98100.98110.9794
Romanian0.98750.98740.99070.99060.9904
Albanian0.99220.99200.99530.99490.9940
Serbian0.98430.98380.98770.98770.9870
Turkish0.98240.98290.98580.98540.9854

Download

The trained models can be downloaded in the releases section. The each model 7zip archive contains the best scoring model (on development data) out of 5 epochs, alongside with its corresponding vocabulary.

The datasets can be downloaded with the download_data.sh script from the original implementation.

To-Do

  • Implementation:
    • LSTM
    • Bi-LSTM
    • CNN

References

  • S. Schweter and S. Ahmed, "Deep-EOS: General-Purpose Neural Networks for Sentence Boundary Detection” in Proceedings of the 15th Conference on Natural Language Processing (KONVENS), 2019.