Neural GEC Systems with Unsupervised Pre-Training on Synthetic Data

September 7, 2019 ยท View on GitHub

This repository contains models, system configurations and outputs of our winning GEC systems in the BEA 2019 shared task described in R. Grundkiewicz, M. Junczys-Dowmunt, K. Heafield: Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data, BEA 2019.

Citation

@inproceedings{grundkiewicz-etal-2019-neural,
    title = "Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data",
    author = "Grundkiewicz, Roman  and
        Junczys-Dowmunt, Marcin  and
        Heafield, Kenneth",
    booktitle = "Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/W19-4427",
    pages = "252--263",
}

Content

  • systems - original GEC systems developed for and submitted to the restricted and low-resource tracks
  • outputs - corrected output and evaluation scores for common GEC test sets
  • training - updated training scripts re-producing our GEC system

See README files in each subdirectory for more information. In case of any questions, please open an issue or send me (Roman) an email.