Evaluating German Transformer Language Models with Syntactic Agreement Tests

July 21, 2020 · View on GitHub

Code and data for the paper by Karolina Zaczynska, Nils Feldhus, Robert Schwarzenberg, Aleksandra Gabryszak and Sebastian Möller: https://arxiv.org/abs/2007.03765
It originally appeared in the proceedings of the Swiss Text Analytics Conference & Conference on Natural Language Processing (KONVENS) 2020: http://ceur-ws.org/Vol-2624/paper7.pdf
We recommend to refer to the more recent arXiv publication, because it includes minor adjustments.

Data

See the data folder README for more information.

Requirements

  • jsonlines==1.2.0
  • nltk==3.4.5
  • overrides==2.8.0
  • torch==1.4.0
  • tqdm==4.43.0
  • transformers==2.5.1
  • Pattern==3.6

Experiments

Run tests with LMs

Execute python run_probing_experiment.py with the following flags:

  • --input : [Required] Path to the input jsonl (directory or file). Please choose from the directories in data/input, e.g. data/input/SimplSent (whole directory) or data/input/SimplSent/SimplSent_pl.jsonl (single file).
  • --output_dir : Path to the output of the experiment, by default data/output/. This will create a sub-folder to the one set by --output_dir with a name according to the language model identifier set by --lm. In here, you will find another sub-folder with the name corresponding to the case. That folder contains the .jsonl file(s), e.g. data/output/bert-base-german-dbmdz-cased/SimplSent/SimplSent_pl.jsonl.
  • --lm : [Required] Language model identifier, i.e. either bert-base-german-dbmdz-cased (our paper: gBERT_large) or distilbert-base-german-cased (our paper: gBERT_small)
  • --verbose : If set, it's printing data processing steps and results in detail

Run evaluation on test outputs to produce accuracy scores

Execute python evaluation.py with the following flags:

  • --path : [Required] Path to the output directory with .jsonl files, e.g. data/output/bert-base-german-dbmdz-cased/SimplSent/
  • --lm : [Required] Language model identifier, i.e. either bert-base-german-dbmdz-cased (our paper: gBERT_large) or distilbert-base-german-cased (our paper: gBERT_small). This is for loading the correct tokenizer.
  • --sum_up_cases : If set, it takes all .jsonl files in the directory set by --path and display one result for all sub-cases instead of calculating them separately.