LIT: auto-generator for contrast set

March 4, 2024 · View on GitHub

This is the code for out EMNLP 2020 paper: Linguistically-Informed Transformations (LIT): A Method forAutomatically Generating Contrast Sets

Processed Dataset

We released the datasets used in our paper.
- Note: orignal+basic means (original [SNLI/MNLI] + basic [e.g. i;i] augmentation). Nothing like +pa is included, as that means compositional transformation.
We will also release the MRS parses with which people can transform sentences by their defined perturbation.

Note: This is not complete parallel datasets of the original SNLI and MNLI. There are some sentences missing because the parser sometimes can’t parse the representation. You might need to run your transformation on some missed out data.

Environment setup

conda env create -f environment.yml
conda activate lit

Structure of Repo

transfer: module that contain all functions we mentioned in our paper. Within it, :

README.md gives a detailed documentation of current config of our parser.
transfer_example.py is an illustrative example of how to use our parser.
transfer_snli_parallel.py is the script we used (some local modification needs to be made) to parse SNLI in parallel. Parallel processing is strongly encouraged

post-process: after processing the dataset, you need some cleaning of the parsed dataset to put in the right form.

making_sense.py contains choices of sentence selectors in scoring different generated surface sentences
process.py contains functions that:
select generated sentences
apply defined rules to generate contrast set

Citation

Processed Dataset

Environment setup

Structure of Repo