CircEvent

September 10, 2021 · View on GitHub

This repository contains the source code for the publication of EMNLP 2021 findings Incorporating Circumstance into Narrative Event Prediction. You can follow the instructions below to reproduce our experiments.

Dataset

Our experiments are conducted on the New York Times (NYT) portion of the English Gigawords. You can get access from the official website. The data split we used is provided by Granroth-Wilding[1] We annotate the raw documents based on Lee[2] with the standford CoreNLP toolkit. The configuration of CoreNLP is listed in corenlp.props file.

Environment Setup

We conducted our experiments with on a workstation with a RTX 2080Ti, 64GB Memory. Our programs are tested under PyTorch 1.8.1 + CUDA 10.2.

Setup Python environment. We encourage using conda to setup the python virtual environment. conda create -n circ python==3.8 && conda activate circ
Install the CUDA toolkit and Pytorch. conda install cudatoolkit=10.2 && pip install torch==1.8.1+cu102 -f https://download.pytorch.org/whl/torch_stable.html
Install the pip packages. pip install -r requirements.txt
Install the circumst_event package pip install -e .

Now the environment has been set up in the circ virtual environment.

Reproduce Steps

The source codes can be divided into two parts, i.e. data preprocessing and model training. The entry scripts are placed in bin folder. Each step and its corresponding script is listed below.

extract text out of gigaword xml file. 1-extract_gigaword_nyt
annotate text with CoreNLP. 2-corenlp_annotate
extract event chain from annotated document. 3-extract_event_chain
convert event chain words to ids. 4-index_event_chain
split into train, validation, test set. 5-split_dataset
train the circ model. 6-circ_train
evaluate the saved model. 7-circ_eval

Reference

[1] Mark Granroth-Wilding and Stephen Clark. 2016. What happens next? Event Prediction Using a Compositional Neural Network Model. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pages 2727–2733, Phoenix, Arizona, February. AAAI Press.

[2] I-Ta Lee and Dan Goldwasser. 2019. Multi-Relational Script Learning for Discourse Relations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4214–4226, Florence, Italy, July. Association for Computational Linguistics.