BERTforGAP-coreference
April 22, 2019 ยท View on GitHub
This project was realised in the context of the INF8225 AI course. In this project, we aim to reduce gender bias in pronoun resolution by creating a coreference resolver that performs well on a gender-balanced pronoun dataset, the Gendered Ambiguous Pronouns (GAP) dataset. We leverage BERT's strong pre-training tasks on large unsupervised datasets and transfer these contextual representations to the fine-tuning stage. The fine-tuning stage was trained in a SWAG-like manner on the GAP supervised dataset.
We have submitted our best performing model to the Gendered Pronoun Resolution Kaggle competition.
Setting up
git clone --recursive git@github.com:isabellebouchard/BERT_for_GAP-coreference.git
Make sure the submodules are properly initialized.
First steps
To run the code, first install Docker to be able to build and run a docker container with all the proper dependencies installed
docker build -t IMAGE_NAME .
nvidia-docker run --rm -it -v /path/to/your/code/:/project IMAGE_NAME
If you don't have access to GPU, change nvidia-docker for docker. It is
highly recommended to run the training on (multiple) GPUs.
Once inside the container you should be able run the training script:
python run_GAP.py --data_dir gap-coreference \
--bert_model bert-base-cased \
--output_dir results \
This will run the training script and save checkpoints of the best model in the output directory.