Answer Equivalence Dataset

October 24, 2022 · View on GitHub

This dataset is introduced and described in Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation.

Download the data

AE Split	# AE Examples	# Ratings
Train	9,090	9,090
Dev	2,734	4,446
Test	5,831	9,724
Total	17,655	23,260

Split by system	# AE Examples	# Ratings
BiDAF dev predictions	5622	7522
XLNet dev predictions	2448	7932
Luke dev predictions	2240	4590
Total	8,565	14,170

BERT Matching (BEM) model

The BEM model from the paper, finetuned on this dataset, is available on tfhub.

This colab demonstrates how to use it.

How to cite AE?

@article{bulian-etal-2022-tomayto,
  author    = {Jannis Bulian and
		Christian Buck  and
		Wojciech Gajewski and
		Benjamin B{\"o}rschinger and
		Tal Schuster},
  title     = {Tomayto, Tomahto. Beyond Token-level Answer Equivalence 
               for Question Answering Evaluation},
  journal   = {CoRR},
  volume    = {abs/2202.07654},
  year      = {2022},
  ee        = {http://arxiv.org/abs/2202.07654},
}

Disclaimer

This is not an official Google product.

Contact information

For help or issues, please submit a GitHub issue or contact the authors by email.