Analyzing Open-domain QA

October 10, 2022 · View on GitHub

Before diving into the detail of this doc, you're strongly recommended to know some important concepts about system analyses.

In this file we describe how to analyze open-domain QA models. We will give an example using the natural_questions_comp_gen dataset, but other datasets can be analyzed in a similar way.

Data Preparation

Format of Dataset File

  • (1) datalab: if your datasets have been supported by datalab, you fortunately don't need to prepare the dataset.

  • (2) json (basically, it's a list of dictionaries with two keys: question and answers)

[
  {'question': 'who got the first nobel prize in physics', 'answers': ['Wilhelm Conrad Röntgen']},
  {'question': 'when is the next deadpool movie being released', 'answers': ['May 18 , 2018']},
  ...
]

Format of System Output File

In this task, your system outputs should be as follows:

william henry bragg
may 18, 2018
...

where each line represents one predicted answer. An example system output file is here: test.dpr.nq.txt

Let's say we have several files such as

etc. from different systems.

Performing Basic Analysis

In order to perform your basic analysis, we can run the following command:

explainaboard --task qa-open-domain --dataset natural_questions_comp_gen   --system-outputs ./data/system_outputs/qa_open_domain/test.dpr.nq.txt  > report.json

where

  • --task: denotes the task name, you can find all supported task names here
  • --system-outputs: denote the path of system outputs. Multiple one should be separated by space, for example, system1 system2
  • --dataset:denotes the dataset name
  • report.json: the generated analysis file with json format. You can find the file here. Tips: use a json viewer like this one for better interpretation.