Analyzing Open-domain QA

October 10, 2022 · View on GitHub

Before diving into the detail of this doc, you're strongly recommended to know some important concepts about system analyses.

In this file we describe how to analyze open-domain QA models. We will give an example using the natural_questions_comp_gen dataset, but other datasets can be analyzed in a similar way.

Data Preparation

Format of `Dataset` File

(1) datalab: if your datasets have been supported by datalab, you fortunately don't need to prepare the dataset.
(2) json (basically, it's a list of dictionaries with two keys: question and answers)

[
  {'question': 'who got the first nobel prize in physics', 'answers': ['Wilhelm Conrad Röntgen']},
  {'question': 'when is the next deadpool movie being released', 'answers': ['May 18 , 2018']},
  ...
]

Format of `System Output` File

In this task, your system outputs should be as follows:

william henry bragg
may 18, 2018
...

where each line represents one predicted answer. An example system output file is here: test.dpr.nq.txt

Let's say we have several files such as

gpt2.json

etc. from different systems.

Performing Basic Analysis

In order to perform your basic analysis, we can run the following command:

explainaboard --task qa-open-domain --dataset natural_questions_comp_gen   --system-outputs ./data/system_outputs/qa_open_domain/test.dpr.nq.txt  > report.json

where

--task: denotes the task name, you can find all supported task names here
--system-outputs: denote the path of system outputs. Multiple one should be separated by space, for example, system1 system2
--dataset:denotes the dataset name
report.json: the generated analysis file with json format. You can find the file here. Tips: use a json viewer like this one for better interpretation.

Data Preparation

Format of Dataset File

Format of System Output File

Performing Basic Analysis

Format of `Dataset` File

Format of `System Output` File