Analyzing Open-domain QA
October 10, 2022 · View on GitHub
Before diving into the detail of this doc, you're strongly recommended to know some important concepts about system analyses.
In this file we describe how to analyze open-domain QA models. We will give an example using the natural_questions_comp_gen dataset, but other datasets can be analyzed in a similar way.
Data Preparation
Format of Dataset File
-
(1)
datalab: if your datasets have been supported by datalab, you fortunately don't need to prepare the dataset. -
(2)
json(basically, it's a list of dictionaries with two keys:questionandanswers)
[
{'question': 'who got the first nobel prize in physics', 'answers': ['Wilhelm Conrad Röntgen']},
{'question': 'when is the next deadpool movie being released', 'answers': ['May 18 , 2018']},
...
]
Format of System Output File
In this task, your system outputs should be as follows:
william henry bragg
may 18, 2018
...
where each line represents one predicted answer. An example system output file is here: test.dpr.nq.txt
Let's say we have several files such as
etc. from different systems.
Performing Basic Analysis
In order to perform your basic analysis, we can run the following command:
explainaboard --task qa-open-domain --dataset natural_questions_comp_gen --system-outputs ./data/system_outputs/qa_open_domain/test.dpr.nq.txt > report.json
where
--task: denotes the task name, you can find all supported task names here--system-outputs: denote the path of system outputs. Multiple one should be separated by space, for example, system1 system2--dataset:denotes the dataset namereport.json: the generated analysis file with json format. You can find the file here. Tips: use a json viewer like this one for better interpretation.