RAGChecker Benchmark

December 3, 2024 · View on GitHub

Please take the following steps to get the benchmark dataset.

Please login to BioASQ, then do the following:

In Datasets for task a, download allMeSH_2022.zip throuth the entry of Training v.2022 (txt) in the table. Unzip it to raw_data/bioasq/allMeSH_2022.json. Note that this JSON file is of 27G large, please make sure you have enough disk space.
In Datasets for task b, download files throuth the links column Test data in the table from 2014 to 2023. Unzip the files and put the JSON files into the folder raw_data/bioasq:
- {2~9}B{1~5}_golden.json
- 10B{1~6}_golden.json
- 11B{1~4}_golden.json

Get access to NovelQA dataset: https://huggingface.co/datasets/NovelQA/NovelQA . Then login your huggingface account:

pip install huggingface_hub
huggingface-cli login

Run the following script, the benchmark dataset will be processed to the folder processed_data:

sh data_process.sh