new_dataset.md
March 24, 2024 ยท View on GitHub
Adding New Datasets
- Introducing Official Datasets
- First, place the official dataset files in the
RawData/{dataset_name}directory. This is the first step in preparing data, ensuring the integrity and accessibility of the original data. Please note, it's recommended to use "-" instead of "_" indataset_nameandtask_name.
- First, place the official dataset files in the
- Writing the make_dataset.py Script
- Create a new folder for your dataset in the
datasets/directory,datasets/{dataset_name}/. - Refer to the make_dataset introduction tutorial to write the corresponding data creation logic in
datasets/{dataset_name}/make_dataset.py. - Running the
make_dataset.pyscript will convert the raw data into the format required by the UltraEval framework. After conversion, all tasks in your dataset will appear in thedatasets/{dataset_name}/data/directory as UltraEval format.jsonlfiles. These files will be named after the task names, in the formattask_name.jsonl.
- Create a new folder for your dataset in the