README.md
March 20, 2025 ยท View on GitHub
๐ BigO(Bench) Dataset ๐
๐ Overview
๐ Getting Started with the data (back to top) (back to root)
The data is available as a Huggingface dataset. You can directly download it from the HuggingFace website, or use the CLI
cd /data
huggingface-cli download facebook/BigOBench --repo-type dataset --local-dir ./temp_dir
mv ./temp_dir/data/* .
rm -r ./temp_dir
You will find 5 files, whose content is detailed below. In addition, example_input_format_1.jsonl and example_input_format_2.jsonl are toy exampled to be used for the complexity framework, as detailed in the module src/complexity.
๐ฅ problem_and_human_solutions_list.jsonl (back to top) (back to root)
This gathers the general information about the coding problems and human solutions on which BigO(Bench) is built, from the problem descriptions to the public and private tests. There is also a lot of metadata used for postprocessing the results of BigO(Bench) and further analyze where models are strong and where they struggle.
In addition, you will find added data from BigO(Bench), first in the field dataclass that shares added value about the inputs of a problem, and the code of the dataclass corresponding to this problem as generated by ourselves using a LLM. Some metadata from the complexity framework is also available in complexity_framework, such as the fail rates and inputs metadata as parsed by the framework itself (which can differ from what the dataclass parsed):
-
dataclass.input_type_listgives the list of arguments, listed as their data type, as inferred by a LLM. This comes along the dataclass code, that is also inferred by the LLM. The LLM uses the problem description and a reference solution to try to understand the data types. This field was used to create filters on the problems and solutions, to create the base dataset of BigO(Bench). -
complexity_framework.measures_set_id_to_input_properties.framework_input_typeis instead the data type of each argument as inferred by the framework. The framework uses the dataclass code generated by the LLM to split the input stream (string that represents all the inputs of the problem all together), and then each input is parsed into a data type using rules. This means that sometimes a LLM can correctly understand that there are two arguments, but mistake them for string arguments, whereas the framework will use the LLM-generated dataclass to split the input stream into the two arguments, but using rules will correctly infer that each argument is an integer. To understand fully the complexity framework outputs, use this field. The previous one was only used for filters on the base Code Contests dataset, but were not used within the complexity framework itself to generate the complexity output.
problem_and_human_solutions_list: dict list
problem_id: strproblem_name: strdescription: dicttext: stris_description_translated: booluntranslated_text: str
correct_solution_list: dict listsolution_id: strsolution_code: str
data_source: strsource_specific_limits: dicttime_limit: dictseconds: intnanos: int
memory_limit_bytes: int
codeforces_specific_metadata: dictcf_contest_id: intcf_index: strcf_points: floatcf_rating: intcf_tags: str listdifficulty: str
tests: dictpublic_tests: dict listinput: stroutput: str
private_tests: dict listinput: stroutput: str
generated_tests: dict listinput: stroutput: str
human_accuracy_rate: floatdataclass: dictdataclass_code: strinput_type_list: str listnumber_inputs: int
complexity_framework: dicttime_complexity_fail_ratespace_complexity_fail_ratetime_or_space_complexity_fail_ratemeasures_set_id_to_input_properties: dict- (measures_set_id) str: dict
input_id: strframework_input_type: strinput_dimension: int
- (measures_set_id) str: dict
๐ฅ complexity_labels_light.jsonl (back to top) (back to root)
Light outputs of the complexity framework, as detailed in the module src/complexity, when run on all problems and solutions from problem_and_human_solutions_list.jsonl.
complexity_labels_light: dict list
problem_id: strproblem_name: strsolution_id: strtime_complexity_inferred: strspace_complexity_inferred: strtime_curve_coefficient: floatspace_curve_coefficient: float
๐ฅ complexity_labels_full.jsonl (back to top) (back to root)
Full outputs of the complexity framework, as detailed in the module src/complexity, when run on all problems and solutions from problem_and_human_solutions_list.jsonl.
complexity_labels_full_n-m: dict list
problem_id: strproblem_name: strsolution_id: strtime_complexity_inferred: strspace_complexity_inferred: strtime_curve_coefficient: floatspace_curve_coefficient: floatquery_dataclass_code: strquery_code: strquery_inputs_example: strruntime_measures: dict listmeasures_set_id: strmeasures_per_expansion_multiplier: dict listexpansion_multiplier: intmeasures_per_expansion_method: dict listvalue_list: float listexpansion_method: strmeasures_set_id_list: str listmeasures_priority: int
memory_footprint_measures: dict listmeasures_set_id: strmeasures_per_expansion_multiplier: dict listexpansion_multiplier: intmeasures_per_expansion_method: dict listvalue_list: float listexpansion_method: strmeasures_set_id_list: str listmeasures_priority: int
๐ฅ time_complexity_test_set.jsonl (back to top) (back to root)
The time complexity test set is made out of 311 problems and 640 corresponding solutions covering 11 different classes (the most represented ones being O(n), O(n.log(n)), O(n2), O(1), O(n รm) and the least represented O((n + m)log(n + m))).
It was created from problem_and_human_solutions_list.jsonl and the complexity framework outputs on this dataset, complexity_labels_full.jsonl. Filtering was applied to nail down the test set of problems and solutions.
time_complexity_test_set: dict list
problem_name: strproblem_id: strsolution_id: strdescription: strsolution_code: strdataclass_code: strinputs_example: strtime_complexity_inferred: strtime_curve_coefficient: floattests: dictpublic_tests: dict listinput: stroutput: str
private_tests: dict listinput: stroutput: str
generated_tests: dict listinput: stroutput: str
problem_time_curve_coefficient_list: float list
๐ฅ space_complexity_test_set.jsonl (back to top) (back to root)
The space complexity test set consists in 308 problems and 636 solutions covering 5 different classes (by order of popularity O(n), O(1), O(n**2), O(n + m), O(nรm)).
It was created from problem_and_human_solutions_list.jsonl and the complexity framework outputs on this dataset, complexity_labels_full.jsonl. Filtering was applied to nail down the test set of problems and solutions.
space_complexity_test_set: dict list
problem_name: strproblem_id: strsolution_id: strdescription: strsolution_code: strdataclass_code: strinputs_example: strspace_complexity_inferred: strspace_curve_coefficient: floattests: dictpublic_tests: dict listinput: stroutput: str
private_tests: dict listinput: stroutput: str
generated_tests: dict listinput: stroutput: str
problem_space_curve_coefficient_list: float list
License (back to top) (back to root)
The majority of BigO(Bench) is licensed under CC-BY-NC (see LICENCE), however portions of the project are available under separate license terms: https://github.com/pberkes/big_O is licensed under the BSD-3 license.
๐ Citation (back to top) (back to root)
If you find our project useful and/or are using its data, please cite our paper:
@misc{chambon2025bigobenchllmsgenerate,
title={BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?},
author={Pierre Chambon and Baptiste Roziere and Benoit Sagot and Gabriel Synnaeve},
year={2025},
eprint={2503.15242},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.15242},
}