README.md

March 20, 2025 · View on GitHub

📊 BigO(Bench) Dataset 📊

👋 Overview

📋 Getting Started with the data
🔥 problem_and_human_solutions_list.jsonl
🔥 complexity_labels_light.jsonl
🔥 complexity_labels_full.jsonl
🔥 time_complexity_test_set.jsonl
🔥 space_complexity_test_set.jsonl
License
📝 Citation

📋 Getting Started with the data _⁽back to top) (back to root)

The data is available as a Huggingface dataset. You can directly download it from the HuggingFace website, or use the CLI

cd /data
huggingface-cli download facebook/BigOBench --repo-type dataset --local-dir ./temp_dir
mv ./temp_dir/data/* .
rm -r ./temp_dir

You will find 5 files, whose content is detailed below. In addition, example_input_format_1.jsonl and example_input_format_2.jsonl are toy exampled to be used for the complexity framework, as detailed in the module src/complexity.

🔥 `problem_and_human_solutions_list.jsonl` _⁽back to top) (back to root)

This gathers the general information about the coding problems and human solutions on which BigO(Bench) is built, from the problem descriptions to the public and private tests. There is also a lot of metadata used for postprocessing the results of BigO(Bench) and further analyze where models are strong and where they struggle.

In addition, you will find added data from BigO(Bench), first in the field dataclass that shares added value about the inputs of a problem, and the code of the dataclass corresponding to this problem as generated by ourselves using a LLM. Some metadata from the complexity framework is also available in complexity_framework, such as the fail rates and inputs metadata as parsed by the framework itself (which can differ from what the dataclass parsed):

dataclass.input_type_list gives the list of arguments, listed as their data type, as inferred by a LLM. This comes along the dataclass code, that is also inferred by the LLM. The LLM uses the problem description and a reference solution to try to understand the data types. This field was used to create filters on the problems and solutions, to create the base dataset of BigO(Bench).
complexity_framework.measures_set_id_to_input_properties.framework_input_type is instead the data type of each argument as inferred by the framework. The framework uses the dataclass code generated by the LLM to split the input stream (string that represents all the inputs of the problem all together), and then each input is parsed into a data type using rules. This means that sometimes a LLM can correctly understand that there are two arguments, but mistake them for string arguments, whereas the framework will use the LLM-generated dataclass to split the input stream into the two arguments, but using rules will correctly infer that each argument is an integer. To understand fully the complexity framework outputs, use this field. The previous one was only used for filters on the base Code Contests dataset, but were not used within the complexity framework itself to generate the complexity output.

problem_and_human_solutions_list: dict list

problem_id: str
problem_name: str
description: dict
- text: str
- is_description_translated: bool
- untranslated_text: str
correct_solution_list: dict list
- solution_id: str
- solution_code: str
data_source: str
source_specific_limits: dict
- time_limit: dict
  - seconds: int
  - nanos: int
- memory_limit_bytes: int
codeforces_specific_metadata: dict
- cf_contest_id: int
- cf_index: str
- cf_points: float
- cf_rating: int
- cf_tags: str list
- difficulty: str
tests: dict
- public_tests: dict list
  - input: str
  - output: str
- private_tests: dict list
  - input: str
  - output: str
- generated_tests: dict list
  - input: str
  - output: str
human_accuracy_rate: float
dataclass: dict
- dataclass_code: str
- input_type_list: str list
- number_inputs: int
complexity_framework: dict
- time_complexity_fail_rate
- space_complexity_fail_rate
- time_or_space_complexity_fail_rate
- measures_set_id_to_input_properties: dict
  - (measures_set_id) str: dict
    - input_id: str
    - framework_input_type: str
    - input_dimension: int

🔥 `complexity_labels_light.jsonl` _⁽back to top) (back to root)

Light outputs of the complexity framework, as detailed in the module src/complexity, when run on all problems and solutions from problem_and_human_solutions_list.jsonl.

complexity_labels_light: dict list

problem_id: str
problem_name: str
solution_id: str
time_complexity_inferred: str
space_complexity_inferred: str
time_curve_coefficient: float
space_curve_coefficient: float

🔥 `complexity_labels_full.jsonl` _⁽back to top) (back to root)

Full outputs of the complexity framework, as detailed in the module src/complexity, when run on all problems and solutions from problem_and_human_solutions_list.jsonl.

complexity_labels_full_n-m: dict list

problem_id: str
problem_name: str
solution_id: str
time_complexity_inferred: str
space_complexity_inferred: str
time_curve_coefficient: float
space_curve_coefficient: float
query_dataclass_code: str
query_code: str
query_inputs_example : str
runtime_measures: dict list
- measures_set_id: str
- measures_per_expansion_multiplier: dict list
  - expansion_multiplier: int
  - measures_per_expansion_method: dict list
    - value_list: float list
    - expansion_method: str
    - measures_set_id_list: str list
    - measures_priority: int
memory_footprint_measures: dict list
- measures_set_id: str
- measures_per_expansion_multiplier: dict list
  - expansion_multiplier: int
  - measures_per_expansion_method: dict list
    - value_list: float list
    - expansion_method: str
    - measures_set_id_list: str list
    - measures_priority: int

🔥 `time_complexity_test_set.jsonl` _⁽back to top) (back to root)

The time complexity test set is made out of 311 problems and 640 corresponding solutions covering 11 different classes (the most represented ones being O(n), O(n.log(n)), O(n2), O(1), O(n ×m) and the least represented O((n + m)log(n + m))). It was created from problem_and_human_solutions_list.jsonl and the complexity framework outputs on this dataset, complexity_labels_full.jsonl. Filtering was applied to nail down the test set of problems and solutions.

time_complexity_test_set: dict list

problem_name: str
problem_id: str
solution_id: str
description: str
solution_code: str
dataclass_code: str
inputs_example: str
time_complexity_inferred: str
time_curve_coefficient: float
tests: dict
- public_tests: dict list
  - input: str
  - output: str
- private_tests: dict list
  - input: str
  - output: str
- generated_tests: dict list
  - input: str
  - output: str
problem_time_curve_coefficient_list: float list

🔥 `space_complexity_test_set.jsonl` _⁽back to top) (back to root)

The space complexity test set consists in 308 problems and 636 solutions covering 5 different classes (by order of popularity O(n), O(1), O(n**2), O(n + m), O(n×m)). It was created from problem_and_human_solutions_list.jsonl and the complexity framework outputs on this dataset, complexity_labels_full.jsonl. Filtering was applied to nail down the test set of problems and solutions.

space_complexity_test_set: dict list

problem_name: str
problem_id: str
solution_id: str
description: str
solution_code: str
dataclass_code: str
inputs_example: str
space_complexity_inferred: str
space_curve_coefficient: float
tests: dict
- public_tests: dict list
  - input: str
  - output: str
- private_tests: dict list
  - input: str
  - output: str
- generated_tests: dict list
  - input: str
  - output: str
problem_space_curve_coefficient_list: float list

License _⁽back to top) (back to root)

The majority of BigO(Bench) is licensed under CC-BY-NC (see LICENCE), however portions of the project are available under separate license terms: https://github.com/pberkes/big_O is licensed under the BSD-3 license.

📝 Citation _⁽back to top) (back to root)

If you find our project useful and/or are using its data, please cite our paper:

@misc{chambon2025bigobenchllmsgenerate,
      title={BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?}, 
      author={Pierre Chambon and Baptiste Roziere and Benoit Sagot and Gabriel Synnaeve},
      year={2025},
      eprint={2503.15242},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.15242}, 
}

📊 BigO(Bench) Dataset 📊

👋 Overview

📋 Getting Started with the data (back to top) (back to root)

🔥 problem_and_human_solutions_list.jsonl (back to top) (back to root)

🔥 complexity_labels_light.jsonl (back to top) (back to root)

🔥 complexity_labels_full.jsonl (back to top) (back to root)

🔥 time_complexity_test_set.jsonl (back to top) (back to root)

🔥 space_complexity_test_set.jsonl (back to top) (back to root)

License (back to top) (back to root)

📝 Citation (back to top) (back to root)