README.md

March 20, 2025 ยท View on GitHub

๐Ÿ“Š BigO(Bench) Dataset ๐Ÿ“Š

๐Ÿ‘‹ Overview

๐Ÿ“‹ Getting Started with the data (back to top) (back to root)

The data is available as a Huggingface dataset. You can directly download it from the HuggingFace website, or use the CLI

cd /data
huggingface-cli download facebook/BigOBench --repo-type dataset --local-dir ./temp_dir
mv ./temp_dir/data/* .
rm -r ./temp_dir

You will find 5 files, whose content is detailed below. In addition, example_input_format_1.jsonl and example_input_format_2.jsonl are toy exampled to be used for the complexity framework, as detailed in the module src/complexity.

๐Ÿ”ฅ problem_and_human_solutions_list.jsonl (back to top) (back to root)

This gathers the general information about the coding problems and human solutions on which BigO(Bench) is built, from the problem descriptions to the public and private tests. There is also a lot of metadata used for postprocessing the results of BigO(Bench) and further analyze where models are strong and where they struggle.

In addition, you will find added data from BigO(Bench), first in the field dataclass that shares added value about the inputs of a problem, and the code of the dataclass corresponding to this problem as generated by ourselves using a LLM. Some metadata from the complexity framework is also available in complexity_framework, such as the fail rates and inputs metadata as parsed by the framework itself (which can differ from what the dataclass parsed):

  • dataclass.input_type_list gives the list of arguments, listed as their data type, as inferred by a LLM. This comes along the dataclass code, that is also inferred by the LLM. The LLM uses the problem description and a reference solution to try to understand the data types. This field was used to create filters on the problems and solutions, to create the base dataset of BigO(Bench).

  • complexity_framework.measures_set_id_to_input_properties.framework_input_type is instead the data type of each argument as inferred by the framework. The framework uses the dataclass code generated by the LLM to split the input stream (string that represents all the inputs of the problem all together), and then each input is parsed into a data type using rules. This means that sometimes a LLM can correctly understand that there are two arguments, but mistake them for string arguments, whereas the framework will use the LLM-generated dataclass to split the input stream into the two arguments, but using rules will correctly infer that each argument is an integer. To understand fully the complexity framework outputs, use this field. The previous one was only used for filters on the base Code Contests dataset, but were not used within the complexity framework itself to generate the complexity output.

problem_and_human_solutions_list: dict list

  • problem_id: str
  • problem_name: str
  • description: dict
    • text: str
    • is_description_translated: bool
    • untranslated_text: str
  • correct_solution_list: dict list
    • solution_id: str
    • solution_code: str
  • data_source: str
  • source_specific_limits: dict
    • time_limit: dict
      • seconds: int
      • nanos: int
    • memory_limit_bytes: int
  • codeforces_specific_metadata: dict
    • cf_contest_id: int
    • cf_index: str
    • cf_points: float
    • cf_rating: int
    • cf_tags: str list
    • difficulty: str
  • tests: dict
    • public_tests: dict list
      • input: str
      • output: str
    • private_tests: dict list
      • input: str
      • output: str
    • generated_tests: dict list
      • input: str
      • output: str
  • human_accuracy_rate: float
  • dataclass: dict
    • dataclass_code: str
    • input_type_list: str list
    • number_inputs: int
  • complexity_framework: dict
    • time_complexity_fail_rate
    • space_complexity_fail_rate
    • time_or_space_complexity_fail_rate
    • measures_set_id_to_input_properties: dict
      • (measures_set_id) str: dict
        • input_id: str
        • framework_input_type: str
        • input_dimension: int

๐Ÿ”ฅ complexity_labels_light.jsonl (back to top) (back to root)

Light outputs of the complexity framework, as detailed in the module src/complexity, when run on all problems and solutions from problem_and_human_solutions_list.jsonl.

complexity_labels_light: dict list

  • problem_id: str
  • problem_name: str
  • solution_id: str
  • time_complexity_inferred: str
  • space_complexity_inferred: str
  • time_curve_coefficient: float
  • space_curve_coefficient: float

๐Ÿ”ฅ complexity_labels_full.jsonl (back to top) (back to root)

Full outputs of the complexity framework, as detailed in the module src/complexity, when run on all problems and solutions from problem_and_human_solutions_list.jsonl.

complexity_labels_full_n-m: dict list

  • problem_id: str
  • problem_name: str
  • solution_id: str
  • time_complexity_inferred: str
  • space_complexity_inferred: str
  • time_curve_coefficient: float
  • space_curve_coefficient: float
  • query_dataclass_code: str
  • query_code: str
  • query_inputs_example : str
  • runtime_measures: dict list
    • measures_set_id: str
    • measures_per_expansion_multiplier: dict list
      • expansion_multiplier: int
      • measures_per_expansion_method: dict list
        • value_list: float list
        • expansion_method: str
        • measures_set_id_list: str list
        • measures_priority: int
  • memory_footprint_measures: dict list
    • measures_set_id: str
    • measures_per_expansion_multiplier: dict list
      • expansion_multiplier: int
      • measures_per_expansion_method: dict list
        • value_list: float list
        • expansion_method: str
        • measures_set_id_list: str list
        • measures_priority: int

๐Ÿ”ฅ time_complexity_test_set.jsonl (back to top) (back to root)

The time complexity test set is made out of 311 problems and 640 corresponding solutions covering 11 different classes (the most represented ones being O(n), O(n.log(n)), O(n2), O(1), O(n ร—m) and the least represented O((n + m)log(n + m))). It was created from problem_and_human_solutions_list.jsonl and the complexity framework outputs on this dataset, complexity_labels_full.jsonl. Filtering was applied to nail down the test set of problems and solutions.

time_complexity_test_set: dict list

  • problem_name: str
  • problem_id: str
  • solution_id: str
  • description: str
  • solution_code: str
  • dataclass_code: str
  • inputs_example: str
  • time_complexity_inferred: str
  • time_curve_coefficient: float
  • tests: dict
    • public_tests: dict list
      • input: str
      • output: str
    • private_tests: dict list
      • input: str
      • output: str
    • generated_tests: dict list
      • input: str
      • output: str
  • problem_time_curve_coefficient_list: float list

๐Ÿ”ฅ space_complexity_test_set.jsonl (back to top) (back to root)

The space complexity test set consists in 308 problems and 636 solutions covering 5 different classes (by order of popularity O(n), O(1), O(n**2), O(n + m), O(nร—m)). It was created from problem_and_human_solutions_list.jsonl and the complexity framework outputs on this dataset, complexity_labels_full.jsonl. Filtering was applied to nail down the test set of problems and solutions.

space_complexity_test_set: dict list

  • problem_name: str
  • problem_id: str
  • solution_id: str
  • description: str
  • solution_code: str
  • dataclass_code: str
  • inputs_example: str
  • space_complexity_inferred: str
  • space_curve_coefficient: float
  • tests: dict
    • public_tests: dict list
      • input: str
      • output: str
    • private_tests: dict list
      • input: str
      • output: str
    • generated_tests: dict list
      • input: str
      • output: str
  • problem_space_curve_coefficient_list: float list

License (back to top) (back to root)

The majority of BigO(Bench) is licensed under CC-BY-NC (see LICENCE), however portions of the project are available under separate license terms: https://github.com/pberkes/big_O is licensed under the BSD-3 license.

๐Ÿ“ Citation (back to top) (back to root)

If you find our project useful and/or are using its data, please cite our paper:

@misc{chambon2025bigobenchllmsgenerate,
      title={BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?}, 
      author={Pierre Chambon and Baptiste Roziere and Benoit Sagot and Gabriel Synnaeve},
      year={2025},
      eprint={2503.15242},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.15242}, 
}