Dataset

December 1, 2022 ยท View on GitHub

  • We share the dataset split we used in our experiments: data.zip and parallel_functions.zip.
    • Download them by running bash download.sh
  • data.zip contains the AVATAR dataset. Total number of examples in the AVATAR dataset is 9515. sources:
    • CodeForces - 2193
    • AtCoder - 871
    • AIZU - 1043
    • CodeJam - 120
    • GeeksforGeeks - 5019
    • LeetCode - 107
    • ProjectEuler - 162
  • We extract a collection of parallel standalone functions (parallel_functions.zip) from AVATAR.
    • It consists of 3391 examples for training.
    • We use the validation and testing dataset collected from GeeksforGeeks released with TransCoder.