yabloco-benchmark
January 24, 2025 ยท View on GitHub
Directories
bench
The benchmark tables for test and dev sets accompanied by json-files with links from every function to functions that it calls (next).
data_extraction
The pipeline used for collecting the benchmark.
tablescontains tables with test coverage, json-files with function call graphs, repository statistics and a docstring labeling judgement,commit_dates.pyextracts commit dates for functions from repositories,db_stat.pycollects a table of functions from a function call graph,merge_commits.pymerges commit dates into the tables,pipeline.shruns aforementioned steps to produce a table for repository,merge_test_cov.pymerges test coverage hits into the tables,generate_benchmark.pyscript produces benchmark and dev tables.
finetune
Notebooks and dockerfiles used to fine-tune models on training set. Training script is based on fsdp_qlora project.
streamlit_app
A streamlit took to evaluate models and visualize generated and original code as well as result tables. The directory contains a separate README file with instructions to run.
train_data
prevcontains json-files with links from every function to functions that call it, produced from function call graphs,train_functionscontains tables of all train functions (similar to tables inbench) used for fine-tuning,db_stat_prev.pyscript producesprevtables,generate.pyscript produces train tables.
Cite
YABLoCo: Yet Another Benchmark for Long Context Code Generation