TritonBench

March 3, 2025 · View on GitHub

TritonBench features two distinct channels: TritonBench-G and TritonBench-T, each with its own evaluation framework. For detailed information, refer to the paper TRITONBENCH: Benchmarking Large Language Model Capabilities for Generating Triton Operators.

Data

TritonBench-G offers two versions of Alpaca-format instructions:
- Simple instruction: TritonBench_G_simp_alpac_v1.json
- Complex instruction: TritonBench_G_comp_alpac_v1.json
It also includes executable folders (TritonBench_G_v1) and associated statistics (TritonBench_G_v1.json).
TritonBench-T offers two versions of Alpaca-format instructions:
- Simple instruction: TritonBench_T_simp_alpac_v1.json
- Complex instruction: TritonBench_T_comp_alpac_v1.json
It also includes executable folders (TritonBench_T_v1) and associated statistics (TritonBench_T_v1.json).
Additionally, there are two sets of filtered GitHub data:
- train_crawl.json (4024 entries) – de-duplicated using BERT score similarity.
- train_synth.json (4133 entries) – data synthesized using Jiuci.
The combined 8k dataset can be used for RAG (Retrieval-Augmented Generation).

LLM Generated

We also provide the output results from all major models used in the paper.

Python Environment

triton = 3.1.0
torch >= 2.5.1
After installation, update the py_interpreter paths in eval_G and eval_T.

Evaluation Process

TritonBench-G

Code Similarity Evaluation: First, use CodeBLEU to evaluate code similarity. For detailed instructions, refer to ../readme_4similarity.md.
Execution Accuracy:
- Run 0_call_acc.py with the following command:
```
0_call_acc.py --source source/path/or/folder --target target/path/or/folder --GPUs [0,1,2,3]
```
- Multiple GPUs can accelerate the execution.

Execution Performance:

Run 1_exe_acc.py with:

1_exe_acc.py --folder root/of/multiple/folders/or/folder --GPUs [0,1,2,3]

Efficiency:

First run the correctly executable operators and get the performance:

cd performance_metrics/perf_G
python run_bench/write_file.py --input_folder_path /folder/of/pyfiles --results_path /folder/of/output/results
python run_bench/multiprocess_gpu_run.py

Finally, run 2_efficiency.py to evaluate the performance:

cd EVAL/eval_G
python 2_efficiency.py --gen_folder /folder/of/output/results

TritonBench-T

For TritonBench-T, there is no code similarity evaluation. Only call accuracy, execution accuracy, and speedup are assessed. The process is similar:

Run 0_call_acc.py as above:

0_call_acc.py --source source/path/or/folder --target target/path/or/folder --GPUs [0,1,2,3]

Run 1_exe_acc.py with the appropriate folders and GPUs:

1_exe_acc.py --folder root/of/multiple/folders/or/folder --GPUs [0,1,2,3]

Get the performance and evaluate

First run the correctly executable operators and get the performance:

cd performance_metrics/perf_T
python run_bench/write_file.py --input_folder_path /folder/of/pyfiles --results_path /folder/of/output/results
python run_bench/multiprocess_gpu_run.py

Finally, run 2_efficiency.py to evaluate the performance:

cd EVAL/eval_T
python 2_efficiency.py --gen_folder /folder/of/output/results

Note: Ensure that accuracy and efficiency evaluations are performed sequentially.

Hugging face

We have published our dataset on Hugging Face.

📩 Contact Us

If you have any questions, feel free to reach out to us at:
✉️ Email: [qshi9510@gmail.com]