Evaluation for CodeApex

September 4, 2023 ยท View on GitHub

Open our benchmark website for evaluation.

1. Register

Click register button. Input your username (unique for everyone), password, confirmed password, and email, and then click "Send Verifiction Code" button and wait for email verifiction code (valid in 5 minute). Input code and register.

2. Login

After registering, you should click login button to sign in.

3. Leaderboard

The leaderboard page is a leaderboard, which contains a Programming Comprehension Leaderboard in Chinese and a Programming Comprehension Leaderboard in English. In these two leaderboards, CU score means Conceptual Understanding scores, CR score for Commonsense Reasoning, and MCR stands for Multi-hop Reasoning. The Code Generatation Leaderboards are also completed.

4. Submit

1) Answer Format

Users should be responsible for the correctness and compliance of their inputs. The format of answer generated by LLM is json, and the json file is divided into three dictionaries in order, representing the answers for CU, CR, and MCR, with each dictionary's answers sorted by ID within the dictionary. We provide a example.json.

Your input should be a npy file containing your answer to the testcases, and run the deal_answer.py to generate the json file for evaluation.

python deal_answer.py

2) How to submit

You should login and click submit page. First click "choose file" button, and then input the name, author and description of your model name. Then click upload button, then click process butto, and the score of your model will appear. You will have a daily limit of 5 submissions.

Due to time constraints, there are some issues with the website. Please bear with us.