Evaluation for CodeApex
September 4, 2023 ยท View on GitHub
Open our benchmark website for evaluation.
1. Register
Click register button. Input your username (unique for everyone), password, confirmed password, and email, and then click "Send Verifiction Code" button and wait for email verifiction code (valid in 5 minute). Input code and register.
2. Login
After registering, you should click login button to sign in.
3. Leaderboard
The leaderboard page is a leaderboard, which contains a Programming Comprehension Leaderboard in Chinese and a Programming Comprehension Leaderboard in English. In these two leaderboards, CU score means Conceptual Understanding scores, CR score for Commonsense Reasoning, and MCR stands for Multi-hop Reasoning. The Code Generatation Leaderboards are also completed.
4. Submit
1) Answer Format
Users should be responsible for the correctness and compliance of their inputs. The format of answer generated by LLM is json, and the json file is divided into three dictionaries in order, representing the answers for CU, CR, and MCR, with each dictionary's answers sorted by ID within the dictionary. We provide a example.json.
Your input should be a npy file containing your answer to the testcases, and run the deal_answer.py to generate the json file for evaluation.
python deal_answer.py
2) How to submit
You should login and click submit page. First click "choose file" button, and then input the name, author and description of your model name. Then click upload button, then click process butto, and the score of your model will appear. You will have a daily limit of 5 submissions.
Due to time constraints, there are some issues with the website. Please bear with us.