README.md
December 10, 2024 ยท View on GitHub

ย
๐ How to Contribute
Contributions are welcome! If you have any resources, tools, papers, or insights related to Code LLMs, feel free to submit a pull request. Let's work together to make this project better!
ย
News
- ๐ฅ๐ฅ๐ฅ [2024-11-12] Qwen2.5-Coder series are released, offering six model sizes (0.5B, 1.5B, 3B, 7B, 14B, 32B), with Qwen2.5-Coder-32B-Instruct now the most powerful open-source code model.
- ๐ฅ๐ฅ [2024-11-08] OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models is released.
ย
๐งต Table of Contents
- ๐งต Table of Contents
- ๐ Top Code LLMs
- ๐ก Evaluation Toolkit
- ๐ Awesome Code LLMs Leaderboard
- ๐ Awesome Code LLMs Papers
- ๐ Contributors
- Cite as
- Acknowledgement
- Star History
ย
๐ Top Code LLMs
Sort by HumanEval Pass@1
ย
๐ก Evaluation Toolkit:
- bigcode-evaluation-harness: A framework for the evaluation of autoregressive code generation language models.
- code-eval: A framework for the evaluation of autoregressive code generation language models on HumanEval.
- SandboxFusion: A secure sandbox for running and judging code generated by LLMs.
ย
๐ Awesome Code LLMs Leaderboard
| Leaderboard | Description |
|---|---|
| Evalperf Leaderboard | Evaluating LLMs for Efficient Code Generation. |
| Aider Code Editing Leaderboard | Measuring the LLMโs coding ability, and whether it can write new code that integrates into existing code. |
| BigCodeBench Leaderboard | BigCodeBench evaluates LLMs with practical and challenging programming tasks. |
| LiveCodeBench Leaderboard | Holistic and Contamination Free Evaluation of Large Language Models for Code. |
| Big Code Models Leaderboard | Compare performance of base multilingual code generation models on HumanEval benchmark and MultiPL-E. |
| BIRD Leaderboard | BIRD contains over 12,751 unique question-SQL pairs, 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc. |
| CanAiCode Leaderboard | CanAiCode Leaderboard |
| Coding LLMs Leaderboard | Coding LLMs Leaderboard |
| CRUXEval Leaderboard | CRUXEval is a benchmark complementary to HumanEval and MBPP measuring code reasoning, understanding, and execution capabilities! |
| EvalPlus Leaderboard | EvalPlus evaluates AI Coders with rigorous tests. |
| InfiBench Leaderboard | InfiBench is a comprehensive benchmark for code large language models evaluating model ability on answering freeform real-world questions in the code domain. |
| InterCode Leaderboard | InterCode is a benchmark for evaluating language models on the interactive coding task. Given a natural language request, an agent is asked to interact with a software system (e.g., database, terminal) with code to resolve the issue. |
| Program Synthesis Models Leaderboard | They created this leaderboard to help researchers easily identify the best open-source model with an intuitive leadership quadrant graph. They evaluate the performance of open-source code models to rank them based on their capabilities and market adoption. |
| Spider Leaderboard | Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases. |
ย
๐ Awesome Code LLMs Papers
๐ Awesome Code Pre-Training Papers
ย
๐ณ Awesome Code Instruction-Tuning Papers
| Title | Venue | Date | Code | Resources |
|---|---|---|---|---|
Magicoder: Source Code Is All You Need | ICML'24 | 2023.12 | Github | HF |
OctoPack: Instruction Tuning Code Large Language Models | ICLR'24 | 2023.08 | Github | HF |
WizardCoder: Empowering Code Large Language Models with Evol-Instruct | Preprint | 2023.07 | Github | HF |
Code Alpaca: An Instruction-following LLaMA Model trained on code generation instructions | Preprint | 2023.xx | Github | HF |
ย
๐ฌ Awesome Code Alignment Papers
| Title | Venue | Date | Code | Resources |
|---|---|---|---|---|
| ProSec: Fortifying Code LLMs with Proactive Security Alignment | Preprint | 2024.11 | - | - |
| PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models | Preprint | 2024.06 | - | - |
| PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback | Preprint | 2023.07 | - | - |
RLTF: Reinforcement Learning from Unit Test Feedback | Preprint | 2023.07 | Github | - |
Execution-based Code Generation using Deep Reinforcement Learning | TMLR'23 | 2023.01 | Github | - |
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning | NeurIPS'22 | 2022.07 | Github | - |
ย
๐ Awesome Code Prompting Papers
| Title | Venue | Date | Code | Resources |
|---|---|---|---|---|
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging | Preprint | 2024.10 | Github | - |
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs | AAAI'25 | 2024.06 | Github | - |
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step | ACL'24 | 2024.02 | Github | - |
| SelfEvolve: A Code Evolution Framework via Large Language Models | Preprint | 2023.06 | - | - |
Demystifying GPT Self-Repair for Code Generation | ICLR'24 | 2023.06 | Github | - |
| Teaching Large Language Models to Self-Debug | ICLR'24 | 2023.06 | - | - |
LEVER: Learning to Verify Language-to-Code Generation with Execution | ICML'23 | 2023.02 | Github | - |
Coder Reviewer Reranking for Code Generation | ICML'23 | 2022.11 | Github | - |
CodeT: Code Generation with Generated Tests | ICLR'23 | 2022.07 | Github | - |
ย
๐ Awesome Code Benchmark & Evaluation Papers
ย
๐ Contributors
This is an active repository and your contributions are always welcome! If you have any question about this opinionated list, do not hesitate to contact me huybery@gmail.com.
ย
Cite as
@software{awesome-code-llm,
author = {Binyuan Hui, Lei Zhang},
title = {An awesome and curated list of best code-LLM for research},
howpublished = {\url{https://github.com/huybery/Awesome-Code-LLM}},
year = 2023,
}
ย
Acknowledgement
This project is inspired by Awesome-LLM.
ย