arxiv.md

July 24, 2025 ยท View on GitHub

Updated on 2025.07.24

Table of Contents
  1. Code Summarization/Understanding
  2. Code Generation/Completion
  3. Program Repair
  4. Automated Debugging/Bug Localization
  5. Bug/Vulnerability Detection
  6. Fuzzing/Testing
  7. Clone Detection
  8. Clone Search
  9. Code Review

Code Summarization/Understanding

TitleArXiv LinkGitHub LinkLast Update
On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization2507.16587None2025-07-22
EVALOOP: Assessing LLM Robustness in Programming from a Self-consistency Perspective2505.12185None2025-07-14
Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding2507.12482None2025-07-14
Turning the Tide: Repository-based Code Reflection2507.09866None2025-07-14
Can LLMs Replace Humans During Code Chunking?2506.19897None2025-06-24
Re-Evaluating Code LLM Benchmarks Under Semantic Mutation2506.17369None2025-06-20
MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization2406.18379None2025-06-17
Evaluating Large Language Models on Non-Code Software Engineering Tasks2506.10833https://github.com/aieng-lab/senlp-benchmark2025-06-12
Evaluating LLMs Effectiveness in Detecting and Correcting Test Smells: An Empirical Study2506.07594https://github.com/ts-group-icse26/testsmells.llms.study-replication.package-icse262025-06-09
Rethinking the effects of data contamination in Code Intelligence2506.02791None2025-06-08
LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models2505.14759None2025-06-08
Can Large Language Models Understand Intermediate Representations in Compilers?2502.06854None2025-06-05
Utilizing Precise and Complete Code Context to Guide LLM in Automatic False Positive Mitigation2411.03079None2025-05-31
The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs2504.11711https://github.com/seclab-ucr/buglens2025-05-31
An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks2505.20854None2025-05-27
DocAgent: A Multi-Agent System for Automated Code Documentation Generation2504.08725https://github.com/facebookresearch/docagent2025-05-23
CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations2504.14119None2025-05-23
A Qualitative Investigation into LLM-Generated Multilingual Code Comments and Automatic Evaluation Metrics2505.15469None2025-05-21
Capturing the Effects of Quantization on Trojans in Code LLMs2505.14200None2025-05-20
Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?2505.10443None2025-05-15
Variational Prefix Tuning for Diverse and Accurate Code Summarization Using Pre-trained Language Models2505.09062https://github.com/jundaz/VPT2025-05-14
BinMetric: A Comprehensive Binary Analysis Benchmark for Large Language Models2505.07360None2025-05-12
A Systematic Literature Review of Parameter-Efficient Fine-Tuning for Large Code Models2504.21569https://github.com/alvi75/slr-peft2025-05-09
Large Language Models as Robust Data Generators in Software Analytics: Are We There Yet?2411.10565None2025-05-05
An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding2504.21803None2025-04-30
CoCo-Bench: A Comprehensive Code Benchmark For Multi-task Large Language Model Evaluation2504.20673None2025-04-29
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks2504.19444None2025-04-28
Do Automatic Comment Generation Techniques Fall Short? Exploring the Influence of Method Dependencies on Code Understanding2504.19459None2025-04-28
Context-Enhanced Vulnerability Detection Based on Large Language Model2504.16877None2025-04-23
LRASGen: LLM-based RESTful API Specification Generation2504.16833None2025-04-23
Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering2502.06193None2025-04-21

Code Generation/Completion

TitleArXiv LinkGitHub LinkLast Update
Adaptive Graph Pruning for Multi-Agent Communication2506.02951None2025-07-23
On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization2507.16587None2025-07-22
Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems2506.06821None2025-07-22
Improving Code LLM Robustness to Prompt Perturbations via Layer-Aware Model Editing2507.16407None2025-07-22
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2507.14111None2025-07-22
LOCOFY Large Design Models -- Design to code conversion solution2507.16208None2025-07-22
3LM: Bridging Arabic, STEM, and Code through Benchmarking2507.15850None2025-07-22
ShadowCode: Towards (Automatic) External Prompt Injection Attack against Code LLMs2407.09164None2025-07-22
GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities2507.12367None2025-07-21
Compositional Coordination for Multi-Robot Teams with Large Language Models2507.16068None2025-07-21
Autocomp: LLM-Driven Code Optimization for Tensor Accelerators2505.18574None2025-07-21
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR2507.15778None2025-07-21
Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training2507.15640None2025-07-21
DHEvo: Data-Algorithm Based Heuristic Evolution for Generalizable MILP Solving2507.15615None2025-07-21
ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution2507.15501None2025-07-21
Understanding the Design Decisions of Retrieval-Augmented Generation Systems2411.19463None2025-07-21
SimdBench: Benchmarking Large Language Models for SIMD-Intrinsic Code Generation2507.15224None2025-07-21
Survey of GenAI for Automotive Software Development: From Requirements to Executable Code2507.15025None2025-07-20
Doc2Chart: Intent-Driven Zero-Shot Chart Generation from Documents2507.14819None2025-07-20
VeriOpt: PPA-Aware High-Quality Verilog Generation via Multi-Role LLMs2507.14776None2025-07-20
Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations2507.14688None2025-07-19
Enhancing LLM Code Generation with Ensembles: A Similarity-Based Selection Approach2503.15838None2025-07-18
Combinatorial Optimization for All: Using LLMs to Aid Non-Experts in Improving Optimization Algorithms2503.10968None2025-07-18
On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding2505.12723None2025-07-18
ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle2507.12674None2025-07-18
CoDet-M4: Detecting Machine-Generated Code in Multi-Lingual, Multi-Generator and Multi-Domain Settings2503.13733None2025-07-17
Detecting LLM-generated Code with Subtle Modification by Adversarial Training2507.13123None2025-07-17
ReCode: Updating Code API Knowledge with Reinforcement Learning2506.20495None2025-07-17
CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance2507.10646None2025-07-17
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?2507.12415None2025-07-16
Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization2507.12308None2025-07-16

Program Repair

TitleArXiv LinkGitHub LinkLast Update
Do AI models help produce verified bug fixes?2507.15822None2025-07-21
Input Reduction Enhanced LLM-based Program Repair2507.15251None2025-07-21
LLM-Based Detection of Tangled Code Changes for Higher-Quality Method-Level Bug Datasets2505.08263None2025-07-19
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?2507.12415None2025-07-16
Accelerating Automatic Program Repair with Dual Retrieval-Augmented Fine-Tuning and Patch Generation on Large Language Models2507.10103None2025-07-14
LLMCup: Ranking-Enhanced Comment Updating with LLMs2507.08671None2025-07-11
Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs2507.03659None2025-07-04
CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark2507.05281None2025-07-04
CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks2507.05269None2025-07-03
APRMCTS: Improving LLM-based Automated Program Repair with Iterative Tree Search2507.01827None2025-07-02
Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench2507.02976None2025-06-30
A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications2506.23749None2025-06-30
Repair Ingredients Are All You Need: Improving Large Language Model-Based Program Repair via Repair Ingredients Search2506.23100None2025-06-29
T3T^3: Multi-level Tree-based Automatic Program Repair with Large Language Models2506.21211None2025-06-26
Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories2506.18824None2025-06-23
The Impact of Input Order Bias on Large Language Models for Software Fault Localization2412.18750None2025-06-23
Tracing Errors, Constructing Fixes: Repository-Level Memory Error Repair via Typestate-Guided Context Retrieval2506.18394None2025-06-23
Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems2506.17208None2025-06-20
SemAgent: A Semantics Aware Program Repair Agent2506.16650None2025-06-19
ChatDBG: Augmenting Debugging with Large Language Models2403.16354https://github.com/plasma-umass/chatdbg2025-06-19
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development2505.16975https://github.com/dorothyduuu/swe-dev2025-06-19
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation2503.06680None2025-06-19
Empirical Evaluation of Large Language Models in Automated Program Repair2506.13186None2025-06-16
The Foundation Cracks: A Comprehensive Study on Bugs and Testing Practices in LLM Libraries2506.12320None2025-06-14
Identifying Helpful Context for LLM-based Vulnerability Repair: A Preliminary Study2506.11561None2025-06-13
An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications2404.11050https://github.com/mohannadcse/alloyspecrepair2025-06-12
Towards Understanding Bugs in Distributed Training and Inference Frameworks for Large Language Models2506.10426None2025-06-12
Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning2409.18395None2025-06-11
Automated Repair of Ambiguous Natural Language Requirements2505.07270https://github.com/msv-lab/specfix2025-06-07
CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics2411.17274https://github.com/yikun-li/cleanvul2025-06-07
MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair2408.09568None2025-06-06

Automated Debugging/Bug Localization

TitleArXiv LinkGitHub LinkLast Update
VeriRAG: A Retrieval-Augmented Framework for Automated RTL Testability Repair2507.15664None2025-07-21
Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding2507.12482None2025-07-14
Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs2507.03659None2025-07-04
T3T^3: Multi-level Tree-based Automatic Program Repair with Large Language Models2506.21211None2025-06-26
Black-Box Test Code Fault Localization Driven by Large Language Models and Execution Estimation2506.19045None2025-06-23
The Impact of Input Order Bias on Large Language Models for Software Fault Localization2412.18750None2025-06-23
BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning2407.17631https://zenodo.org/record/151229802025-06-22
Improving Compiler Bug Isolation by Leveraging Large Language Models2506.17647None2025-06-21
Towards Understanding Bugs in Distributed Training and Inference Frameworks for Large Language Models2506.10426None2025-06-12
TTrace: Lightweight Error Checking and Diagnosis for Distributed Training2506.09280None2025-06-10
Understanding Software Engineering Agents Through the Lens of Traceability: An Empirical Study2506.08311None2025-06-10
Improving LLM-Based Fault Localization with External Memory and Project Context2506.03585None2025-06-04
When Deep Learning Meets Information Retrieval-based Bug Localization: A Survey2505.00144None2025-04-30
How Accurately Do Large Language Models Understand Code?2504.04372None2025-04-09
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs2504.04030None2025-04-05
Improved IR-based Bug Localization with Intelligent Relevance Feedback2501.10542https://github.com/asifsamir/brain2025-03-27
A Multi-Agent Approach to Fault Localization via Graph-Based Retrieval and Reflexion2409.13642None2025-03-19
AgentFL: Scaling LLM-based Fault Localization to Project-Level Context2403.16362None2025-02-24
Bridging Bug Localization and Issue Fixing: A Hierarchical Localization Framework Leveraging Large Language Models2502.15292None2025-02-21
Aligning the Objective of LLM-based Program Repair2404.08877https://github.com/cuhk-shenzhen-se/d4c2025-02-21
Where's the Bug? Attention Probing for Scalable Fault Localization2502.13966None2025-02-20
FlexFL: Flexible and Effective Fault Localization with Open-Source Large Language Models2411.10714None2025-02-18
COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis2408.05006https://github.com/neuir/coast2025-02-12
Fault Localization via Fine-tuning Large Language Models with Mutation Generated Stack Traces2501.18005None2025-02-11
Simulated Interactive Debugging2501.09694None2025-01-16
Improving the Ability of Pre-trained Language Model by Imparting Large Language Model's Experience2408.08553None2025-01-15
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds2501.06706None2025-01-12
Counterexample Guided Program Repair Using Zero-Shot Learning and MaxSAT-based Fault Localization2502.07786None2024-12-19
Enhancing IR-based Fault Localization using Large Language Models2412.03754None2024-12-04
Identifying Root Causes of Null Pointer Exceptions with Logical Inferences2412.01005None2024-12-01
BDefects4NN: A Backdoor Defect Database for Controlled Localization Studies in Neural Networks2412.00746None2024-12-01

Bug/Vulnerability Detection

TitleArXiv LinkGitHub LinkLast Update
When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs2507.16773None2025-07-22
Meta-Learning for Cold-Start Personalization in Prompt-Tuned LLMs2507.16672None2025-07-22
LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models2507.16585None2025-07-22
CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection2501.04510None2025-07-21
BugScope: Learn to Find Bugs Like Human2507.15671None2025-07-21
StaAgent: An Agentic Framework for Testing Static Analyzers2507.15892None2025-07-20
LLM-Based Detection of Tangled Code Changes for Higher-Quality Method-Level Bug Datasets2505.08263None2025-07-19
LLAMA: Multi-Feedback Smart Contract Fuzzing Framework with LLM-Guided Seed Generation2507.12084None2025-07-16
Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding2507.12482None2025-07-14
Turning the Tide: Repository-based Code Reflection2507.09866None2025-07-14
White-Basilisk: A Hybrid Model for Code Vulnerability Detection2507.08540None2025-07-11
ETrace:Event-Driven Vulnerability Detection in Smart Contracts via LLM-Based Trace Analysis2506.15790None2025-07-08
Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization2507.03051None2025-07-03
CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks2507.05269None2025-07-03
Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench2507.02976None2025-06-30
SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models2506.20415None2025-06-25
VulStamp: Vulnerability Assessment using Large Language Model2506.11484None2025-06-25
FuncVul: An Effective Function Level Vulnerability Detection Model using LLM and Code Chunk2506.19453None2025-06-24
Smart-LLaMA-DPO: Reinforced Large Language Model for Explainable Smart Contract Vulnerability Detection2506.18245None2025-06-23
LASA: Enhancing SoC Security Verification with LLM-Aided Property Generation2506.17865None2025-06-22
SAVANT: Vulnerability Detection in Application Dependencies through Semantic-Guided Reachability Analysis2506.17798None2025-06-21
Identifying Helpful Context for LLM-based Vulnerability Repair: A Preliminary Study2506.11561None2025-06-13
Expert-in-the-Loop Systems with Cross-Domain and In-Domain Few-Shot Learning for Software Vulnerability Detection2506.10104None2025-06-11
Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning2409.18395None2025-06-11
A First Look at Bugs in LLM Inference Engines2506.09713https://github.com/infbug/bugs-in-llm-inference-engines2025-06-11
Large Language Models for Multilingual Vulnerability Detection: How Far Are We?2506.07503https://github.com/spanshu96/large-language-model-for-multilingual-vulnerability-detection2025-06-09
Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data2506.07390https://github.com/xin-cheng-wen/po4vul2025-06-09
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning2401.16185None2025-06-07
ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data2408.16028None2025-06-01
The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs2504.11711https://github.com/seclab-ucr/buglens2025-05-31
LPASS: Linear Probes as Stepping Stones for vulnerability detection using compressed LLMs2505.24451None2025-05-30

Fuzzing/Testing

TitleArXiv LinkGitHub LinkLast Update
Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent2507.16799None2025-07-23
WAKENLLM: Evaluating Reasoning Potential and Stability in LLMs via Fine-Grained Benchmarking2507.16199None2025-07-23
LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs2507.16809None2025-07-22
ChatChecker: A Framework for Dialogue System Testing and Evaluation Through Non-cooperative User Simulation2507.16792None2025-07-22
LangBiTe: A Platform for Testing Bias in Large Language Models2404.18558None2025-07-22
Universal Model Routing for Efficient LLM Inference2502.08773None2025-07-22
Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems2506.06821None2025-07-22
ACT: Bridging the Gap in Code Translation through Synthetic Data Generation & Adaptive Training2507.16478None2025-07-22
Talking Like a Phisher: LLM-Based Attacks on Voice Phishing Classifiers2507.16291None2025-07-22
Time to Split: Exploring Data Splitting Strategies for Offline Evaluation of Sequential Recommenders2507.16289None2025-07-22
Towards Compute-Optimal Many-Shot In-Context Learning2507.16217None2025-07-22
LOCOFY Large Design Models -- Design to code conversion solution2507.16208None2025-07-22
SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting2507.16145None2025-07-22
GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities2507.12367None2025-07-21
Efficient Compositional Multi-tasking for On-device Large Language Models2507.16083None2025-07-21
Deep Researcher with Test-Time Diffusion2507.16075None2025-07-21
AutoMeet: a proof-of-concept study of genAI to automate meetings in automotive engineering2507.16054None2025-07-21
FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs2507.15839None2025-07-21
LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra2507.15815None2025-07-21
True Multimodal In-Context Learning Needs Attention to the Visual Context2507.15807None2025-07-21
Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning2507.15788None2025-07-21
Detecting Benchmark Contamination Through Watermarking2502.17259None2025-07-21
BugScope: Learn to Find Bugs Like Human2507.15671None2025-07-21
CoordField: Coordination Field for Agentic UAV Task Allocation In Low-altitude Urban Scenarios2505.00091None2025-07-21
RankMixer: Scaling Up Ranking Models in Industrial Recommenders2507.15551None2025-07-21
LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning2507.15521None2025-07-21
DARE: Diverse Visual Question Answering with Robustness Evaluation2409.18023None2025-07-21
ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events2501.03040None2025-07-21
Input Reduction Enhanced LLM-based Program Repair2507.15251None2025-07-21
Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning2505.16122None2025-07-21
LibLMFuzz: LLM-Augmented Fuzz Target Generation for Black-box Libraries2507.15058None2025-07-20

Clone Detection

TitleArXiv LinkGitHub LinkLast Update
VulCoCo: A Simple Yet Effective Method for Detecting Vulnerable Code Clones2507.16661None2025-07-22
Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification2506.01631None2025-07-03
Understanding Software Engineering Agents Through the Lens of Traceability: An Empirical Study2506.08311None2025-06-10
Cross-lingual Transfer in Programming Languages: An Extensive Empirical Study2310.16937None2025-06-10
An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models2409.14644None2025-06-03
A Systematic Literature Review of Parameter-Efficient Fine-Tuning for Large Code Models2504.21569https://github.com/alvi75/slr-peft2025-05-09
From Bias To Improved Prompts: A Case Study of Bias Mitigation of Clone Detection Models2505.05679None2025-05-08
The Struggles of LLMs in Cross-lingual Code Clone Detection2408.04430https://github.com/trux-dtf/clccd2025-05-06
Large Language Models as Robust Data Generators in Software Analytics: Are We There Yet?2411.10565None2025-05-05
Improving the Ability of Pre-trained Language Model by Imparting Large Language Model's Experience2408.08553None2025-01-15
Unveiling Code Clone Patterns in Open Source VR Software: An Empirical Study2501.07165None2025-01-13
Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code2412.06757None2024-12-10
Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code2402.09299https://github.com/commissarsilver/trawic2024-10-30
In-Context Code-Text Learning for Bimodal Software Engineering2410.18107None2024-10-08
LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions2408.07321None2024-08-14
Assessing the Code Clone Detection Capability of Large Language Models2407.02402None2024-07-02
Investigating the Efficacy of Large Language Models for Code Clone Detection2401.13802https://github.com/mkhfring/largelanguagemodels2024-01-30
Greening Large Language Models of Code2309.04076https://github.com/soarsmu/Avatar2024-01-12
Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey2308.01191None2023-08-06
Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review2307.02503None2023-07-04
Understanding Programs by Exploiting (Fuzzing) Test Cases2305.13592https://github.com/rabbitjy/fuzztuning2023-06-12
TitleArXiv LinkGitHub LinkLast Update
OASIS: Order-Augmented Strategy for Improved Code Search2503.08161None2025-07-17
The Impact of LLM-Assistants on Software Developer Productivity: A Systematic Literature Review2507.03156None2025-07-03
LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models2505.14759None2025-06-08
DeepRTL2: A Versatile Model for RTL-Related Tasks2506.15697None2025-05-28
Knowledge Graph Based Repository-Level Code Generation2505.14394None2025-05-20
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks2504.19444None2025-04-28
Towards Leveraging Large Language Model Summaries for Topic Modeling in Source Code2504.17426None2025-04-24
CoSQA+: Pioneering the Multi-Choice Code Search Benchmark with Test-Driven Agents2406.11589https://github.com/DeepSoftwareAnalytics/CoSQA_Plus2025-04-11
Zero-Shot Cross-Domain Code Search without Fine-Tuning2504.07740https://github.com/zju-ctag/codebridge2025-04-10
Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets2502.20246None2025-02-28
OrcaLoca: An LLM Agent Framework for Software Issue Localization2502.00350None2025-02-01
SpecRover: Code Intent Extraction via LLMs2408.02232None2024-12-11
Fixing Security Vulnerabilities with AI in OSS-Fuzz2411.03346None2024-11-21
Are Decoder-Only Large Language Models the Silver Bullet for Code Search?2410.22240https://github.com/georgepitt/decoderllms-codesearch2024-10-29
In-the-loop Hyper-Parameter Optimization for LLM-Based Automated Design of Heuristics2410.16309None2024-10-07
No Man is an Island: Towards Fully Automatic Programming by Code Search, Code Generation and Program Repair2409.03267None2024-09-05
You Augment Me: Exploring ChatGPT-based Data Augmentation for Semantic Code Search2408.05542None2024-08-17
ViC: Virtual Compiler Is All You Need For Assembly Code Search2408.06385https://github.com/zeyugao/virtualcompiler2024-08-10
AutoCodeRover: Autonomous Program Improvement2404.05427https://github.com/nus-apr/auto-code-rover2024-07-25
RepoQA: Evaluating Long Context Code Understanding2406.06025https://github.com/evalplus/repoqa2024-06-10
Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search2401.04514https://github.com/alex-haochenli/reco2024-06-03
ACES: Generating Diverse Programming Puzzles with with Autotelic Generative Models2310.10692None2024-05-29
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models2405.11196https://github.com/gksajy/slimcode2024-05-18
REINFOREST: Reinforcing Semantic Code Similarity for Cross-Lingual Code Search Models2305.03843https://github.com/reinforest-team/reinforest2024-04-15
GenCodeSearchNet: A Benchmark Test Suite for Evaluating Generalization in Programming Language Understanding2311.09707https://github.com/drndr/gencodesearchnet2023-11-16
The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation2305.06156https://github.com/fsoft-ai4code/thevault2023-10-30
Code Representation Pre-training with Complements from Program Executions2309.09980None2023-09-04
Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work?2211.12821None2023-08-28

Code Review

TitleArXiv LinkGitHub LinkLast Update
LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning2507.16395None2025-07-22
Measuring how changes in code readability attributes affect code quality evaluation by Large Language Models2507.05289None2025-07-09
Towards Exception Safety Code Generation with Intermediate Representation Agents Framework2410.06949https://github.com/XMZhangAI/Seeker2025-07-07
An Insight into Security Code Review with LLMs: Capabilities, Obstacles, and Influential Factors2401.16310None2025-06-02
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models2503.16167None2025-05-31
Applying Large Language Models to Issue Classification: Revisiting with Extended Data and New Models2506.00128None2025-05-30
Evaluating Large Language Models for Code Review2505.20206None2025-05-26
Large Language Models in Code Co-generation for Safe Autonomous Vehicles2505.19658None2025-05-26
Knowledge-Guided Prompt Learning for Request Quality Assurance in Public Code Review2410.21673https://github.com/wut-idea/kp-pcr2025-05-20
Advancing Software Quality: A Standards-Focused Review of LLM-Based Assurance Techniques2505.13766None2025-05-19
Large Language Models as Robust Data Generators in Software Analytics: Are We There Yet?2411.10565None2025-05-05
Patched RTC: evaluating LLMs for diverse software development tasks2407.16557https://github.com/codelion/optillm/blob/main/optillm/rto.py2025-04-29
CoCo-Bench: A Comprehensive Code Benchmark For Multi-task Large Language Model Evaluation2504.20673None2025-04-29
Improving Automated Secure Code Reviews: A Synthetic Dataset for Code Vulnerability Flaws2504.16310None2025-04-22
DR.FIX: Automatically Fixing Data Races at Industry Scale2504.15637https://github.com/uber-research/drfix2025-04-22
Psycholinguistic Analyses in Software Engineering Text: A Systematic Literature Review2503.05992None2025-04-17
Bugdar: AI-Augmented Secure Code Review for GitHub Pull Requests2503.17302None2025-03-21
Measuring Determinism in Large Language Models for Software Code Review2502.20747None2025-02-28
MdEval: Massively Multilingual Code Debugging2411.02310None2025-02-24
Accountability in Code Review: The Role of Intrinsic Drivers and the Impact of LLMs2502.15963None2025-02-21