arxiv.md

July 24, 2025 · View on GitHub

Updated on 2025.07.24

Table of Contents

Code Summarization/Understanding
Code Generation/Completion
Program Repair
Automated Debugging/Bug Localization
Bug/Vulnerability Detection
Fuzzing/Testing
Clone Detection
Clone Search
Code Review

Code Summarization/Understanding

Title	ArXiv Link	GitHub Link	Last Update
On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization	2507.16587	None	2025-07-22
EVALOOP: Assessing LLM Robustness in Programming from a Self-consistency Perspective	2505.12185	None	2025-07-14
Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding	2507.12482	None	2025-07-14
Turning the Tide: Repository-based Code Reflection	2507.09866	None	2025-07-14
Can LLMs Replace Humans During Code Chunking?	2506.19897	None	2025-06-24
Re-Evaluating Code LLM Benchmarks Under Semantic Mutation	2506.17369	None	2025-06-20
MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization	2406.18379	None	2025-06-17
Evaluating Large Language Models on Non-Code Software Engineering Tasks	2506.10833	https://github.com/aieng-lab/senlp-benchmark	2025-06-12
Evaluating LLMs Effectiveness in Detecting and Correcting Test Smells: An Empirical Study	2506.07594	https://github.com/ts-group-icse26/testsmells.llms.study-replication.package-icse26	2025-06-09
Rethinking the effects of data contamination in Code Intelligence	2506.02791	None	2025-06-08
LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models	2505.14759	None	2025-06-08
Can Large Language Models Understand Intermediate Representations in Compilers?	2502.06854	None	2025-06-05
Utilizing Precise and Complete Code Context to Guide LLM in Automatic False Positive Mitigation	2411.03079	None	2025-05-31
The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs	2504.11711	https://github.com/seclab-ucr/buglens	2025-05-31
An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks	2505.20854	None	2025-05-27
DocAgent: A Multi-Agent System for Automated Code Documentation Generation	2504.08725	https://github.com/facebookresearch/docagent	2025-05-23
CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations	2504.14119	None	2025-05-23
A Qualitative Investigation into LLM-Generated Multilingual Code Comments and Automatic Evaluation Metrics	2505.15469	None	2025-05-21
Capturing the Effects of Quantization on Trojans in Code LLMs	2505.14200	None	2025-05-20
Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?	2505.10443	None	2025-05-15
Variational Prefix Tuning for Diverse and Accurate Code Summarization Using Pre-trained Language Models	2505.09062	https://github.com/jundaz/VPT	2025-05-14
BinMetric: A Comprehensive Binary Analysis Benchmark for Large Language Models	2505.07360	None	2025-05-12
A Systematic Literature Review of Parameter-Efficient Fine-Tuning for Large Code Models	2504.21569	https://github.com/alvi75/slr-peft	2025-05-09
Large Language Models as Robust Data Generators in Software Analytics: Are We There Yet?	2411.10565	None	2025-05-05
An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding	2504.21803	None	2025-04-30
CoCo-Bench: A Comprehensive Code Benchmark For Multi-task Large Language Model Evaluation	2504.20673	None	2025-04-29
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks	2504.19444	None	2025-04-28
Do Automatic Comment Generation Techniques Fall Short? Exploring the Influence of Method Dependencies on Code Understanding	2504.19459	None	2025-04-28
Context-Enhanced Vulnerability Detection Based on Large Language Model	2504.16877	None	2025-04-23
LRASGen: LLM-based RESTful API Specification Generation	2504.16833	None	2025-04-23
Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering	2502.06193	None	2025-04-21

Code Generation/Completion

Title	ArXiv Link	GitHub Link	Last Update
Adaptive Graph Pruning for Multi-Agent Communication	2506.02951	None	2025-07-23
On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization	2507.16587	None	2025-07-22
Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems	2506.06821	None	2025-07-22
Improving Code LLM Robustness to Prompt Perturbations via Layer-Aware Model Editing	2507.16407	None	2025-07-22
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning	2507.14111	None	2025-07-22
LOCOFY Large Design Models -- Design to code conversion solution	2507.16208	None	2025-07-22
3LM: Bridging Arabic, STEM, and Code through Benchmarking	2507.15850	None	2025-07-22
ShadowCode: Towards (Automatic) External Prompt Injection Attack against Code LLMs	2407.09164	None	2025-07-22
GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities	2507.12367	None	2025-07-21
Compositional Coordination for Multi-Robot Teams with Large Language Models	2507.16068	None	2025-07-21
Autocomp: LLM-Driven Code Optimization for Tensor Accelerators	2505.18574	None	2025-07-21
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR	2507.15778	None	2025-07-21
Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training	2507.15640	None	2025-07-21
DHEvo: Data-Algorithm Based Heuristic Evolution for Generalizable MILP Solving	2507.15615	None	2025-07-21
ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution	2507.15501	None	2025-07-21
Understanding the Design Decisions of Retrieval-Augmented Generation Systems	2411.19463	None	2025-07-21
SimdBench: Benchmarking Large Language Models for SIMD-Intrinsic Code Generation	2507.15224	None	2025-07-21
Survey of GenAI for Automotive Software Development: From Requirements to Executable Code	2507.15025	None	2025-07-20
Doc2Chart: Intent-Driven Zero-Shot Chart Generation from Documents	2507.14819	None	2025-07-20
VeriOpt: PPA-Aware High-Quality Verilog Generation via Multi-Role LLMs	2507.14776	None	2025-07-20
Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations	2507.14688	None	2025-07-19
Enhancing LLM Code Generation with Ensembles: A Similarity-Based Selection Approach	2503.15838	None	2025-07-18
Combinatorial Optimization for All: Using LLMs to Aid Non-Experts in Improving Optimization Algorithms	2503.10968	None	2025-07-18
On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding	2505.12723	None	2025-07-18
ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle	2507.12674	None	2025-07-18
CoDet-M4: Detecting Machine-Generated Code in Multi-Lingual, Multi-Generator and Multi-Domain Settings	2503.13733	None	2025-07-17
Detecting LLM-generated Code with Subtle Modification by Adversarial Training	2507.13123	None	2025-07-17
ReCode: Updating Code API Knowledge with Reinforcement Learning	2506.20495	None	2025-07-17
CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance	2507.10646	None	2025-07-17
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?	2507.12415	None	2025-07-16
Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization	2507.12308	None	2025-07-16

Program Repair

Title	ArXiv Link	GitHub Link	Last Update
Do AI models help produce verified bug fixes?	2507.15822	None	2025-07-21
Input Reduction Enhanced LLM-based Program Repair	2507.15251	None	2025-07-21
LLM-Based Detection of Tangled Code Changes for Higher-Quality Method-Level Bug Datasets	2505.08263	None	2025-07-19
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?	2507.12415	None	2025-07-16
Accelerating Automatic Program Repair with Dual Retrieval-Augmented Fine-Tuning and Patch Generation on Large Language Models	2507.10103	None	2025-07-14
LLMCup: Ranking-Enhanced Comment Updating with LLMs	2507.08671	None	2025-07-11
Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs	2507.03659	None	2025-07-04
CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark	2507.05281	None	2025-07-04
CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks	2507.05269	None	2025-07-03
APRMCTS: Improving LLM-based Automated Program Repair with Iterative Tree Search	2507.01827	None	2025-07-02
Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench	2507.02976	None	2025-06-30
A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications	2506.23749	None	2025-06-30
Repair Ingredients Are All You Need: Improving Large Language Model-Based Program Repair via Repair Ingredients Search	2506.23100	None	2025-06-29
$T^3$ : Multi-level Tree-based Automatic Program Repair with Large Language Models	2506.21211	None	2025-06-26
Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories	2506.18824	None	2025-06-23
The Impact of Input Order Bias on Large Language Models for Software Fault Localization	2412.18750	None	2025-06-23
Tracing Errors, Constructing Fixes: Repository-Level Memory Error Repair via Typestate-Guided Context Retrieval	2506.18394	None	2025-06-23
Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems	2506.17208	None	2025-06-20
SemAgent: A Semantics Aware Program Repair Agent	2506.16650	None	2025-06-19
ChatDBG: Augmenting Debugging with Large Language Models	2403.16354	https://github.com/plasma-umass/chatdbg	2025-06-19
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development	2505.16975	https://github.com/dorothyduuu/swe-dev	2025-06-19
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation	2503.06680	None	2025-06-19
Empirical Evaluation of Large Language Models in Automated Program Repair	2506.13186	None	2025-06-16
The Foundation Cracks: A Comprehensive Study on Bugs and Testing Practices in LLM Libraries	2506.12320	None	2025-06-14
Identifying Helpful Context for LLM-based Vulnerability Repair: A Preliminary Study	2506.11561	None	2025-06-13
An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications	2404.11050	https://github.com/mohannadcse/alloyspecrepair	2025-06-12
Towards Understanding Bugs in Distributed Training and Inference Frameworks for Large Language Models	2506.10426	None	2025-06-12
Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning	2409.18395	None	2025-06-11
Automated Repair of Ambiguous Natural Language Requirements	2505.07270	https://github.com/msv-lab/specfix	2025-06-07
CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics	2411.17274	https://github.com/yikun-li/cleanvul	2025-06-07
MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair	2408.09568	None	2025-06-06

Automated Debugging/Bug Localization

Title	ArXiv Link	GitHub Link	Last Update
VeriRAG: A Retrieval-Augmented Framework for Automated RTL Testability Repair	2507.15664	None	2025-07-21
Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding	2507.12482	None	2025-07-14
Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs	2507.03659	None	2025-07-04
$T^3$ : Multi-level Tree-based Automatic Program Repair with Large Language Models	2506.21211	None	2025-06-26
Black-Box Test Code Fault Localization Driven by Large Language Models and Execution Estimation	2506.19045	None	2025-06-23
The Impact of Input Order Bias on Large Language Models for Software Fault Localization	2412.18750	None	2025-06-23
BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning	2407.17631	https://zenodo.org/record/15122980	2025-06-22
Improving Compiler Bug Isolation by Leveraging Large Language Models	2506.17647	None	2025-06-21
Towards Understanding Bugs in Distributed Training and Inference Frameworks for Large Language Models	2506.10426	None	2025-06-12
TTrace: Lightweight Error Checking and Diagnosis for Distributed Training	2506.09280	None	2025-06-10
Understanding Software Engineering Agents Through the Lens of Traceability: An Empirical Study	2506.08311	None	2025-06-10
Improving LLM-Based Fault Localization with External Memory and Project Context	2506.03585	None	2025-06-04
When Deep Learning Meets Information Retrieval-based Bug Localization: A Survey	2505.00144	None	2025-04-30
How Accurately Do Large Language Models Understand Code?	2504.04372	None	2025-04-09
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs	2504.04030	None	2025-04-05
Improved IR-based Bug Localization with Intelligent Relevance Feedback	2501.10542	https://github.com/asifsamir/brain	2025-03-27
A Multi-Agent Approach to Fault Localization via Graph-Based Retrieval and Reflexion	2409.13642	None	2025-03-19
AgentFL: Scaling LLM-based Fault Localization to Project-Level Context	2403.16362	None	2025-02-24
Bridging Bug Localization and Issue Fixing: A Hierarchical Localization Framework Leveraging Large Language Models	2502.15292	None	2025-02-21
Aligning the Objective of LLM-based Program Repair	2404.08877	https://github.com/cuhk-shenzhen-se/d4c	2025-02-21
Where's the Bug? Attention Probing for Scalable Fault Localization	2502.13966	None	2025-02-20
FlexFL: Flexible and Effective Fault Localization with Open-Source Large Language Models	2411.10714	None	2025-02-18
COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis	2408.05006	https://github.com/neuir/coast	2025-02-12
Fault Localization via Fine-tuning Large Language Models with Mutation Generated Stack Traces	2501.18005	None	2025-02-11
Simulated Interactive Debugging	2501.09694	None	2025-01-16
Improving the Ability of Pre-trained Language Model by Imparting Large Language Model's Experience	2408.08553	None	2025-01-15
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds	2501.06706	None	2025-01-12
Counterexample Guided Program Repair Using Zero-Shot Learning and MaxSAT-based Fault Localization	2502.07786	None	2024-12-19
Enhancing IR-based Fault Localization using Large Language Models	2412.03754	None	2024-12-04
Identifying Root Causes of Null Pointer Exceptions with Logical Inferences	2412.01005	None	2024-12-01
BDefects4NN: A Backdoor Defect Database for Controlled Localization Studies in Neural Networks	2412.00746	None	2024-12-01

Bug/Vulnerability Detection

Title	ArXiv Link	GitHub Link	Last Update
When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs	2507.16773	None	2025-07-22
Meta-Learning for Cold-Start Personalization in Prompt-Tuned LLMs	2507.16672	None	2025-07-22
LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models	2507.16585	None	2025-07-22
CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection	2501.04510	None	2025-07-21
BugScope: Learn to Find Bugs Like Human	2507.15671	None	2025-07-21
StaAgent: An Agentic Framework for Testing Static Analyzers	2507.15892	None	2025-07-20
LLM-Based Detection of Tangled Code Changes for Higher-Quality Method-Level Bug Datasets	2505.08263	None	2025-07-19
LLAMA: Multi-Feedback Smart Contract Fuzzing Framework with LLM-Guided Seed Generation	2507.12084	None	2025-07-16
Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding	2507.12482	None	2025-07-14
Turning the Tide: Repository-based Code Reflection	2507.09866	None	2025-07-14
White-Basilisk: A Hybrid Model for Code Vulnerability Detection	2507.08540	None	2025-07-11
ETrace:Event-Driven Vulnerability Detection in Smart Contracts via LLM-Based Trace Analysis	2506.15790	None	2025-07-08
Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization	2507.03051	None	2025-07-03
CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks	2507.05269	None	2025-07-03
Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench	2507.02976	None	2025-06-30
SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models	2506.20415	None	2025-06-25
VulStamp: Vulnerability Assessment using Large Language Model	2506.11484	None	2025-06-25
FuncVul: An Effective Function Level Vulnerability Detection Model using LLM and Code Chunk	2506.19453	None	2025-06-24
Smart-LLaMA-DPO: Reinforced Large Language Model for Explainable Smart Contract Vulnerability Detection	2506.18245	None	2025-06-23
LASA: Enhancing SoC Security Verification with LLM-Aided Property Generation	2506.17865	None	2025-06-22
SAVANT: Vulnerability Detection in Application Dependencies through Semantic-Guided Reachability Analysis	2506.17798	None	2025-06-21
Identifying Helpful Context for LLM-based Vulnerability Repair: A Preliminary Study	2506.11561	None	2025-06-13
Expert-in-the-Loop Systems with Cross-Domain and In-Domain Few-Shot Learning for Software Vulnerability Detection	2506.10104	None	2025-06-11
Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning	2409.18395	None	2025-06-11
A First Look at Bugs in LLM Inference Engines	2506.09713	https://github.com/infbug/bugs-in-llm-inference-engines	2025-06-11
Large Language Models for Multilingual Vulnerability Detection: How Far Are We?	2506.07503	https://github.com/spanshu96/large-language-model-for-multilingual-vulnerability-detection	2025-06-09
Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data	2506.07390	https://github.com/xin-cheng-wen/po4vul	2025-06-09
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning	2401.16185	None	2025-06-07
ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data	2408.16028	None	2025-06-01
The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs	2504.11711	https://github.com/seclab-ucr/buglens	2025-05-31
LPASS: Linear Probes as Stepping Stones for vulnerability detection using compressed LLMs	2505.24451	None	2025-05-30

Fuzzing/Testing

Title	ArXiv Link	GitHub Link	Last Update
Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent	2507.16799	None	2025-07-23
WAKENLLM: Evaluating Reasoning Potential and Stability in LLMs via Fine-Grained Benchmarking	2507.16199	None	2025-07-23
LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs	2507.16809	None	2025-07-22
ChatChecker: A Framework for Dialogue System Testing and Evaluation Through Non-cooperative User Simulation	2507.16792	None	2025-07-22
LangBiTe: A Platform for Testing Bias in Large Language Models	2404.18558	None	2025-07-22
Universal Model Routing for Efficient LLM Inference	2502.08773	None	2025-07-22
Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems	2506.06821	None	2025-07-22
ACT: Bridging the Gap in Code Translation through Synthetic Data Generation & Adaptive Training	2507.16478	None	2025-07-22
Talking Like a Phisher: LLM-Based Attacks on Voice Phishing Classifiers	2507.16291	None	2025-07-22
Time to Split: Exploring Data Splitting Strategies for Offline Evaluation of Sequential Recommenders	2507.16289	None	2025-07-22
Towards Compute-Optimal Many-Shot In-Context Learning	2507.16217	None	2025-07-22
LOCOFY Large Design Models -- Design to code conversion solution	2507.16208	None	2025-07-22
SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting	2507.16145	None	2025-07-22
GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities	2507.12367	None	2025-07-21
Efficient Compositional Multi-tasking for On-device Large Language Models	2507.16083	None	2025-07-21
Deep Researcher with Test-Time Diffusion	2507.16075	None	2025-07-21
AutoMeet: a proof-of-concept study of genAI to automate meetings in automotive engineering	2507.16054	None	2025-07-21
FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs	2507.15839	None	2025-07-21
LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra	2507.15815	None	2025-07-21
True Multimodal In-Context Learning Needs Attention to the Visual Context	2507.15807	None	2025-07-21
Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning	2507.15788	None	2025-07-21
Detecting Benchmark Contamination Through Watermarking	2502.17259	None	2025-07-21
BugScope: Learn to Find Bugs Like Human	2507.15671	None	2025-07-21
CoordField: Coordination Field for Agentic UAV Task Allocation In Low-altitude Urban Scenarios	2505.00091	None	2025-07-21
RankMixer: Scaling Up Ranking Models in Industrial Recommenders	2507.15551	None	2025-07-21
LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning	2507.15521	None	2025-07-21
DARE: Diverse Visual Question Answering with Robustness Evaluation	2409.18023	None	2025-07-21
ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events	2501.03040	None	2025-07-21
Input Reduction Enhanced LLM-based Program Repair	2507.15251	None	2025-07-21
Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning	2505.16122	None	2025-07-21
LibLMFuzz: LLM-Augmented Fuzz Target Generation for Black-box Libraries	2507.15058	None	2025-07-20

Clone Detection

Title	ArXiv Link	GitHub Link	Last Update
VulCoCo: A Simple Yet Effective Method for Detecting Vulnerable Code Clones	2507.16661	None	2025-07-22
Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification	2506.01631	None	2025-07-03
Understanding Software Engineering Agents Through the Lens of Traceability: An Empirical Study	2506.08311	None	2025-06-10
Cross-lingual Transfer in Programming Languages: An Extensive Empirical Study	2310.16937	None	2025-06-10
An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models	2409.14644	None	2025-06-03
A Systematic Literature Review of Parameter-Efficient Fine-Tuning for Large Code Models	2504.21569	https://github.com/alvi75/slr-peft	2025-05-09
From Bias To Improved Prompts: A Case Study of Bias Mitigation of Clone Detection Models	2505.05679	None	2025-05-08
The Struggles of LLMs in Cross-lingual Code Clone Detection	2408.04430	https://github.com/trux-dtf/clccd	2025-05-06
Large Language Models as Robust Data Generators in Software Analytics: Are We There Yet?	2411.10565	None	2025-05-05
Improving the Ability of Pre-trained Language Model by Imparting Large Language Model's Experience	2408.08553	None	2025-01-15
Unveiling Code Clone Patterns in Open Source VR Software: An Empirical Study	2501.07165	None	2025-01-13
Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code	2412.06757	None	2024-12-10
Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code	2402.09299	https://github.com/commissarsilver/trawic	2024-10-30
In-Context Code-Text Learning for Bimodal Software Engineering	2410.18107	None	2024-10-08
LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions	2408.07321	None	2024-08-14
Assessing the Code Clone Detection Capability of Large Language Models	2407.02402	None	2024-07-02
Investigating the Efficacy of Large Language Models for Code Clone Detection	2401.13802	https://github.com/mkhfring/largelanguagemodels	2024-01-30
Greening Large Language Models of Code	2309.04076	https://github.com/soarsmu/Avatar	2024-01-12
Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey	2308.01191	None	2023-08-06
Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review	2307.02503	None	2023-07-04
Understanding Programs by Exploiting (Fuzzing) Test Cases	2305.13592	https://github.com/rabbitjy/fuzztuning	2023-06-12

Clone Search

Title	ArXiv Link	GitHub Link	Last Update
OASIS: Order-Augmented Strategy for Improved Code Search	2503.08161	None	2025-07-17
The Impact of LLM-Assistants on Software Developer Productivity: A Systematic Literature Review	2507.03156	None	2025-07-03
LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models	2505.14759	None	2025-06-08
DeepRTL2: A Versatile Model for RTL-Related Tasks	2506.15697	None	2025-05-28
Knowledge Graph Based Repository-Level Code Generation	2505.14394	None	2025-05-20
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks	2504.19444	None	2025-04-28
Towards Leveraging Large Language Model Summaries for Topic Modeling in Source Code	2504.17426	None	2025-04-24
CoSQA+: Pioneering the Multi-Choice Code Search Benchmark with Test-Driven Agents	2406.11589	https://github.com/DeepSoftwareAnalytics/CoSQA_Plus	2025-04-11
Zero-Shot Cross-Domain Code Search without Fine-Tuning	2504.07740	https://github.com/zju-ctag/codebridge	2025-04-10
Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets	2502.20246	None	2025-02-28
OrcaLoca: An LLM Agent Framework for Software Issue Localization	2502.00350	None	2025-02-01
SpecRover: Code Intent Extraction via LLMs	2408.02232	None	2024-12-11
Fixing Security Vulnerabilities with AI in OSS-Fuzz	2411.03346	None	2024-11-21
Are Decoder-Only Large Language Models the Silver Bullet for Code Search?	2410.22240	https://github.com/georgepitt/decoderllms-codesearch	2024-10-29
In-the-loop Hyper-Parameter Optimization for LLM-Based Automated Design of Heuristics	2410.16309	None	2024-10-07
No Man is an Island: Towards Fully Automatic Programming by Code Search, Code Generation and Program Repair	2409.03267	None	2024-09-05
You Augment Me: Exploring ChatGPT-based Data Augmentation for Semantic Code Search	2408.05542	None	2024-08-17
ViC: Virtual Compiler Is All You Need For Assembly Code Search	2408.06385	https://github.com/zeyugao/virtualcompiler	2024-08-10
AutoCodeRover: Autonomous Program Improvement	2404.05427	https://github.com/nus-apr/auto-code-rover	2024-07-25
RepoQA: Evaluating Long Context Code Understanding	2406.06025	https://github.com/evalplus/repoqa	2024-06-10
Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search	2401.04514	https://github.com/alex-haochenli/reco	2024-06-03
ACES: Generating Diverse Programming Puzzles with with Autotelic Generative Models	2310.10692	None	2024-05-29
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models	2405.11196	https://github.com/gksajy/slimcode	2024-05-18
REINFOREST: Reinforcing Semantic Code Similarity for Cross-Lingual Code Search Models	2305.03843	https://github.com/reinforest-team/reinforest	2024-04-15
GenCodeSearchNet: A Benchmark Test Suite for Evaluating Generalization in Programming Language Understanding	2311.09707	https://github.com/drndr/gencodesearchnet	2023-11-16
The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation	2305.06156	https://github.com/fsoft-ai4code/thevault	2023-10-30
Code Representation Pre-training with Complements from Program Executions	2309.09980	None	2023-09-04
Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work?	2211.12821	None	2023-08-28

Code Review

Title	ArXiv Link	GitHub Link	Last Update
LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning	2507.16395	None	2025-07-22
Measuring how changes in code readability attributes affect code quality evaluation by Large Language Models	2507.05289	None	2025-07-09
Towards Exception Safety Code Generation with Intermediate Representation Agents Framework	2410.06949	https://github.com/XMZhangAI/Seeker	2025-07-07
An Insight into Security Code Review with LLMs: Capabilities, Obstacles, and Influential Factors	2401.16310	None	2025-06-02
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models	2503.16167	None	2025-05-31
Applying Large Language Models to Issue Classification: Revisiting with Extended Data and New Models	2506.00128	None	2025-05-30
Evaluating Large Language Models for Code Review	2505.20206	None	2025-05-26
Large Language Models in Code Co-generation for Safe Autonomous Vehicles	2505.19658	None	2025-05-26
Knowledge-Guided Prompt Learning for Request Quality Assurance in Public Code Review	2410.21673	https://github.com/wut-idea/kp-pcr	2025-05-20
Advancing Software Quality: A Standards-Focused Review of LLM-Based Assurance Techniques	2505.13766	None	2025-05-19
Large Language Models as Robust Data Generators in Software Analytics: Are We There Yet?	2411.10565	None	2025-05-05
Patched RTC: evaluating LLMs for diverse software development tasks	2407.16557	https://github.com/codelion/optillm/blob/main/optillm/rto.py	2025-04-29
CoCo-Bench: A Comprehensive Code Benchmark For Multi-task Large Language Model Evaluation	2504.20673	None	2025-04-29
Improving Automated Secure Code Reviews: A Synthetic Dataset for Code Vulnerability Flaws	2504.16310	None	2025-04-22
DR.FIX: Automatically Fixing Data Races at Industry Scale	2504.15637	https://github.com/uber-research/drfix	2025-04-22
Psycholinguistic Analyses in Software Engineering Text: A Systematic Literature Review	2503.05992	None	2025-04-17
Bugdar: AI-Augmented Secure Code Review for GitHub Pull Requests	2503.17302	None	2025-03-21
Measuring Determinism in Large Language Models for Software Code Review	2502.20747	None	2025-02-28
MdEval: Massively Multilingual Code Debugging	2411.02310	None	2025-02-24
Accountability in Code Review: The Role of Intrinsic Drivers and the Impact of LLMs	2502.15963	None	2025-02-21