arxiv.md
July 24, 2025 ยท View on GitHub
Updated on 2025.07.24
Table of Contents
Code Summarization/Understanding
| Title | ArXiv Link | GitHub Link | Last Update |
|---|---|---|---|
| On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization | 2507.16587 | None | 2025-07-22 |
| EVALOOP: Assessing LLM Robustness in Programming from a Self-consistency Perspective | 2505.12185 | None | 2025-07-14 |
| Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding | 2507.12482 | None | 2025-07-14 |
| Turning the Tide: Repository-based Code Reflection | 2507.09866 | None | 2025-07-14 |
| Can LLMs Replace Humans During Code Chunking? | 2506.19897 | None | 2025-06-24 |
| Re-Evaluating Code LLM Benchmarks Under Semantic Mutation | 2506.17369 | None | 2025-06-20 |
| MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization | 2406.18379 | None | 2025-06-17 |
| Evaluating Large Language Models on Non-Code Software Engineering Tasks | 2506.10833 | https://github.com/aieng-lab/senlp-benchmark | 2025-06-12 |
| Evaluating LLMs Effectiveness in Detecting and Correcting Test Smells: An Empirical Study | 2506.07594 | https://github.com/ts-group-icse26/testsmells.llms.study-replication.package-icse26 | 2025-06-09 |
| Rethinking the effects of data contamination in Code Intelligence | 2506.02791 | None | 2025-06-08 |
| LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models | 2505.14759 | None | 2025-06-08 |
| Can Large Language Models Understand Intermediate Representations in Compilers? | 2502.06854 | None | 2025-06-05 |
| Utilizing Precise and Complete Code Context to Guide LLM in Automatic False Positive Mitigation | 2411.03079 | None | 2025-05-31 |
| The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs | 2504.11711 | https://github.com/seclab-ucr/buglens | 2025-05-31 |
| An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks | 2505.20854 | None | 2025-05-27 |
| DocAgent: A Multi-Agent System for Automated Code Documentation Generation | 2504.08725 | https://github.com/facebookresearch/docagent | 2025-05-23 |
| CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations | 2504.14119 | None | 2025-05-23 |
| A Qualitative Investigation into LLM-Generated Multilingual Code Comments and Automatic Evaluation Metrics | 2505.15469 | None | 2025-05-21 |
| Capturing the Effects of Quantization on Trojans in Code LLMs | 2505.14200 | None | 2025-05-20 |
| Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations? | 2505.10443 | None | 2025-05-15 |
| Variational Prefix Tuning for Diverse and Accurate Code Summarization Using Pre-trained Language Models | 2505.09062 | https://github.com/jundaz/VPT | 2025-05-14 |
| BinMetric: A Comprehensive Binary Analysis Benchmark for Large Language Models | 2505.07360 | None | 2025-05-12 |
| A Systematic Literature Review of Parameter-Efficient Fine-Tuning for Large Code Models | 2504.21569 | https://github.com/alvi75/slr-peft | 2025-05-09 |
| Large Language Models as Robust Data Generators in Software Analytics: Are We There Yet? | 2411.10565 | None | 2025-05-05 |
| An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding | 2504.21803 | None | 2025-04-30 |
| CoCo-Bench: A Comprehensive Code Benchmark For Multi-task Large Language Model Evaluation | 2504.20673 | None | 2025-04-29 |
| Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks | 2504.19444 | None | 2025-04-28 |
| Do Automatic Comment Generation Techniques Fall Short? Exploring the Influence of Method Dependencies on Code Understanding | 2504.19459 | None | 2025-04-28 |
| Context-Enhanced Vulnerability Detection Based on Large Language Model | 2504.16877 | None | 2025-04-23 |
| LRASGen: LLM-based RESTful API Specification Generation | 2504.16833 | None | 2025-04-23 |
| Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering | 2502.06193 | None | 2025-04-21 |
Code Generation/Completion
| Title | ArXiv Link | GitHub Link | Last Update |
|---|---|---|---|
| Adaptive Graph Pruning for Multi-Agent Communication | 2506.02951 | None | 2025-07-23 |
| On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization | 2507.16587 | None | 2025-07-22 |
| Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems | 2506.06821 | None | 2025-07-22 |
| Improving Code LLM Robustness to Prompt Perturbations via Layer-Aware Model Editing | 2507.16407 | None | 2025-07-22 |
| CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning | 2507.14111 | None | 2025-07-22 |
| LOCOFY Large Design Models -- Design to code conversion solution | 2507.16208 | None | 2025-07-22 |
| 3LM: Bridging Arabic, STEM, and Code through Benchmarking | 2507.15850 | None | 2025-07-22 |
| ShadowCode: Towards (Automatic) External Prompt Injection Attack against Code LLMs | 2407.09164 | None | 2025-07-22 |
| GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities | 2507.12367 | None | 2025-07-21 |
| Compositional Coordination for Multi-Robot Teams with Large Language Models | 2507.16068 | None | 2025-07-21 |
| Autocomp: LLM-Driven Code Optimization for Tensor Accelerators | 2505.18574 | None | 2025-07-21 |
| Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR | 2507.15778 | None | 2025-07-21 |
| Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training | 2507.15640 | None | 2025-07-21 |
| DHEvo: Data-Algorithm Based Heuristic Evolution for Generalizable MILP Solving | 2507.15615 | None | 2025-07-21 |
| ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution | 2507.15501 | None | 2025-07-21 |
| Understanding the Design Decisions of Retrieval-Augmented Generation Systems | 2411.19463 | None | 2025-07-21 |
| SimdBench: Benchmarking Large Language Models for SIMD-Intrinsic Code Generation | 2507.15224 | None | 2025-07-21 |
| Survey of GenAI for Automotive Software Development: From Requirements to Executable Code | 2507.15025 | None | 2025-07-20 |
| Doc2Chart: Intent-Driven Zero-Shot Chart Generation from Documents | 2507.14819 | None | 2025-07-20 |
| VeriOpt: PPA-Aware High-Quality Verilog Generation via Multi-Role LLMs | 2507.14776 | None | 2025-07-20 |
| Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations | 2507.14688 | None | 2025-07-19 |
| Enhancing LLM Code Generation with Ensembles: A Similarity-Based Selection Approach | 2503.15838 | None | 2025-07-18 |
| Combinatorial Optimization for All: Using LLMs to Aid Non-Experts in Improving Optimization Algorithms | 2503.10968 | None | 2025-07-18 |
| On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding | 2505.12723 | None | 2025-07-18 |
| ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle | 2507.12674 | None | 2025-07-18 |
| CoDet-M4: Detecting Machine-Generated Code in Multi-Lingual, Multi-Generator and Multi-Domain Settings | 2503.13733 | None | 2025-07-17 |
| Detecting LLM-generated Code with Subtle Modification by Adversarial Training | 2507.13123 | None | 2025-07-17 |
| ReCode: Updating Code API Knowledge with Reinforcement Learning | 2506.20495 | None | 2025-07-17 |
| CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance | 2507.10646 | None | 2025-07-17 |
| SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? | 2507.12415 | None | 2025-07-16 |
| Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization | 2507.12308 | None | 2025-07-16 |
Program Repair
| Title | ArXiv Link | GitHub Link | Last Update |
|---|---|---|---|
| Do AI models help produce verified bug fixes? | 2507.15822 | None | 2025-07-21 |
| Input Reduction Enhanced LLM-based Program Repair | 2507.15251 | None | 2025-07-21 |
| LLM-Based Detection of Tangled Code Changes for Higher-Quality Method-Level Bug Datasets | 2505.08263 | None | 2025-07-19 |
| SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? | 2507.12415 | None | 2025-07-16 |
| Accelerating Automatic Program Repair with Dual Retrieval-Augmented Fine-Tuning and Patch Generation on Large Language Models | 2507.10103 | None | 2025-07-14 |
| LLMCup: Ranking-Enhanced Comment Updating with LLMs | 2507.08671 | None | 2025-07-11 |
| Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs | 2507.03659 | None | 2025-07-04 |
| CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark | 2507.05281 | None | 2025-07-04 |
| CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks | 2507.05269 | None | 2025-07-03 |
| APRMCTS: Improving LLM-based Automated Program Repair with Iterative Tree Search | 2507.01827 | None | 2025-07-02 |
| Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench | 2507.02976 | None | 2025-06-30 |
| A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications | 2506.23749 | None | 2025-06-30 |
| Repair Ingredients Are All You Need: Improving Large Language Model-Based Program Repair via Repair Ingredients Search | 2506.23100 | None | 2025-06-29 |
| : Multi-level Tree-based Automatic Program Repair with Large Language Models | 2506.21211 | None | 2025-06-26 |
| Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories | 2506.18824 | None | 2025-06-23 |
| The Impact of Input Order Bias on Large Language Models for Software Fault Localization | 2412.18750 | None | 2025-06-23 |
| Tracing Errors, Constructing Fixes: Repository-Level Memory Error Repair via Typestate-Guided Context Retrieval | 2506.18394 | None | 2025-06-23 |
| Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems | 2506.17208 | None | 2025-06-20 |
| SemAgent: A Semantics Aware Program Repair Agent | 2506.16650 | None | 2025-06-19 |
| ChatDBG: Augmenting Debugging with Large Language Models | 2403.16354 | https://github.com/plasma-umass/chatdbg | 2025-06-19 |
| SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development | 2505.16975 | https://github.com/dorothyduuu/swe-dev | 2025-06-19 |
| FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation | 2503.06680 | None | 2025-06-19 |
| Empirical Evaluation of Large Language Models in Automated Program Repair | 2506.13186 | None | 2025-06-16 |
| The Foundation Cracks: A Comprehensive Study on Bugs and Testing Practices in LLM Libraries | 2506.12320 | None | 2025-06-14 |
| Identifying Helpful Context for LLM-based Vulnerability Repair: A Preliminary Study | 2506.11561 | None | 2025-06-13 |
| An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications | 2404.11050 | https://github.com/mohannadcse/alloyspecrepair | 2025-06-12 |
| Towards Understanding Bugs in Distributed Training and Inference Frameworks for Large Language Models | 2506.10426 | None | 2025-06-12 |
| Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning | 2409.18395 | None | 2025-06-11 |
| Automated Repair of Ambiguous Natural Language Requirements | 2505.07270 | https://github.com/msv-lab/specfix | 2025-06-07 |
| CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics | 2411.17274 | https://github.com/yikun-li/cleanvul | 2025-06-07 |
| MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair | 2408.09568 | None | 2025-06-06 |
Automated Debugging/Bug Localization
| Title | ArXiv Link | GitHub Link | Last Update |
|---|---|---|---|
| VeriRAG: A Retrieval-Augmented Framework for Automated RTL Testability Repair | 2507.15664 | None | 2025-07-21 |
| Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding | 2507.12482 | None | 2025-07-14 |
| Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs | 2507.03659 | None | 2025-07-04 |
| : Multi-level Tree-based Automatic Program Repair with Large Language Models | 2506.21211 | None | 2025-06-26 |
| Black-Box Test Code Fault Localization Driven by Large Language Models and Execution Estimation | 2506.19045 | None | 2025-06-23 |
| The Impact of Input Order Bias on Large Language Models for Software Fault Localization | 2412.18750 | None | 2025-06-23 |
| BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning | 2407.17631 | https://zenodo.org/record/15122980 | 2025-06-22 |
| Improving Compiler Bug Isolation by Leveraging Large Language Models | 2506.17647 | None | 2025-06-21 |
| Towards Understanding Bugs in Distributed Training and Inference Frameworks for Large Language Models | 2506.10426 | None | 2025-06-12 |
| TTrace: Lightweight Error Checking and Diagnosis for Distributed Training | 2506.09280 | None | 2025-06-10 |
| Understanding Software Engineering Agents Through the Lens of Traceability: An Empirical Study | 2506.08311 | None | 2025-06-10 |
| Improving LLM-Based Fault Localization with External Memory and Project Context | 2506.03585 | None | 2025-06-04 |
| When Deep Learning Meets Information Retrieval-based Bug Localization: A Survey | 2505.00144 | None | 2025-04-30 |
| How Accurately Do Large Language Models Understand Code? | 2504.04372 | None | 2025-04-09 |
| OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs | 2504.04030 | None | 2025-04-05 |
| Improved IR-based Bug Localization with Intelligent Relevance Feedback | 2501.10542 | https://github.com/asifsamir/brain | 2025-03-27 |
| A Multi-Agent Approach to Fault Localization via Graph-Based Retrieval and Reflexion | 2409.13642 | None | 2025-03-19 |
| AgentFL: Scaling LLM-based Fault Localization to Project-Level Context | 2403.16362 | None | 2025-02-24 |
| Bridging Bug Localization and Issue Fixing: A Hierarchical Localization Framework Leveraging Large Language Models | 2502.15292 | None | 2025-02-21 |
| Aligning the Objective of LLM-based Program Repair | 2404.08877 | https://github.com/cuhk-shenzhen-se/d4c | 2025-02-21 |
| Where's the Bug? Attention Probing for Scalable Fault Localization | 2502.13966 | None | 2025-02-20 |
| FlexFL: Flexible and Effective Fault Localization with Open-Source Large Language Models | 2411.10714 | None | 2025-02-18 |
| COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis | 2408.05006 | https://github.com/neuir/coast | 2025-02-12 |
| Fault Localization via Fine-tuning Large Language Models with Mutation Generated Stack Traces | 2501.18005 | None | 2025-02-11 |
| Simulated Interactive Debugging | 2501.09694 | None | 2025-01-16 |
| Improving the Ability of Pre-trained Language Model by Imparting Large Language Model's Experience | 2408.08553 | None | 2025-01-15 |
| AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds | 2501.06706 | None | 2025-01-12 |
| Counterexample Guided Program Repair Using Zero-Shot Learning and MaxSAT-based Fault Localization | 2502.07786 | None | 2024-12-19 |
| Enhancing IR-based Fault Localization using Large Language Models | 2412.03754 | None | 2024-12-04 |
| Identifying Root Causes of Null Pointer Exceptions with Logical Inferences | 2412.01005 | None | 2024-12-01 |
| BDefects4NN: A Backdoor Defect Database for Controlled Localization Studies in Neural Networks | 2412.00746 | None | 2024-12-01 |
Bug/Vulnerability Detection
| Title | ArXiv Link | GitHub Link | Last Update |
|---|---|---|---|
| When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs | 2507.16773 | None | 2025-07-22 |
| Meta-Learning for Cold-Start Personalization in Prompt-Tuned LLMs | 2507.16672 | None | 2025-07-22 |
| LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models | 2507.16585 | None | 2025-07-22 |
| CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection | 2501.04510 | None | 2025-07-21 |
| BugScope: Learn to Find Bugs Like Human | 2507.15671 | None | 2025-07-21 |
| StaAgent: An Agentic Framework for Testing Static Analyzers | 2507.15892 | None | 2025-07-20 |
| LLM-Based Detection of Tangled Code Changes for Higher-Quality Method-Level Bug Datasets | 2505.08263 | None | 2025-07-19 |
| LLAMA: Multi-Feedback Smart Contract Fuzzing Framework with LLM-Guided Seed Generation | 2507.12084 | None | 2025-07-16 |
| Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding | 2507.12482 | None | 2025-07-14 |
| Turning the Tide: Repository-based Code Reflection | 2507.09866 | None | 2025-07-14 |
| White-Basilisk: A Hybrid Model for Code Vulnerability Detection | 2507.08540 | None | 2025-07-11 |
| ETrace:Event-Driven Vulnerability Detection in Smart Contracts via LLM-Based Trace Analysis | 2506.15790 | None | 2025-07-08 |
| Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization | 2507.03051 | None | 2025-07-03 |
| CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks | 2507.05269 | None | 2025-07-03 |
| Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench | 2507.02976 | None | 2025-06-30 |
| SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models | 2506.20415 | None | 2025-06-25 |
| VulStamp: Vulnerability Assessment using Large Language Model | 2506.11484 | None | 2025-06-25 |
| FuncVul: An Effective Function Level Vulnerability Detection Model using LLM and Code Chunk | 2506.19453 | None | 2025-06-24 |
| Smart-LLaMA-DPO: Reinforced Large Language Model for Explainable Smart Contract Vulnerability Detection | 2506.18245 | None | 2025-06-23 |
| LASA: Enhancing SoC Security Verification with LLM-Aided Property Generation | 2506.17865 | None | 2025-06-22 |
| SAVANT: Vulnerability Detection in Application Dependencies through Semantic-Guided Reachability Analysis | 2506.17798 | None | 2025-06-21 |
| Identifying Helpful Context for LLM-based Vulnerability Repair: A Preliminary Study | 2506.11561 | None | 2025-06-13 |
| Expert-in-the-Loop Systems with Cross-Domain and In-Domain Few-Shot Learning for Software Vulnerability Detection | 2506.10104 | None | 2025-06-11 |
| Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning | 2409.18395 | None | 2025-06-11 |
| A First Look at Bugs in LLM Inference Engines | 2506.09713 | https://github.com/infbug/bugs-in-llm-inference-engines | 2025-06-11 |
| Large Language Models for Multilingual Vulnerability Detection: How Far Are We? | 2506.07503 | https://github.com/spanshu96/large-language-model-for-multilingual-vulnerability-detection | 2025-06-09 |
| Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data | 2506.07390 | https://github.com/xin-cheng-wen/po4vul | 2025-06-09 |
| LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning | 2401.16185 | None | 2025-06-07 |
| ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data | 2408.16028 | None | 2025-06-01 |
| The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs | 2504.11711 | https://github.com/seclab-ucr/buglens | 2025-05-31 |
| LPASS: Linear Probes as Stepping Stones for vulnerability detection using compressed LLMs | 2505.24451 | None | 2025-05-30 |
Fuzzing/Testing
| Title | ArXiv Link | GitHub Link | Last Update |
|---|---|---|---|
| Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent | 2507.16799 | None | 2025-07-23 |
| WAKENLLM: Evaluating Reasoning Potential and Stability in LLMs via Fine-Grained Benchmarking | 2507.16199 | None | 2025-07-23 |
| LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs | 2507.16809 | None | 2025-07-22 |
| ChatChecker: A Framework for Dialogue System Testing and Evaluation Through Non-cooperative User Simulation | 2507.16792 | None | 2025-07-22 |
| LangBiTe: A Platform for Testing Bias in Large Language Models | 2404.18558 | None | 2025-07-22 |
| Universal Model Routing for Efficient LLM Inference | 2502.08773 | None | 2025-07-22 |
| Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems | 2506.06821 | None | 2025-07-22 |
| ACT: Bridging the Gap in Code Translation through Synthetic Data Generation & Adaptive Training | 2507.16478 | None | 2025-07-22 |
| Talking Like a Phisher: LLM-Based Attacks on Voice Phishing Classifiers | 2507.16291 | None | 2025-07-22 |
| Time to Split: Exploring Data Splitting Strategies for Offline Evaluation of Sequential Recommenders | 2507.16289 | None | 2025-07-22 |
| Towards Compute-Optimal Many-Shot In-Context Learning | 2507.16217 | None | 2025-07-22 |
| LOCOFY Large Design Models -- Design to code conversion solution | 2507.16208 | None | 2025-07-22 |
| SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting | 2507.16145 | None | 2025-07-22 |
| GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities | 2507.12367 | None | 2025-07-21 |
| Efficient Compositional Multi-tasking for On-device Large Language Models | 2507.16083 | None | 2025-07-21 |
| Deep Researcher with Test-Time Diffusion | 2507.16075 | None | 2025-07-21 |
| AutoMeet: a proof-of-concept study of genAI to automate meetings in automotive engineering | 2507.16054 | None | 2025-07-21 |
| FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs | 2507.15839 | None | 2025-07-21 |
| LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra | 2507.15815 | None | 2025-07-21 |
| True Multimodal In-Context Learning Needs Attention to the Visual Context | 2507.15807 | None | 2025-07-21 |
| Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning | 2507.15788 | None | 2025-07-21 |
| Detecting Benchmark Contamination Through Watermarking | 2502.17259 | None | 2025-07-21 |
| BugScope: Learn to Find Bugs Like Human | 2507.15671 | None | 2025-07-21 |
| CoordField: Coordination Field for Agentic UAV Task Allocation In Low-altitude Urban Scenarios | 2505.00091 | None | 2025-07-21 |
| RankMixer: Scaling Up Ranking Models in Industrial Recommenders | 2507.15551 | None | 2025-07-21 |
| LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning | 2507.15521 | None | 2025-07-21 |
| DARE: Diverse Visual Question Answering with Robustness Evaluation | 2409.18023 | None | 2025-07-21 |
| ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events | 2501.03040 | None | 2025-07-21 |
| Input Reduction Enhanced LLM-based Program Repair | 2507.15251 | None | 2025-07-21 |
| Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning | 2505.16122 | None | 2025-07-21 |
| LibLMFuzz: LLM-Augmented Fuzz Target Generation for Black-box Libraries | 2507.15058 | None | 2025-07-20 |
Clone Detection
| Title | ArXiv Link | GitHub Link | Last Update |
|---|---|---|---|
| VulCoCo: A Simple Yet Effective Method for Detecting Vulnerable Code Clones | 2507.16661 | None | 2025-07-22 |
| Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification | 2506.01631 | None | 2025-07-03 |
| Understanding Software Engineering Agents Through the Lens of Traceability: An Empirical Study | 2506.08311 | None | 2025-06-10 |
| Cross-lingual Transfer in Programming Languages: An Extensive Empirical Study | 2310.16937 | None | 2025-06-10 |
| An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models | 2409.14644 | None | 2025-06-03 |
| A Systematic Literature Review of Parameter-Efficient Fine-Tuning for Large Code Models | 2504.21569 | https://github.com/alvi75/slr-peft | 2025-05-09 |
| From Bias To Improved Prompts: A Case Study of Bias Mitigation of Clone Detection Models | 2505.05679 | None | 2025-05-08 |
| The Struggles of LLMs in Cross-lingual Code Clone Detection | 2408.04430 | https://github.com/trux-dtf/clccd | 2025-05-06 |
| Large Language Models as Robust Data Generators in Software Analytics: Are We There Yet? | 2411.10565 | None | 2025-05-05 |
| Improving the Ability of Pre-trained Language Model by Imparting Large Language Model's Experience | 2408.08553 | None | 2025-01-15 |
| Unveiling Code Clone Patterns in Open Source VR Software: An Empirical Study | 2501.07165 | None | 2025-01-13 |
| Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code | 2412.06757 | None | 2024-12-10 |
| Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code | 2402.09299 | https://github.com/commissarsilver/trawic | 2024-10-30 |
| In-Context Code-Text Learning for Bimodal Software Engineering | 2410.18107 | None | 2024-10-08 |
| LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions | 2408.07321 | None | 2024-08-14 |
| Assessing the Code Clone Detection Capability of Large Language Models | 2407.02402 | None | 2024-07-02 |
| Investigating the Efficacy of Large Language Models for Code Clone Detection | 2401.13802 | https://github.com/mkhfring/largelanguagemodels | 2024-01-30 |
| Greening Large Language Models of Code | 2309.04076 | https://github.com/soarsmu/Avatar | 2024-01-12 |
| Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey | 2308.01191 | None | 2023-08-06 |
| Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review | 2307.02503 | None | 2023-07-04 |
| Understanding Programs by Exploiting (Fuzzing) Test Cases | 2305.13592 | https://github.com/rabbitjy/fuzztuning | 2023-06-12 |
Clone Search
| Title | ArXiv Link | GitHub Link | Last Update |
|---|---|---|---|
| OASIS: Order-Augmented Strategy for Improved Code Search | 2503.08161 | None | 2025-07-17 |
| The Impact of LLM-Assistants on Software Developer Productivity: A Systematic Literature Review | 2507.03156 | None | 2025-07-03 |
| LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models | 2505.14759 | None | 2025-06-08 |
| DeepRTL2: A Versatile Model for RTL-Related Tasks | 2506.15697 | None | 2025-05-28 |
| Knowledge Graph Based Repository-Level Code Generation | 2505.14394 | None | 2025-05-20 |
| Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks | 2504.19444 | None | 2025-04-28 |
| Towards Leveraging Large Language Model Summaries for Topic Modeling in Source Code | 2504.17426 | None | 2025-04-24 |
| CoSQA+: Pioneering the Multi-Choice Code Search Benchmark with Test-Driven Agents | 2406.11589 | https://github.com/DeepSoftwareAnalytics/CoSQA_Plus | 2025-04-11 |
| Zero-Shot Cross-Domain Code Search without Fine-Tuning | 2504.07740 | https://github.com/zju-ctag/codebridge | 2025-04-10 |
| Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets | 2502.20246 | None | 2025-02-28 |
| OrcaLoca: An LLM Agent Framework for Software Issue Localization | 2502.00350 | None | 2025-02-01 |
| SpecRover: Code Intent Extraction via LLMs | 2408.02232 | None | 2024-12-11 |
| Fixing Security Vulnerabilities with AI in OSS-Fuzz | 2411.03346 | None | 2024-11-21 |
| Are Decoder-Only Large Language Models the Silver Bullet for Code Search? | 2410.22240 | https://github.com/georgepitt/decoderllms-codesearch | 2024-10-29 |
| In-the-loop Hyper-Parameter Optimization for LLM-Based Automated Design of Heuristics | 2410.16309 | None | 2024-10-07 |
| No Man is an Island: Towards Fully Automatic Programming by Code Search, Code Generation and Program Repair | 2409.03267 | None | 2024-09-05 |
| You Augment Me: Exploring ChatGPT-based Data Augmentation for Semantic Code Search | 2408.05542 | None | 2024-08-17 |
| ViC: Virtual Compiler Is All You Need For Assembly Code Search | 2408.06385 | https://github.com/zeyugao/virtualcompiler | 2024-08-10 |
| AutoCodeRover: Autonomous Program Improvement | 2404.05427 | https://github.com/nus-apr/auto-code-rover | 2024-07-25 |
| RepoQA: Evaluating Long Context Code Understanding | 2406.06025 | https://github.com/evalplus/repoqa | 2024-06-10 |
| Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search | 2401.04514 | https://github.com/alex-haochenli/reco | 2024-06-03 |
| ACES: Generating Diverse Programming Puzzles with with Autotelic Generative Models | 2310.10692 | None | 2024-05-29 |
| Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models | 2405.11196 | https://github.com/gksajy/slimcode | 2024-05-18 |
| REINFOREST: Reinforcing Semantic Code Similarity for Cross-Lingual Code Search Models | 2305.03843 | https://github.com/reinforest-team/reinforest | 2024-04-15 |
| GenCodeSearchNet: A Benchmark Test Suite for Evaluating Generalization in Programming Language Understanding | 2311.09707 | https://github.com/drndr/gencodesearchnet | 2023-11-16 |
| The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation | 2305.06156 | https://github.com/fsoft-ai4code/thevault | 2023-10-30 |
| Code Representation Pre-training with Complements from Program Executions | 2309.09980 | None | 2023-09-04 |
| Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work? | 2211.12821 | None | 2023-08-28 |
Code Review
| Title | ArXiv Link | GitHub Link | Last Update |
|---|---|---|---|
| LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning | 2507.16395 | None | 2025-07-22 |
| Measuring how changes in code readability attributes affect code quality evaluation by Large Language Models | 2507.05289 | None | 2025-07-09 |
| Towards Exception Safety Code Generation with Intermediate Representation Agents Framework | 2410.06949 | https://github.com/XMZhangAI/Seeker | 2025-07-07 |
| An Insight into Security Code Review with LLMs: Capabilities, Obstacles, and Influential Factors | 2401.16310 | None | 2025-06-02 |
| CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models | 2503.16167 | None | 2025-05-31 |
| Applying Large Language Models to Issue Classification: Revisiting with Extended Data and New Models | 2506.00128 | None | 2025-05-30 |
| Evaluating Large Language Models for Code Review | 2505.20206 | None | 2025-05-26 |
| Large Language Models in Code Co-generation for Safe Autonomous Vehicles | 2505.19658 | None | 2025-05-26 |
| Knowledge-Guided Prompt Learning for Request Quality Assurance in Public Code Review | 2410.21673 | https://github.com/wut-idea/kp-pcr | 2025-05-20 |
| Advancing Software Quality: A Standards-Focused Review of LLM-Based Assurance Techniques | 2505.13766 | None | 2025-05-19 |
| Large Language Models as Robust Data Generators in Software Analytics: Are We There Yet? | 2411.10565 | None | 2025-05-05 |
| Patched RTC: evaluating LLMs for diverse software development tasks | 2407.16557 | https://github.com/codelion/optillm/blob/main/optillm/rto.py | 2025-04-29 |
| CoCo-Bench: A Comprehensive Code Benchmark For Multi-task Large Language Model Evaluation | 2504.20673 | None | 2025-04-29 |
| Improving Automated Secure Code Reviews: A Synthetic Dataset for Code Vulnerability Flaws | 2504.16310 | None | 2025-04-22 |
| DR.FIX: Automatically Fixing Data Races at Industry Scale | 2504.15637 | https://github.com/uber-research/drfix | 2025-04-22 |
| Psycholinguistic Analyses in Software Engineering Text: A Systematic Literature Review | 2503.05992 | None | 2025-04-17 |
| Bugdar: AI-Augmented Secure Code Review for GitHub Pull Requests | 2503.17302 | None | 2025-03-21 |
| Measuring Determinism in Large Language Models for Software Code Review | 2502.20747 | None | 2025-02-28 |
| MdEval: Massively Multilingual Code Debugging | 2411.02310 | None | 2025-02-24 |
| Accountability in Code Review: The Role of Intrinsic Drivers and the Impact of LLMs | 2502.15963 | None | 2025-02-21 |