Are Decoder-Only Large Language Models the Silver Bullet for Code Search?
November 6, 2025 ยท View on GitHub
This repository contains the code and datasets for the paper "Are Decoder-Only Large Language Models the Silver Bullet for Code Search?". Our work is divided into three main parts: zero-shot tests with decoder-only LLMs, fine-tuning tests with decoder-only LLMs, and improvement analysis. This repository provides the necessary code and data to reproduce our results.
Each section has its own dedicated directory containing all relevant scripts. Below, we provide an overview and demonstration example for each section.
๐ค Published Models
For reproducibility, for the decoder-only large models evaluated in the paper, we have made public the corresponding Huggingface links. These links can be viewed specifically in the HuggingFace Models list and HuggingFace collection.
Experimental Environment
Hardware:
- CPU: Intel(R) Xeon(R) Platinum 8360H CPU @ 3.00GHz
- GPU: 2 ร NVIDIA A800 80GB GPUs
- RAM: 2.0 TB
Software:
- Operating System: CentOS Linux release 7.9.2009 (Core)
- Python: 3.8.19
- PyTorch Version: 2.3.0+cu121
- CUDA Version: 12.1
Dependencies
To install the necessary dependencies, run the following commands:
git clone https://github.com/ChenyxEugene/DecoderLLMs-CodeSearch.git
cd decoder-only-code-search
pip install -e .
pip install -r requirements.txt
Datasets
The datasets can be accessed via this Google Drive link. The dataset structure is as follows:
Dataset
|__CodeSearchNet
|__CoSQA_Plus
|__Train
|__CSN
|__E5
|__MNTP
|__SimCSE
Zero-Shot Test
All scripts for zero-shot code search are located in the Zero-shot directory. These scripts measure distances using cosine similarity. Below is an example of testing CodeGemma on the CodeSearchNet dataset. Additional examples can be found in the same directory.
cd decoder-only-code-search/Zero-shot
python CSN_Test_Decoder_Model.py \
--model_name_or_path google/codegemma-7b-it \
--result_path CSN-codegemma \
--test_data_path_dir ../Dataset/CodeSearchNet \
--embedding_batch_size 500
Example output:
Loading checkpoint shards: 100%|โโโโโโโโโโ| 3/3 [00:05<00:00, 1.78s/it]
Evaluating language: python
Shape of data_code: (22176,)
Each batch contains 500 data
Processing batches: 100%|โโโโโโโโโโ| 45/45 [04:01<00:00, 5.38s/it]
python MRR Score: 0.10966641818108162
Evaluating language: go
......
Fine-Tuning Test
All scripts for fine-tuning code search models are in the Fine-tuning directory. These scripts also use cosine similarity to measure distances. Below is an example of fine-tuning CodeGemma on the CodeSearchNet dataset. More examples can be found in the Fine-tuning directory. Note that before running the fine-tuning test, the model needs to be fine-tuned. Detailed instructions can be found in the Fine-tuning Method directory.
cd decoder-only-code-search/Fine-tuning
python CSN_Test_Finetuning_Decoder_Model.py \
--model_name_or_path google/codegemma-7b-it \
--peft_model_name_or_path finetuning_model \
--result_path CSN-finetuning-codegemma \
--test_data_path_dir ../Dataset/CodeSearchNet \
--embedding_batch_size 500
Improvement Analysis
All scripts for improvement analysis are provided in the Improvement Analysis directory.
๐ How to Cite
If you use this repository or our work in your research, please cite our paper:
@article{chen2024decoder,
title={Are Decoder-Only Large Language Models the Silver Bullet for Code Search?},
author={Chen, Yuxuan and Liu, Mingwei and Ou, Guangsheng and Li, Anji and Dai, Dekun and Wang, Yanlin and Zheng, Zibin},
journal={arXiv preprint arXiv:2410.22240},
year={2024}
}