Breast-CRAG

December 19, 2024 ยท View on GitHub

Github repository for article "Breast-CRAG: A Breast Cancer Large Language Model Leveraging Retrieval-Augmented Generation"

Directory description

Code: We have open-sourced all the code used in this experiment in the code folder. Please note that some code requires downloading corresponding models or using specific API keys, so please carefully verify before use.

Dataset: Datasets other than Huatuo-BC have been open-sourced in the dataset folder. Huatuo-BC is derived from Huatuo-26M, which only provides original webpage links but not the raw data. To avoid copyright infringement, this study also only provides data links. Users can obtain the required data through certain methods.

Knowledge Base (KB): Considering copyright issues, we have created a JSON file in the kb folder to display the names of the data and literature used in the experiment. Based on the code in the code/retriever/kb_preprocess folder, you can process PDF files in a full pipeline if you can obtain the PDF files.

Model Dictionary: In the model_dict folder, we have shared links to the trained models, which users can obtain from Google Drive. The filter subfolder stores the QLoRA adapter files, the generator subfolder stores the models exported after combining with LoRA adapters (hence very large), and the retriever subfolder stores the fine-tuning results of the 137M model.

Evluation Result

Our model has demonstrated performance on par with or exceeding gpt-4o-2024-08-06 across four breast cancer question-answer datasets and two breast cancer exam datasets. Detailed results are presented below:

Table 1. Evaluation Result on Breast Cancer Dialogue Dataset

EvalsetModelRouge-1Rouge-2Rouge-LBleuBert-score
Huatuo-BChuatuogpt-20.0560.0060.0410.6820.508
llama3-8b-chinese-chat0.1540.0190.1001.8640.600
qwen2.5-7b-instruct0.1440.0210.0711.5680.612
gpt-4-o0.1960.0330.1233.5090.632
BRAG0.2310.0590.1936.3550.653
MedDialogue-BChuatuogpt-20.0760.0150.0471.2320.512
llama3-8b-chinese-chat0.2010.0350.1092.5820.627
qwen2.5-7b-instruct0.1130.0240.0551.6570.541
gpt-4-o0.2610.0650.1495.3950.667
cMedQA-BChuatuogpt-20.0550.0060.0400.6200.505
llama3-8b-chinese-chat0.1460.0170.0961.6000.592
qwen2.5-7b-instruct0.1520.0210.0911.9130.608
gpt-4-o0.1940.0320.1213.3850.630
BRAG0.2140.0440.1794.3490.647
webMedQA-BChuatuogpt-20.0590.0070.0430.8420.512
llama3-8b-chinese-chat0.1460.0170.0982.0260.591
qwen2.5-7b-instruct0.1580.0220.0942.4120.606
gpt-4-o0.1970.0330.1214.0370.625
BRAG0.1740.0340.1442.9090.626

Table 2. Evaluation Result on Breast Cancer Exam Dataset

ModelsExam-BC (simple set)Exam-BC (hard set)USMLE-BC
SingleMulti.Ave.SingleMulti.Ave.Single
HuatuoGPT2-7B0.360.360.360.260.240.250.48
Llama3-8B-Chinese-Chat0.220.370.250.240.220.230.49
Qwen2.5-7B-Instruct0.350.440.370.30.490.330.41
GPT-4-o0.730.540.690.680.450.640.81
Breast-CRAG0.650.540.630.560.370.520.81

Paper

Our pre-print paper can be accessed at http://ssrn.com/abstract=5052341