๐ฆ ๐ โ DeepDR-LLM: Integrated Image-based Deep Learning and Language Models for Primary Diabetes Care
June 7, 2024 ยท View on GitHub
DeepDR-LLM offers a holistic approach to primary diabetes care by combining image-based deep learning with advanced language models. This repository includes code for utilizing the Vision Transformer (ViT) for image analysis, alongside fine-tuned LLaMA models to produce detailed management suggestions for patients with diabetes. Here, we employ the LLaMA-7B model as the foundational language model.
Contents
- Requirements
- Environment Setup
- Linux System
- Dataset Preparation
- Model Training and Evaluation
Requirements
This software is compatible with a Linux operating system, specifically Ubuntu 20.04 (compatibility with other versions has not been tested), and requires Python 3.9. It necessitates 64GB of RAM and 1TB of disk storage. Performance benchmarks are based on an Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz and an NVIDIA A100 GPU.
The following Python packages are required, which are also listed in requirements.txt:
numpy>=1.25.0
datasets>=2.13.1
deepspeed>=0.10.0
huggingface-hub>=0.15.1
sentencepiece>=0.1.97
tokenizers>=0.13.1
torch>=2.0.1
transformers>=4.28.1
Linux System
Step 1: Download the Project
- Open the terminal, or press Ctrl+Alt+F1 to access the command line interface.
- Clone this repository to your home directory.
git clone https://github.com/DeepPros/DeepDR-LLM.git
- Navigate to the cloned repository's directory.
cd DeepDR-LLM
Step 2: Prepare the Environment and Execute the Code
- Install the required Python packages.
python3 -m pip install --user -r requirements.txt
Supported Image File Formats JPEG, PNG, and TIFF file formats are supported and have been tested. Other formats compatible with OpenCV should also work. The input image must be a 3-channel color fundus image with the shorter side of the resolution being greater than 448 pixels.
Modules
Module 1: Language Model (LLaMA) Integration
Module 1 leverages the LLaMA model to generate comprehensive diagnostic and treatment recommendations, designed for easy integration with outputs from Module 2.
-
Dataset Preparation
- For training:
Ensure your dataset is formatted as shown in
DeepDR-LLM/Module1/Minimum/train_set/train_set.json. Sample format: [{"instruction":"...","input":"...","output":"..."}]. - For validation:
Format your dataset according to the structure shown in
DeepDR-LLM/Module1/Minimum/valid_set.json. Sample format: [{"instruction":"...","input":"...","output":"..."}].
- For training:
Ensure your dataset is formatted as shown in
-
Training
- Note: Make sure
llama-7bmodel weights are downloaded fromhttps://huggingface.co/huggyllama/llama-7band saved inDeepDR-LLM/Module1/llama-7b-weights. - Run
DeepDR-LLM/Module1/scripts/run_train.shto start training. - Please review the settings in
run_train.sh, particularly thepathsconfiguration.
- Note: Make sure
-
Inference
See
DeepDR-LLM/Module1/scripts/inference.pyfor guidance. Be sure to configure necessary arguments properly. The input format should match that inDeepDR-LLM/Module1/Minimum/train_set/train_set.json.
Module 2: Image Prediction & Analysis
Module 2 is focused on analyzing and predicting outcomes based on fundus images.
-
Dataset Preparation
Includes tasks for classification and segmentation. Datasets for both are compiled using .txt files, where each line corresponds to an image. For classification, the format is "imagepath classindex". For segmentation, it is "imagepath maskpath", with segmentation labels formatted as [C,H,W], where C includes the background category.
-
Training
- For classification models, use
DeepDR-LLM/Module2/train_cla.py. - For segmentation models, use
DeepDR-LLM/Module2/train_seg.py. - **Note
- For classification models, use
**: Obtain pretrained vit-base model weights from ImageNet before training (https://download.pytorch.org/models/vit_b_16-c867db91.pth).
-
Inference
Apply
DeepDR-LLM/Module2/test.pyfor evaluation, ensuring trained models are accurately loaded. Outputs will be stored as specified.
Integrated Workflow of DeepDR-LLM from Module 1 and Module 2
- Starting point: A fundus image is obtained from a standard or portable imaging device, along with aligned clinical metadata, following the example structure in
DeepDR-LLM/Module1/Minimum/train_set/train_set.json.
Step 1: Submit the fundus image to Module 1 (using the 'test.py' script)
- Predict the quality of the fundus image, DR grade, DME grade, and the presence of retinal lesions.
Step 2: Convert the clinical metadata into JSON format
- Example: {Sex: Female; Age: 47; BMI: 22.13 kg/m^2;....}
Step 3: Combine the clinical metadata with the results from Module 1
- Example: {Sex: Female; Age: 47; BMI: 22.13 kg/m^2;....; Fundus Image Quality: Gradable; DR Grade: 0; DME Grade: 0; Presence of Retinal Lesions: No microaneurysms, no cotton-wool spots, no hard exudates, no hemorrhages.}