🦙 👀 ⛑ DeepDR-LLM: Integrated Image-based Deep Learning and Language Models for Primary Diabetes Care

June 7, 2024 · View on GitHub

DeepDR-LLM offers a holistic approach to primary diabetes care by combining image-based deep learning with advanced language models. This repository includes code for utilizing the Vision Transformer (ViT) for image analysis, alongside fine-tuned LLaMA models to produce detailed management suggestions for patients with diabetes. Here, we employ the LLaMA-7B model as the foundational language model.

Requirements
Environment Setup
- Linux System
Dataset Preparation
Model Training and Evaluation

Requirements

This software is compatible with a Linux operating system, specifically Ubuntu 20.04 (compatibility with other versions has not been tested), and requires Python 3.9. It necessitates 64GB of RAM and 1TB of disk storage. Performance benchmarks are based on an Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz and an NVIDIA A100 GPU.

The following Python packages are required, which are also listed in requirements.txt:

numpy>=1.25.0
datasets>=2.13.1
deepspeed>=0.10.0
huggingface-hub>=0.15.1
sentencepiece>=0.1.97
tokenizers>=0.13.1
torch>=2.0.1
transformers>=4.28.1

Linux System

Step 1: Download the Project

Open the terminal, or press Ctrl+Alt+F1 to access the command line interface.
Clone this repository to your home directory.

git clone https://github.com/DeepPros/DeepDR-LLM.git

Navigate to the cloned repository's directory.

cd DeepDR-LLM

Step 2: Prepare the Environment and Execute the Code

Install the required Python packages.

python3 -m pip install --user -r requirements.txt

Supported Image File Formats JPEG, PNG, and TIFF file formats are supported and have been tested. Other formats compatible with OpenCV should also work. The input image must be a 3-channel color fundus image with the shorter side of the resolution being greater than 448 pixels.

Modules

Module 1: Language Model (LLaMA) Integration

Module 1 leverages the LLaMA model to generate comprehensive diagnostic and treatment recommendations, designed for easy integration with outputs from Module 2.

Dataset Preparation
- For training: Ensure your dataset is formatted as shown in DeepDR-LLM/Module1/Minimum/train_set/train_set.json. Sample format: [{"instruction":"...","input":"...","output":"..."}].
- For validation: Format your dataset according to the structure shown in DeepDR-LLM/Module1/Minimum/valid_set.json. Sample format: [{"instruction":"...","input":"...","output":"..."}].
Training
- Note: Make sure llama-7b model weights are downloaded from https://huggingface.co/huggyllama/llama-7b and saved in DeepDR-LLM/Module1/llama-7b-weights.
- Run DeepDR-LLM/Module1/scripts/run_train.sh to start training.
- Please review the settings in run_train.sh, particularly the paths configuration.
Inference

See DeepDR-LLM/Module1/scripts/inference.py for guidance. Be sure to configure necessary arguments properly. The input format should match that in DeepDR-LLM/Module1/Minimum/train_set/train_set.json.

Module 2: Image Prediction & Analysis

Module 2 is focused on analyzing and predicting outcomes based on fundus images.

Dataset Preparation

Includes tasks for classification and segmentation. Datasets for both are compiled using .txt files, where each line corresponds to an image. For classification, the format is "imagepath classindex". For segmentation, it is "imagepath maskpath", with segmentation labels formatted as [C,H,W], where C includes the background category.
Training
- For classification models, use DeepDR-LLM/Module2/train_cla.py.
- For segmentation models, use DeepDR-LLM/Module2/train_seg.py.
- **Note

**: Obtain pretrained vit-base model weights from ImageNet before training (https://download.pytorch.org/models/vit_b_16-c867db91.pth).

Inference

Apply DeepDR-LLM/Module2/test.py for evaluation, ensuring trained models are accurately loaded. Outputs will be stored as specified.

Integrated Workflow of DeepDR-LLM from Module 1 and Module 2

Starting point: A fundus image is obtained from a standard or portable imaging device, along with aligned clinical metadata, following the example structure in DeepDR-LLM/Module1/Minimum/train_set/train_set.json.

Step 1: Submit the fundus image to Module 1 (using the 'test.py' script)

Predict the quality of the fundus image, DR grade, DME grade, and the presence of retinal lesions.

Step 2: Convert the clinical metadata into JSON format

Example: {Sex: Female; Age: 47; BMI: 22.13 kg/m^2;....}

Step 3: Combine the clinical metadata with the results from Module 1

Example: {Sex: Female; Age: 47; BMI: 22.13 kg/m^2;....; Fundus Image Quality: Gradable; DR Grade: 0; DME Grade: 0; Presence of Retinal Lesions: No microaneurysms, no cotton-wool spots, no hard exudates, no hemorrhages.}