MALADE: Multiple Agents powered by LLMs for ADE Extraction (MLHC'24)
August 13, 2024 ยท View on GitHub
Code for the paper:
Jihye Choi, Nils Palumbo, Prasad Chalasani, Matthew Engelhard, Somesh Jha, Anivarya Kumar, & David Page, (2024). MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance. Machine Learning for Healthcare 2024.
๐ What is MALADE?
MALADE (pronounced like the French word malade meaning 'sick' or 'ill') is a framework for the orchestration of Large Language Model (LLM)-powered agents with Retrieval Augmented Generation (RAG) for Pharmacovigilance, in particular for Adverse Drug Event (ADE) extraction.
The core function of MALADE is to answer category-outcome ADE questions of the form:
Does drug category X cause adverse event Y?,
or
Is drug category X associated with adverse event Y?
For example, "Do ACE inhibitors cause angioedema?".
The primary data source used is FDA Drug Label data as obtained via the OpenFDA API. Optionally, one can use the MIMIC-IV EHR data to identify the most representative drugs within a category (this is important since FDA label data is specific to individual drugs, not categories).
For a given drug-category and outcome, MALADE produces a variety of qualitative and quantitative outputs, for example:
Label: ACE inhibitors increase angioedema risk,
Confidence: 0.9 (i.e. confidence in the label),
Frequency: rare,
Evidence: strong,
Justification: The evidence from FDAHandler and drug labels for LISINOPRIL, CAPTOPRIL, and ENALAPRIL MALEATE consistently reports an increased risk of angioedema with the use of these ACE inhibitors. The incidence of angioedema is reported as rare, with occurrences such as one in 1000 patients for CAPTOPRIL. The evidence is considered strong due to the authoritative nature of the sources.
MALADE is evaluated against the OMOP Common Data Model (CDM) ground truth table, which shows established category-outcome associations for a specific set of 10 drug categories and 10 outcomes.
โ๏ธ Set up environment and install dependencies
We leverage the awesome Langroid open-source Python library for multi-agent LLM applications.
IMPORTANT: Please ensure you are using Python 3.11+. If you are using poetry,
you may be able to just run poetry env use 3.11 if you have Python 3.11 available in your system.
# clone this repository
git clone [this-repository]
cd malade
Environment setup with conda:
# create empty environment:
conda env create -n malade python=3.11 -c conda-forge
conda activate malade
Setup with venv:
# create a virtual env under project root, .venv directory
python3 -m venv .venv
# activate the virtual env
. .venv/bin/activate
Install dependencies with Poetry:
# Optionally: poetry lock
poetry install
๐ Set up environment variables (API keys, etc)
To use the example scripts with an OpenAI LLM, you need an OpenAI API Key.
In the root of the repo, copy the .env-template file to a new file .env:
cp .env-template .env
First, an OpenAI API key is required;
save it in the .env file as OPENAI_API_KEY=... (no quotes).
A Qdrant instance and API key is required (see the Langroid instructions); set up QDRANT_API_URL and QDRANT_API_KEY in .env as described there.
An OpenFDA API key is also required (get one here), set it as
OPENFDA_API_KEY=... in the .env file.
(Optional) Setup for drug representative generation
This step is required only to run DrugFinder and the process to find
representative drugs in a category based on MIMIC-IV data.
Make sure that MIMIC-IV is installed and running on your machine as PostgreSQL database.
The MIMIC-IV can be obtained here.
Access requires completing the following training described here.
Instructions and code for loading MIMIC-IV into PostgreSQL are here.
Finally, ensure that your user account has access to the mimiciv database.
๐๏ธ Code Structure
We provide brief descriptions for each file as follows:
| Directory/File | Description |
|---|---|
malade/ | core directory for codes |
malade/omop.py | define the OMOP Ground Truth table, and the associated drug categories and conditions |
malade/drug_categories.py | find representative drugs |
malade/omop_interactions.py | contain CategoryAgent and DrugAgent identify drug-outcome associations and label drug category-outcome associations |
malade/critic_agent.py | contain Critic and malade/omop_evaluation.py contain utilities for evaluation (for use by scripts/generate_results.py) |
malade/doc/ | contain RAG-related code |
malade/doc/fda_handler.py | contain FDAHandler |
malade/utils/ | for general utilities |
malade/utils/openfda.py | for the OpenFDA query code |
malade/tools/ | contain utilities related to tool-use |
Run Experiments
TODO: add brief demo for each step below.
- STEP1: Finding Representative Drugs (Optional)
If MIMIC-IV was set up, run DrugFinder and the drug category representative identification process with
python3 malade/drug_categories.py --recompute
- STEP2: Identifying Drug-Outcome Associations
Run DrugAgent and the drug-outcome association identification process with
python3 malade/omop_interactions.py --recompute_interactions
- STEP3: Labeling Drug Category-Outcome Associations
Run CategoryAgent and the category-outcome labeling process with
python3 malade/omop_interactions.py --recompute_labels
Run python3 scripts/generate_summary_files.py to process the outputs from MALADE into a readable format.
scripts/generate_results.py contains the code to generate the final experimental results.
๐ Outputs of MALADE
The outputs from MALADE are in the outputs/ directory;
| File | Description |
|---|---|
outputs/representative_drugs.json | outputs from DrugFinder |
outputs/interactions.json | outputs from DrugAgent and CategoryAgent |
outputs/representative_drugs.md | outputs from DrugAgent in a readable format |
outputs/omop_results.md | outputs from CategoryAgent in a readable format |
The logs generated by the agents are in the logs/ directory; the path is of the form
logs/DrugFinder-{category name}.log for DrugFinder,
logs/DrugOutcomeInfoAgent-{outcome}-{drug name}.log for DrugAgent, and
logs/CategoryOutcomeRiskAgent-{outcome}-{category name}.log for CategoryAgent.
๐ Reference
If you find this code/work useful in your own research, please consider citing the following:
@misc{choi2024malade,
title={MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance},
author={Jihye Choi and Nils Palumbo and Prasad Chalasani and Matthew M. Engelhard and Somesh Jha and Anivarya Kumar and David Page},
year={2024},
eprint={2408.01869},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2408.01869},
}