GERNERMED++

October 20, 2023 ยท View on GitHub

About

GERNERMED++ is the successor to GERNERMED, an open neural named-entity-recognition (NER) model for German texts in medical natural language processing (NLP).

Key features:

  • Supported labels: Drug, Strength, Frequency, Duration, Form, Dosage
  • Improved word alignment, including improved tokenization for Pharao alignment
  • Introduction of transfer-learning for NER parsing
  • Open, public access to models

Published paper: Our published paper is available at https://doi.org/10.1016/j.jbi.2023.104513

Online Demo: A demo page is available: Demo; and HuggingFace (see Models section)

NER demonstration:
NER example demo

Models

The pretrained models can be retrieved from the following URLs:

The models are also available on the HuggingFace platform:

Scores

Note: Metric scores are evaluated by token-wise classification.

Out of Distribution Dataset (provided in OoD-dataset_GoldStandard.jsonl):

ModelMetricDrugStrFreqDurFormDosTotal
GermanBERTPr0.8300.9550.4561.0000.9090.0770.817
Re1.0000.8320.6670.8000.5260.2500.797
F10.9070.8890.5420.8890.6670.1180.794
GottBERTPr0.8720.8680.9331.0001.0000.1250.882
Re0.9320.9800.7180.4000.6840.2500.868
F10.9010.9210.8120.5710.8130.1670.865
SpaCy SlimPr0.6900.9510.4860.0001.0000.1110.778
Re0.6590.7720.4620.0000.3160.2500.623
F10.6740.8520.4740.0000.4800.1540.679

Test Set:

ModelMetricDrugStrFreqDurFormDosTotal
GermanBERTPr0.9680.9440.8590.7910.9560.9630.932
Re0.9330.9730.9240.8250.9620.9710.947
F10.9500.9590.8900.8070.9590.9670.939
GottBERTPr0.9660.9690.8790.8130.9490.9660.941
Re0.9260.9650.9510.8250.9720.9720.952
F10.9460.9670.9140.8190.9610.9690.946
SpaCy SlimPr0.9290.9650.8550.8250.9650.9580.926
Re0.8850.9670.9660.7580.9500.9710.941
F10.9060.9660.9080.7900.9570.9640.933

Setup and Usage

The models are based on SpaCy. The sample code is written in Python.

model_link="https://myweb.rz.uni-augsburg.de/~freijoha/GERNERMEDpp/GERNERMEDpp_GottBERT.zip"

# [Optional] Create env
python3 -m venv env
source ./env/bin/activate

# Install dependencies
python3 -m pip install -r requirements.txt

# Download & extract model
wget -O model.zip "$model_link"
unzip model.zip -d "model"

# Run script
python3 GERNERMEDpp.py

Citation

Cite the published paper from https://doi.org/10.1016/j.jbi.2023.104513

@article{FREI2023104513,
 title = {GERNERMED++: Semantic annotation in German medical NLP through transfer-learning, translation and word alignment},
 journal = {Journal of Biomedical Informatics},
 volume = {147},
 pages = {104513},
 year = {2023},
 issn = {1532-0464},
 doi = {https://doi.org/10.1016/j.jbi.2023.104513},
 url = {https://www.sciencedirect.com/science/article/pii/S1532046423002344},
 author = {Johann Frei and Ludwig Frei-Stuber and Frank Kramer},
 keywords = {Natural language processing, Medical NLP, Medical named entity recognition, Transfer learning, German NLP, Artificial intelligence},
}

Our ArXiv pre-print paper from https://arxiv.org/abs/2206.14504

@misc{https://doi.org/10.48550/arxiv.2206.14504,
 doi = {10.48550/ARXIV.2206.14504},  
 url = {https://arxiv.org/abs/2206.14504},  
 author = {Frei, Johann and Frei-Stuber, Ludwig and Kramer, Frank},  
 keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},  
 title = {GERNERMED++: Transfer Learning in German Medical NLP},  
 publisher = {arXiv},  
 year = {2022},  
 copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International}
}