โณ Timexy
March 13, 2022 ยท View on GitHub
A spaCy custom component that extracts and normalizes dates and other temporal expressions.
Features
- :boom: Extract dates and durations for various languages. See here a list of currently supported languages
- :boom: Normalize dates to timestamps or normalize dates and durations to the TimeML TIMEX3 standard
Supported Languages
- ๐ฉ๐ช German
- :uk: English
- ๐ซ๐ท French
Installation
pip install timexy
Usage
After installation, simply integrate the timexy component in any of your spaCy pipelines to extract and normalize dates and other temporal expressions:
import spacy
from timexy import Timexy
nlp = spacy.load("en_core_web_sm")
# Optionally add config if varying from default values
config = {
"kb_id_type": "timex3", # possible values: 'timex3'(default), 'timestamp'
"label": "timexy", # default: 'timexy'
"overwrite": False # default: False
}
nlp.add_pipe("timexy", config=config, before="ner")
doc = nlp("Today is the 10.10.2010. I was in Paris for six years.")
for e in doc.ents:
print(f"{e.text}\t{e.label_}\t{e.kb_id_}")
>>> 10.10.2010 timexy TIMEX3 type="DATE" value="2010-10-10T00:00:00"
>>> six years timexy TIMEX3 type="DURATION" value="P6Y"
Normalization of temporal expressions
Timexy allows the normalization of all temporal expressions to
- TimeML Timex3 standard
- timestamp
The normalization is configured with the kb_id_type config parameter:
config = {
"kb_id_type": "timex3", # possible values: 'timex3'(default), 'timestamp'
"label": "timexy", # default: 'timexy'
"overwrite": False # default: False
}
nlp.add_pipe("timexy", config=config, before="ner")
NOTE: Normalizing temporal expressions that are not concrete dates to timestamp is not viable. Therefore, all non-date temporal expressions are always normalized to timex3 regardless of the
kb_id_typeconfig.
Contributing
Please refer to the contributing guidelines here.