README.md
February 23, 2021 ยท View on GitHub
Creating Instance Data Node Metadata
Wiki Page: Dependencies
Jupyter Notebook: Data_Preparation.ipynb
Purpose: The knowledge graph can be built with or without the inclusion of node and relation metadata (i.e.
labels, descriptions or definitions, and synonyms). If you'd like to create and use node metadata, please run the
Data_Preparation.ipynb Jupyter Notebook and run the code chunks listed under the INSTANCE AND/OR SUBCLASS (NON-ONTOLOGY CLASS) METADATA section. These code chunks should be run before the knowledge graph is constructed. For more details on what these data sources are and how they are created, please see the node_data README.md.
Example structure of the metadata dictionary is shown below:
{
'nodes': {
'http://www.ncbi.nlm.nih.gov/gene/1': {
'Label': 'A1BG',
'Description': "A1BG has locus group protein-coding' and is located on chromosome 19 (19q13.43).",
'Synonym': 'HYST2477alpha-1B-glycoprotein|HEL-S-163pA|ABG|A1B|GAB'} ... },
'relations': {
'http://purl.obolibrary.org/obo/RO_0002533': {
'Label': 'sequence atomic unit',
'Description': 'Any individual unit of a collection of like units arranged in a linear order',
'Synonym': 'None'} ... }
}
๐ CONSTRAINTS ๐
The algorithm makes the following assumptions:
- If metadata is provided, only those edges with nodes that have metadata will be created; valid edges without metadata will be discarded.
- Metadata for all non-ontology nodes and all relations for edges added to the core set of ontologies will be saved as a dictionary in the
./resources/node_data/node_metadata_dict.pklrepository. - For each identifier we try to obtain the following metadata:
Label,Description, andSynonym. An example of these data types is shown below for ageneidentifier5620:
| Metadata Type | Definition | Metadata |
|---|---|---|
| ID | Node identifiers for instance data sources | 5620 |
| Label | The primary label or name for the node | LANCL2 |
| Description | A definition or other useful details about the node | Lanc Like 2 is a protein-coding gene that is located on chromosome 7 (map_location: 7p11.2) |
| Synonym | Alternative terms used for a node | GPR69B, TASP, lanC-like protein 2, G protein-coupled receptor 69B, LanC (bacterial lantibiotic synthetase component C)-like 2, LanC lantibiotic synthetase component C-like 2, testis-specific adriamycin sensitivity protein |
Metadata + PheKnowLator
The metadata will be used to create the following edges in the knowledge graph:
- Label โ node
rdfs:label - Description โ node
obo:IAO_0000115description - Synonyms โ node
oboInOwl:hasExactSynonymsynonym