README.md

July 13, 2024 ยท View on GitHub

Code and data for the paper "Learning from Both Structural and Textual Knowledge for Inductive Knowledge Graph Completion"

Prerequisites

  • Python 3.8
  • pytorch==1.10.0
  • TensorFlow==1.15.0 (for LSTK-NeuralLP and LSTK-DRUM)

Datasets

We use three datasets in our experiments.

DatasetDownload Link (original)
HacREDhttps://github.com/qiaojiim/HacRED
DocREDhttps://github.com/thunlp/DocRED
BioRelhttps://bit.ly/biorel_dataset

Models

We use four models in our experiments.

ModelCode Download Link (original)
NeuralLPhttps://github.com/fanyangxyz/Neural-LP
DRUMhttps://github.com/alisadeghian/DRUM
RNNLogichttps://github.com/DeepGraphLearning/RNNLogic
TELMThis work

Use examples

The first stage

LSTK is a two-stage framework. In the first stage, it aims at generating a set of soft triples for reasoning.

You can generate a set of soft triples by:

Path for code: src_nli

  1. training a textual entailment model:
python main_nli.py [dataset]
  1. Searching triples with corresponding texts:
python generate_triples_by_index.py [dataset]
  1. If the dataset is in Chinese, please use:
python generate_triples_by_index_zh.py [dataset]
  1. Appling the trained textual entailment model to generate soft triples:
python apply_model_nli.py [dataset]

After the above process, you can get three files (train/valid/test_triple_scores.txt) storing soft triples.

You can also directly download our processed soft triples:

DatasetDownload Link (processed)
HacREDGoogle Drive
DocREDGoogle Drive
BioRelGoogle Drive

The second stage

In the second stage, you can use the generated soft triples to train SOTA neural approximate rule-based models.

LSTK-TELM

Path for code: src/LSTK-TELM

The script for both training and evaluation on the HacRED dataset is:

sh run_hacred.sh

The script for both training and evaluation on the HacRED dataset is:

sh run_docred.sh

The script for both training and evaluation on the BioRel dataset is:

sh run_biorel.sh

The script for rule extraction is:

sh run_rules.sh [dataset]

We also provide the runing scripts of baseline methods:

LSTK-NeuralLP and LSTK-DRUM

Path for code: src/LSTK-NeuralLP or src/LSTK-DRUM

The training script is:

python -u src/main.py --datadir=[dataset]/ --exp_name=[dataset] --num_step 4 --gpu 0 --exps_dir exps --max_epoch 10 --seed 1234

The evaluation script is:

sh eval/collect_all_facts.sh [dataset]

python eval/get_truths.py [dataset]

python eval/evaluate.py --preds=exps/[dataset]/test_predictions.txt --truths=[dataset]/truths.pckl

LSTK-RNNLogic

Path for code: src/LSTK-RNNLogic

The script for environment installation is:

cd LSTK-RNNLogic/codes/pyrnnlogiclib/
python setup.py install

The script for data preparation is:

python process_dicts.py
python get_scores.py
python process_soft.py

The script for both training and evaluation is:

python run.py --data_path [dataset] --num_generated_rules 2000 --num_rules_for_test 500 --num_important_rules 0 --prior_weight 0.01 --cuda --predictor_learning_rate 0.1 --generator_epochs 5000 --max_rule_length 2

Citation

Please consider citing the following paper if you find our codes helpful. Thank you!

@inproceedings{QiDW23,
  author       = {Kunxun Qi and
                  Jianfeng Du and
                  Hai Wan},
  title        = {Learning from Both Structural and Textual Knowledge for Inductive
                  Knowledge Graph Completion},
  booktitle    = {NeurIPS},
  year         = {2023},
  url          = {http://papers.nips.cc/paper\_files/paper/2023/hash/544242770e8333875325d013328b2079-Abstract-Conference.html},
}