README.md

July 13, 2024 · View on GitHub

Code and data for the paper "Learning from Both Structural and Textual Knowledge for Inductive Knowledge Graph Completion"

Prerequisites

Python 3.8
pytorch==1.10.0
TensorFlow==1.15.0 (for LSTK-NeuralLP and LSTK-DRUM)

Datasets

We use three datasets in our experiments.

Dataset	Download Link (original)
HacRED	https://github.com/qiaojiim/HacRED
DocRED	https://github.com/thunlp/DocRED
BioRel	https://bit.ly/biorel_dataset

Models

We use four models in our experiments.

Model	Code Download Link (original)
NeuralLP	https://github.com/fanyangxyz/Neural-LP
DRUM	https://github.com/alisadeghian/DRUM
RNNLogic	https://github.com/DeepGraphLearning/RNNLogic
TELM	This work

Use examples

The first stage

LSTK is a two-stage framework. In the first stage, it aims at generating a set of soft triples for reasoning.

You can generate a set of soft triples by:

Path for code: src_nli

training a textual entailment model:

python main_nli.py [dataset]

Searching triples with corresponding texts:

python generate_triples_by_index.py [dataset]

If the dataset is in Chinese, please use:

python generate_triples_by_index_zh.py [dataset]

Appling the trained textual entailment model to generate soft triples:

python apply_model_nli.py [dataset]

After the above process, you can get three files (train/valid/test_triple_scores.txt) storing soft triples.

You can also directly download our processed soft triples:

Dataset	Download Link (processed)
HacRED	Google Drive
DocRED	Google Drive
BioRel	Google Drive

The second stage

In the second stage, you can use the generated soft triples to train SOTA neural approximate rule-based models.

LSTK-TELM

Path for code: src/LSTK-TELM

The script for both training and evaluation on the HacRED dataset is:

sh run_hacred.sh

The script for both training and evaluation on the HacRED dataset is:

sh run_docred.sh

The script for both training and evaluation on the BioRel dataset is:

sh run_biorel.sh

The script for rule extraction is:

sh run_rules.sh [dataset]

We also provide the runing scripts of baseline methods:

LSTK-NeuralLP and LSTK-DRUM

Path for code: src/LSTK-NeuralLP or src/LSTK-DRUM

The training script is:

python -u src/main.py --datadir=[dataset]/ --exp_name=[dataset] --num_step 4 --gpu 0 --exps_dir exps --max_epoch 10 --seed 1234

The evaluation script is:

sh eval/collect_all_facts.sh [dataset]

python eval/get_truths.py [dataset]

python eval/evaluate.py --preds=exps/[dataset]/test_predictions.txt --truths=[dataset]/truths.pckl

LSTK-RNNLogic

Path for code: src/LSTK-RNNLogic

The script for environment installation is:

cd LSTK-RNNLogic/codes/pyrnnlogiclib/
python setup.py install

The script for data preparation is:

python process_dicts.py
python get_scores.py
python process_soft.py

The script for both training and evaluation is:

python run.py --data_path [dataset] --num_generated_rules 2000 --num_rules_for_test 500 --num_important_rules 0 --prior_weight 0.01 --cuda --predictor_learning_rate 0.1 --generator_epochs 5000 --max_rule_length 2

Citation

Please consider citing the following paper if you find our codes helpful. Thank you!

@inproceedings{QiDW23,
  author       = {Kunxun Qi and
                  Jianfeng Du and
                  Hai Wan},
  title        = {Learning from Both Structural and Textual Knowledge for Inductive
                  Knowledge Graph Completion},
  booktitle    = {NeurIPS},
  year         = {2023},
  url          = {http://papers.nips.cc/paper\_files/paper/2023/hash/544242770e8333875325d013328b2079-Abstract-Conference.html},
}