LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings

November 21, 2024 ยท View on GitHub

The official implementation of work "LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings (NeurIPS 2024)".

1. Environment

# create a environment
conda create -n teaglm python=3.9
conda activate teaglm

# pytorch 1.13.1 cuda 11.7
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

# install related libraries
pip install -r requirements.txt

# pyg related libraries
wget https://data.pyg.org/whl/torch-1.13.0%2Bcu117/torch_scatter-2.1.0%2Bpt113cu117-cp39-cp39-linux_x86_64.whl
wget https://data.pyg.org/whl/torch-1.13.0%2Bcu117/torch_sparse-0.6.16%2Bpt113cu117-cp39-cp39-linux_x86_64.whl
wget https://data.pyg.org/whl/torch-1.13.0%2Bcu117/torch_cluster-1.6.0%2Bpt113cu117-cp39-cp39-linux_x86_64.whl
wget https://data.pyg.org/whl/torch-1.13.0%2Bcu117/torch_spline_conv-1.2.1%2Bpt113cu117-cp39-cp39-linux_x86_64.whl
wget https://data.pyg.org/whl/torch-1.13.0%2Bcu117/pyg_lib-0.2.0%2Bpt113cu117-cp39-cp39-linux_x86_64.whl

pip install torch_scatter-2.1.0+pt113cu117-cp39-cp39-linux_x86_64.whl
pip install torch_sparse-0.6.16+pt113cu117-cp39-cp39-linux_x86_64.whl
pip install torch_cluster-1.6.0+pt113cu117-cp39-cp39-linux_x86_64.whl
pip install torch_spline_conv-1.2.1+pt113cu117-cp39-cp39-linux_x86_64.whl
pip install pyg_lib-0.2.0+pt113cu117-cp39-cp39-linux_x86_64.whl

2. Data

3. Train

3.1 Self-supervised of GNN

Run self-supervised learning on arxiv, and the model will be saved to GNNPreTrain/saved_model. We have also uploaded the trained GNN checkpoint to Hugging Face, where you can find it underW-rudder/TEA-GLM/gnn.

cd GNNPreTrain

# example on citation
python train_valid_paper.py --gpu 0 --num_runs 1 --drop_edge_rate_1 0.3 --drop_edge_rate_2 0.4 --drop_feature_rate_1 0.0 --drop_feature_rate_2 0.1

# move for stage 2 training
mkdir -p ../saved_model/gnn | cp ./saved_model/model_file.pth ../saved_model/gnn/

3.2 Alignment tuning

Fill in the blanks in train_arxiv.sh, then run the script below. We have also uploaded the trained Projector checkpoint to Hugging Face, where you can find it under W-rudder/TEA-GLM/first_model.

# If you directly download the GNN checkpoint, you can skip this step.
cd ..

# train
bash ./script/train_arxiv.sh

4. Evaluation

Fill in the blanks in test_citation.sh, then run the script. The generation results will be saved to ./results.

# generate on citation datasets
bash ./script/test_citation.sh

Acknowledgements

Many previous works, such as Vicuna, GraphGPT, and LLaGA, have greatly inspired our work. The code is developed based on InstructGLM. We are grateful for their wonderful contributions.

Citation

@inproceedings{
wang2024llms,
title={{LLM}s as Zero-shot Graph Learners: Alignment of {GNN} Representations with {LLM} Token Embeddings},
author={Duo Wang and Yuan Zuo and Fengzhi Li and Junjie Wu},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=32g9BWTndc}
}