LLMNodeBed

May 20, 2025 · View on GitHub

This repository is the official implementation for our ICML 2025 paper: When Do LLMs Help With Node Classification? A Comprehensive Analysis. It provides a standardized framework for evaluating LLM-based node classification methods, including 14 datasets, 8 LLM-based algorithms, and 3 learning paradigms.

Please consider citing or giving a 🌟 if our repository is helpful to your work!

@inproceedings{wu2025llmnodebed,
      title={When Do LLMs Help With Node Classification? A Comprehensive Analysis}, 
      author={Xixi Wu and Yifei Shen and Fangzhou Ge and Caihua Shan and Yizhu Jiao and Xiangguo Sun and Hong Cheng},
      year={2025},
      booktitle={International Conference on Machine Learning},
      organization={PMLR},
      url={https://arxiv.org/abs/2502.00829}, 
}

🎙️ News

🎉 [2025-05-01] Our paper is accepted to ICML 2025. The camera ready paper, integration of more baseline methods, and corresponding blogs will be released soon!

📅 [2025-02-04] The code for LLMNodebed, along with the project pages and paper, has now been released! 🧨

🚀 Quick Start

0. Environment Setup

To get started, follow these steps to set up your Python environment:

conda create -n NodeBed python=3.10
conda activate NodeBed
pip install torch torch_geometric transformers peft pytz scikit-learn torch_scatter torch_sparse

Some packages might be missed for specific algorithms. Check the algorithm READMD or error logs to identify any missing dependencies and install them accordingly.

1. LLM Preparation

Close-source LLMs like GPT-4o, DeepSeek-Chat:

Add API keys to LLMZeroShot/Direct/api_keys.py

Open-source LLMs like Mistral-7B, Qwen:

Download models from HuggingFace (e.g., Mistral-7B). Then, update model paths in common/model_path.py as you actual saving paths.

Example paths:

MODEL_PATHs = {
  "MiniLM": "sentence-transformers/all-MiniLM-L6-v2",
  "Mistral-7B": "mistralai/Mistral-7B-Instruct-v0.2",
  "Llama-8B": "meta-llama/Llama-3.1-8B-Instruct",
  # See full list in common/model_path.py
}

2. Datasets

Download datasets either from Google Drive or HuggingFace and unzip into the datasets folder.

Before running LLM-based algorithms, please generate LM / LLM-encoded embeddings as follows:

cd LLMEncoder/GNN

python3 embedding.py --dataset=cora --encoder_name=roberta      # LM embeddings
python3 embedding.py --dataset=cora --encoder_name=Mistral-7B  # LLM embeddings

3. (Optional) Deploy Local LLMs

For LLM Direct Inference using open-source LLMs, we depoly them as local services based on the FastChat framework.

# Install dependencies
pip install vllm "fschat[model_worker,webui]"

# Start services
python3 -m fastchat.serve.controller --host 127.0.0.1
CUDA_VISIBLE_DEVICES=0 python3 -m fastchat.serve.vllm_worker --model-path mistralai/Mistral-7B-Instruct-v0.2 --host 127.0.0.1
python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8008

Then, the Mistral-7B model can be invoked via the url http://127.0.0.1:8008/v1/chat/completions.

4. Run Algorithms

Refer to method-specific READMEs for execution details:

LLM-as-Encoder: LLMEncoder/README.md
LLM-as-Predictor: LLMPredictor/README.md
LLM-as-Reasoner: LLMReasoner/README.md
Zero-shot Methods: LLMZeroShot/README.md

📖 Code Structure

LLMNodeBed/
├── LLMEncoder/           # LLM-as-Encoder (GNN, ENGINE)
├── LLMPredictor/         # LLM-as-Predictor (GraphGPT, LLaGA, Instruction Tuning)
├── LLMReasoner/          # LLM-as-Reasoner (TAPE)
├── LLMZeroShot/          # Zero-shot Methods (Direct Inference, ZeroG)
├── common/               # Shared utilities
├── datasets/             # Dataset storage
├── results/              # Experiment outputs
└── requirements.txt

🔧 Supported Methods

Method	Veneue	Official Implementation	Our Implementation
TAPE	ICLR'24	link	`LLMReasoner/TAPE`
ENGINE	IJCAI'24	link	`LLMEncoder/ENGINE`
GraphGPT	SIGIR'24	link	`LLMPredictor/GraphGPT`
LLaGA	ICML'24	link	`LLMPredictor/LLaGA`
ZeroG	KDD'24	link	`LLMZeroShot/ZeroG`
$\text{GNN}_{\text{LLMEmb}}$	-	Ours Proposed	`LLMEncoder/GNN`
LLM Instruction Tuning	-	Ours Implemented	`LLMPredictor/Instruction Tuning`
Direct Inference	-	Ours Implemented	`LLMZeroShot/Direct`

📮 Contact

If you have any further questions about usage, reproducibility, or would like to discuss, please feel free to open an issue or contact the authors via email at xxwu@se.cuhk.edu.hk.

🙏 Acknowledgements

We thank the authors of TAPE, ENGINE, GraphGPT, LLaGA, and ZeroG for their open-source implementations. Part of our framework is inspired by GLBench.