ProG-V2: A Reproducible Toolkit for Graph Prompt Learning
May 31, 2026 · View on GitHub
ProG-V2 is an engineering-focused extension of the original ProG benchmark for graph prompt learning. It keeps the standard pre-train → prompt-tune → evaluate workflow, while adding a modular prompt-strategy architecture, broader prompt coverage, centralized path/device/logging utilities, benchmark scripts, tests, and public merged result reports.
What's New in ProG-V2
- 17 prompt strategies registered through a
PromptStrategyregistry. - 6 GNN backbones registered through
prompt_graph.model.build_gnn. - Reproducible few-shot benchmark utilities for node- and graph-level tasks.
- Centralized filesystem paths, device resolution, logging, and CLI/YAML config.
- Tests for data loading, GNN factory construction, strategy registration, and prompt-task smoke runs.
- Fixes for several benchmark-blocking edge cases in WebKB, MultiGprompt, RELIEF, and GraphMAE.
Architecture
Benchmark Results
We publish two complementary public GCN benchmark reports: a node- and
graph-classification report, and an edge-task (link-prediction) report.
Both follow the same {pretrain}+{prompt} matrix format so they can be read and
merged with the same tooling.
Node & Graph Classification
The classification report lives under
results/benchmark-gcn/ and contains 714
independent (dataset, shot, pretrain+prompt) combinations and 2142 metric
values over Accuracy, Macro-F1, and AUROC.
Experiment parameters:
| Setting | Value |
|---|---|
| Backbone | GCN |
| GNN layers | 2 |
| Hidden dimension | 128 |
| Seed | 42 |
| Shots | 1-shot, 3-shot, 5-shot |
| Few-shot splits | 5 splits per shot setting (mean±std) |
| Downstream budget | 50 epochs with early stopping |
| Pretrain budget | 200 epochs for generated checkpoints |
| Metrics | Accuracy, Macro-F1, AUROC |
| Result format | {pretrain}+{prompt} columns |
Coverage:
| Dataset | Task | 1-shot | 3-shot | 5-shot |
|---|---|---|---|---|
| Cora | Node | 72 | 72 | 72 |
| Wisconsin | Node | 59 | 59 | 59 |
| MUTAG | Graph | 56 | 56 | 56 |
| PROTEINS | Graph | 51 | 51 | 51 |
Result files:
summary.csv: flat table, one row per experiment combination.final_matrices.xlsx: 12 sheets, one per(dataset, shot)task view.README.md: detailed result documentation and metric definitions.
Edge Task (Link Prediction)
The link-prediction report lives under
results/link-prediction-gcn/ and contains
2912 (dataset, shot, pretrain+prompt) combinations over Accuracy, F1,
AUROC, and AUPRC.
| Setting | Value |
|---|---|
| Backbone | GCN |
| Datasets | CiteSeer, Cora, IMDB-BINARY, MUTAG, PROTEINS, PTC_MR, PubMed, Wisconsin (8) |
| Shots | 0-shot, 1-shot, 3-shot, 5-shot |
| Pretrains | None, DGI, GraphMAE, Edgepred_GPPT, Edgepred_Gprompt, GraphCL, SimGRACE (7) |
| Prompts | 13 LinkTask-supported strategies |
| Combos per dataset | 91 (13 prompts × 7 pretrains) × 4 shots = 364 cells |
| Primary metrics | AUROC, AUPRC (Accuracy/F1 kept for matrix compatibility) |
Result files:
summary.csv: flat table, one row per experiment combination.final_matrices.xlsx: 32 sheets, one per(dataset, shot)view (4 shots × 8 datasets).README.md: detailed result documentation and metric definitions.
Both reports currently use GCN. Other backbones are available in the model registry but are not part of these public benchmark tables.
Installation
Use Python 3.9 or 3.11. Python 3.11 is recommended for local development.
conda create -n prog-v2 python=3.11 -y
conda activate prog-v2
pip install -e ".[dev]"
pre-commit install
If PyTorch Geometric extension wheels are not resolved automatically, install the matching wheels for your PyTorch/CUDA version from the official PyG wheel index:
python -m pip install torch_scatter torch_sparse -f https://data.pyg.org/whl/
Quick Start
Run a minimal downstream task:
python downstream_task.py \
--downstream_task NodeTask \
--dataset_name Cora \
--gnn_type GCN \
--prompt_type GPF \
--shot_num 1 \
--epochs 1 \
--device cpu
Run a small benchmark cell and write an Excel matrix:
python scripts/bootstrap_excel_full.py --gnn_type GCN
python bench.py \
--pretrain_task NodeTask \
--dataset_name Cora \
--prompt_type None \
--gnn_type GCN \
--shot_num 1 \
--epochs 1 \
--device cpu \
--pre_train_model_path None \
--num_iter 1
Run a LinkTask cell (link prediction, dot-product decoder):
python bench.py \
--pretrain_task LinkTask \
--dataset_name Cora \
--prompt_type GPF \
--gnn_type GCN \
--shot_num 0 \
--epochs 10 \
--device cpu \
--pre_train_model_path None \
--num_iter 1
For a single-run LinkTask entry point, downstream_task.py also accepts
--downstream_task LinkTask:
python downstream_task.py \
--downstream_task LinkTask \
--dataset_name Cora \
--prompt_type None \
--gnn_type GCN \
--shot_num 0 \
--epochs 2 \
--device cpu \
--pre_train_model_path None
Supported Components
Backbones
GCNGATGINGraphSAGEGCovGraphTransformer
Pretraining Methods
DGIGraphMAEGraphCLSimGRACEEdgepred_GPPTEdgepred_GpromptMultiGprompt
Prompt Strategies
None, GPF, GPF-plus, Gprompt, All-in-one, GPPT, Prodigy,
GraphPrompter, EdgePrompt, EdgePromptplus, RELIEF, MultiGprompt,
UniPrompt, SelfPro, ProNoG, PSP, and DAGPrompT.
Downstream Tasks
| Task | Class | Datasets | Notes |
|---|---|---|---|
NodeTask | prompt_graph.tasker.NodeTask | NODE_TASKS (12) | Node classification, supports all 17 prompts. |
GraphTask | prompt_graph.tasker.GraphTask | GRAPH_TASKS (11) | Graph classification, supports all 17 prompts. |
LinkTask | prompt_graph.tasker.LinkTask | LINK_TASKS (16 curated) | Link prediction with binary BCE + dot-product decoder for most prompts. |
Scripts
The public sweep scripts are parameterized by --gnn_type:
bash scripts/pretrain_full_grid.sh --gnn_type GCN --fast
bash scripts/bench_full_grid.sh --gnn_type GCN --fast --datasets "Cora MUTAG"
Useful scripts:
| Script | Purpose |
|---|---|
scripts/bootstrap_excel_full.py | Create empty Excel matrices for a selected backbone. |
scripts/pretrain_full_grid.sh | Pretrain selected methods/datasets/backbone. |
scripts/bench_full_grid.sh | Run the full project benchmark grid with filters. |
scripts/merge_result_excels.py | Merge per-run Excel outputs into one report. |
scripts/export_final_matrices.py | Export populated per-dataset matrices into summary.csv and final_matrices.xlsx. |
Development Checks
ruff check .
ruff format --check .
pytest tests/ -v
For contribution guidelines, see CONTRIBUTING.md.
Citation
If you find this project useful, please cite the original ProG/graph prompt work:
@article{zi2024prog,
title={ProG: A Graph Prompt Learning Benchmark},
author={Chenyi Zi and Haihong Zhao and Xiangguo Sun and Yiqing Lin and Hong Cheng and Jia Li},
year={2024},
journal={Advances in Neural Information Processing Systems}
}
@inproceedings{sun2023all,
title={All in One: Multi-Task Prompting for Graph Neural Networks},
author={Sun, Xiangguo and Cheng, Hong and Li, Jia and Liu, Bo and Guan, Jihong},
booktitle={Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
year={2023}
}