Efficient Graph Condensation via Gaussian Process (GCGP)
June 15, 2025 ยท View on GitHub
๐ Read the Paper
๐ Table of Contents
- Efficient Graph Condensation via Gaussian Process (GCGP)
๐ง Abstract
Graph condensation reduces graph sizes while maintaining performance, addressing the scalability challenges of GNNs caused by computational inefficiencies on large datasets. Existing methods often rely on bi-level optimization, which requires repeated GNN training and limits scalability.
This paper proposes Graph Condensation via Gaussian Process (GCGP) โ a computationally efficient method that leverages a Gaussian Process (GP) to estimate predictions from input nodes without iterative GNN training.
Key innovations:
- A covariance function aggregates local neighborhoods to capture complex node dependencies.
- Concrete random variables approximate binary adjacency matrices in a differentiable form, enabling gradient-based optimization of discrete graph structures.
๐ฌ Methodology
Figure 1: Graph condensation condenses a large graph into a smaller but informative graph that preserves performance on downstream tasks like GNN training.
Conventional graph condensation methods use a bi-level optimization framework:
- Inner loop: Train a GNN on the condensed graph.
- Outer loop: Update the condensed graph based on performance loss.
This is computationally expensive due to repeated GNN training.
๐งช GCGP: A Simpler Alternative
GCGP replaces iterative GNN training with a Gaussian Process, treating the condensed synthetic graph as GP observations. The GP combines these with model priors to make predictions on the original graph .
Figure 2: The GCGP workflow includes:
- Using the condensed graph as GP observations.
- Predicting node labels in the original graph .
- Optimizing the condensed graph by minimizing the discrepancy between predictions and ground-truth labels.
๐ ๏ธ Implementation
๐ง Requirements
python=3.8.20ogb=1.3.6pytorch=1.12.1pyg=2.5.2numpy=1.24.3
๐ก Tip: Install
ogbfirst to avoid CUDA device recognition issues.
To set up the environment, run:
conda env create -f environment.yml
๐ Small Datasets (Cora, Citeseer, Pubmed, Photo, Computers)
Navigate to the gcgp folder:
cd gcgp
Run GCGP on a dataset (e.g., Cora):
python main.py --dataset Cora --cond_ratio 0.5 --ridge 0.5 --k 4 --epochs 200 --learn_A 0
To reproduce all results:
sh run.sh
- Outputs will be saved in
./gcgp/outputs/ - Final results collected in
./gcgp/results.csvviaresults.py
For generalization experiments:
sh run_generalization.sh
- Outputs:
./gcgp/outputs_generalization/ - Results:
./gcgp/results_generalization.csv
For efficiency/time evaluation:
sh run_time.sh
- Outputs:
./gcgp/outputs_time/
๐๏ธ Large Datasets (Ogbn-arxiv and Reddit)
๐น Ogbn-arxiv Dataset
Go to the folder:
cd gcgp_ogb
Run GCGP:
python main.py --dataset ogbn-arxiv --cond_size 90 --ridge 5 --k 2 --epochs 200 --learn_A 0
To reproduce all results:
sh run.sh
- Outputs:
./gcgp_ogb/outputs/ - Results:
./gcgp_ogb/results.csv
For time analysis:
sh run_time.sh
- Outputs:
./gcgp_ogb/outputs_time/
๐น Reddit Dataset
Navigate to:
cd gcgp_reddit
Run GCGP:
python main.py --dataset Reddit --cond_size 77 --ridge 0.1 --k 2 --epochs 270 --learn_A 0
To reproduce all results:
sh run.sh
- Outputs:
./gcgp_reddit/outputs/ - Results:
./gcgp_reddit/results.csv
For training time evaluation:
sh run_time.sh
- Outputs:
./gcgp_reddit/outputs_time/
๐ Cite Our Paper
If you find our paper or code useful, please cite:
@article{wang2025efficient,
title={Efficient Graph Condensation via Gaussian Process},
author={Wang, Lin and Li, Qing},
journal={arXiv preprint arXiv:2501.02565},
year={2025}
}
๐ License
MIT License ยฉ 2025 WANG Lin