GECC: Scalable and Structure-free Graph Condensation with Evolving Capabilities

December 1, 2025 · View on GitHub

Stars

[KDD 2026] This is the PyTorch implementation of the training-free graph condensation method with evolving capabilities.

If you find this repo helpful, please consider to star this repository and cite our research on graph condensation:

@inproceedings{gong2025gcnc,
  title={GECC: Scalable and Structure-free Graph Condensation with Evolving Capabilities},
  author={Shengbo Gong and Mohammad Hashemi and Juntong Ni and Carl Yang and Wei Jin},
  booktitle={Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '26)},
  year={2026}
}


@inproceedings{gong2025gcnc,
  title={{GC}4{NC}: A Benchmark Framework for Graph Condensation on Node Classification with New Insights},
  author={Shengbo Gong and Juntong Ni and Noveen Sachdeva and Carl Yang and Wei Jin},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2025}
}

@inproceedings{hashemi2024comprehensive,
  title={A comprehensive survey on graph reduction: sparsification, coarsening, and condensation},
  author={Hashemi, Mohammad and Gong, Shengbo and Ni, Juntong and Fan, Wenqi and Prakash, B Aditya and Jin, Wei},
  booktitle={Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence},
  pages={8058--8066},
  year={2024}
}

Prepare Environments

CUDA and PyTorch

Check torch previous versions. We test this repo in torch 1.12.1 with CUDA 11.6 and torch 2.4.0 with CUDA 12.1.

Download Datasets

For cora, citeseer, flickr and reddit (reddit2 in pyg), the pyg code will directly download them. For arxiv, we use the datasets provided by GraphSAINT. Our code will automatically download all datasets.

The default path of datasets is ../../data.

Once the datasets are downloaded, they randomly be divided into five subsets and each preserving the original class distribution except for Ogbn-arxiv-real, which follows the real timestamps of the publication years using checkpoints/evolve_dataset/ogbn-arxiv/node_year.csv file. To save the graphs of each timestamp, simply run:

For real-world splits: python split.py --dataset <dataset name> --realworld
Others: python split.py --dataset <dataset name>

Reproduce Results

To reproduce the reported results in the paper (Table 2), we have provided needed scripts to re-run the experiments in graphslim/scripts.

First,

cd graphlim

Evolving and non-Evolving settings:

Add any arguments to bash run_<dataset>_evolve.sh to reproduce both Evolving and non-Evolving setting results. This basically reproduce the results of Table 2 in the paper except the Whole column (last column).

As an example, for Cora dataset:

bash run_cora_evolve.sh && bash run_cora_evolve.sh 1

Whole GNN training:

To train the GNN using the whole training nodes, run:

python run_eval.py -D <dataset> -W -G <gpu_id>

Acknowledgement

Some of the algorithms are referred to paper authors' implementations and other packages.

SimGC

GCOND

GEOM