Reproducibility
May 10, 2022 ยท View on GitHub
1. Usage
1.1 In terminal
# Run the main file (at the root of the project)
python main_molecules_graph_regression.py --dataset ZINC --config 'configs/molecules_graph_regression_GatedGCN_ZINC_100k.json' # for CPU
python main_molecules_graph_regression.py --dataset ZINC --gpu_id 0 --config 'configs/molecules_graph_regression_GatedGCN_ZINC_100k.json' # for GPU
The training and network parameters for each dataset and network is stored in a json file in the configs/ directory.
1.2 In jupyter notebook
# Run the notebook file (at the root of the project)
conda activate benchmark_gnn
jupyter notebook
Use main_molecules_graph_regression.ipynb notebook to explore the code and do the training interactively.
2. Output, checkpoints and visualizations
Output results are located in the folder defined by the variable out_dir in the corresponding config file (eg. configs/molecules_graph_regression_GatedGCN_ZINC_100k.json file).
If out_dir = 'out/molecules_graph_regression/', then
2.1 To see checkpoints and results
- Go to
out/molecules_graph_regression/resultsto view all result text files. - Directory
out/molecules_graph_regression/checkpointscontains model checkpoints.
2.2 To see the training logs in Tensorboard on local machine
- Go to the logs directory, i.e.
out/molecules_graph_regression/logs/. - Run the commands
source activate benchmark_gnn
tensorboard --logdir='./' --port 6006
- Open
http://localhost:6006in your browser. Note that the port information (here 6006 but it may change) appears on the terminal immediately after starting tensorboard.
2.3 To see the training logs in Tensorboard on remote machine
- Move this script to the root of the repository, i.e. benchmarking-gnns/.
- Run the script
bash script_tensorboard.sh. - On your local machine, run the command
ssh -N -f -L localhost:6006:localhost:6006 user@xx.xx.xx.xx. - Open
http://localhost:6006in your browser. Note thatuser@xx.xx.xx.xxcorresponds to your user login and the IP of the remote machine.
3. Reproduce results (4 runs on all, except CSL and TUs)
# At the root of the project
bash scripts/SuperPixels/script_main_superpixels_graph_classification_MNIST_100k.sh # run MNIST dataset for 100k params
bash scripts/SuperPixels/script_main_superpixels_graph_classification_MNIST_500k.sh # run MNIST dataset for 500k params; WL-GNNs
bash scripts/SuperPixels/script_main_superpixels_graph_classification_CIFAR10_100k.sh # run CIFAR10 dataset for 100k params
bash scripts/SuperPixels/script_main_superpixels_graph_classification_CIFAR10_500k.sh # run CIFAR10 dataset for 500k params; WL-GNNs
bash scripts/ZINC/script_main_molecules_graph_regression_ZINC_100k.sh # run ZINC dataset for 100k params
bash scripts/ZINC/script_main_molecules_graph_regression_ZINC_500k.sh # run ZINC dataset for 500k params
bash scripts/ZINC/script_main_molecules_graph_regression_ZINC_PE_GatedGCN_500k.sh # run ZINC dataset with PE for GatedGCN
bash scripts/AQSOL/script_main_molecules_graph_regression_AQSOL_100k.sh # run AQSOL dataset for 100k params
bash scripts/AQSOL/script_main_molecules_graph_regression_AQSOL_500k.sh # run AQSOL dataset for 500k params
bash scripts/AQSOL/script_main_molecules_graph_regression_AQSOL_PE_GatedGCN_500k.sh # run AQSOL dataset with PE for GatedGCN
bash scripts/WikiCS/script_main_WikiCS_node_classification_100k.sh # run WikiCS dataset for 100k params
bash scripts/SBMs/script_main_SBMs_node_classification_PATTERN_100k.sh # run PATTERN dataset for 100k params
bash scripts/SBMs/script_main_SBMs_node_classification_PATTERN_500k.sh # run PATTERN dataset for 500k params
bash scripts/SBMs/script_main_SBMs_node_classification_PATTERN_PE_GatedGCN_500k.sh # run PATTERN dataset with PE for GatedGCN
bash scripts/SBMs/script_main_SBMs_node_classification_CLUSTER_100k.sh # run CLUSTER dataset for 100k params
bash scripts/SBMs/script_main_SBMs_node_classification_CLUSTER_500k.sh # run CLUSTER dataset for 500k params
bash scripts/SBMs/script_main_SBMs_node_classification_CLUSTER_PE_GatedGCN_500k.sh # run CLUSTER dataset with PE for GatedGCN
bash scripts/TSP/script_main_TSP_edge_classification_100k.sh # run TSP dataset for 100k params
bash scripts/TSP/script_main_TSP_edge_classification_edge_feature_analysis.sh # run TSP dataset for edge feature analysis
bash scripts/COLLAB/script_main_COLLAB_edge_classification_40k.sh # run OGBL-COLLAB dataset for 40k params
bash scripts/COLLAB/script_main_COLLAB_edge_classification_edge_feature_analysis.sh # run OGBL-COLLAB dataset for edge feature analysis
bash scripts/COLLAB/script_main_COLLAB_edge_classification_PE_GatedGCN_40k.sh # run OGBL-COLLAB dataset with PE for GatedGCN
bash scripts/CSL/script_main_CSL_graph_classification_20_seeds.sh # run CSL dataset without node features on 20 seeds
bash scripts/CSL/script_main_CSL_graph_classification_PE_20_seeds.sh # run CSL dataset with PE on 20 seeds
bash scripts/GraphTheoryProp/script_main_GraphTheoryProp_multitask_100k.sh # run GraphTheoryProp dataset for 100k params
bash scripts/CYCLES/script_main_CYCLES_graph_classification_all.sh # run CYCLES dataset for 100k params
bash scripts/TU/script_main_TUs_graph_classification_100k_seed1.sh # run TU datasets for 100k params on seed1
bash scripts/TU/script_main_TUs_graph_classification_100k_seed2.sh # run TU datasets for 100k params on seed2
Scripts are located at the scripts/ directory of the repository.
4. Generate statistics obtained over mulitple runs (except CSL and TUs)
After running a script, statistics (mean and standard variation) can be generated from a notebook. For example, after running the script scripts/ZINC/script_main_molecules_graph_regression_ZINC_100k.sh, go to the results folder out/molecules_graph_regression/results/, and run the notebook scripts/StatisticalResults/generate_statistics_molecules_graph_regression_ZINC_100k.ipynb to generate the statistics.