GP-DCRNN: Large scale traffic forecasting using graph-partitioning-based diffusion convolution recurrent neural network
March 14, 2025 ยท View on GitHub
Graph-partitioning-based DCRNN approach model the traffic on a large California highway network with 11,160 sensor locations. The general idea is to partition the large highway network into a number of small networks, and trained them with a simultaneously. The training process takes around 3 hours in a moderately sized GPU cluster, and the real-time inference can be run on traditional hardware such as CPUs. This is a TensorFlow implementation of Diffusion Convolutional Recurrent Neural Network.
Requirements
- scipy>=0.19.0
- numpy>=1.12.1
- pandas>=0.19.2
- tensorflow>=1.13.1
- pyaml
๐ Dataset Overview
The final dataset contains speed and flow data from 11,160 traffic stations across California from January 1, 2018, to December 31, 2018, with a granularity of 5 minutes.
The dataset covers traffic in nine districts of California: D3 - North Central, D4 - Bay Area, D5 - Central Coast, D6 - South Central, D7 - Los Angeles, D8 - San Bernardino, D10 - Central, D11 - San Diego, D12 - Orange County
The dataset includes:
โ Traffic speed measurements collected from sensors.
โ Traffic flow data representing vehicle density and movement.
โ Sensor adjacency matrices representing road network connectivity.
โ Sensor distances for spatial analysis.
These datasets are useful for:
๐ฆ Traffic flow analysis
๐ Machine learning & deep learning models for traffic prediction
๐ฃ Graph-based road network modeling
๐ Urban mobility & transportation planning
๐ Data Preparation
To get started, download the necessary traffic data files for California and store them in the scripts/ folder.
๐ฅ Download Required Files
California Traffic Data
- ๐ฆ Traffic Speed Data:
speed.h5 - ๐ Traffic Flow Data:
flow.h5 - ๐บ Adjacency Matrix:
adj_mat.pkl - ๐ Sensor Distances:
distances.csv
Los Angeles (LA) Traffic Data
- ๐ Complete LA Dataset:
LA Traffic Data - ๐ฆ Traffic Speed Data:
speed.h5 - ๐บ Adjacency Matrix:
adj_mat.pkl - ๐ Sensor Distances:
distances.csv
San Francisco (SFO) Traffic Data
- ๐ Complete SFO Dataset:
SFO Traffic Data
๐ Organizing Files
After downloading, place the files inside the scripts/ directory:
# Generate adjucency matrix for 64 partitions. It will generate 64 folder containing adj_mat.pkl for each partition
# Input: graph_sensor_locations_11k.csv, distances.csv, and tiny_11k_graph_new.txt.part.64 (graph partition from Metis)
python part_adj.py
# Generate speed.h5 and sensor_ids.txt containing station ids for each partition
# Input: graph_sensor_locations_11k.csv, and tiny_11k_graph_new.txt.part.64
python extract_part_data.py
#Provide the folder name in the following code to generate configuration files for all partitions
python copy_yaml.py
# Move the folder Ex. data_partition_64 outside the `scripts/` folder.
# move the data_partition_64 folder outside of the script folder
#To run DCRNN on the local machine with one partition
python dcrnn_train.py --config_filename=data_partitions_64/part0/dcrnn_config.yaml
Script to submit job on cooley (GPU cluster at Argonne Leadership Computing Facility) is
qsub_64.sh
The model generates prediction of DCRNN is in data_partition_64/part{0..63}/results/dcrnn_predictions_[1-12].h5.
Citation
If you find this repository, e.g., the code and the datasets, useful in your research, please cite the following paper:
@article{mallick2020graph,
title={Graph-partitioning-based diffusion convolutional recurrent neural network for large-scale traffic forecasting},
author={Mallick, Tanwi and Balaprakash, Prasanna and Rask, Eric and Macfarlane, Jane},
journal={Transportation Research Record},
volume={2674},
number={9},
pages={473--488},
year={2020},
publisher={SAGE Publications Sage CA: Los Angeles, CA}
}