Weather2K: A Multivariate Spatio-Temporal Benchmark Dataset for Meteorological Forecasting Based on Real-Time Observation Data from Ground Weather Stations

August 5, 2025 · View on GitHub

Logo

Weather2K: A Multivariate Spatio-Temporal Benchmark Dataset for Meteorological Forecasting Based on Real-Time Observation Data from Ground Weather Stations

【Accepted】by the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023

GitHub repo Hugging Face

Download the dataset

This is the open-source version (after necessary adjustments and checks) of the Weather2K dataset: https://huggingface.co/datasets/BUPT-PRIS-727/Weather2K

The shape of the numpy file of Weather2K-R is (1866, 13, 13632), which means 1,866 groud weather stations, 3 constants for position information and 10 meteorological factors, and 13,632 time steps with 3-hour time resolution (Time coverage range: January 1, 2017- August 31, 2021).

Numpy IndexLong NameShort NameUnit
0Latitudelat(°)
1Longitudelon(°)
2Altitudealt(m)
3Air pressureaphpa
4Air Temperaturet(°C)
5/6Maximum / Minimum temperaturemxt / mnt(°C)
7Relative humidityrh(%)
8Precipitation in 3hp3(mm)
9Wind directionwd(°)
10Wind speedws(ms-1)
11Maximum wind directionmwd(°)
12Maximum wind speedmws(ms-1)

If you have any quesetions about the data download, please contact wuming@bupt.edu.cn

Archive

Cite

If you are using this dataset please cite

Zhu X, Xiong Y, Wu M, et al. Weather2K: A Multivariate Spatio-Temporal Benchmark Dataset for Meteorological Forecasting Based on Real-Time Observation Data from Ground Weather Stations[C]//International Conference on Artificial Intelligence and Statistics. PMLR, 2023: 2704-2722.