Multivariate-Long-Term-Time-Series-Forecasting
August 11, 2025 ยท View on GitHub
In this GitHub repository, we provide eight widely-used and publicly available datasets for research in multivariate long-term time series forecasting. These datasets span diverse real-world domains, including weather prediction, traffic flow monitoring, energy consumption, and financial exchange rate analysis. Each dataset is formatted consistently: for a time series signal with T timestamps and n sensors at each timestamp, the data file contains T lines, with each line comprising n real numbers separated by commas. We also include data preprocessing scripts to ensure compatibility with our proposed model, EAPformer, and other time series forecasting frameworks.
Paper
EAPformer: Entropy-Aware Patch Transformer for Multivariate Long-Term Time Series Forecasting
Datasets
1. Electricity Consumption
- Source: UCI Machine Learning Repository
- Description: This dataset records electricity consumption (in kWh) every 15 minutes from 2011 to 2014 for 321 clients. Due to zero-valued dimensions in 2011, we excluded those records and used data from 2012 to 2014. The data has been preprocessed to reflect hourly consumption for consistency.
- Temporal Resolution: Hourly
- Dimensionality: 321 sensors (clients)
- Usage: Suitable for modeling energy consumption patterns over time.
2. Traffic Usage
- Source: PeMS - California Department of Transportation
- Description: This dataset contains 48 months (2015โ2016) of hourly road occupancy rates (ranging from 0 to 1) measured by sensors on San Francisco Bay Area freeways. It captures traffic flow dynamics across multiple locations.
- Temporal Resolution: Hourly
- Dimensionality: Varies (sensor-dependent)
- Usage: Ideal for studying traffic flow patterns and congestion forecasting.
3. Weather
- Source: Referenced in Autoformer [Wu et al., 2021]
- Description: This dataset includes multivariate time series data for weather-related variables, such as temperature, humidity, and pressure, collected at high temporal resolution. It is designed to test forecasting models on periodic and seasonal patterns.
- Temporal Resolution: Varies (e.g., 15-minute intervals)
- Dimensionality: Multiple weather variables
- Usage: Suitable for evaluating models on weather forecasting tasks.
4. Exchange
- Source: Referenced in Autoformer [Wu et al., 2021]
- Description: This dataset captures daily exchange rates for multiple currencies. It reflects financial time series with non-stationary and fluctuating dynamics.
- Temporal Resolution: Daily
- Dimensionality: Multiple currency pairs
- Usage: Useful for financial time series forecasting and trend analysis.
5. ETT Datasets (ETTm1, ETTm2, ETTh1, ETTh2)
- Source: Referenced in Informer [Zhou et al., 2021]
- Description: The Electricity Transformer Temperature (ETT) datasets consist of four variants: ETTm1, ETTm2 (minute-level), and ETTh1, ETTh2 (hourly). These datasets record transformer load and temperature metrics, capturing both periodic and non-stationary dynamics. They vary in temporal granularity and sequence length, making them suitable for testing long-term forecasting robustness.
- Temporal Resolution:
- ETTm1, ETTm2: 15-minute intervals
- ETTh1, ETTh2: Hourly
- Dimensionality: Varies (multiple transformer metrics)
- Usage: Ideal for evaluating models on both high- and low-resolution time series with complex temporal dependencies.