Bidirectional Temporal-Aware Modeling with Multi-Scale Mixture-of-Experts for Multivariate Time Series Forecasting
November 11, 2025 · View on GitHub
This code is the official PyTorch implementation of CIKM'25 paper: Bidirectional Temporal-Aware Modeling with Multi-Scale Mixture-of-Experts for Multivariate Time Series Forecasting ( ).
Quickstart
1. Environment
is developed with Python 3.10 and relies on Pytorch 2.4.1. To set up the environment, make sure miniconda has been correctly installed and configed.
# Create a new conda environment.
conda create -n bim3 python=3.10 -y
conda activate bim3
# Install required packages using pip.
pip install -r requirements.txt
2. Dataset
adopts TFB framework as the code basis, and datasets can be obtained from TFB's public archive Google Drive.
Place the downloaded .zip file under the folder ./dataset.
The structure of folders should be:
.
├── config
├── dataset
│ └── forecasting
| ├── ...
| ├── Electricity.csv
| ├── ETTh1.csv
| ...
...
3. Run Scripts
We provide all experiment scripts for for 10 datasets (Electricity, ETTh1, ETTh2, ETTm1, ETTm2, Exchange, ILI, Solar, Traffic and Weather). These scripts are placed under folder ./scripts/multivariate_forecast. For instance, you can reproduce the ETTh1.csv dataset's results by running:
sh ./scripts/multivariate_forecast/ETTh1_script/BIM3.sh
The experiments results would be under the folder results/ETTh1, which are stored in .csv format with detailed training configuration. All same to other datasets' scripts.
4. Code Implementation
The implementation of can be found under folder ts_benchmark/baselines/bim3.
5. Citation
If you find this repo is helpful, please cite our paper.
@inproceedings{10.1145/3746252.3761273, author = {Gao, Yifan and Zhao, Boming and Peng, Haocheng and Bao, Hujun and Zhao, Jiashu and Cui, Zhaopeng}, title = {Bidirectional Temporal-Aware Modeling with Multi-Scale Mixture-of-Experts for Multivariate Time Series Forecasting}, year = {2025}, isbn = {9798400720406}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3746252.3761273}, doi = {10.1145/3746252.3761273}, abstract = {Recent advances in deep learning have significantly boosted performance in multivariate time series forecasting (MTSF). While many existing approaches focus on capturing inter-variable (a.k.a. channel-wise) correlations to improve prediction accuracy, the temporal dimension, particularly its rich structural and contextual information, remains underexplored. In this paper, we propose BIM3, a novel framework that integrates BIdirectional temporal-aware modeling with Multi-Scale Mixture-of-Experts for MTSF. First, unlike existing methods that treat historical and future temporal information independently, we introduce a novel Timestamp Dual Cross-Attention Module, which employs a symmetric cross-attention mechanism to explicitly capture bidirectional temporal dependencies through timestamp interactions. Second, to address the complex and scale-varying temporal patterns commonly found in multivariate time series, we move beyond recent multi-scale forecasting models that share parameters across all channels and fail to capture channel-specific dynamics. Instead, we design a Multi-Scale Feature Extract Mixture-of-Experts module that adaptively routes time series to specialized experts based on their temporal characteristics. Extensive experiments on multiple real-world datasets show that BIM3 consistently outperforms state-of-the-art methods, highlighting its effectiveness in capturing both temporal structure and inter-variable diversity.}, booktitle = {Proceedings of the 34th ACM International Conference on Information and Knowledge Management}, pages = {696–706}, numpages = {11}, keywords = {deep learning, mixture-of-experts, multi-scale, multivariate time series forecasting}, location = {Seoul, Republic of Korea}, series = {CIKM '25} }