README.md

June 8, 2026 · View on GitHub

Multi-period Learning for Financial Time Series Forecasting (MLF, KDD2025)

The paper is available at the link Paper (PDF).

News

  • [2026/06] We release an enriched version of the Fund Sales Dataset with additional covariates (e.g., page exposure UVs and market yield rates) and detailed table schemas. The original dataset was split by holding period and merged into a unified format; the enriched version extends it with richer features to facilitate more comprehensive research. The enriched dataset in its original unmerged format is available at Google Drive (Enriched Fund Dataset). See Introduction of Fund Sales Dataset for full table descriptions.

Simple introduction

This repo provides official code of Multi-period Learning for Financial Time Series Forecasting (MLF, published in KDD 2025), which incorporates multiple inputs with varying lengths (periods) to achieve better accuracy and reduces the costs of selecting input lengths during training.

In our work, multi-period inputs refer to multiple original time series windows with varying input lengths, as shown in follwoing sub-figure (c). This is different from the multi-scale inputs in Pyraformer and Scaleformer, which are obtained by downsampling from the same fixed input length (follwoing sub-figure (b)).

替代文本

Different input lengths have a significant impact on prediction accuracy. However, selecting appropriate input lengths is a crucial challenge affecting time series forecasting. we propose MLF to extract the semantic information of short-medium-long-term individually using sequences with varying lengths, to avoid model fails to learn the different semantics under only long-term inputs, e.g., the prediction error of Pathformer and Scaleformer using long-term sequence inputs is higher than that of short-term one.

It's not easy to use inputs of different lengths simultaneously for prediction due to challenges caused by multi-period characteristics. As shown in the following figure, MLF is a benchmark that exlpores an architecture consists of various componments to address the challenges to incorporate multiple inputs with varying lengths to achieve better accuracy.

替代文本

Overall architecture

The overall architecture of MLF is shown in following figure.

替代文本

The two simple but effective componments of MLF are shown in following figure. For instance, the Patch Squeeze module significantly improves efficiency while maintaining good accuracy in the long-term TSF task.

替代文本

Downloading Datasets

Due to confidentiality reasons, we can only disclose partial data of fund products. You can download the public datasets and original Fund dataset from https://drive.google.com/drive/folders/1KKqHsdd18ZuBdpV8ZiiQxU9bbMkPR4kS. The enriched version in its original unmerged format is available at can be downloaded from the link https://drive.google.com/drive/folders/1mx5ItVm2Nyod8n2AixotTSbbFIK5hO35. The downloaded folders e.g., "Fund_Dataset", should be placed at the "dataset" folder. For the original Fund dataset, the average holding period of Fund 1, Fund 2, and Fund 3 gradually increases, and the overall time pattern distribution also changes, which can be used for a more comprehensive evaluation of the algorithm's effectiveness.

Introduction of Fund Sales Dataset

We collect fund sales datasets of different customers from Ant Fortune, which is an online wealth management platform on the Alipay APP. A subset of fund datasets covering January 2021 to January 2023 is currently released due to confidentiality reasons. The datasets consist of three tables described below.

1. Fund Purchase/Redemption Data and Fund Feature Information Table

FieldDescriptionType
1product_pidProduct IDstring
2transaction_dateTransaction datestring
3apply_amtPurchase (subscription) amountdouble
4redeem_amtRedemption amountdouble
5net_in_amtNet purchase amount = purchase amount - redemption amountdouble
6uv_fundownUV of fund holdings pagedouble
7uv_stableownUV of stable holdings pagedouble
8uv_fundoptUV of fund watchlist pagedouble
9uv_fundmarketUV of fund market pagedouble
10uv_termmarketUV of fixed-term market pagedouble
11during_daysFund holding period (days), i.e., the minimum number of calendar days between redemption and purchasebigint
12total_net_valueCumulative net asset valuedouble

2. Market Information Table

FieldDescriptionType
1enddateDatestring
2yieldYield rate (%)double

3. Calendar Information Table

FieldDescriptionType
1stat_dateDatestring
2is_tradeWhether it is a trading daybigint
3next_trade_dateNext trading daystring
4last_trade_datePrevious trading daystring
5is_week_endWhether it is the last trading day of the weekbigint
6is_month_endWhether it is the last trading day of the monthbigint
7is_quarter_endWhether it is the last trading day of the quarterbigint
8is_year_endWhether it is the last trading day of the yearbigint
9trade_day_rankGlobal trading day rankbigint

Time series visualization of Fund dataset (first two lines) and public datasets is shown as follows:

替代文本

Reference

If you find our dataset and methodology useful in your work, please cite our paper:

@inproceedings{zhang2025multi,
  title={Multi-period learning for financial time series forecasting},
  author={Zhang, Xu and Huang, Zhengang and Wu, Yunzhi and Lu, Xun and Qi, Erpeng and Chen, Yunkai and Xue, Zhongya and Wang, Qitong and Wang, Peng and Wang, Wei},
  booktitle={Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1},
  pages={2848--2859},
  year={2025}
}