CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data Only

March 9, 2026 ยท View on GitHub

๐Ÿšจ NEW: 2 million CauKer dataset is now available on Hugging Face! ๐Ÿšจ

CauKer is a new synthetic data generation framework for pretraining classification Time Series Foundation Models (TSFMs) without using any real data.

๐Ÿš€ Oral Presentation and Paper @ ICLR 2026 ๐Ÿ“… April 23-27, 2026 | ๐Ÿ“ Rio de Janeiro

๐Ÿš€ Best Time Series Paper and Spotlight Presentation @ ICML 2025 FMSD Workshop
๐Ÿ“… July 18, 2025 | ๐Ÿ“ Vancouver

Overview

Recent work in TSFMs has focused on training with large-scale real-world corpora, which are expensive and hard to collect. CauKer tackles this challenge by generating synthetic time series data that are both:

  • Temporally realistic (trends, seasonality, non-stationarity)
  • Causally structured (DAG-based propagation of Gaussian Process root nodes)

Our method combines:

  • Gaussian Process Kernel Composition
  • Structural Causal Graphs
  • Rich activation and mean function libraries

With CauKer, we show that state-of-the-art classification models like Mantis and MOMENT can be pretrained entirely on synthetic data and still outperform or match real-data baselines in zero-shot classification tasks.

A more efficient version and training code will be released soon.


Quick Start

You can try CauKer directly in your browser via Google Colab:

๐Ÿ‘‰ Open Tutorial in Colab

# 1. Clone the repo
git clone https://github.com/ShifengXIE/CauKer.git
cd CauKer

# 2. Install dependencies


# 3. Generate 200,000 synthetic time series (default: 512-length, 4-dimensional)
python CauKer.py -N 200000 -L 512 -F 4 -P 6 -M 18 -O CauKer200K.arrow

Example Use Case

Once generated, the synthetic .arrow dataset can be used to pretrain your own TSFM (e.g., Mantis or MOMENT) and then evaluated in a zero-shot setting on UCR, UEA benchmarks.


Citation

@inproceedings{cauker2025,
  title={CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data Only},
  author={Shifeng Xie, Vasilii Feofanov, Marius Alonso, Ambroise Odonnat, Jianfeng Zhang, Ievgen Redko},
  booktitle={ICML Workshop on Foundation Models for Structured Data (FMSD)},
  year={2025}
}

โค๏ธ Acknowledgements

This work is the result of a great collaborationโ€”thanks to all my amazing co-authors: Vasilii Feofanov, Marius Alonso, Ambroise Odonnat, Jianfeng Zhang and Ievgen Redko, for their guidance and support throughout this project. Thanks MOAKHER Yessin for propose some ideas to be faster and safe.

If you have questions, collaboration ideas, or just want to discuss TSFMs, feel free to reach out:

๐Ÿ“ฌ shifeng.xie@telecom-paris.fr ievgen.redko@huawei.com>