[TMLR 2024] NuTime
December 23, 2024 ยท View on GitHub
NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time-Series Pretraining
Chenguo Lin, Xumeng Wen, Wei Cao, Congrui Huang, Jiang Bian, Stephen Lin, Zhirong Wu
This repository contains the official implementation of the paper: NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time-Series Pretraining, which is accepted to TMLR 2024. In this work, we propose the NuTime model for large-scale time series pretraining. The model is based on the Transformer architecture, which takes input as a set of tokens from non-overlapping windows. Each window is represented by its normalized shape, the window mean and standard deviation. We develop a numerically multi-scaled embedding method (NME) for representing the scalar values of mean and std. The model can take raw values of time-series data in any numerical scales as input without any data normalization and transformation.
Feel free to contact me (chenguolin@stu.pku.edu.cn) or open an issue if you have any questions or suggestions.
๐ข News
- 2024-12-23: Check the latest repository under the Microsoft account: microsoft/NuTime.
- 2024-11-12: Checkpoint of the self-supervised pretrained NuTime is released.
- 2024-11-12: Codes about data preprocessing, training, evaluation are released.
- 2024-07-15: It might take some time to clean the entire codebase for releasing, so we first provide the code about window & mean & std embeddings, which is the essential part of the proposed NuTime, at here.
- 2024-07-10: NuTime is accepted to TMLR 2024.
๐ TODO
- Release the training and evaluation code
- Release the self-supervised pretrained NuTime
๐ง Installation
Please install PyTorch according to your CUDA version first. There are not restrictions on the torch version, feel free to use your preferred one.
git clone https://github.com/chenguolin/NuTime.git
cd NuTime
bash settings/setup.sh
๐ Dataset
Please refer to src/data/preprocess.py.
We provide the script to preprocess the data including: UCR, UEA, SleepEDF, Epilepsy, etc.
The processed and splitted Epilpesy dataset is provided in datasets/Epilepsy for example.
๐ Usage
-
The core part of our work is
WindowNormEncoderin src/models/encoders/WindowNormEncoder.py andWinTin src/models/networks.py. You can directly view the code for implementation details. Other codes are merely for data preprocessing, training, evaluation and ablation study, which could be ignored essentially. -
Checkpoint of the self-supervised (i.e., BYOL-style) pretrained NuTime (with
9multi-scaled embeddings) is provided in ckpt/checkpoint_bias9.pth
Finetune Pretrained NuTime for Epilepsy dataset
python3 src/pipeline.py --config_file configs/demo_ft_epilepsy.json
๐ Citation
If you find our work helpful, please consider citing:
@article{lin2024nutime,
title={NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time-Series Pretraining},
author={Chenguo Lin and Xumeng Wen and Wei Cao and Congrui Huang and Jiang Bian and Stephen Lin and Zhirong Wu},
journal={Transactions on Machine Learning Research (TMLR)},
year={2024}
}