Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting

August 14, 2025 ยท View on GitHub

Paper Conference Python PyTorch Stars

Framework Architecture

Time-VLM Framework Architecture

๐Ÿ“– Overview

Time-VLM provides an extensible framework for integrating various Vision-Language Models (VLMs) with time series forecasting. It supports multiple VLM types (CLIP, BLIP2, ViLT) and enables flexible multimodal experiments.

๐Ÿš€ Quick Start

Environment Setup

To set up the environment, install Python 3.8 with Pytorch 1.4.4. Use the following commands for convenience:

conda create -n Time-VLM python=3.8
conda activate Time-VLM
pip install -r requirements.txt

Dataset Preparation

Download the pre-processed datasets from:

Place the downloaded data in the ./dataset folder.

Running Experiments

Run the following scripts for different forecasting tasks:

# Long-term Forecasting (Full-shot, 100% data)
bash ./scripts/TimeVLM_long_1.0p.sh

# Long-term Forecasting (Few-shot, 10% data)
bash ./scripts/TimeVLM_long_0.1p.sh

# Short-term Forecasting
bash ./scripts/TimeVLM_short.sh

# Zero-shot Transfer Learning
bash ./scripts/TimeVLM_transfer.sh

โš ๏ธ Important Notes:

  • Ensure you have downloaded the datasets and placed them in the correct directory
  • The default parameters provided in scripts are a good starting point, but you need to adjust them based on your specific dataset and requirements
  • Script Naming Convention: TimeVLM_long_X.Xp.sh where X.Xp indicates the percentage of data used (e.g., 1.0p = 100%, 0.1p = 10%)

๐Ÿ“ Project Structure

Time-VLM/
โ”œโ”€โ”€ README.md                 # Project documentation
โ”œโ”€โ”€ requirements.txt          # Python dependencies
โ”œโ”€โ”€ run.py                    # Main entry point for training and testing
โ”œโ”€โ”€ dataset/                  # Dataset directory
โ”‚   โ”œโ”€โ”€ ETT/                  # ETT datasets
โ”‚   โ”œโ”€โ”€ Weather/              # Weather dataset
โ”‚   โ”œโ”€โ”€ Electricity/          # Electricity dataset
โ”‚   โ”œโ”€โ”€ Traffic/              # Traffic dataset
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ scripts/                  # Training and evaluation scripts
โ”‚   โ”œโ”€โ”€ TimeVLM_long_1.0p.sh # Long-term forecasting (full-shot, 100% data)
โ”‚   โ”œโ”€โ”€ TimeVLM_long_0.1p.sh # Long-term forecasting (few-shot, 10% data)
โ”‚   โ”œโ”€โ”€ TimeVLM_short.sh     # Short-term forecasting
โ”‚   โ”œโ”€โ”€ TimeVLM_transfer.sh  # Zero-shot transfer learning
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ src/                      # Source code
โ”‚   โ”œโ”€โ”€ TimeVLM/             # Time-VLM model implementation
โ”‚   โ”‚   โ”œโ”€โ”€ model.py         # Main model architecture
โ”‚   โ”‚   โ”œโ”€โ”€ vlm_custom.py    # Custom VLM implementations
โ”‚   โ”‚   โ”œโ”€โ”€ vlm_manager.py   # VLM manager for different types
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ utils/                # Utility functions
โ”‚   โ”œโ”€โ”€ models/               # Model implementations
โ”‚   โ”œโ”€โ”€ layers/               # Custom layers
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ exp/                      # Experiment configurations
โ”œโ”€โ”€ logs/                     # Training logs
โ”œโ”€โ”€ ts-images/               # Generated time series images
โ””โ”€โ”€ ...

โš™๏ธ Configuration & Tuning

Core Parameters

ParameterDefaultRangeDescription
d_model12832-512Most Important: Model dimension
dropout0.10.1-0.5Dropout rate
learning_rate0.0010.0001-0.01Learning rate
batch_size32-Adjust based on GPU memory
image_size5628-112Time series image size
periodicity24-Data periodicity for image generation
norm_const0.40.1-1.0Normalization constant

Script Parameters

ParameterDefaultDescription
percent1.0Data usage ratio
vlm_typeclipVLM type [clip, blip2, vilt, custom]
image_size56Time series image size (28-224)
periodicity24Data periodicity for image generation
use_mem_gateTrueMemory fusion gate
finetune_vlmFalseFinetune pre-trained VLM
three_channel_imageTrueGenerate RGB images
learnable_imageTrueLearnable image generation

๐Ÿ“š Citation

If you find this repository useful, please cite our paper:

@inproceedings{zhong2025time,
  title={Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting},
  author={Zhong, Siru and Ruan, Weilin and Jin, Ming and Li, Huan and Wen, Qingsong and Liang, Yuxuan},
  booktitle={Proceedings of the 42nd International Conference on Machine Learning},
  year={2025}
}