Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting
August 14, 2025 ยท View on GitHub

Time-VLM Framework Architecture
๐ Overview
Time-VLM provides an extensible framework for integrating various Vision-Language Models (VLMs) with time series forecasting. It supports multiple VLM types (CLIP, BLIP2, ViLT) and enables flexible multimodal experiments.
๐ Quick Start
Environment Setup
To set up the environment, install Python 3.8 with Pytorch 1.4.4. Use the following commands for convenience:
conda create -n Time-VLM python=3.8
conda activate Time-VLM
pip install -r requirements.txt
Dataset Preparation
Download the pre-processed datasets from:
- Google Drive: Download Link
- Baidu Drive: Download Link
Place the downloaded data in the ./dataset folder.
Running Experiments
Run the following scripts for different forecasting tasks:
# Long-term Forecasting (Full-shot, 100% data)
bash ./scripts/TimeVLM_long_1.0p.sh
# Long-term Forecasting (Few-shot, 10% data)
bash ./scripts/TimeVLM_long_0.1p.sh
# Short-term Forecasting
bash ./scripts/TimeVLM_short.sh
# Zero-shot Transfer Learning
bash ./scripts/TimeVLM_transfer.sh
โ ๏ธ Important Notes:
- Ensure you have downloaded the datasets and placed them in the correct directory
- The default parameters provided in scripts are a good starting point, but you need to adjust them based on your specific dataset and requirements
- Script Naming Convention:
TimeVLM_long_X.Xp.shwhereX.Xpindicates the percentage of data used (e.g.,1.0p= 100%,0.1p= 10%)
๐ Project Structure
Time-VLM/
โโโ README.md # Project documentation
โโโ requirements.txt # Python dependencies
โโโ run.py # Main entry point for training and testing
โโโ dataset/ # Dataset directory
โ โโโ ETT/ # ETT datasets
โ โโโ Weather/ # Weather dataset
โ โโโ Electricity/ # Electricity dataset
โ โโโ Traffic/ # Traffic dataset
โ โโโ ...
โโโ scripts/ # Training and evaluation scripts
โ โโโ TimeVLM_long_1.0p.sh # Long-term forecasting (full-shot, 100% data)
โ โโโ TimeVLM_long_0.1p.sh # Long-term forecasting (few-shot, 10% data)
โ โโโ TimeVLM_short.sh # Short-term forecasting
โ โโโ TimeVLM_transfer.sh # Zero-shot transfer learning
โ โโโ ...
โโโ src/ # Source code
โ โโโ TimeVLM/ # Time-VLM model implementation
โ โ โโโ model.py # Main model architecture
โ โ โโโ vlm_custom.py # Custom VLM implementations
โ โ โโโ vlm_manager.py # VLM manager for different types
โ โ โโโ ...
โ โโโ utils/ # Utility functions
โ โโโ models/ # Model implementations
โ โโโ layers/ # Custom layers
โ โโโ ...
โโโ exp/ # Experiment configurations
โโโ logs/ # Training logs
โโโ ts-images/ # Generated time series images
โโโ ...
โ๏ธ Configuration & Tuning
Core Parameters
| Parameter | Default | Range | Description |
|---|---|---|---|
d_model | 128 | 32-512 | Most Important: Model dimension |
dropout | 0.1 | 0.1-0.5 | Dropout rate |
learning_rate | 0.001 | 0.0001-0.01 | Learning rate |
batch_size | 32 | - | Adjust based on GPU memory |
image_size | 56 | 28-112 | Time series image size |
periodicity | 24 | - | Data periodicity for image generation |
norm_const | 0.4 | 0.1-1.0 | Normalization constant |
Script Parameters
| Parameter | Default | Description |
|---|---|---|
percent | 1.0 | Data usage ratio |
vlm_type | clip | VLM type [clip, blip2, vilt, custom] |
image_size | 56 | Time series image size (28-224) |
periodicity | 24 | Data periodicity for image generation |
use_mem_gate | True | Memory fusion gate |
finetune_vlm | False | Finetune pre-trained VLM |
three_channel_image | True | Generate RGB images |
learnable_image | True | Learnable image generation |
๐ Citation
If you find this repository useful, please cite our paper:
@inproceedings{zhong2025time,
title={Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting},
author={Zhong, Siru and Ruan, Weilin and Jin, Ming and Li, Huan and Wen, Qingsong and Liang, Yuxuan},
booktitle={Proceedings of the 42nd International Conference on Machine Learning},
year={2025}
}