Readme.md

March 6, 2025 · View on GitHub

The codebase supports multiple teacher models (like SimVP with UniFormer,gsta and VAN) and a student model (Baseline), with options for advanced distillation techniques.

Prerequisites

Python 3.10 or higher
PyTorch (recommended: latest stable version)
Additional dependencies listed in requirements.txt

Install the dependencies:

pip install -r requirements.txt

Directory Structure

train_base_model.py: Script for training teacher and student models.
train_student_multiT_KD.py: Script for distilling the student model using multiple teacher models.
network/: Directory for saving network used int this work

Experimental Steps

1. Train the Base Models

Train the teacher models (SimVP-UniFormer, SimVP-VAN) and the student model (Baseline) using the following script:

python train_base_model.py

This script trains each model independently.
Checkpoints will be saved in the outputs/ directory.

2. Configure and Run Knowledge Distillation

Modify the train_student_multiT_KD.py script to point to the best teacher model weights:

Open train_student_multiT_KD.py.
Update the paths to the pretrained teacher model weights (e.g., simvp_uniformer/best.pth, simvp_van/best.pth).

Run the distillation process:

python train_student_multiT_KD.py

This script distills knowledge from the teacher models into the student model.
Output checkpoints and logs will be saved in outputs/.

3. Distillation Methods

The train_student_multiT_KD.py script supports multiple distillation strategies. The current default configuration is:

Multi-Teacher AEKD (Adaptive Ensemble Knowledge Distillation) combined with ABLoss.
You can modify the script to experiment with other distillation methods (see code comments for details).