Readme.md

March 6, 2025 ยท View on GitHub

The codebase supports multiple teacher models (like SimVP with UniFormer,gsta and VAN) and a student model (Baseline), with options for advanced distillation techniques.

Prerequisites

  • Python 3.10 or higher
  • PyTorch (recommended: latest stable version)
  • Additional dependencies listed in requirements.txt

Install the dependencies:

pip install -r requirements.txt

Directory Structure

  • train_base_model.py: Script for training teacher and student models.
  • train_student_multiT_KD.py: Script for distilling the student model using multiple teacher models.
  • network/: Directory for saving network used int this work

Experimental Steps

1. Train the Base Models

Train the teacher models (SimVP-UniFormer, SimVP-VAN) and the student model (Baseline) using the following script:

python train_base_model.py
  • This script trains each model independently.
  • Checkpoints will be saved in the outputs/ directory.

2. Configure and Run Knowledge Distillation

Modify the train_student_multiT_KD.py script to point to the best teacher model weights:

  1. Open train_student_multiT_KD.py.
  2. Update the paths to the pretrained teacher model weights (e.g., simvp_uniformer/best.pth, simvp_van/best.pth).

Run the distillation process:

python train_student_multiT_KD.py
  • This script distills knowledge from the teacher models into the student model.
  • Output checkpoints and logs will be saved in outputs/.

3. Distillation Methods

The train_student_multiT_KD.py script supports multiple distillation strategies. The current default configuration is:

  • Multi-Teacher AEKD (Adaptive Ensemble Knowledge Distillation) combined with ABLoss.
  • You can modify the script to experiment with other distillation methods (see code comments for details).