Readme.md
March 6, 2025 ยท View on GitHub
The codebase supports multiple teacher models (like SimVP with UniFormer,gsta and VAN) and a student model (Baseline), with options for advanced distillation techniques.
Prerequisites
- Python 3.10 or higher
- PyTorch (recommended: latest stable version)
- Additional dependencies listed in
requirements.txt
Install the dependencies:
pip install -r requirements.txt
Directory Structure
train_base_model.py: Script for training teacher and student models.train_student_multiT_KD.py: Script for distilling the student model using multiple teacher models.network/: Directory for saving network used int this work
Experimental Steps
1. Train the Base Models
Train the teacher models (SimVP-UniFormer, SimVP-VAN) and the student model (Baseline) using the following script:
python train_base_model.py
- This script trains each model independently.
- Checkpoints will be saved in the
outputs/directory.
2. Configure and Run Knowledge Distillation
Modify the train_student_multiT_KD.py script to point to the best teacher model weights:
- Open
train_student_multiT_KD.py. - Update the paths to the pretrained teacher model weights (e.g.,
simvp_uniformer/best.pth,simvp_van/best.pth).
Run the distillation process:
python train_student_multiT_KD.py
- This script distills knowledge from the teacher models into the student model.
- Output checkpoints and logs will be saved in
outputs/.
3. Distillation Methods
The train_student_multiT_KD.py script supports multiple distillation strategies. The current default configuration is:
- Multi-Teacher AEKD (Adaptive Ensemble Knowledge Distillation) combined with ABLoss.
- You can modify the script to experiment with other distillation methods (see code comments for details).