[MICCAI 2025] Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
September 17, 2025 · View on GitHub
[MICCAI 2025] Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Soham Walimbe, Britty Baby Vinkle Srivastav, Nicolas Padoy, MICCAI 2025
MML-SurgAdapt framework
MML - SurgAdapt
This repository contains the codebase for MML - SurgAdapt, an adaptation of the CLIP for surgery. The project is designed for multi-task surgical computer vision and supports easy setup, training, and inference.
Table of Contents
Environment Setup
Follow these steps to set up the environment:
- Clone the Repository
git clone https://github.com/CAMMA-public/MMA-SurgAdapt.git cd MMA-SurgAdapt - Create a Python Virtual Environment
conda create -n env python=3.12 conda activate env - Install Dependencies
conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=12.1 -c pytorch -c nvidia pip install -r requirements.txt
Data Setup
Set up your data in the cholec directory as follows:
cholec/
├── data/
│ ├── cholec80/ # Phase recognition
│ ├── endoscapes/ # CVS assessment
│ ├── cholect50/ # Triplet recognition
│ ├── triplet_data/ # Optional: For model initialization with LatentGraph pseudolabels
│ └── triplet_val_data/ # Optional: For model initialization with LatentGraph pseudolabels
├── cholec_labels_index.npy
├── cholec_labels.txt
├── cholec_super_labels.txt
└── word2vec_similarity_matrix.npy
Set up the configs for training and testing in the configs/surgadapt+cholec.yaml:
Batch size, lr, epochs, dir, loss, backbone, seed, flags for SP validation, Pseudolabel initialization, Label file, init/getitem, partial positive setup.
For evaluation, specify checkpoint, dir, and loss.
Running Training
For training the model, use the config file given for each experiment to set the configuration for training and run, for example:
python train.py -c configs/surgadapt+cholec_pp_hill.yaml
Running Inference
For testing the model, use the config file given for each experiment to set the configuration (change the directory for saving results) for testing and run, for example:
python test.py -c configs/surgadapt+cholec_pp_hill.yaml
Pretrained Weights
Model weights have been saved as follows:
MMLSurgAdapt_checkpoints/
├── Baselines/ # One ckpt file each
│ ├── R50/
│ ├── CLIP-VitL/
│ ├── DualCoop/
│ ├── VLPL/
│ ├── HSPNet/
│ ├── Multi-task/
│ └── Task-specific/
│ ├── R50/ # 1 ckpt per dataset
│ └── CLIP/ # 1 ckpt per dataset
├── Loss_experiments/ # All loss functions here, one ckpt each
├── SP Hill/ # Single positive, 5 ckpts
├── SP WAN/ # 5 ckpts
├── SP SPLC/ # 5 ckpts
├── PP Hill/ # Partial positive, 5 ckpts
├── PP WAN/ # 5 ckpts
└── PP SPLC/ # 5 ckpts
Run Baselines
For DualCoOp, use the README file to set up the environment, set the data folder as given above (not in cholec/).
cd baselines/Dualcoop/
python train.py
For Task-specific baselines, use config files for the experiments after setting up the data, as above (in cholec/).
cd baselines/TS+multitask/
python train.py -c configs/r50+endo.yaml
For multi-task baseline:
cd baselines/TS+multitask/
python train_multitask.py
Citation
If you use our code or models in your research, please cite with:
@article{walimbe2025adaptation,
title={Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision},
author={Walimbe, Soham and Baby, Britty and Srivastav, Vinkle and Padoy, Nicolas},
booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
year={2025},
organization={Springer}
}
License
This code and models are available for non-commercial scientific research purposes as defined in the CC BY-NC-SA 4.0. By downloading and using this code you agree to the terms in the LICENSE. Third-party codes are subject to their respective licenses.