Training InstructSeg
January 13, 2025 ยท View on GitHub
Prepare pre-trained model weights
MLLM weights
Loading Mipha-3B pre-trained weights Mipha-3B, and replace --model_name_or_path in training scripts.
CLIP Encoder weights
Loading SigLIP-SO pre-trained weights SigLIP-SO, and replace --vision_tower in training scripts.
Visual Encoder and Segmentation Decoder weights
Loading Mask2Former Swin-B weights Mask2Former, and replace --vision_tower_mask in training scripts.
Now Train !
sh scripts/seg/train.sh
Merge lora weights
After the training stage, merge the output/model/checkpoint-100000 and save the final InstructSeg model weight.
sh scripts/seg/merge_lora_weights.sh