Download 2D features

May 7, 2025 · View on GitHub

Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation

Zhaochong An · Guolei Sun · Yun Liu · Runjia Li · Min Wu
Ming-Ming Cheng · Ender Konukoglu · Serge Belongie

ICLR 2025 Spotlight (Paper)

Overview

🌟 Highlights

We introduce:

  • A novel cost-free multimodal few-shot 3D point cloud segmentation (FS-PCS) setup that integrates textual category names and 2D image modality
  • MM-FSS: The first multimodal FS-PCS model that explicitly utilizes textual modality and implicitly leverages 2D modality
  • Superior performance on novel class generalization through effective multimodal integration
  • Valuable insights into the importance of commonly-ignored free modalities in FS-PCS

📝 Citation

If you find our code or paper useful, please cite:

@inproceedings{an2025generalized,
  title={Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model},
  author={An, Zhaochong and Sun, Guolei and Liu, Yun and Li, Runjia and Han, Junlin and Konukoglu, Ender and Belongie, Serge},
  booktitle={CVPR},
  year={2025}
}

@inproceedings{an2024multimodality,
    title={Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation},
    author={An, Zhaochong and Sun, Guolei and Liu, Yun and Li, Runjia and Wu, Min 
            and Cheng, Ming-Ming and Konukoglu, Ender and Belongie, Serge},
    booktitle={ICLR},
    year={2025}
}

@inproceedings{an2024rethinking,
  title={Rethinking Few-shot 3D Point Cloud Semantic Segmentation},
  author={An, Zhaochong and Sun, Guolei and Liu, Yun and Liu, Fayao and Wu, Zongwei and Wang, Dan and Van Gool, Luc and Belongie, Serge},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3996--4006},
  year={2024}
}

🛠️ Environment Setup

Our environment has been tested on:

  • RTX 3090 GPUs
  • GCC 6.3.0

Follow the COSeg installation guide for detailed setup.

📦 Dataset Preparation

Pretraining Stage Data

Follow OpenScene instructions, you can directly download the following ScanNet 3D dataset and 2D features for pretraining:

# Download ScanNet 3D dataset
wget https://cvg-data.inf.ethz.ch/openscene/data/scannet_processed/scannet_3d.zip
unzip scannet_3d.zip

# Download 2D features
wget https://cvg-data.inf.ethz.ch/openscene/data/scannet_multiview_lseg.zip
unzip scannet_multiview_lseg.zip

You should put the unpacked data into the folder ./pretraining/data/ or link to the corresponding data folder with the symbolic link:

ln -s /PATH/TO/DOWNLOADED/FOLDER ./pretraining/data

Few-shot Stage Data

Download our preprocessed datasets:

DatasetFew-shot Stage Data
S3DISDownload
ScanNetDownload

Option 2: Manual Preprocessing

Follow COSeg preprocessing instructions. The processed data will be in [PATH_to_DATASET_processed_data]/blocks_bs1_s1/data. Make sure to update the data_root entry in the .yaml config file to [PATH_to_DATASET_processed_data]/blocks_bs1_s1/data.

🔄 Training Pipeline

1. Backbone and IF Head Pretraining

Option A: Download our pretrained weights from Google Drive

Option B: Train from scratch:

cd pretraining
bash run/distill_strat.sh PATH_to_SAVE_BACKBONE config/scannet/ours_lseg_strat.yaml

2. Meta-learning Stage

Set config config/[CONFIG_FILE] to be s3dis_COSeg_fs.yaml or scannetv2_COSeg_fs.yaml for training on S3DIS or ScanNet respectively. Adjust cvfold, n_way, and k_shot according to your few-shot task:

# For 1-way tasks
python3 main_fs.py --config config/[CONFIG_FILE] \
    save_path [PATH_to_SAVE_MODEL] \
    pretrain_backbone [PATH_to_SAVED_BACKBONE] \
    cvfold [CVFOLD] \
    n_way 1 \
    k_shot [K_SHOT] \
    num_episode_per_comb 1000

# For 2-way tasks
python3 main_fs.py --config config/[CONFIG_FILE] \
    save_path [PATH_to_SAVE_MODEL] \
    pretrain_backbone [PATH_to_SAVED_BACKBONE] \
    cvfold [CVFOLD] \
    n_way 2 \
    k_shot [K_SHOT] \
    num_episode_per_comb 100

Note: Following COSeg, num_episode_per_comb defaults to 1000 for 1-way and 100 for 2-way tasks to maintain consistency in test set size.

📊 Evaluation & Visualization

Model Evaluation

Modify cvfold, n_way, k_shot and num_episode_per_comb accordingly and run:

python3 main_fs.py --config config/[CONFIG_FILE] \
    test True \
    eval_split test \
    weight [PATH_to_SAVED_MODEL] \
    [vis 1]  # Optional: Enable W&B visualization

Note: Performance may vary by 1.0% due to potential randomness in the training process. ScanNetv2 typically shows less variance than S3DIS.

Visualization

Follow COSeg visualization guide for high-quality visualization results.

🎯 Model Zoo

ModelDatasetCVFOLDN-way K-shotWeights
s30_1w1sS3DIS01-way 1-shotDownload
s30_1w5sS3DIS01-way 5-shotDownload
s30_2w1sS3DIS02-way 1-shotDownload
s30_2w5sS3DIS02-way 5-shotDownload
s31_1w1sS3DIS11-way 1-shotDownload
s31_1w5sS3DIS11-way 5-shotDownload
s31_2w1sS3DIS12-way 1-shotDownload
s31_2w5sS3DIS12-way 5-shotDownload
sc0_1w1sScanNet01-way 1-shotDownload
sc0_1w5sScanNet01-way 5-shotDownload
sc0_2w1sScanNet02-way 1-shotDownload
sc0_2w5sScanNet02-way 5-shotDownload
sc1_1w1sScanNet11-way 1-shotDownload
sc1_1w5sScanNet11-way 5-shotDownload
sc1_2w1sScanNet12-way 1-shotDownload
sc1_2w5sScanNet12-way 5-shotDownload

Contact

For any questions or issues, feel free to reach out!