ODIN: A Single Model for 2D and 3D Segmentation

June 4, 2025 ยท View on GitHub

PWC PWC

ODIN: A Single Model for 2D and 3D Segmentation

Authors: Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki.

Official implementation of "ODIN: A Single Model for 2D and 3D Segmentation", CVPR 2024 (Highlight).


Installation

Make sure you are using a newer version of GCC>=9.2.0

export TORCH_CUDA_ARCH_LIST="6.0 6.1 6.2 7.0 7.2 7.5 8.0 8.6"
conda create -n odin python=3.10
conda activate odin
pip install torch==2.2.0+cu118 torchvision==0.17.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.2.0+cu118.html
pip install -r requirements.txt
sh init.sh

Data Preparation

Please refer to README in data_preparation folder for individual datasets. For eg. ScanNet data preparation README

Usage

We provide training scripts for various datasets in scripts folder. Please refer to these scripts for training ODIN.

  • Modify DETECTRON2_DATASETS to the path where you store the Posed RGB-D data. You might also need to change 3D Mesh point cloud paths (like SCANNET_DATA_DIR) for each script. You may want to find these variables in odin/config.py and permanently modify these paths.
  • If you want to train a model on 3D datasets, modify MODEL.WEIGHTS to load pre-trained 2D weights. We use weights from Mask2Former. You can download the Mask2Former-ResNet and Mask2Former-Swin Weights. MODEL.WEIGHTS can also accept link to the checkpoint as well, so you can directly supply these links as the argument value.
  • ODIN Pre-trained weights are provided below in the Model-Zoo. Simply point to these weights using the MODEL.WEIGHTS to run inference. You would also need to add --eval-only flag for running evaluation.
  • SOLVER.IMS_PER_BATCH controls the batch size. This is effective batch size i.e. if you are running on 2 GPUs and the batch size is set to 6, you are using bs=3 per GPU.
  • SOLVER.TEST_IMS_PER_BATCH controls the (effective) test batch size. Since, there are variable number of images in a scene, we use bs=1 per GPU at test time. MAX_FRAME_NUM=-1 means that it loads all images in a scene for inference, which is our usual strategy. In some datasets, the images can simply be too large, thus there we actually set a maximum limit on images.
  • INPUT.SAMPLING_FRAME_NUM controls the number of images we sample at test time -- for eg. in ScanNet, we train on 25 image chunks at training time.
  • CHECKPOINT_PERIOD is the number of iterations after which a checkpoint is saved. EVAL_PERIOD specifies the number of steps after which the eval is run.
  • OUTPUT_DIR stores the checkpoints and the tensorboard logs. --resume resumes the training from the last checkpoint stored in OUTPUT_DIR. If no checkpoint is present, it loads the weights from MODEL.WEIGHTS

Model Zoo

ScanNet Instance Segmentation

DatasetmAPmAP@25ConfigCheckpoint
ScanNet val (ResNet50)47.883.6configcheckpoint
ScanNet val (Swin-B)50.083.6configcheckpoint
ScanNet test (Swin-B)47.786.2configcheckpoint

ScanNet Semantic Segmentation

DatasetmIoUConfigCheckpoint
ScanNet val (ResNet50)73.3configcheckpoint
ScanNet val (Swin-B)77.8configcheckpoint
ScanNet test (Swin-B)74.4configcheckpoint

Joint 2D-3D on ScanNet and COCO

ModelmAP (ScanNet)mAP25 (ScanNet)mAP (COCO)ConfigCheckpoint
ODIN49.183.141.2configcheckpoint

ScanNet200 Instance Segmentation

DatasetmAPmAP@25ConfigCheckpoint
ScanNet200 val (ResNet50)25.636.9configcheckpoint
ScanNet200 val (Swin-B)31.545.3configcheckpoint
ScanNet200 test (Swin-B)27.239.4configcheckpoint

ScanNet200 Semantic Segmentation

DatasetmIoUConfigCheckpoint
ScanNet200 val (ResNet50)35.8configcheckpoint
ScanNet200 val (Swin-B)40.5configcheckpoint
ScanNet test (Swin-B)36.8configcheckpoint

AI2THOR Semantic and Instance Segmentation

DatasetmAPmAP@25mIoUConfigCheckpoint
AI2THOR val (ResNet)63.880.271.5configcheckpoint
AI2RHOR val (Swin)64.378.671.4configcheckpoint

Matterport3D Instance Segmentation

DatasetmAPmAP@25ConfigCheckpoint
Matterport3D val (ResNet)11.527.6configcheckpoint
Matterport val (Swin)14.536.8configcheckpoint

Matterport3D Semantic Segmentation

DatasetmIoUmAccConfigCheckpoint
Matterport3D val (ResNet)22.428.5configcheckpoint
Matterport3D val (Swin)28.638.2configcheckpoint

S3DIS Instance Segmentation

DatasetmAPmAP@25ConfigCheckpoint
S3DIS Area5 (ResNet50-Scratch)36.361.2configcheckpoint
S3DIS Area5 (ResNet50-Fine-Tuned)44.767.5configcheckpoint
S3DIS Area5 (Swin-B)43.070.0configcheckpoint

S3DIS Semantic Segmentation

DatasetmIoUConfigCheckpoint
S3DIS (ResNet50)59.7configcheckpoint
Swin-B68.6configcheckpoint

Training Logs:

Please find training logs for all models here

Citing ODIN

If you find ODIN useful in your research, please consider citing:

@inproceedings{jain2024odin,
  title={Odin: a single model for 2D and 3D segmentation},
  author={Jain, Ayush and Katara, Pushkal and Gkanatsios, Nikolaos and Harley, Adam W and Sarch, Gabriel and Aggarwal, Kriti and Chaudhary, Vishrav and Fragkiadaki, Katerina},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3564--3574},
  year={2024}
}

License

Shield: License: MIT

The majority of ODIN is licensed under a MIT License.

Acknowledgement

Parts of this code were based on the codebase of Mask2Former and Mask3D.