3D-GRES: Generalized 3D Referring Expression Segmentation

December 15, 2024 ยท View on GitHub

PyTorch Python

๐Ÿ”—[arXiv] โ€ƒ ๐Ÿ“„[PDF] โ€ƒ

NEWS:๐Ÿ”ฅ3D-GRES is accepted at ACM MM 2024 (Oral)!๐Ÿ”ฅ

Changli Wu, Yihang Liu, Jiayi Ji, Yiwei Ma, Haowei Wang, Gen Luo, Henghui Ding, Xiaoshuai Sun, Rongrong Ji

Introduction

3D Referring Expression Segmentation (3D-RES) is dedicated to segmenting a specific instance within a 3D space based on a natural language description.However, current approaches are limited to segmenting a single target, restricting the versatility of the task. To overcome this limitation, we introduce Generalized 3D Referring Expression Segmentation (3D-GRES), which extends the capability to segment any number of instances based on natural language instructions.In addressing this broader task, we propose the Multi-Query Decoupled Interaction Network (MDIN), designed to break down multi-object segmentation tasks into simpler, individual segmentations.MDIN comprises two fundamental components: Text-driven Sparse Queries (TSQ) and Multi-object Decoupling Optimization (MDO). TSQ generates sparse point cloud features distributed over key targets as the initialization for queries. Meanwhile, MDO is tasked with assigning each target in multi-object scenarios to different queries while maintaining their semantic consistency. To adapt to this new task, we build a new dataset, namely Multi3DRes. Our comprehensive evaluations on this dataset demonstrate substantial enhancements over existing models, thus charting a new path for intricate multi-object 3D scene comprehension.

Installation

Requirements

  • Python 3.7 or higher
  • Pytorch 1.12
  • CUDA 11.3 or higher

The following installation suppose python=3.8 pytorch=1.12.1 and cuda=11.3.

  • Create a conda virtual environment

    conda create -n 3d-gres python=3.8
    conda activate 3d-gres
    
  • Clone the repository

    git clone https://github.com/sosppxo/MDIN.git
    
  • Install the dependencies

    Install Pytorch 1.12.1

    pip install spconv-cu113
    conda install pytorch-scatter -c pyg # or pip install https://data.pyg.org/whl/torch-1.12.0%2Bcu113/torch_scatter-2.0.9-cp38-cp38-linux_x86_64.whl
    pip install -r requirements.txt
    

    Install segmentator from this repo (We wrap the segmentator in ScanNet).

  • Setup, Install mdin and pointgroup_ops.

    sudo apt-get install libsparsehash-dev
    python setup.py develop
    cd gres_model/lib/
    python setup.py develop
    
  • Compile pointnet++

cd pointnet2
python setup.py install --user
cd ..

Data Preparation

ScanNet v2 dataset

Download the ScanNet v2 dataset.

Put the downloaded scans folder as follows.

MDIN
โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ scannetv2
โ”‚   โ”‚   โ”œโ”€โ”€ scans

Split and preprocess point cloud data

cd data/scannetv2
bash prepare_data.sh

The script data into train/val folder and preprocess the data. After running the script the scannet dataset structure should look like below.

MDIN
โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ scannetv2
โ”‚   โ”‚   โ”œโ”€โ”€ scans
โ”‚   โ”‚   โ”œโ”€โ”€ train
โ”‚   โ”‚   โ”œโ”€โ”€ val

ScanRefer dataset

Download ScanRefer annotations following the instructions.

In the original ScanRefer annotations, all ann_id within each scene were individually assigned based on the corresponding object_id, resulting in duplicate ann_id. We have modified the ScanRefer annotations, and the revised annotation data, where each ann_id within a scene is unique, can be accessed here.

Put the downloaded ScanRefer folder as follows.

MDIN
โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ ScanRefer
โ”‚   โ”‚   โ”œโ”€โ”€ ScanRefer_filtered_train_new.json
โ”‚   โ”‚   โ”œโ”€โ”€ ScanRefer_filtered_val_new.json

Multi3DRefer dataset

Downloading the Multi3DRefer annotations.

Put the downloaded Multi3DRefer folder as follows.

MDIN
โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ Multi3DRefer
โ”‚   โ”‚   โ”œโ”€โ”€ multi3drefer_train.json
โ”‚   โ”‚   โ”œโ”€โ”€ multi3drefer_val.json

There are some typos in the original text, please correct them according to Issue #6 to prevent syntax parsing errors.

Or download the modified Multi3DRefer(New)

ReferIt3D dataset

Downloading the ReferIt3D annotations and convert the .csv file into a .json format consistent with the Multi3DRefer format.

Put the downloaded ReferIt3D folder as follows.

MDIN
โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ ReferIt3D
โ”‚   โ”‚   โ”œโ”€โ”€ sr3d_train.json
โ”‚   โ”‚   โ”œโ”€โ”€ sr3d_val.json
โ”‚   โ”‚   โ”œโ”€โ”€ nr3d_train.json
โ”‚   โ”‚   โ”œโ”€โ”€ nr3d_val.json

Or download the modified ReferIt3D(.json)

Pretrained Backbone

Download SPFormer pretrained model (We only use the Sparse 3D U-Net backbone for training).

Move the pretrained model to backbones.

mkdir backbones
mv ${Download_PATH}/sp_unet_backbone.pth backbones/

Models

Download pretrain models and move it to checkpoints.

BenchmarkTaskmIoUAcc@0.25Acc@0.5Model
Multi3DRes3D-GRES47.566.944.7Model
ScanRefer3D-RES48.358.053.1Model
Nr3D3D-RES38.648.442.2Model
Sr3D3D-RES46.456.651.3Model

Training

For 3D-GRES:

bash scripts/train_3dgres.sh

For 3D-RES:

bash scripts/train_3dres.sh

Inference

For 3D-GRES:

bash scripts/test_3dgres.sh

For 3D-RES:

bash scripts/test_3dres.sh

Citation

If you find this work useful in your research, please cite:

@misc{wu20243dgresgeneralized3dreferring,
      title={3D-GRES: Generalized 3D Referring Expression Segmentation}, 
      author={Changli Wu and Yihang Liu and Jiayi Ji and Yiwei Ma and Haowei Wang and Gen Luo and Henghui Ding and Xiaoshuai Sun and Rongrong Ji},
      year={2024},
      eprint={2407.20664},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.20664}, 
}

Ancknowledgement

Sincerely thanks for ReLA, M3DRef-CLIP, EDA, SceneGraphParser, SoftGroup, SSTNet and SPFormer repos. This repo is build upon them.