3D-GRES: Generalized 3D Referring Expression Segmentation
December 15, 2024 ยท View on GitHub
๐[arXiv] โ ๐[PDF] โ
NEWS:๐ฅ3D-GRES is accepted at ACM MM 2024 (Oral)!๐ฅ
Changli Wu, Yihang Liu, Jiayi Ji, Yiwei Ma, Haowei Wang, Gen Luo, Henghui Ding, Xiaoshuai Sun, Rongrong Ji

Introduction
3D Referring Expression Segmentation (3D-RES) is dedicated to segmenting a specific instance within a 3D space based on a natural language description.However, current approaches are limited to segmenting a single target, restricting the versatility of the task. To overcome this limitation, we introduce Generalized 3D Referring Expression Segmentation (3D-GRES), which extends the capability to segment any number of instances based on natural language instructions.In addressing this broader task, we propose the Multi-Query Decoupled Interaction Network (MDIN), designed to break down multi-object segmentation tasks into simpler, individual segmentations.MDIN comprises two fundamental components: Text-driven Sparse Queries (TSQ) and Multi-object Decoupling Optimization (MDO). TSQ generates sparse point cloud features distributed over key targets as the initialization for queries. Meanwhile, MDO is tasked with assigning each target in multi-object scenarios to different queries while maintaining their semantic consistency. To adapt to this new task, we build a new dataset, namely Multi3DRes. Our comprehensive evaluations on this dataset demonstrate substantial enhancements over existing models, thus charting a new path for intricate multi-object 3D scene comprehension.
Installation
Requirements
- Python 3.7 or higher
- Pytorch 1.12
- CUDA 11.3 or higher
The following installation suppose python=3.8 pytorch=1.12.1 and cuda=11.3.
-
Create a conda virtual environment
conda create -n 3d-gres python=3.8 conda activate 3d-gres -
Clone the repository
git clone https://github.com/sosppxo/MDIN.git -
Install the dependencies
Install Pytorch 1.12.1
pip install spconv-cu113 conda install pytorch-scatter -c pyg # or pip install https://data.pyg.org/whl/torch-1.12.0%2Bcu113/torch_scatter-2.0.9-cp38-cp38-linux_x86_64.whl pip install -r requirements.txtInstall segmentator from this repo (We wrap the segmentator in ScanNet).
-
Setup, Install mdin and pointgroup_ops.
sudo apt-get install libsparsehash-dev python setup.py develop cd gres_model/lib/ python setup.py develop -
Compile pointnet++
cd pointnet2
python setup.py install --user
cd ..
Data Preparation
ScanNet v2 dataset
Download the ScanNet v2 dataset.
Put the downloaded scans folder as follows.
MDIN
โโโ data
โ โโโ scannetv2
โ โ โโโ scans
Split and preprocess point cloud data
cd data/scannetv2
bash prepare_data.sh
The script data into train/val folder and preprocess the data. After running the script the scannet dataset structure should look like below.
MDIN
โโโ data
โ โโโ scannetv2
โ โ โโโ scans
โ โ โโโ train
โ โ โโโ val
ScanRefer dataset
Download ScanRefer annotations following the instructions.
In the original ScanRefer annotations, all ann_id within each scene were individually assigned based on the corresponding object_id, resulting in duplicate ann_id. We have modified the ScanRefer annotations, and the revised annotation data, where each ann_id within a scene is unique, can be accessed here.
Put the downloaded ScanRefer folder as follows.
MDIN
โโโ data
โ โโโ ScanRefer
โ โ โโโ ScanRefer_filtered_train_new.json
โ โ โโโ ScanRefer_filtered_val_new.json
Multi3DRefer dataset
Downloading the Multi3DRefer annotations.
Put the downloaded Multi3DRefer folder as follows.
MDIN
โโโ data
โ โโโ Multi3DRefer
โ โ โโโ multi3drefer_train.json
โ โ โโโ multi3drefer_val.json
There are some typos in the original text, please correct them according to Issue #6 to prevent syntax parsing errors.
Or download the modified Multi3DRefer(New)
ReferIt3D dataset
Downloading the ReferIt3D annotations and convert the .csv file into a .json format consistent with the Multi3DRefer format.
Put the downloaded ReferIt3D folder as follows.
MDIN
โโโ data
โ โโโ ReferIt3D
โ โ โโโ sr3d_train.json
โ โ โโโ sr3d_val.json
โ โ โโโ nr3d_train.json
โ โ โโโ nr3d_val.json
Or download the modified ReferIt3D(.json)
Pretrained Backbone
Download SPFormer pretrained model (We only use the Sparse 3D U-Net backbone for training).
Move the pretrained model to backbones.
mkdir backbones
mv ${Download_PATH}/sp_unet_backbone.pth backbones/
Models
Download pretrain models and move it to checkpoints.
| Benchmark | Task | mIoU | Acc@0.25 | Acc@0.5 | Model |
|---|---|---|---|---|---|
| Multi3DRes | 3D-GRES | 47.5 | 66.9 | 44.7 | Model |
| ScanRefer | 3D-RES | 48.3 | 58.0 | 53.1 | Model |
| Nr3D | 3D-RES | 38.6 | 48.4 | 42.2 | Model |
| Sr3D | 3D-RES | 46.4 | 56.6 | 51.3 | Model |
Training
For 3D-GRES:
bash scripts/train_3dgres.sh
For 3D-RES:
bash scripts/train_3dres.sh
Inference
For 3D-GRES:
bash scripts/test_3dgres.sh
For 3D-RES:
bash scripts/test_3dres.sh
Citation
If you find this work useful in your research, please cite:
@misc{wu20243dgresgeneralized3dreferring,
title={3D-GRES: Generalized 3D Referring Expression Segmentation},
author={Changli Wu and Yihang Liu and Jiayi Ji and Yiwei Ma and Haowei Wang and Gen Luo and Henghui Ding and Xiaoshuai Sun and Rongrong Ji},
year={2024},
eprint={2407.20664},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.20664},
}
Ancknowledgement
Sincerely thanks for ReLA, M3DRef-CLIP, EDA, SceneGraphParser, SoftGroup, SSTNet and SPFormer repos. This repo is build upon them.