Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble

January 22, 2025 · View on GitHub

Juhan Cha*, Minseok Joo*, Jihwan Park, Sanghyeok Lee, Injae Kim, Hyunwoo J. Kim†.

Installation

Please follow the instructions to set up the MEFormer

Environments

Python 3.8
CUDA 11.1
PyTorch 1.10

1. Clone Repository

git clone https://github.com/hanchaa/MEFormer.git
cd MEFormer

2. Create environment & Install libraries

conda create -n MEFormer python=3.8
conda activate MEFormer
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html

pip install openmim
mim install mmcv-full==1.6.0
pip install -r requirements.txt

3. Download pre-trained weights

Download the pretrained weight of the image backbone from Google Drive and move them to ckpts directory.

MEFormer
├─ ckpts
│  ├─ fcos3d_vovnet_imgbackbone-remapped.pth
│  └─ nuim_r50.pth
├─ figures
├─ projects
└─ tools

4. Prepare data

Run create_data.sh script.

bash tools/create_data.sh

Train & Inference

Train

tools/dist_train.sh $path_to_config$ 8

Inference

tools/dist_test.sh $path_to_config$ $path_to_weight$ 8 --eval bbox

Results

Results on nuScenes validation set.

Config	NDS	mAP	Schedule	FPS	weights
MEFormer	73.9%	71.5%	6 epoch *	3.1	Google Drive
MEFormer w/o PME	73.7%	71.3%	20 epoch	3.4	Google Drive

FPS is measured with a single NVIDIA A6000 GPU.

* means MEFormer with PME should be trained after MEFormer w/o PME is trained first.