Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble

January 22, 2025 · View on GitHub

arXiv PWC


Juhan Cha*, Minseok Joo*, Jihwan Park, Sanghyeok Lee, Injae Kim, Hyunwoo J. Kim†.

Installation

Please follow the instructions to set up the MEFormer

Environments

  • Python 3.8
  • CUDA 11.1
  • PyTorch 1.10

1. Clone Repository

git clone https://github.com/hanchaa/MEFormer.git
cd MEFormer

2. Create environment & Install libraries

conda create -n MEFormer python=3.8
conda activate MEFormer
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html

pip install openmim
mim install mmcv-full==1.6.0
pip install -r requirements.txt

3. Download pre-trained weights

Download the pretrained weight of the image backbone from Google Drive and move them to ckpts directory.

MEFormer
├─ ckpts
  ├─ fcos3d_vovnet_imgbackbone-remapped.pth
  └─ nuim_r50.pth
├─ figures
├─ projects
└─ tools

4. Prepare data

Run create_data.sh script.

bash tools/create_data.sh

Train & Inference

Train

tools/dist_train.sh $path_to_config$ 8

Inference

tools/dist_test.sh $path_to_config$ $path_to_weight$ 8 --eval bbox

Results

Results on nuScenes validation set.

ConfigNDSmAPScheduleFPSweights
MEFormer73.9%71.5%6 epoch *3.1Google Drive
MEFormer w/o PME73.7%71.3%20 epoch3.4Google Drive

FPS is measured with a single NVIDIA A6000 GPU.

* means MEFormer with PME should be trained after MEFormer w/o PME is trained first.