BEVDet

July 4, 2024 · View on GitHub

News

2024.07.01 DAL is accepted to ECCV24.
2023.11.08 Support DAL for 3D object detection with LiDAR-camera fusion. [Arxiv]
History

Main Results

Nuscenes Detection

Config	mAP	NDS	Latency(ms)	FPS	Model	Log
BEVDet-R50	28.3	35.0	29.1/4.2/33.3	30.7	baidu	baidu
BEVDet-R50-CBGS	31.3	39.8	28.9/4.3/33.2	30.1	baidu	baidu
BEVDet-R50-4D-CBGS	31.4/35.4#	44.7/44.9#	29.1/4.3/33.4	30.0	baidu	baidu
BEVDet-R50-4D-Depth-CBGS	36.1/36.2#	48.3/48.4#	35.7/4.0/39.7	25.2	baidu	baidu
BEVDet-R50-4D-Stereo-CBGS	38.2/38.4#	49.9/50.0#	-	-	baidu	baidu
BEVDet-R50-4DLongterm-CBGS	34.8/35.4#	48.2/48.7#	30.8/4.2/35.0	28.6	baidu	baidu
BEVDet-R50-4DLongterm-Depth-CBGS	39.4/39.9#	51.5/51.9#	38.4/4.0/42.4	23.6	baidu	baidu
BEVDet-R50-4DLongterm-Stereo-CBGS	41.1/41.5#	52.3/52.7#	-	-	baidu	baidu
BEVDet-STBase-4D-Stereo-512x1408-CBGS	47.2#	57.6#	-	-	baidu	baidu

DAL-Tiny	67.4	71.3	-	16.6	baidu	baidu
DAL-Base	70.0	73.4	-	10.7	baidu	baidu
DAL-Large	71.5	74.0	-	6.10	baidu	baidu

# align previous frame bev feature during the view transformation.

Depth: Depth supervised from Lidar as BEVDepth.

Longterm: cat 8 history frame in temporal modeling. 1 by default.

Stereo: A private implementation that concat cost-volumn with image feature before executing model.view_transformer.depth_net.

The latency includes Network/Post-Processing/Total. Training without CBGS is deprecated.

Nuscenes Occupancy

Config	mIOU	Model	Log
BEVDet-Occ-R50-4D-Stereo-2x	36.1	baidu	baidu
BEVDet-Occ-R50-4D-Stereo-2x-384x704	37.3	baidu	baidu
BEVDet-Occ-R50-4DLongterm-Stereo-2x-384x704	39.3	baidu	baidu
BEVDet-Occ-STBase-4D-Stereo-2x	42.0	baidu	baidu

Inference latency with different backends

Backend	256x704	384x1056	512x1408	640x1760
PyTorch	28.9	49.7	78.7	113.4
TensorRT	14.0	22.8	36.5	53.0
TensorRT-FP16	4.94	7.96	12.4	17.9
TensorRT-INT8	2.93	4.41	6.58	9.19
TensorRT-INT8(Xavier)	25.0	-	-	-

Evaluate with BEVDet-R50-CBGS on a RTX 3090 GPU by default. We omit the postprocessing, which spends up to 5 ms with the PyTorch backend.

Get Started

Installation and Data Preparation

step 1. Please prepare environment as that in Docker.

step 2. Prepare bevdet repo by.

git clone https://github.com/HuangJunJie2017/BEVDet.git
cd BEVDet
pip install -v -e .

step 3. Prepare nuScenes dataset as introduced in nuscenes_det.md and create the pkl for BEVDet by running:

python tools/create_data_bevdet.py

step 4. For Occupancy Prediction task, download (only) the 'gts' from CVPR2023-3D-Occupancy-Prediction and arrange the folder as:

└── nuscenes
    ├── v1.0-trainval (existing)
    ├── sweeps  (existing)
    ├── samples (existing)
    └── gts (new)

Train model

# single gpu
python tools/train.py $config
# multiple gpu
./tools/dist_train.sh $config num_gpu

Test model

# single gpu
python tools/test.py $config $checkpoint --eval mAP
# multiple gpu
./tools/dist_test.sh $config $checkpoint num_gpu --eval mAP

Estimate the inference speed of BEVDet

# with pre-computation acceleration
python tools/analysis_tools/benchmark.py $config $checkpoint --fuse-conv-bn
# 4D with pre-computation acceleration
python tools/analysis_tools/benchmark_sequential.py $config $checkpoint --fuse-conv-bn
# view transformer only
python tools/analysis_tools/benchmark_view_transformer.py $config $checkpoint

Estimate the flops of BEVDet

python tools/analysis_tools/get_flops.py configs/bevdet/bevdet-r50.py --shape 256 704

Visualize the predicted result.

Private implementation. (Visualization remotely/locally)

python tools/test.py $config $checkpoint --format-only --eval-options jsonfile_prefix=$savepath
python tools/analysis_tools/vis.py $savepath/pts_bbox/results_nusc.json

Convert to TensorRT and test inference speed.

1. install mmdeploy from https://github.com/HuangJunJie2017/mmdeploy
2. convert to TensorRT
python tools/convert_bevdet_to_TRT.py $config $checkpoint $work_dir --fuse-conv-bn --fp16 --int8
3. test inference speed
python tools/analysis_tools/benchmark_trt.py $config $engine

Acknowledgement

This project is not possible without multiple great open-sourced code bases. We list some notable examples below.

Beside, there are some other attractive works extend the boundary of BEVDet.

BEVerse for multi-task learning.
BEVStereo for stero depth estimation.

Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entries.

@article{huang2023dal,
  title={Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection},
  author={Huang, Junjie and Ye, Yun and Liang, Zhujin and Shan, Yi and Du, Dalong},
  journal={arXiv preprint arXiv:2311.07152},
  year={2023}
}

@article{huang2022bevpoolv2,
  title={BEVPoolv2: A Cutting-edge Implementation of BEVDet Toward Deployment},
  author={Huang, Junjie and Huang, Guan},
  journal={arXiv preprint arXiv:2211.17111},
  year={2022}
}

@article{huang2022bevdet4d,
  title={BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection},
  author={Huang, Junjie and Huang, Guan},
  journal={arXiv preprint arXiv:2203.17054},
  year={2022}
}

@article{huang2021bevdet,
  title={BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View},
  author={Huang, Junjie and Huang, Guan and Zhu, Zheng and Yun, Ye and Du, Dalong},
  journal={arXiv preprint arXiv:2112.11790},
  year={2021}
}