README.md

December 6, 2025 · View on GitHub

English | 简体中文

Benchmarking and Improving Bird's Eye View Perception Robustness
in Autonomous Driving

Shaoyuan Xie1   Lingdong Kong2,3   Wenwei Zhang2,4   Jiawei Ren4   Liang Pan2   Kai Chen2   Ziwei Liu4
1University of California, Irvine   2Shanghai AI Laboratory   3National University of Singapore   4S-Lab, Nanyang Technological University

About

RoboBEV is the first robustness evaluation benchmark tailored for camera-based bird's eye view (BEV) perception under natural data corruption and domain shift, which are cases that have a high likelihood to occur in real-world deployments.

[Common Corruption] - We investigate eight data corruption types that are likely to appear in driving scenarios, ranging from 1sensor failure, 2motion & data processing, 3lighting conditions, and 4weather conditions.

[Domain Shift] - We benchmark the adaptation performance of BEV models from three aspects, including 1city-to-city, 2day-to-night, and 3dry-to-rain.

FRONT_LEFTFRONTFRONT_RIGHTFRONT_LEFTFRONTFRONT_RIGHT
BACK_LEFTBACKBACK_RIGHTBACK_LEFTBACKBACK_RIGHT

Visit our project page to explore more examples. :blue_car:

:books: Citation

If you find this work helpful, please kindly consider citing the following:

@article{xie2025benchmarking,
    title     = {Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving},
    author    = {Xie, Shaoyuan and Kong, Lingdong and Zhang, Wenwei and Ren, Jiawei and Pan, Liang and Chen, Kai and Liu, Ziwei},
    journal   = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
    volume    = {47},
    number    = {5},
    pages     = {3878-3894},  
    year      = {2025}
}
@article{xie2023robobev,
    title     = {{RoboBEV}: Towards Robust Bird's Eye View Perception under Corruptions},
    author    = {Xie, Shaoyuan and Kong, Lingdong and Zhang, Wenwei and Ren, Jiawei and Pan, Liang and Chen, Kai and Liu, Ziwei},
    journal   = {arXiv preprint arXiv:2304.06719}, 
    year      = {2023}
}

Updates

  • [2024.06] - Check out our updated paper for robust BEV perception: Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving. :fuelpump:
  • [2024.05] - Check out the technical report of this competition: The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition. :blue_car:
  • [2024.05] - The slides of the 2024 RoboDrive Workshop are available here :arrow_heading_up:.
  • [2024.05] - The video recordings are available on YouTube :arrow_heading_up: and Bilibili :arrow_heading_up:.
  • [2024.05] - We are glad to announce the winning teams of the 2024 RoboDrive Challenge:
    • Track 1: Robust BEV Detection
      • :1st_place_medal: DeepVision, :2nd_place_medal: Ponyville Autonauts Ltd, :3rd_place_medal: CyberBEV
    • Track 2: Robust Map Segmentation
      • :1st_place_medal: SafeDrive-SSR, :2nd_place_medal: CrazyFriday, :3rd_place_medal: Samsung Research
    • Track 3: Robust Occupancy Prediction
      • :1st_place_medal: ViewFormer, :2nd_place_medal: APEC Blue, :3rd_place_medal: hm.unilab
    • Track 4: Robust Depth Estimation
      • :1st_place_medal: HIT-AIIA, :2nd_place_medal: BUAA-Trans, :3rd_place_medal: CUSTZS
    • Track 5: Robust Multi-Modal BEV Detection
      • :1st_place_medal: safedrive-promax, :2nd_place_medal: Ponyville Autonauts Ltd, :3rd_place_medal: HITSZrobodrive
  • [2024.01] - The toolkit tailored for the 2024 RoboDrive Challenge has been released. :hammer_and_wrench:
  • [2023.12] - We are hosting the RoboDrive Challenge at ICRA 2024. :blue_car:
  • [2023.06] - The nuScenes-C dataset is now available at OpenDataLab! 🚀
  • [2023.04] - We establish "Robust BEV Perception" leaderboards on Paper-with-Code. Join the challenge today! :raising_hand:
  • [2023.02] - We invite every BEV enthusiast to participate in the robust BEV perception benchmark! For more details, please read this page. :beers:
  • [2023.01] - Launch of RoboBEV! In this initial version, 11 BEV detection algorithms and 1 monocular 3D detection algorithm have been benchmarked under 8 corruption types across 3 severity levels.

Outline

Installation

Kindly refer to INSTALL.md for the installation details.

Data Preparation

Our datasets are hosted by OpenDataLab.


OpenDataLab is a pioneering open data platform for the large AI model era, making datasets accessible. By using OpenDataLab, researchers can obtain free formatted datasets in various fields.

Kindly refer to DATA_PREPARE.md for the details to prepare the nuScenes and nuScenes-C datasets.

Getting Started

Kindly refer to GET_STARTED.md to learn more usage about this codebase.

Model Zoo

 Camera-Only BEV Detection
 Camera-Only Monocular 3D Detection
 LiDAR-Camera Fusion BEV Detection
 Camera-Only BEV Map Segmentation
 Multi-Camera Depth Estimation
 Multi-Camera Semantic Occupancy Prediction

Robustness Benchmark

:triangular_ruler: Metrics: The nuScenes Detection Score (NDS) is consistently used as the main indicator for evaluating model performance in our benchmark. The following two metrics are adopted to compare between models' robustness:

  • mCE (the lower the better): The average corruption error (in percentage) of a candidate model compared to the baseline model, which is calculated among all corruption types across three severity levels.
  • mRR (the higher the better): The average resilience rate (in percentage) of a candidate model compared to its "clean" performance, which is calculated among all corruption types across three severity levels.

:gear: Notation: Symbol :star: denotes the baseline model adopted in mCE calculation. For more detailed experimental results, please refer to RESULTS.md.

BEV Detection

ModelmCE (%) \downarrowmRR (%) \uparrowCleanCam CrashFrame LostColor QuantMotion BlurBrightLow LightFogSnow
DETR3D:star:100.0070.770.42240.28590.26040.31770.26610.40020.27860.39120.1913
DETR3DCBGS99.2170.020.43410.29910.26850.32350.25420.41540.27660.40200.1925
BEVFormerSmall101.2359.070.47870.27710.24590.32750.25700.37410.24130.35830.1809
BEVFormerBase97.9760.400.51740.31540.30170.35090.26950.41840.25150.40690.1857
PETRR50-p4111.0161.260.36650.23200.21660.24720.22990.28410.15710.28760.1417
PETRVoV-p4100.6965.030.45500.29240.27920.29680.24900.38580.23050.37030.2632
ORA3D99.1768.630.44360.30550.27500.33600.26470.40750.26130.39590.1898
BEVDetR50115.1251.830.37700.24860.19240.24080.20610.25650.11020.24610.0625
BEVDetR101113.6853.120.38770.26220.20650.25460.22650.25540.11180.24950.0810
BEVDetR101-pt112.8056.350.37800.24420.19620.30410.25900.25990.13980.20730.0939
BEVDetSwinT116.4846.260.40370.26090.21150.22780.21280.21910.04900.24500.0680
BEVDepthR50110.0256.820.40580.26380.21410.27510.25130.28790.17570.29030.0863
BEVerseSwinT110.6748.600.46650.31810.30370.26000.26470.26560.05930.27810.0644
BEVerseSwinS117.8249.570.49510.33640.24850.28070.26320.33940.11180.28490.0985
PolarFormerR10196.0670.880.46020.31330.28080.35090.32210.43040.25540.42620.2304
PolarFormerVoV98.7567.510.45580.31350.28110.30760.23440.42800.24410.40610.2468
SRCN3DR10199.6770.230.42860.29470.26810.33180.26090.40740.25900.39400.1920
SRCN3DVoV102.0467.950.42050.28750.25790.28270.21430.38860.22740.37740.2499
Sparse4DR101100.0155.040.54380.28730.26110.33100.25140.39840.25100.38840.2259
SOLOFusionshort108.6861.450.39070.25410.21950.28040.26030.29660.20330.29980.1066
SOLOFusionlong97.9964.420.48500.31590.24900.35980.34600.40020.28140.39910.1480
SOLOFusionfusion92.8664.530.53810.38060.34640.40580.36420.43290.26260.44800.1376
FCOS3Dfinetune107.8262.090.39490.28490.24790.25740.25700.32180.14680.33210.1136
BEVFusionCam109.0257.810.41210.27770.22550.27630.27880.29020.10760.30410.1461
BEVFusionLiDAR--0.6928--------
BEVFusionC+L43.8097.410.71380.69630.69310.70440.69770.70180.6787--
TransFusion--0.68870.68430.64470.68190.67490.68430.6663--
AutoAlignV2--0.61390.58490.58320.60060.59010.60760.5770--

Multi-Camera Depth Estimation

ModelMetricCleanCam CrashFrame LostColor QuantMotion BlurBrightLow LightFogSnow
SurroundDepthAbs Rel0.2800.4850.4970.3340.3380.3390.3540.3200.423

Multi-Camera Semantic Occupancy Prediction

ModelMetricCleanCam CrashFrame LostColor QuantMotion BlurBrightLow LightFogSnow
TPVFormermIoU vox52.0627.3922.8538.1638.6449.0037.3846.6919.39
SurroundOccSC mIoU20.3011.6010.0014.0312.4119.1812.1518.427.39

BEV Model Calibration

ModelPretrainTemporalDepthCBGSBackboneEncoderBEVInput SizemCE (%)mRR (%)NDS
DETR3DResNetAttention1600×900100.0070.770.4224
DETR3DCBGSResNetAttention1600×90099.2170.020.4341
BEVFormerSmallResNetAttention1280×720101.2359.070.4787
BEVFormerBaseResNetAttention1600×90097.9760.400.5174
PETRR50-p4ResNetAttention1408×512111.0161.260.3665
PETRVoV-p4VoVNetV2Attention1600×900100.6965.030.4550
ORA3DResNetAttention1600×90099.1768.630.4436
PolarFormerR101ResNetAttention1600×90096.0670.880.4602
PolarFormerVoVVoVNetV2Attention1600×90098.7567.510.4558
SRCN3DR101ResNetCNN+Attn.1600×90099.6770.230.4286
SRCN3DVoVVoVNetV2CNN+Attn.1600×900102.0467.950.4205
Sparse4DR101ResNetCNN+Attn.1600×900100.0155.040.5438
BEVDetR50ResNetCNN704×256115.1251.830.3770
BEVDetR101ResNetCNN704×256113.6853.120.3877
BEVDetR101-ptResNetCNN704×256112.8056.350.3780
BEVDetSwinTSwinCNN704×256116.4846.260.4037
BEVDepthR50ResNetCNN704×256110.0256.820.4058
BEVerseSwinTSwinCNN704×256137.2528.240.1603
BEVerseSwinTSwinCNN704×256110.6748.600.4665
BEVerseSwinSSwinCNN1408×512132.1329.540.2682
BEVerseSwinSSwinCNN1408×512117.8249.570.4951
SOLOFusionshortResNetCNN704×256108.6861.450.3907
SOLOFusionlongResNetCNN704×25697.9964.420.4850
SOLOFusionfusionResNetCNN704×25692.8664.530.5381

Note: Pretrain denotes models initialized from the FCOS3D checkpoint. Temporal indicates whether temporal information is used. Depth denotes models with an explicit depth estimation branch. CBGS highlight models use the class-balanced group-sampling strategy.

Create Corruption Set

You can manage to create your own "RoboBEV" corrpution sets! Follow the instructions listed in CREATE.md.

TODO List

  • Initial release. 🚀
  • Add scripts for creating common corruptions.
  • Add download link of nuScenes-C.
  • Add evaluation scripts on corruption sets.
  • Establish benchmark for BEV map segmentation.
  • Establish benchmark for multi-camera depth estimation.
  • Establish benchmark for multi-camera semantic occupancy prediction.
  • ...

License

Creative Commons License
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, while some specific operations in this codebase might be with other licenses. Please refer to LICENSE.md for a more careful check, if you are using our code for commercial matters.

Acknowledgements

This work is developed based on the MMDetection3D codebase.


MMDetection3D is an open source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection. It is a part of the OpenMMLab project developed by MMLab.

:heart: We thank Jiangmiao Pang and Tai Wang for their insightful discussions and feedback. We thank the OpenDataLab platform for hosting our datasets.