RoboBEV Benchmark

April 6, 2023 · View on GitHub

RoboBEV Benchmark

The official nuScenes metrics are considered in our benchmark:

Average Precision (AP)

The average precision (AP) defines a match by thresholding the 2D center distance d on the ground plane instead of the intersection over union (IoU). This is done in order to decouple detection from object size and orientation but also because objects with small footprints, like pedestrians and bikes, if detected with a small translation error, give $0IoU.WethencalculateAPasthenormalizedareaundertheprecisionrecallcurveforrecallandprecisionover10 IoU. We then calculate AP as the normalized area under the precision-recall curve for recall and precision over 10%. Operating points where recall or precision is less than \10% are removed in order to minimize the impact of noise commonly seen in low precision and recall regions. If no operating point in this region is achieved, the AP for that class is set to zero. We then average over-matching thresholds of \mathbb{D}={0.5, 1, 2, 4}metersandthesetofclassesmeters and the set of classes\mathbb{C}$ :

mAP= 1CDcCdDAPc,d.\text{mAP}= \frac{1}{|\mathbb{C}||\mathbb{D}|}\sum_{c\in\mathbb{C}}\sum_{d\in\mathbb{D}}\text{AP}_{c,d} .

True Positive (TP)

All TP metrics are calculated using d=2d=2 m center distance during matching, and they are all designed to be positive scalars. Matching and scoring happen independently per class and each metric is the average of the cumulative mean at each achieved recall level above $10%. If a \10% recall is not achieved for a particular class, all TP errors for that class are set to \1$.

  • Average Translation Error (ATE) is the Euclidean center distance in 2D (units in meters).
  • Average Scale Error (ASE) is the 3D intersection-over-union (IoU) after aligning orientation and translation ($1$ − IoU).
  • Average Orientation Error (AOE) is the smallest yaw angle difference between prediction and ground truth (radians). All angles are measured on a full $360-degree period except for barriers where they are measured on a \180$-degree period.
  • Average Velocity Error (AVE) is the absolute velocity error as the L2 norm of the velocity differences in 2D (m/s).
  • Average Attribute Error (AAE) is defined as $1 minus attribute classification accuracy (\1$ − acc).

nuScenes Detection Score (NDS)

mAP with a threshold on IoU is perhaps the most popular metric for object detection. However, this metric can not capture all aspects of the nuScenes detection tasks, like velocity and attribute estimation. Further, it couples location, size, and orientation estimates. nuScenes proposed instead consolidating the different error types into a scalar score:

NDS = 110 [5mAP+mTPTP (1min(1, mTP))].\text{NDS} = \frac{1}{10} [5\text{mAP}+\sum_{\text{mTP}\in\mathbb{TP}} (1-\min(1, \text{mTP}))] .

SOLOFusion-Short-Only

CorruptionNDSmAPmATEmASEmAOEmAVEmAAE
Clean0.39070.34380.66910.28090.66380.88030.3180
Cam Crash0.25410.11320.75420.28480.73370.92480.3273
Frame Lost0.21950.08480.80660.32850.74071.00920.3785
Color Quant0.28040.20130.77900.32140.77020.98250.3706
Motion Blur0.26030.17170.81450.29680.83530.98310.3414
Brightness0.29660.23390.74970.32580.80381.06630.3433
Low Light0.20330.11380.77440.37160.91461.15180.4757
Fog0.29980.22600.75560.29080.77611.00740.3238
Snow0.10660.04270.93990.58880.90261.12120.7160

Experiment Log

Time: Thu Apr 6 15:58:22 2023

Camera Crash

SeverityNDSmAPmATEmASEmAOEmAVEmAAE
Easy0.29460.18050.71430.28440.72470.91250.3202
Moderate0.22960.07940.78530.28460.74000.95780.3334
Hard0.23820.07970.76300.28530.73630.90420.3282
Average0.25410.11320.75420.28480.73370.92480.3273

Frame Lost

SeverityNDSmAPmATEmASEmAOEmAVEmAAE
Easy0.30190.19320.71770.28320.70400.92450.3178
Moderate0.20820.05350.81020.28890.76151.02390.3254
Hard0.14840.00760.89180.41350.75661.07920.4924
Average0.21950.08480.80660.32850.74071.00920.3785

Color Quant

SeverityNDSmAPmATEmASEmAOEmAVEmAAE
Easy0.36860.30810.68730.28320.66570.90080.3174
Moderate0.29450.21360.76360.29280.75060.98310.3327
Hard0.17810.08220.88600.38830.89421.06360.4618
Average0.28040.20130.77900.32140.77020.98250.3706

Motion Blur

SeverityNDSmAPmATEmASEmAOEmAVEmAAE
Easy0.35180.28910.70710.28510.71700.90350.3147
Moderate0.23210.13690.84250.29900.87481.00240.3471
Hard0.19690.08920.89390.30640.91411.04330.3624
Average0.26030.17170.81450.29680.83530.98310.3414

Brightness

SeverityNDSmAPmATEmASEmAOEmAVEmAAE
Easy0.35370.29700.70280.29080.70370.94360.3070
Moderate0.29550.22340.75820.30540.79211.10700.3068
Hard0.24060.18140.78810.38120.91571.14830.4162
Average0.29660.23390.74970.32580.80381.06630.3433

Low Light

SeverityNDSmAPmATEmASEmAOEmAVEmAAE
Easy0.25540.16440.75480.30550.84031.04930.3674
Moderate0.20300.11970.77440.38020.93591.10850.4777
Hard0.15140.05730.79390.42910.96761.29760.5819
Average0.20330.11380.77440.37160.91461.15180.4757

Fog

SeverityNDSmAPmATEmASEmAOEmAVEmAAE
Easy0.32360.25550.73140.28730.76040.95690.3051
Moderate0.29920.22780.75580.29080.77491.01200.3258
Hard0.27670.19480.77960.29420.79311.05340.3406
Average0.29980.22600.75560.29080.77611.00740.3238

Snow

SeverityNDSmAPmATEmASEmAOEmAVEmAAE
Easy0.17820.08360.87250.38550.88761.05050.4901
Moderate0.07020.02360.96310.72310.87061.13040.8590
Hard0.07140.02100.98420.65780.94971.18270.7990
Average0.10660.04270.93990.58880.90261.12120.7160

References

@article{Park2022TimeWT,
  title={Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection},
  author={Park, Jinhyung and Xu, Chenfeng and Yang, Shijia and Keutzer, Kurt and Kitani, Kris and Tomizuka, Masayoshi and Zhan, Wei},
  booktitle={International Conference on Learning Representations},
  year={2023}
}