Getting Started & Problem Definition

March 19, 2023 · View on GitHub

The purpose of an Unsupervised Domain Adaptation (UDA) task is to learn a generalized model or backbone FF on a labeled source domain ss and an unlabeled target domain tt, such that the FF can be adapted to the new target domain tt, where unlabeled training data (such as point cloud or images) from the target domain tt are assumed to be available during the adaptation process.

Getting Started & Task Challenges

  • Different domains present inconsistent object-size distribution, as illustrated in file of object-size statistics. Thus, Statistical Normalization (SN) is used to rescale the object-size during the source-domain training process, where both the bounding box size and point cloud within this bounding box are rescaled. For Waymo-to-KITTI adaptation, we found that the object-size variation is a major reason of cross-domain detection accuracy drop.

  • LiDAR beam is also constantly changing for different AD manufacturers. For waymo-to-nuScenes adaptation, we argue that the LiDAR-beam variation is a major challenge, and leverage the range-map provided by Waymo tfrecords to produce the low-beam point clouds (such as 32-beam or 16-beam). Please refer to Results for more details.

    • For some dataset where the range-map is not provided, such as ONCE dataset, one can employ the clustering algorithm on height-angles to obtain the pseudo-labeled low-beam point clouds, which is also verified to be effective in our codebase.

   

Getting Started & Training-Testing for UDA Setting

Here, we take Waymo-to-KITTI adaptation as an example.

Pretraining stage: train the source-only model on the labeled source domain:

  • Train FEAT=3 (X,Y,Z) with SN (statistical normalization) using multiple GPUs
sh scripts/dist_train.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_old_anchor_sn_kitti.yaml
  • Train FEAT=3 (X,Y,Z) with SN (statistical normalization) using multiple machines
sh scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} ${NUM_NODES} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_old_anchor_sn_kitti.yaml
  • Train FEAT=3 (X,Y,Z) without SN (statistical normalization) using multiple GPUs
sh scripts/dist_train.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_feat_3_vehi.yaml
  • Train FEAT=3 (X,Y,Z) without SN (statistical normalization) using multiple machines
sh scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} ${NUM_NODES} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_feat_3_vehi.yaml
  • Train other baseline detectors such as PV-RCNN++ using multiple GPUs
sh scripts/dist_train.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pv_rcnn_plus_feat_3_vehi_full_train.yaml
  • Train other baseline detectors such as Voxel-RCNN using multiple GPUs
sh scripts/dist_train.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/voxel_rcnn_feat_3_vehi.yaml

Evaluate the source-pretrained model:

  • Note that for the cross-domain setting where the KITTI dataset is regarded as the target domain, please try --set DATA_CONFIG_TAR.FOV_POINTS_ONLY True to enable front view point cloud only. We report the best model for all epochs on the validation set.

  • Test the source-only models using multiple GPUs

sh scripts/dist_test.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_feat_3_vehi.yaml \
--ckpt ${CKPT} 
  • Test the source-only models using multiple machines
sh scripts/slurm_test_mgpu.sh ${PARTITION} ${NUM_NODES} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_feat_3_vehi.yaml \
--ckpt ${CKPT}
  • Test the source-only models of all ckpts using multiple GPUs
sh scripts/dist_test.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_feat_3_vehi.yaml \
--eval_all
  • Test the source-only models of all ckpts using multiple machines
sh scripts/slurm_test_mgpu.sh ${PARTITION} ${NUM_NODES} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_feat_3_vehi.yaml \
--eval_all

Adaptation stage: self-training the source-model on the unlabeled target domain:

  • You need to set the --pretrained_model ${PRETRAINED_MODEL} when finish the above pretraining model stage.

  • If you train the source-only model using the SN (statistical normalization). For example, you train the model with pvrcnn_old_anchor_sn_kitti.yaml, you should perform the pre-SN script as follows, where pre-SN represents that we perform the SN (statistical normalization) operation before the adaptation stage.

  • Train FEAT=3 (X,Y,Z) with pre-SN (statistical normalization) using multiple machines

sh scripts/UDA/slurm_train_uda.sh ${PARTITION} ${JOB_NAME} ${NUM_NODES} ${QUOTATYPE} \
--cfg_file ./cfgs/DA/waymo_kitti/pvrcnn_pre_SN_feat_3.yaml \
--pretrained_model ${PRETRAINED_MODEL}
  • Train FEAT=3 (X,Y,Z) with pre-SN (statistical normalization) using multiple GPUs
sh scripts/UDA/dist_train_uda.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/pvrcnn_pre_SN_feat_3.yaml \
--pretrained_model ${PRETRAINED_MODEL}
  • If you train the source-only model without using the SN (statistical normalization), you should perform the post-SN script as follows, where post-SN represents that we perform the SN (statistical normalization) operation during the adaptation stage.

  • Train FEAT=3 (X,Y,Z) with post-SN (statistical normalization) using multiple machines

sh scripts/UDA/slurm_train_uda.sh ${PARTITION} ${JOB_NAME} ${NUM_NODES} ${QUOTATYPE} \
--cfg_file ./cfgs/DA/waymo_kitti/pvrcnn_post_SN_feat_3.yaml \
--pretrained_model ${PRETRAINED_MODEL}
  • Train FEAT=3 (X,Y,Z) with post-SN (statistical normalization) using multiple GPUs
sh scripts/UDA/dist_train_uda.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/pvrcnn_post_SN_feat_3.yaml \
--pretrained_model ${PRETRAINED_MODEL}

Evaluating the model on the target validation set:

  • Note that for the cross-domain setting where the KITTI dataset is regarded as the target domain, please try --set DATA_CONFIG_TAR.FOV_POINTS_ONLY True to enable front view point cloud only. We report the best model for all epochs on the validation set.

  • Test with a ckpt file:

python test.py \
--cfg_file ${CONFIG_FILE} \
--batch_size ${BATCH_SIZE} \
--ckpt ${CKPT}
  • To test all the saved checkpoints of a specific training setting and draw the performance curve on the Tensorboard, add the --eval_all argument:
python test.py \
--cfg_file ${CONFIG_FILE} \
--batch_size ${BATCH_SIZE} \
--eval_all
  • To test with multiple GPUs:
sh scripts/dist_test.sh ${NUM_GPUs} \ 
--cfg_file ${CONFIG_FILE} \
--batch_size ${BATCH_SIZE} \
--ckpt ${CKPT}
  • To test all ckpts with multiple GPUs
sh scripts/dist_test.sh ${NUM_GPUs} \
--cfg_file ${CONFIG_FILE} \
--batch_size ${BATCH_SIZE} \
--eval_all
  • To test with multiple machines:
sh scripts/slurm_test_mgpu.sh ${PARTITION} ${NUM_NODES} \ 
    --cfg_file ${CONFIG_FILE} --batch_size ${BATCH_SIZE} --ckpt ${CKPT}
  • To test all ckpts with multiple machines:
sh scripts/slurm_test_mgpu.sh ${PARTITION} ${NUM_NODES} \ 
    --cfg_file ${CONFIG_FILE} --batch_size ${BATCH_SIZE} --eval_all

   

All UDA Results:

We report the cross-dataset adaptation results including Waymo-to-KITTI, nuScenes-to-KITTI, and Waymo-to-nuScenes.

  • All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
  • The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
  • All results are reported using the BEV/3D AP performance as the evaluation metric. We report the moderate case for KITTI dataset.
  • Pre-SN represents that we perform the SN (statistical normalization) operation during the pre-training stage (SN for source domain).
  • Post-SN represents that we perform the SN (statistical normalization) operation during the adaptation stage (SN for target domain).

UDA Results for Waymo-to-KITTI:

training timeAdaptationCar@R40download
PointPillar~7.1 hoursSource-only with SN74.98 / 49.31-
PointPillar~0.6 hoursPre-SN81.71 / 57.11model-57M
PV-RCNN~23 hoursSource-only with SN69.92 / 60.17-
PV-RCNN~23 hoursSource-only74.42 / 40.35-
PV-RCNN~3.5 hoursPre-SN84.00 / 74.57model-156M
PV-RCNN~1 hoursPost-SN84.94 / 75.20model-156M
Voxel R-CNN~16 hoursSource-only with SN75.83 / 55.50-
Voxel R-CNN~16 hoursSource-only64.88 / 19.90-
Voxel R-CNN~2.5 hoursPre-SN82.56 / 67.32model-201M
Voxel R-CNN~2.2 hoursPost-SN85.44 / 76.78model-201M
PV-RCNN++~20 hoursSource-only with SN67.22 / 56.50-
PV-RCNN++~20 hoursSource-only67.68 / 20.82-
PV-RCNN++~2.2 hoursPost-SN86.86 / 79.86model-193M

UDA Results for nuScenes-to-KITTI:

training timeAdaptationCar@R40download
PV-RCNN~15.7 hoursSource-only with SN60.16 / 49.63model-156M
PV-RCNN~15.7 hoursSource-only64.58 / 27.12model-156M
PV-RCNN~1.5 hoursPre-SN86.07 / 74.72model-156M
PV-RCNN~1 hoursPost-SN88.79 / 72.50model-156M
Voxel R-CNN~8.5 hoursSource-only66.94 / 30.33model-201M
Voxel R-CNN~2.2 hoursPost-SN87.11 / 66.02model-201M
PV-RCNN++~18 hoursSource-only with SN54.47 / 36.05model-193M
PV-RCNN++~18 hoursSource-only67.68 / 20.82model-193M
PV-RCNN++~1 hoursPost-SN85.50 / 67.85model-193M

UDA Results for Waymo-to-nuScenes:

  • [16-beam Waymo Train] deontes that we down-sample the point clouds of Waymo dataset from 64-beam to 16-beam, according to the given range map of the corresponding point clouds, and then we train the source-only model on the 16-beam Waymo data.
training timeAdaptationCar@R40download
PV-RCNN~23 hoursSource-only31.02 / 21.21-
PV-RCNN~8 hoursSelf-training33.29 / 22.15model-156M
PV-RCNN~19 hours32-beam Waymo Train34.19 / 21.37model-156M
PV-RCNN~15 hours16-beam Waymo Train40.23 / 23.33model-156M
PV-RCNN~8 hours16-beam Waymo + Self-training--
Voxel R-CNN~16 hoursSource-only29.08 / 19.42-
Voxel R-CNN~2.2 hoursSelf-training32.48 / 20.87model-201M
Voxel R-CNN~11 hours16-beam Waymo Train38.63 / 22.64model-201M
PV-RCNN++~20 hoursSource-only31.96 / 19.80-
PV-RCNN++~2.2 hoursSelf-training--
PV-RCNN++~15.5 hours16-beam Waymo Train42.62 / 25.02model-193M
PV-RCNN++~2.2 hours16-beam Waymo + Self-training--