Getting Started & Problem Definition

March 19, 2023 · View on GitHub

The purpose of an Unsupervised Domain Adaptation (UDA) task is to learn a generalized model or backbone $F$ on a labeled source domain $s$ and an unlabeled target domain $t$ , such that the $F$ can be adapted to the new target domain $t$ , where unlabeled training data (such as point cloud or images) from the target domain $t$ are assumed to be available during the adaptation process.

Getting Started & Task Challenges

Different domains present inconsistent object-size distribution, as illustrated in file of object-size statistics. Thus, Statistical Normalization (SN) is used to rescale the object-size during the source-domain training process, where both the bounding box size and point cloud within this bounding box are rescaled. For Waymo-to-KITTI adaptation, we found that the object-size variation is a major reason of cross-domain detection accuracy drop.
LiDAR beam is also constantly changing for different AD manufacturers. For waymo-to-nuScenes adaptation, we argue that the LiDAR-beam variation is a major challenge, and leverage the range-map provided by Waymo tfrecords to produce the low-beam point clouds (such as 32-beam or 16-beam). Please refer to Results for more details.
- For some dataset where the range-map is not provided, such as ONCE dataset, one can employ the clustering algorithm on height-angles to obtain the pseudo-labeled low-beam point clouds, which is also verified to be effective in our codebase.

Getting Started & Training-Testing for UDA Setting

Here, we take Waymo-to-KITTI adaptation as an example.

Pretraining stage: train the source-only model on the labeled source domain:

Train FEAT=3 (X,Y,Z) with SN (statistical normalization) using multiple GPUs

sh scripts/dist_train.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_old_anchor_sn_kitti.yaml

Train FEAT=3 (X,Y,Z) with SN (statistical normalization) using multiple machines

sh scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} ${NUM_NODES} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_old_anchor_sn_kitti.yaml

Train FEAT=3 (X,Y,Z) without SN (statistical normalization) using multiple GPUs

sh scripts/dist_train.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_feat_3_vehi.yaml

Train FEAT=3 (X,Y,Z) without SN (statistical normalization) using multiple machines

sh scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} ${NUM_NODES} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_feat_3_vehi.yaml

Train other baseline detectors such as PV-RCNN++ using multiple GPUs

sh scripts/dist_train.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pv_rcnn_plus_feat_3_vehi_full_train.yaml

Train other baseline detectors such as Voxel-RCNN using multiple GPUs

sh scripts/dist_train.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/voxel_rcnn_feat_3_vehi.yaml

Evaluate the source-pretrained model:

Note that for the cross-domain setting where the KITTI dataset is regarded as the target domain, please try --set DATA_CONFIG_TAR.FOV_POINTS_ONLY True to enable front view point cloud only. We report the best model for all epochs on the validation set.
Test the source-only models using multiple GPUs

sh scripts/dist_test.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_feat_3_vehi.yaml \
--ckpt ${CKPT}

Test the source-only models using multiple machines

sh scripts/slurm_test_mgpu.sh ${PARTITION} ${NUM_NODES} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_feat_3_vehi.yaml \
--ckpt ${CKPT}

Test the source-only models of all ckpts using multiple GPUs

sh scripts/dist_test.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_feat_3_vehi.yaml \
--eval_all

Test the source-only models of all ckpts using multiple machines

sh scripts/slurm_test_mgpu.sh ${PARTITION} ${NUM_NODES} \
--cfg_file ./cfgs/DA/waymo_kitti/source_only/pvrcnn_feat_3_vehi.yaml \
--eval_all

Adaptation stage: self-training the source-model on the unlabeled target domain:

You need to set the --pretrained_model ${PRETRAINED_MODEL} when finish the above pretraining model stage.
If you train the source-only model using the SN (statistical normalization). For example, you train the model with pvrcnn_old_anchor_sn_kitti.yaml, you should perform the pre-SN script as follows, where pre-SN represents that we perform the SN (statistical normalization) operation before the adaptation stage.
Train FEAT=3 (X,Y,Z) with pre-SN (statistical normalization) using multiple machines

sh scripts/UDA/slurm_train_uda.sh ${PARTITION} ${JOB_NAME} ${NUM_NODES} ${QUOTATYPE} \
--cfg_file ./cfgs/DA/waymo_kitti/pvrcnn_pre_SN_feat_3.yaml \
--pretrained_model ${PRETRAINED_MODEL}

Train FEAT=3 (X,Y,Z) with pre-SN (statistical normalization) using multiple GPUs

sh scripts/UDA/dist_train_uda.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/pvrcnn_pre_SN_feat_3.yaml \
--pretrained_model ${PRETRAINED_MODEL}

If you train the source-only model without using the SN (statistical normalization), you should perform the post-SN script as follows, where post-SN represents that we perform the SN (statistical normalization) operation during the adaptation stage.
Train FEAT=3 (X,Y,Z) with post-SN (statistical normalization) using multiple machines

sh scripts/UDA/slurm_train_uda.sh ${PARTITION} ${JOB_NAME} ${NUM_NODES} ${QUOTATYPE} \
--cfg_file ./cfgs/DA/waymo_kitti/pvrcnn_post_SN_feat_3.yaml \
--pretrained_model ${PRETRAINED_MODEL}

Train FEAT=3 (X,Y,Z) with post-SN (statistical normalization) using multiple GPUs

sh scripts/UDA/dist_train_uda.sh ${NUM_GPUs} \
--cfg_file ./cfgs/DA/waymo_kitti/pvrcnn_post_SN_feat_3.yaml \
--pretrained_model ${PRETRAINED_MODEL}

Evaluating the model on the target validation set:

Note that for the cross-domain setting where the KITTI dataset is regarded as the target domain, please try --set DATA_CONFIG_TAR.FOV_POINTS_ONLY True to enable front view point cloud only. We report the best model for all epochs on the validation set.
Test with a ckpt file:

python test.py \
--cfg_file ${CONFIG_FILE} \
--batch_size ${BATCH_SIZE} \
--ckpt ${CKPT}

To test all the saved checkpoints of a specific training setting and draw the performance curve on the Tensorboard, add the --eval_all argument:

python test.py \
--cfg_file ${CONFIG_FILE} \
--batch_size ${BATCH_SIZE} \
--eval_all

To test with multiple GPUs:

sh scripts/dist_test.sh ${NUM_GPUs} \ 
--cfg_file ${CONFIG_FILE} \
--batch_size ${BATCH_SIZE} \
--ckpt ${CKPT}

To test all ckpts with multiple GPUs

sh scripts/dist_test.sh ${NUM_GPUs} \
--cfg_file ${CONFIG_FILE} \
--batch_size ${BATCH_SIZE} \
--eval_all

To test with multiple machines:

sh scripts/slurm_test_mgpu.sh ${PARTITION} ${NUM_NODES} \ 
    --cfg_file ${CONFIG_FILE} --batch_size ${BATCH_SIZE} --ckpt ${CKPT}

To test all ckpts with multiple machines:

sh scripts/slurm_test_mgpu.sh ${PARTITION} ${NUM_NODES} \ 
    --cfg_file ${CONFIG_FILE} --batch_size ${BATCH_SIZE} --eval_all

All UDA Results:

We report the cross-dataset adaptation results including Waymo-to-KITTI, nuScenes-to-KITTI, and Waymo-to-nuScenes.

All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
All results are reported using the BEV/3D AP performance as the evaluation metric. We report the moderate case for KITTI dataset.
Pre-SN represents that we perform the SN (statistical normalization) operation during the pre-training stage (SN for source domain).
Post-SN represents that we perform the SN (statistical normalization) operation during the adaptation stage (SN for target domain).

UDA Results for Waymo-to-KITTI:

	training time	Adaptation	Car@R40	download
PointPillar	~7.1 hours	Source-only with SN	74.98 / 49.31	-
PointPillar	~0.6 hours	Pre-SN	81.71 / 57.11	model-57M
PV-RCNN	~23 hours	Source-only with SN	69.92 / 60.17	-
PV-RCNN	~23 hours	Source-only	74.42 / 40.35	-
PV-RCNN	~3.5 hours	Pre-SN	84.00 / 74.57	model-156M
PV-RCNN	~1 hours	Post-SN	84.94 / 75.20	model-156M
Voxel R-CNN	~16 hours	Source-only with SN	75.83 / 55.50	-
Voxel R-CNN	~16 hours	Source-only	64.88 / 19.90	-
Voxel R-CNN	~2.5 hours	Pre-SN	82.56 / 67.32	model-201M
Voxel R-CNN	~2.2 hours	Post-SN	85.44 / 76.78	model-201M
PV-RCNN++	~20 hours	Source-only with SN	67.22 / 56.50	-
PV-RCNN++	~20 hours	Source-only	67.68 / 20.82	-
PV-RCNN++	~2.2 hours	Post-SN	86.86 / 79.86	model-193M

UDA Results for nuScenes-to-KITTI:

	training time	Adaptation	Car@R40	download
PV-RCNN	~15.7 hours	Source-only with SN	60.16 / 49.63	model-156M
PV-RCNN	~15.7 hours	Source-only	64.58 / 27.12	model-156M
PV-RCNN	~1.5 hours	Pre-SN	86.07 / 74.72	model-156M
PV-RCNN	~1 hours	Post-SN	88.79 / 72.50	model-156M
Voxel R-CNN	~8.5 hours	Source-only	66.94 / 30.33	model-201M
Voxel R-CNN	~2.2 hours	Post-SN	87.11 / 66.02	model-201M
PV-RCNN++	~18 hours	Source-only with SN	54.47 / 36.05	model-193M
PV-RCNN++	~18 hours	Source-only	67.68 / 20.82	model-193M
PV-RCNN++	~1 hours	Post-SN	85.50 / 67.85	model-193M

UDA Results for Waymo-to-nuScenes:

[16-beam Waymo Train] deontes that we down-sample the point clouds of Waymo dataset from 64-beam to 16-beam, according to the given range map of the corresponding point clouds, and then we train the source-only model on the 16-beam Waymo data.

	training time	Adaptation	Car@R40	download
PV-RCNN	~23 hours	Source-only	31.02 / 21.21	-
PV-RCNN	~8 hours	Self-training	33.29 / 22.15	model-156M
PV-RCNN	~19 hours	32-beam Waymo Train	34.19 / 21.37	model-156M
PV-RCNN	~15 hours	16-beam Waymo Train	40.23 / 23.33	model-156M
PV-RCNN	~8 hours	16-beam Waymo + Self-training	-	-
Voxel R-CNN	~16 hours	Source-only	29.08 / 19.42	-
Voxel R-CNN	~2.2 hours	Self-training	32.48 / 20.87	model-201M
Voxel R-CNN	~11 hours	16-beam Waymo Train	38.63 / 22.64	model-201M
PV-RCNN++	~20 hours	Source-only	31.96 / 19.80	-
PV-RCNN++	~2.2 hours	Self-training	-	-
PV-RCNN++	~15.5 hours	16-beam Waymo Train	42.62 / 25.02	model-193M
PV-RCNN++	~2.2 hours	16-beam Waymo + Self-training	-	-