README.md

April 25, 2025 · View on GitHub

HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving

R.D. Lin, Pengcheng Weng, Yinqiao Wang, Han Ding, Jinsong Han, Fei Wang

Introduction

In driving experience, we observe a phenomenon: objects closer to the vehicle, such as roads and cars, tend to have stable categories and shapes as the vehicle moves, while distant objects, such as pedestrians, guardrails, plants, and buildings, exhibit significant variations in category and shape.

Figure 1. Motivation

Methods

Our segmentation model involves three stages. During voxelization, cylindrical voxelization is applied to transform unordered points into volumetric grids, followed by a spatial feature extraction backbone. Then, HiLoTs processes the labeled and unlabeled cylindrical features through a student-teacher framework. It also integrates the attention map from HiLoTs embedding unit (HEU) to produce voxel-level segmentation maps. Finally, a point-wise refinement network is utilized to obtain point-level segmentation results.

HEU consists of High Temporal Sensitive Flow (HTSF) and Low Temporal Sensitive Flow (LTSF). The HTSF focuses on regions where distant objects experience significant changes in category and shape, while the LTSF focuses on nearby regions where object categories and shapes remain relatively stable. Furthermore, the features from HTSF and LTSF are fused and interact through a cross-attention mechanism.

Figure 2. Overall architecture

Main Results

mIoU Comparisons with Other Methods

Methods	SemanticKITTI			nuScenes
Methods	10%	20%	50%	10%	20%	50%
Cylinder3D	56.1	57.8	58.7	63.4	67.0	71.9
RangeViT	53.4	56.6	58.8	64.6	67.8	73.1
PolarMix	60.9	62.0	63.8	69.6	71.0	73.8
DDSemi	65.1	66.3	67.0	70.2	74.0	76.5
HiLoTs (ours)	65.7	66.5	67.6	72.2	75.2	76.9

Visualization Examples

The left three columns are segmentation results from SemanticKITTI dataset, while the right three columns are from nuScenes. Our HiLoTs method shows a significant improvement in the area of distant objects.

Implementation

Package Requirements

The code is tested under python==3.10.0, torch==1.10.0, mmcv==2.0.0rc4, mmdet3d==1.2.0, mmengine==0.8.4. Later version of the above packages should also work well.

Code Structure

The following files contain the config detail of HiLoTs, implementation of HiLoTs Embedding Unit (HEU) and overall architecture of the model.

configs
| hilots

mmdet3d
| models
| | backbones
| | | minkunet_backbone.py
| | segmentors
| | | hilots.py
| | voxel_encoders
| | | voxel_encoder.py

Training the Model

Our model is trained with standard training strategy of mmengine, which can be done by the following command:

python tools/train.py {config_file}

For multi-GPU training, use the command:

tools/dist_train.sh {config_file} {gpu_num}

For example, training HiLoTs with 10% SemanticKITTI on 4 GPU:

tools/dist_train.sh configs/hilots/hilots_semantickitti_10.py 4

Testing the Model

Similar to training, testing the model performance can be done with the following command:

python tools/test.py {config_file} {model_weight_file}

Testing with multi-GPU:

tools/dist_test.sh {config_file} {model_weight_file} {gpu_num}

Visualization of Point Cloud Segmentation

The segmentation visualization of a trained model can be done by the following command:

python tools/test.py {config_file} {model_weight_file} --show --show-dir ./visualize_result --task lidar_seg

Check out mmdet3d/visualization/local_visualizer.py if you want to customize your visualization results.

Acknowledgement

This project is developed based on MMDetection3D and LaserMix. MMDetection3D is an open source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection. It is a part of the OpenMMLab project. LaserMix is a semi-supervised learning framework designed for LiDAR semantic segmentation. It leverages the strong spatial prior of driving scenes to construct low-variation areas via laser beam mixing, and encourages segmentation models to make confident and consistent predictions before and after mixing.