README.md
April 25, 2025 ยท View on GitHub
HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving
R.D. Lin, Pengcheng Weng, Yinqiao Wang, Han Ding, Jinsong Han, Fei Wang
Introduction
In driving experience, we observe a phenomenon: objects closer to the vehicle, such as roads and cars, tend to have stable categories and shapes as the vehicle moves, while distant objects, such as pedestrians, guardrails, plants, and buildings, exhibit significant variations in category and shape.

Figure 1. Motivation
Methods
Our segmentation model involves three stages. During voxelization, cylindrical voxelization is applied to transform unordered points into volumetric grids, followed by a spatial feature extraction backbone. Then, HiLoTs processes the labeled and unlabeled cylindrical features through a student-teacher framework. It also integrates the attention map from HiLoTs embedding unit (HEU) to produce voxel-level segmentation maps. Finally, a point-wise refinement network is utilized to obtain point-level segmentation results.
HEU consists of High Temporal Sensitive Flow (HTSF) and Low Temporal Sensitive Flow (LTSF). The HTSF focuses on regions where distant objects experience significant changes in category and shape, while the LTSF focuses on nearby regions where object categories and shapes remain relatively stable. Furthermore, the features from HTSF and LTSF are fused and interact through a cross-attention mechanism.

Figure 2. Overall architecture
Main Results
mIoU Comparisons with Other Methods
| Methods | SemanticKITTI | nuScenes | ||||
|---|---|---|---|---|---|---|
| 10% | 20% | 50% | 10% | 20% | 50% | |
| Cylinder3D | 56.1 | 57.8 | 58.7 | 63.4 | 67.0 | 71.9 |
| RangeViT | 53.4 | 56.6 | 58.8 | 64.6 | 67.8 | 73.1 |
| PolarMix | 60.9 | 62.0 | 63.8 | 69.6 | 71.0 | 73.8 |
| DDSemi | 65.1 | 66.3 | 67.0 | 70.2 | 74.0 | 76.5 |
| HiLoTs (ours) | 65.7 | 66.5 | 67.6 | 72.2 | 75.2 | 76.9 |
Visualization Examples
The left three columns are segmentation results from SemanticKITTI dataset, while the right three columns are from nuScenes. Our HiLoTs method shows a significant improvement in the area of distant objects.

Implementation
Package Requirements
The code is tested under python==3.10.0, torch==1.10.0, mmcv==2.0.0rc4, mmdet3d==1.2.0, mmengine==0.8.4. Later version of the above packages should also work well.
Code Structure
The following files contain the config detail of HiLoTs, implementation of HiLoTs Embedding Unit (HEU) and overall architecture of the model.
configs
| hilots
mmdet3d
| models
| | backbones
| | | minkunet_backbone.py
| | segmentors
| | | hilots.py
| | voxel_encoders
| | | voxel_encoder.py
Training the Model
Our model is trained with standard training strategy of mmengine, which can be done by the following command:
python tools/train.py {config_file}
For multi-GPU training, use the command:
tools/dist_train.sh {config_file} {gpu_num}
For example, training HiLoTs with 10% SemanticKITTI on 4 GPU:
tools/dist_train.sh configs/hilots/hilots_semantickitti_10.py 4
Testing the Model
Similar to training, testing the model performance can be done with the following command:
python tools/test.py {config_file} {model_weight_file}
Testing with multi-GPU:
tools/dist_test.sh {config_file} {model_weight_file} {gpu_num}
Visualization of Point Cloud Segmentation
The segmentation visualization of a trained model can be done by the following command:
python tools/test.py {config_file} {model_weight_file} --show --show-dir ./visualize_result --task lidar_seg
Check out mmdet3d/visualization/local_visualizer.py if you want to customize your visualization results.
Acknowledgement
This project is developed based on MMDetection3D and LaserMix. MMDetection3D is an open source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection. It is a part of the OpenMMLab project. LaserMix is a semi-supervised learning framework designed for LiDAR semantic segmentation. It leverages the strong spatial prior of driving scenes to construct low-variation areas via laser beam mixing, and encourages segmentation models to make confident and consistent predictions before and after mixing.