About

June 27, 2025 · View on GitHub

English | 简体中文

Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations

Xiang Xu1    Lingdong Kong2    Song Wang3    Chuanwei Zhou4    Qingshan Liu4   

1NUAA    2NUS    3ZJU    4NJUPT   

       

About

LiMA is a novel long-term image-to-LiDAR Memory Aggregation framework that explicitly captures longer range temporal correlations to enhance LiDAR representation learning. It comprises three key components: 1) a Cross-View Aggregation module that aligns and fuses overlapping regions across neighboring camera views, constructing a more unified and redundancy-free memory bank; 2) a Long-Term Feature Propagation mechanism that efficiently aligns and integrates multi-frame image features, reinforcing temporal coherence during LiDAR representation learning; and 3) a Cross-Sequence Memory Alignment strategy that enforces consistency across driving sequences, improving generalization to unseen environments.

:memo: Updates

  • [2025.06] - Our paper LiMA has been accepted to ICCV 2025! :tada:

Table of Content

:gear: Installation

For details related to installation and environment setups, kindly refer to INSTALL.md.

:hotsprings: Data Preparation

Kindly refer to DATA_PREPAER.md for the details to prepare the datasets.

:rocket: Getting Started

To learn more usage about this codebase, kindly refer to GET_STARTED.md.

:bar_chart: Main Results

Comparisons of State-of-the-Art Pretraining Methods

Method Venue Backbone (2D) Backbone (3D) Frames nuScenes KITTI Waymo
LP 1% 5% 10% 25% Full 1% 1%
Random - - - - 8.10 30.30 47.84 56.15 65.48 74.66 39.50 39.41
SLidR CVPR'22 ViT-S MinkUNet-34 1 44.70 41.16 53.65 61.47 66.71 74.20 44.67 47.57
Seal NeurIPS'23 ViT-S MinkUNet-34 2 45.16 44.27 55.13 62.46 67.64 75.58 46.51 48.67
SuperFlow ECCV'24 ViT-S MinkUNet-34 3 46.44 47.81 59.44 64.47 69.20 76.54 47.97 49.94
ScaLR CVPR'24 ViT-S MinkUNet-34 1 49.66 45.89 56.52 61.07 65.79 73.39 46.06 47.67
LiMA Ours ViT-S MinkUNet-34 6 54.76 48.75 60.83 65.41 69.31 76.94 49.28 50.23
SLidR CVPR'22 ViT-B MinkUNet-34 1 45.35 41.64 55.83 62.68 67.61 74.98 45.50 48.32
Seal NeurIPS'23 ViT-B MinkUNet-34 2 46.59 45.98 57.15 62.79 68.18 75.41 47.24 48.91
SuperFlow ECCV'24 ViT-B MinkUNet-34 3 47.66 48.09 59.66 64.52 69.79 76.57 48.40 50.20
ScaLR CVPR'24 ViT-B MinkUNet-34 1 51.90 48.90 57.69 62.88 66.85 74.15 47.77 49.38
LiMA Ours ViT-B MinkUNet-34 6 56.65 51.29 61.11 65.62 70.43 76.91 50.44 51.35
SLidR CVPR'22 ViT-L MinkUNet-34 1 45.70 42.77 57.45 63.20 68.13 75.51 47.01 48.60
Seal NeurIPS'23 ViT-L MinkUNet-34 2 46.81 46.27 58.14 63.27 68.67 75.66 47.55 50.02
SuperFlow ECCV'24 ViT-L MinkUNet-34 3 48.01 49.95 60.72 65.09 70.01 77.19 49.07 50.67
ScaLR CVPR'24 ViT-L MinkUNet-34 1 51.77 49.13 58.36 62.75 66.80 74.16 48.64 49.72
LiMA Ours ViT-L MinkUNet-34 6 56.67 53.22 62.46 66.00 70.59 77.23 52.29 51.19

Domain Generalization Study

Method Venue ScriKITTI Rellis-3D SemPOSS SemSTF SynLiDAR DAPS-3D Synth4D
1% 10% 1% 10% Half Full Half Full 1% 10% Half Full 1% 10%
Random - 23.81 47.60 38.46 53.60 46.26 54.12 48.03 48.15 19.89 44.74 74.32 79.38 20.22 66.87
SLidR CVPR'22 39.60 50.45 49.75 54.57 51.56 55.36 52.01 54.35 42.05 47.84 81.00 85.40 63.10 62.67
Seal NeurIPS'23 40.64 52.77 51.09 55.03 53.26 56.89 53.46 55.36 43.58 49.26 81.88 85.90 64.50 66.96
SuperFlow ECCV'24 42.70 54.00 52.83 55.71 54.41 57.33 54.72 56.57 44.85 51.38 82.43 86.21 65.31 69.43
ScaLR CVPR'24 40.64 52.39 52.53 55.57 53.65 56.86 54.06 55.96 44.42 51.96 81.92 85.58 64.36 67.44
LiMA Ours 45.90 55.13 55.62 57.15 55.05 57.81 55.45 56.70 46.66 52.32 83.11 86.63 66.04 70.19

3D Object Detection

Method Venue nuScenes
5% 10% 20%
mAP NDS mAP NDS mAP NDS
Backbone: VoxelNet + CenterPoint
Random - 38.0 44.3 46.9 55.5 50.2 59.7
PointContrast ECCV'20 39.8 45.1 47.7 56.0 - -
GCC-3D ICCV'21 41.1 46.8 48.4 56.7 - -
SLidR CVPR'22 43.3 52.4 47.5 56.8 50.4 59.9
TriCC CVPR'23 44.6 54.4 48.9 58.1 50.9 60.3
CSC CVPR'24 45.3 54.2 49.3 58.3 51.9 61.3
ScaLR CVPR'24 44.3 53.3 48.2 57.1 50.7 60.8
LiMA Ours 46.5 56.4 50.1 59.6 52.3 62.3

Cosine Similarity

heatmaps
Cosine similarity between a query point (marked as the red dot) and: (1) image features, and (2) LiDAR point features projected onto the image. Colors range from red (indicating high similarity) to blue (indicating low similarity). Best viewed in colors.

Qualitative Assessment

vis_nus
Qualitative assessments of state-of-the-art methods, pretrained on nuScenes and fine-tuned on nuScenes with 1% annotations. The error maps depict correct and incorrect predictions in gray and red, respectively. Best viewed in colors.
vis_det
Qualitative assessments of object detection, pretrained on nuScenes and fine-tuned on nuScenes with 5% annotations. The groundtruth/predicted results are highlighted with blue/red boxes, respectively. Best viewed in colors.

License

This work is under the Apache License Version 2.0, while some specific implementations in this codebase might be with other licenses.

Kindly refer to LICENSE.md for a more careful check, if you are using our code for commercial matters.

Citation

If you find this work helpful for your research, please kindly consider citing our paper:

@inproceedings{xu2025lima,
    title = {Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations},
    author = {Xu, Xiang and Kong, Lingdong and Wang, Song and Zhou, Chuanwei and Liu, Qingshan},
    booktitle = {IEEE/CVF International Conference on Computer Vision},
    year = {2025}
}

Acknowledgments

This work is developed based on the MMDetection3D codebase.


MMDetection3D is an open-source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D perception. It is a part of the OpenMMLab project developed by MMLab.

We acknowledge the use of the following public resources during the couuse of this work: 1nuScenes, 2nuScenes-devkit, 3SemanticKITTI, 4SemanticKITTI-API, , 5WaymoOpenDataset, 6Synth4D, 7ScribbleKITTI, 8RELLIS-3D, 9SemanticPOSS, 10SemanticSTF, 11SynthLiDAR, 12DAPS-3D, 13Robo3D, 14SLidR, 15DINOv2, 16FRNet, 17SuperFlow, 18torchsparse, 19ScaLR. :heart_decoration: