About
June 27, 2025 · View on GitHub
Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations
About
LiMA is a novel long-term image-to-LiDAR Memory Aggregation framework that explicitly captures longer range temporal correlations to enhance LiDAR representation learning. It comprises three key components: 1) a Cross-View Aggregation module that aligns and fuses overlapping regions across neighboring camera views, constructing a more unified and redundancy-free memory bank; 2) a Long-Term Feature Propagation mechanism that efficiently aligns and integrates multi-frame image features, reinforcing temporal coherence during LiDAR representation learning; and 3) a Cross-Sequence Memory Alignment strategy that enforces consistency across driving sequences, improving generalization to unseen environments.
:memo: Updates
- [2025.06] - Our paper LiMA has been accepted to ICCV 2025! :tada:
Table of Content
:gear: Installation
For details related to installation and environment setups, kindly refer to INSTALL.md.
:hotsprings: Data Preparation
Kindly refer to DATA_PREPAER.md for the details to prepare the datasets.
:rocket: Getting Started
To learn more usage about this codebase, kindly refer to GET_STARTED.md.
:bar_chart: Main Results
Comparisons of State-of-the-Art Pretraining Methods
| Method | Venue | Backbone (2D) | Backbone (3D) | Frames | nuScenes | KITTI | Waymo | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LP | 1% | 5% | 10% | 25% | Full | 1% | 1% | |||||
| Random | - | - | - | - | 8.10 | 30.30 | 47.84 | 56.15 | 65.48 | 74.66 | 39.50 | 39.41 |
| SLidR | CVPR'22 | ViT-S | MinkUNet-34 | 1 | 44.70 | 41.16 | 53.65 | 61.47 | 66.71 | 74.20 | 44.67 | 47.57 |
| Seal | NeurIPS'23 | ViT-S | MinkUNet-34 | 2 | 45.16 | 44.27 | 55.13 | 62.46 | 67.64 | 75.58 | 46.51 | 48.67 |
| SuperFlow | ECCV'24 | ViT-S | MinkUNet-34 | 3 | 46.44 | 47.81 | 59.44 | 64.47 | 69.20 | 76.54 | 47.97 | 49.94 |
| ScaLR | CVPR'24 | ViT-S | MinkUNet-34 | 1 | 49.66 | 45.89 | 56.52 | 61.07 | 65.79 | 73.39 | 46.06 | 47.67 |
| LiMA | Ours | ViT-S | MinkUNet-34 | 6 | 54.76 | 48.75 | 60.83 | 65.41 | 69.31 | 76.94 | 49.28 | 50.23 |
| SLidR | CVPR'22 | ViT-B | MinkUNet-34 | 1 | 45.35 | 41.64 | 55.83 | 62.68 | 67.61 | 74.98 | 45.50 | 48.32 |
| Seal | NeurIPS'23 | ViT-B | MinkUNet-34 | 2 | 46.59 | 45.98 | 57.15 | 62.79 | 68.18 | 75.41 | 47.24 | 48.91 |
| SuperFlow | ECCV'24 | ViT-B | MinkUNet-34 | 3 | 47.66 | 48.09 | 59.66 | 64.52 | 69.79 | 76.57 | 48.40 | 50.20 |
| ScaLR | CVPR'24 | ViT-B | MinkUNet-34 | 1 | 51.90 | 48.90 | 57.69 | 62.88 | 66.85 | 74.15 | 47.77 | 49.38 |
| LiMA | Ours | ViT-B | MinkUNet-34 | 6 | 56.65 | 51.29 | 61.11 | 65.62 | 70.43 | 76.91 | 50.44 | 51.35 |
| SLidR | CVPR'22 | ViT-L | MinkUNet-34 | 1 | 45.70 | 42.77 | 57.45 | 63.20 | 68.13 | 75.51 | 47.01 | 48.60 |
| Seal | NeurIPS'23 | ViT-L | MinkUNet-34 | 2 | 46.81 | 46.27 | 58.14 | 63.27 | 68.67 | 75.66 | 47.55 | 50.02 |
| SuperFlow | ECCV'24 | ViT-L | MinkUNet-34 | 3 | 48.01 | 49.95 | 60.72 | 65.09 | 70.01 | 77.19 | 49.07 | 50.67 |
| ScaLR | CVPR'24 | ViT-L | MinkUNet-34 | 1 | 51.77 | 49.13 | 58.36 | 62.75 | 66.80 | 74.16 | 48.64 | 49.72 |
| LiMA | Ours | ViT-L | MinkUNet-34 | 6 | 56.67 | 53.22 | 62.46 | 66.00 | 70.59 | 77.23 | 52.29 | 51.19 |
Domain Generalization Study
| Method | Venue | ScriKITTI | Rellis-3D | SemPOSS | SemSTF | SynLiDAR | DAPS-3D | Synth4D | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1% | 10% | 1% | 10% | Half | Full | Half | Full | 1% | 10% | Half | Full | 1% | 10% | ||
| Random | - | 23.81 | 47.60 | 38.46 | 53.60 | 46.26 | 54.12 | 48.03 | 48.15 | 19.89 | 44.74 | 74.32 | 79.38 | 20.22 | 66.87 |
| SLidR | CVPR'22 | 39.60 | 50.45 | 49.75 | 54.57 | 51.56 | 55.36 | 52.01 | 54.35 | 42.05 | 47.84 | 81.00 | 85.40 | 63.10 | 62.67 |
| Seal | NeurIPS'23 | 40.64 | 52.77 | 51.09 | 55.03 | 53.26 | 56.89 | 53.46 | 55.36 | 43.58 | 49.26 | 81.88 | 85.90 | 64.50 | 66.96 |
| SuperFlow | ECCV'24 | 42.70 | 54.00 | 52.83 | 55.71 | 54.41 | 57.33 | 54.72 | 56.57 | 44.85 | 51.38 | 82.43 | 86.21 | 65.31 | 69.43 |
| ScaLR | CVPR'24 | 40.64 | 52.39 | 52.53 | 55.57 | 53.65 | 56.86 | 54.06 | 55.96 | 44.42 | 51.96 | 81.92 | 85.58 | 64.36 | 67.44 |
| LiMA | Ours | 45.90 | 55.13 | 55.62 | 57.15 | 55.05 | 57.81 | 55.45 | 56.70 | 46.66 | 52.32 | 83.11 | 86.63 | 66.04 | 70.19 |
3D Object Detection
| Method | Venue | nuScenes | |||||
|---|---|---|---|---|---|---|---|
| 5% | 10% | 20% | |||||
| mAP | NDS | mAP | NDS | mAP | NDS | ||
| Backbone: VoxelNet + CenterPoint | |||||||
| Random | - | 38.0 | 44.3 | 46.9 | 55.5 | 50.2 | 59.7 |
| PointContrast | ECCV'20 | 39.8 | 45.1 | 47.7 | 56.0 | - | - |
| GCC-3D | ICCV'21 | 41.1 | 46.8 | 48.4 | 56.7 | - | - |
| SLidR | CVPR'22 | 43.3 | 52.4 | 47.5 | 56.8 | 50.4 | 59.9 |
| TriCC | CVPR'23 | 44.6 | 54.4 | 48.9 | 58.1 | 50.9 | 60.3 |
| CSC | CVPR'24 | 45.3 | 54.2 | 49.3 | 58.3 | 51.9 | 61.3 |
| ScaLR | CVPR'24 | 44.3 | 53.3 | 48.2 | 57.1 | 50.7 | 60.8 |
| LiMA | Ours | 46.5 | 56.4 | 50.1 | 59.6 | 52.3 | 62.3 |
Cosine Similarity
![]() |
|---|
| Cosine similarity between a query point (marked as the red dot) and: (1) image features, and (2) LiDAR point features projected onto the image. Colors range from red (indicating high similarity) to blue (indicating low similarity). Best viewed in colors. |
Qualitative Assessment
![]() |
|---|
| Qualitative assessments of state-of-the-art methods, pretrained on nuScenes and fine-tuned on nuScenes with 1% annotations. The error maps depict correct and incorrect predictions in gray and red, respectively. Best viewed in colors. |
![]() |
|---|
| Qualitative assessments of object detection, pretrained on nuScenes and fine-tuned on nuScenes with 5% annotations. The groundtruth/predicted results are highlighted with blue/red boxes, respectively. Best viewed in colors. |
License
This work is under the Apache License Version 2.0, while some specific implementations in this codebase might be with other licenses.
Kindly refer to LICENSE.md for a more careful check, if you are using our code for commercial matters.
Citation
If you find this work helpful for your research, please kindly consider citing our paper:
@inproceedings{xu2025lima,
title = {Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations},
author = {Xu, Xiang and Kong, Lingdong and Wang, Song and Zhou, Chuanwei and Liu, Qingshan},
booktitle = {IEEE/CVF International Conference on Computer Vision},
year = {2025}
}
Acknowledgments
This work is developed based on the MMDetection3D codebase.
MMDetection3D is an open-source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D perception. It is a part of the OpenMMLab project developed by MMLab.
We acknowledge the use of the following public resources during the couuse of this work: 1nuScenes, 2nuScenes-devkit, 3SemanticKITTI, 4SemanticKITTI-API, , 5WaymoOpenDataset, 6Synth4D, 7ScribbleKITTI, 8RELLIS-3D, 9SemanticPOSS, 10SemanticSTF, 11SynthLiDAR, 12DAPS-3D, 13Robo3D, 14SLidR, 15DINOv2, 16FRNet, 17SuperFlow, 18torchsparse, 19ScaLR. :heart_decoration:



