README.md
December 4, 2025 · View on GitHub
4D Contrastive Superflows are Dense 3D Representation Learners
Liang Pan2, Kai Chen2, Ziwei Liu5, Qingshan Liu4
1Nanjing University of Aeronautics and Astronautics 2Shanghai AI Laboratory 3National University of Singapore 4Nanjing University of Posts and Telecommunications 5S-Lab, Nanyang Technological University
About
SuperFlow is introduced to harness consecutive LiDAR-camera pairs for establishing spatiotemporal pretraining objectives. It stands out by integrating two key designs: 1) a dense-to-sparse consistency regularization, which promotes insensitivity to point cloud density variations during feature learning, and 2) a flow-based contrastive learning module, carefully crafted to extract meaningful temporal cues from readily available sensor calibrations.
Updates
- [2025.12] - SuperFlow++ is accepted by TPAMI.
- [2025.03] - Our improved framework, SuperFlow++ :rocket:, is avaliable on arXiv.
- [2024.07] - Our paper is accepted by ECCV.
Outline
:gear: Installation
For details related to installation and environment setups, kindly refer to INSTALL.md.
:hotsprings: Data Preparation
Kindly refer to DATA_PREPAER.md for the details to prepare the datasets.
:rocket: Getting Started
To learn more usage about this codebase, kindly refer to GET_STARTED.md.
:bar_chart: Main Results
Comparisons of state-of-the-art pretraining methods
| Method | Distill | nuScenes | KITTI | Waymo | |||||
|---|---|---|---|---|---|---|---|---|---|
| LP | 1% | 5% | 10% | 25% | Full | 1% | 1% | ||
| Random | - | 8.10 | 30.30 | 47.84 | 56.15 | 65.48 | 74.66 | 39.50 | 39.41 |
| PPKT | ViT-S | 38.60 | 40.60 | 52.06 | 59.99 | 65.76 | 73.97 | 43.25 | 47.44 |
| SLiDR | ViT-S | 44.70 | 41.16 | 53.65 | 61.47 | 66.71 | 74.20 | 44.67 | 47.57 |
| Seal | ViT-S | 45.16 | 44.27 | 55.13 | 62.46 | 67.64 | 75.58 | 46.51 | 48.67 |
| SuperFlow | ViT-S | 46.44 | 47.81 | 59.44 | 64.47 | 69.20 | 76.54 | 47.97 | 49.94 |
| PPKT | ViT-B | 39.95 | 40.91 | 53.21 | 60.87 | 66.22 | 74.07 | 44.09 | 47.57 |
| SLiDR | ViT-B | 45.35 | 41.64 | 55.83 | 62.68 | 67.61 | 74.98 | 45.50 | 48.32 |
| Seal | ViT-B | 46.59 | 45.98 | 57.15 | 62.79 | 68.18 | 75.41 | 47.24 | 48.91 |
| SuperFlow | ViT-S | 47.66 | 48.09 | 59.66 | 64.52 | 69.79 | 76.57 | 48.40 | 50.20 |
| PPKT | ViT-L | 41.57 | 42.05 | 55.75 | 61.26 | 66.88 | 74.33 | 45.87 | 47.82 |
| SLiDR | ViT-L | 45.70 | 42.77 | 57.45 | 63.20 | 68.13 | 75.51 | 47.01 | 48.60 |
| Seal | ViT-L | 46.81 | 46.27 | 58.14 | 63.27 | 68.67 | 75.66 | 47.55 | 50.02 |
| SuperFlow | ViT-L | 48.01 | 49.95 | 60.72 | 65.09 | 70.01 | 77.19 | 49.07 | 50.67 |
Domain generalization study
| Method | ScriKITTI | Rellis-3D | SemPOSS | SemSTF | SynLiDAR | DAPS-3D | Synth4D | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1% | 10% | 1% | 10% | Half | Full | Half | Full | 1% | 10% | Half | Full | 1% | 10% | |
| Random | 23.81 | 47.60 | 38.46 | 53.60 | 46.26 | 54.12 | 48.03 | 48.15 | 19.89 | 44.74 | 74.32 | 79.38 | 20.22 | 66.87 |
| PPKT | 36.50 | 51.67 | 49.71 | 54.33 | 50.18 | 56.00 | 50.92 | 54.69 | 37.57 | 46.48 | 78.90 | 84.00 | 61.10 | 62.41 |
| SLiDR | 39.60 | 50.45 | 49.75 | 54.57 | 51.56 | 55.36 | 52.01 | 54.35 | 42.05 | 47.84 | 81.00 | 85.40 | 63.10 | 62.67 |
| Seal | 40.64 | 52.77 | 51.09 | 55.03 | 53.26 | 56.89 | 53.46 | 55.36 | 43.58 | 49.26 | 81.88 | 85.90 | 64.50 | 66.96 |
| SuperFlow | 42.70 | 54.00 | 52.83 | 55.71 | 54.41 | 57.33 | 54.72 | 56.57 | 44.85 | 51.38 | 82.43 | 86.21 | 65.31 | 69.43 |
Out-of-distribution 3D robustness study
| # | Initial | Backbone | mCE | mRR | Fog | Rain | Snow | Blur | Beam | Cross | Echo | Sensor | Avg |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Full | Random | MinkU-18 | 115.61 | 70.85 | 53.90 | 71.10 | 48.22 | 51.85 | 62.21 | 37.73 | 57.47 | 38.97 | 52.68 |
| SuperFlow | MinkU-18 | 109.00 | 75.66 | 54.95 | 72.79 | 49.56 | 57.68 | 62.82 | 42.45 | 59.61 | 41.77 | 55.21 | |
| Random | MinkU-34 | 112.20 | 72.57 | 62.96 | 70.65 | 55.48 | 51.71 | 62.01 | 31.56 | 59.64 | 39.41 | 54.18 | |
| SuperFlow | MinkU-34 | 91.67 | 83.17 | 70.32 | 75.77 | 65.41 | 61.05 | 68.09 | 60.02 | 58.36 | 50.41 | 63.68 | |
| Random | MinkU-50 | 113.76 | 72.81 | 49.95 | 71.16 | 45.36 | 55.55 | 62.84 | 36.94 | 59.12 | 43.15 | 53.01 | |
| SuperFlow | MinkU-50 | 107.35 | 74.02 | 54.36 | 73.08 | 50.07 | 56.92 | 64.05 | 38.10 | 62.02 | 47.02 | 55.70 | |
| Random | MinkU-101 | 109.10 | 74.07 | 50.45 | 73.02 | 48.85 | 58.48 | 64.18 | 43.86 | 59.82 | 41.47 | 55.02 | |
| SuperFlow | MinkU-101 | 96.44 | 78.57 | 56.92 | 76.29 | 54.70 | 59.35 | 71.89 | 55.13 | 60.27 | 51.60 | 60.77 | |
| LP | PPKT | MinkU-34 | 183.44 | 78.15 | 30.65 | 35.42 | 28.12 | 29.21 | 32.82 | 19.52 | 28.01 | 20.71 | 28.06 |
| SLidR | MinkU-34 | 179.38 | 77.18 | 34.88 | 38.09 | 32.64 | 26.44 | 33.73 | 20.81 | 31.54 | 21.44 | 29.95 | |
| Seal | MinkU-34 | 166.18 | 75.38 | 37.33 | 42.77 | 29.93 | 37.73 | 40.32 | 20.31 | 37.73 | 24.94 | 33.88 | |
| SuperFlow | MinkU-34 | 161.78 | 75.52 | 37.59 | 43.42 | 37.60 | 39.57 | 41.40 | 23.64 | 38.03 | 26.69 | 35.99 |
License
This work is under the Apache 2.0 license.
Citation
If you find this work helpful for your research, please kindly consider citing our paper:
@inproceedings{xu2024superflow,
title = {4D Contrastive Superflows are Dense 3D Representation Learners},
author = {Xu, Xiang and Kong, Lingdong and Shuai, Hui and Zhang, Wenwei and Pan, Liang and Chen, Kai and Liu, Ziwei and Liu, Qingshan},
booktitle = {European Conference on Computer Vision},
pages = {58--80},
year = {2024}
}
@article{xu2025superflow++,
title = {Enhanced Spatiotemporal Consistency for Image-to-LiDAR Data Pretraining},
author = {Xu, Xiang and Kong, Lingdong and Shuai, Hui and Zhang, Wenwei and Pan, Liang and Chen, Kai and Liu, Ziwei and Liu, Qingshan},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
year = {2025}
}
Acknowledgements
This work is developed based on the MMDetection3D codebase.
MMDetection3D is an open-source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D perception. It is a part of the OpenMMLab project developed by MMLab.
We acknowledge the use of the following public resources during the couuse of this work: 1nuScenes, 2nuScenes-devkit, 3SemanticKITTI, 4SemanticKITTI-API, , 5WaymoOpenDataset, 6Synth4D, 7ScribbleKITTI, 8RELLIS-3D, 9SemanticPOSS, 10SemanticSTF, 11SynthLiDAR, 12DAPS-3D, 13Robo3D, 14SLidR, 15DINOv2, 16Segment-Any-Point-Cloud, 17OpenSeeD, 18torchsparse. :heart_decoration:
