README.md

December 4, 2025 · View on GitHub

4D Contrastive Superflows are Dense 3D Representation Learners

Xiang Xu^1,*,    Lingdong Kong^2,3,*,    Hui Shuai⁴,    Wenwei Zhang²,
Liang Pan²,    Kai Chen²,    Ziwei Liu⁵,    Qingshan Liu⁴
¹Nanjing University of Aeronautics and Astronautics    ²Shanghai AI Laboratory    ³National University of Singapore    ⁴Nanjing University of Posts and Telecommunications    ⁵S-Lab, Nanyang Technological University

About

SuperFlow is introduced to harness consecutive LiDAR-camera pairs for establishing spatiotemporal pretraining objectives. It stands out by integrating two key designs: 1) a dense-to-sparse consistency regularization, which promotes insensitivity to point cloud density variations during feature learning, and 2) a flow-based contrastive learning module, carefully crafted to extract meaningful temporal cues from readily available sensor calibrations.

Updates

[2025.12] - SuperFlow++ is accepted by TPAMI.
[2025.03] - Our improved framework, SuperFlow++ :rocket:, is avaliable on arXiv.
[2024.07] - Our paper is accepted by ECCV.

Method	Distill	nuScenes						KITTI	Waymo
Method	Distill	LP	1%	5%	10%	25%	Full	1%	1%
Random	-	8.10	30.30	47.84	56.15	65.48	74.66	39.50	39.41
PPKT	ViT-S	38.60	40.60	52.06	59.99	65.76	73.97	43.25	47.44
SLiDR	ViT-S	44.70	41.16	53.65	61.47	66.71	74.20	44.67	47.57
Seal	ViT-S	45.16	44.27	55.13	62.46	67.64	75.58	46.51	48.67
SuperFlow	ViT-S	46.44	47.81	59.44	64.47	69.20	76.54	47.97	49.94
PPKT	ViT-B	39.95	40.91	53.21	60.87	66.22	74.07	44.09	47.57
SLiDR	ViT-B	45.35	41.64	55.83	62.68	67.61	74.98	45.50	48.32
Seal	ViT-B	46.59	45.98	57.15	62.79	68.18	75.41	47.24	48.91
SuperFlow	ViT-S	47.66	48.09	59.66	64.52	69.79	76.57	48.40	50.20
PPKT	ViT-L	41.57	42.05	55.75	61.26	66.88	74.33	45.87	47.82
SLiDR	ViT-L	45.70	42.77	57.45	63.20	68.13	75.51	47.01	48.60
Seal	ViT-L	46.81	46.27	58.14	63.27	68.67	75.66	47.55	50.02
SuperFlow	ViT-L	48.01	49.95	60.72	65.09	70.01	77.19	49.07	50.67

Domain generalization study

Method	ScriKITTI		Rellis-3D		SemPOSS		SemSTF		SynLiDAR		DAPS-3D		Synth4D
Method	1%	10%	1%	10%	Half	Full	Half	Full	1%	10%	Half	Full	1%	10%
Random	23.81	47.60	38.46	53.60	46.26	54.12	48.03	48.15	19.89	44.74	74.32	79.38	20.22	66.87
PPKT	36.50	51.67	49.71	54.33	50.18	56.00	50.92	54.69	37.57	46.48	78.90	84.00	61.10	62.41
SLiDR	39.60	50.45	49.75	54.57	51.56	55.36	52.01	54.35	42.05	47.84	81.00	85.40	63.10	62.67
Seal	40.64	52.77	51.09	55.03	53.26	56.89	53.46	55.36	43.58	49.26	81.88	85.90	64.50	66.96
SuperFlow	42.70	54.00	52.83	55.71	54.41	57.33	54.72	56.57	44.85	51.38	82.43	86.21	65.31	69.43

Out-of-distribution 3D robustness study

#	Initial	Backbone	mCE	mRR	Fog	Rain	Snow	Blur	Beam	Cross	Echo	Sensor	Avg
Full	Random	MinkU-18	115.61	70.85	53.90	71.10	48.22	51.85	62.21	37.73	57.47	38.97	52.68
	SuperFlow	MinkU-18	109.00	75.66	54.95	72.79	49.56	57.68	62.82	42.45	59.61	41.77	55.21
	Random	MinkU-34	112.20	72.57	62.96	70.65	55.48	51.71	62.01	31.56	59.64	39.41	54.18
	SuperFlow	MinkU-34	91.67	83.17	70.32	75.77	65.41	61.05	68.09	60.02	58.36	50.41	63.68
	Random	MinkU-50	113.76	72.81	49.95	71.16	45.36	55.55	62.84	36.94	59.12	43.15	53.01
	SuperFlow	MinkU-50	107.35	74.02	54.36	73.08	50.07	56.92	64.05	38.10	62.02	47.02	55.70
	Random	MinkU-101	109.10	74.07	50.45	73.02	48.85	58.48	64.18	43.86	59.82	41.47	55.02
	SuperFlow	MinkU-101	96.44	78.57	56.92	76.29	54.70	59.35	71.89	55.13	60.27	51.60	60.77
LP	PPKT	MinkU-34	183.44	78.15	30.65	35.42	28.12	29.21	32.82	19.52	28.01	20.71	28.06
	SLidR	MinkU-34	179.38	77.18	34.88	38.09	32.64	26.44	33.73	20.81	31.54	21.44	29.95
	Seal	MinkU-34	166.18	75.38	37.33	42.77	29.93	37.73	40.32	20.31	37.73	24.94	33.88
	SuperFlow	MinkU-34	161.78	75.52	37.59	43.42	37.60	39.57	41.40	23.64	38.03	26.69	35.99

License

This work is under the Apache 2.0 license.

Citation

If you find this work helpful for your research, please kindly consider citing our paper:

@inproceedings{xu2024superflow,
    title = {4D Contrastive Superflows are Dense 3D Representation Learners},
    author = {Xu, Xiang and Kong, Lingdong and Shuai, Hui and Zhang, Wenwei and Pan, Liang and Chen, Kai and Liu, Ziwei and Liu, Qingshan},
    booktitle = {European Conference on Computer Vision},
    pages = {58--80},
    year = {2024}
}

@article{xu2025superflow++,
    title = {Enhanced Spatiotemporal Consistency for Image-to-LiDAR Data Pretraining},
    author = {Xu, Xiang and Kong, Lingdong and Shuai, Hui and Zhang, Wenwei and Pan, Liang and Chen, Kai and Liu, Ziwei and Liu, Qingshan},
    journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
    year = {2025}
}

Acknowledgements

This work is developed based on the MMDetection3D codebase.

MMDetection3D is an open-source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D perception. It is a part of the OpenMMLab project developed by MMLab.

We acknowledge the use of the following public resources during the couuse of this work: ¹nuScenes, ²nuScenes-devkit, ³SemanticKITTI, ⁴SemanticKITTI-API, , ⁵WaymoOpenDataset, ⁶Synth4D, ⁷ScribbleKITTI, ⁸RELLIS-3D, ⁹SemanticPOSS, ¹⁰SemanticSTF, ¹¹SynthLiDAR, ¹²DAPS-3D, ¹³Robo3D, ¹⁴SLidR, ¹⁵DINOv2, ¹⁶Segment-Any-Point-Cloud, ¹⁷OpenSeeD, ¹⁸torchsparse. :heart_decoration:

README.md

4D Contrastive Superflows are Dense 3D Representation Learners

About

Updates

Outline

:gear: Installation

:hotsprings: Data Preparation

:rocket: Getting Started

:bar_chart: Main Results

Comparisons of state-of-the-art pretraining methods

Domain generalization study

Out-of-distribution 3D robustness study

License

Citation

Acknowledgements