🧩 Model Zoo

December 15, 2023 · View on GitHub

We are keeping updating models for ViT-Lens, please stay tuned!

3D Point Cloud

ModelTraining DataMN40 (Top1/Top3/Top5)Objaverse-LVIS (Top1/Top3/Top5)ScanObjectNN (Top1/Top3/Top5)
vitlensBULIP-ShapeNet Triplets65.4/-/92.7--
vitlensBULIP2-Objaverse Triplets74.8/-/93.8--
vitlensLULIP-ShapeNet Triplets70.6/-/94.4--
vitlensLULIP2-Objaverse Triplets80.6/-/95.8--
vitlensGOpenShape-Triplets87.6/96.6/98.452.0/73.3/79.960.1/81.0/90.3
vitlensGOpenShape-Triplets(No LVIS)86.8/96.8/97.850.1/71.3/78.159.8/79.3/87.7

Depth

ModelTraining DataSUN.D (Top1)NYU.D (Top1)
vitlensBSUN RGBD (I+T)51.465.0
vitlensLSUN RGBD (I+T)52.268.5
vitlensGSUN RGBD (I+T)54.669.0

Audio

ModelTraining DataAudioset (mAP)VGGSound (Top1)ESC50 (Top1)Clotho (R@1/R@10)AudioCaps (R@1/R@10)
vitlensBAudioset train, 5-sec clips (V+T)26.329.972.97.5/29.513.5/54.1
vitlensLAudioset train, 5-sec clips (V+T)26.731.775.98.1/31.214.4/54.9
vitlensLAudioset train, 2-sec clips (V+T)29.032.575.17.9/31.614.8/53.3
vitlensLAudioset train and VGGSound train , 5-sec clips (V+T)27.251.780.97.9/31.514.9/55.2

Tactile

ModelTraining DataMaterial (Top1)Hard/Soft (Top1)Rough/Smooth (Top1)
vitlensB(aligned to Image) - LinearProbeTouch-and-Go63.092.085.1
vitlensLTouch-and-Go65.874.763.8

EEG

ModelTraining DataINEEG-Val (Top1)INEEG-Test (Top1)
vitlensBImageNet EEG37.335.9
vitlensLImageNet EEG41.842.7