Place Recognition Meet multiple Modalities: A Comprehensive Review, Current Challenges and Future Directions

January 16, 2026 · View on GitHub

Zhenyu Li, Tianyi Shang, Pengjie Xu, Zhaojun Deng

Our Survey

Abstract

Place recognition is a cornerstone of vehicle navigation and mapping, which is pivotal in enabling systems to determine whether a location has been previously visited. This capability is critical for tasks such as loop closure in Simultaneous Localization and Mapping (SLAM) and long-term navigation under varying environmental conditions. This survey comprehensively reviews recent advancements in place recognition, emphasizing three representative methodological paradigms: Convolutional Neural Network (CNN)-based approaches, Transformer-based frameworks, and cross-modal strategies. We begin by elucidating the significance of place recognition within the broader context of autonomous systems. Subsequently, we trace the evolution of CNN-based methods, highlighting their contributions to robust visual descriptor learning and scalability in large-scale environments. We then examine the emerging class of Transformer-based models, which leverage self-attention mechanisms to capture global dependencies and offer improved generalization across diverse scenes. Furthermore, we discuss cross-modal approaches that integrate heterogeneous data sources such as Lidar, vision, and text description, thereby enhancing resilience to viewpoint, illumination, and seasonal variations. We also summarize standard datasets and evaluation metrics widely adopted in the literature. Finally, we identify current research challenges and outline prospective directions, including domain adaptation, real-time performance, and lifelong learning, to inspire future advancements in this domain.

Survey Overview

This paper provides a comprehensive review of recent advancements in place recognition, focusing on three key methodological paradigms:

  1. CNN-based Approaches
  2. Transformer-based Frameworks
  3. Cross-modal Strategies

1. Introduction

Significance in Autonomous Systems

Place recognition plays a pivotal role in:

  • Autonomous vehicle navigation
  • Large-scale environment mapping
  • Robust localization under changing conditions

2. Methodological Evolution

alt text

2.1 CNN-based Approaches

Key Contributions:

  • Robust visual descriptor learning
  • Scalability in large-scale environments
  • Evolution from traditional features to deep learning

alt text

2.2 Transformer-based Models

Advancements:

  • Self-attention mechanisms capturing global dependencies
  • Improved generalization across diverse scenes
  • Handling of long-range spatial relationships

alt text

2.3 Cross-modal Strategies

Innovations:

  • Integration of heterogeneous data sources:

    • Lidar point clouds
    • Visual information
    • Text descriptions
  • Enhanced resilience to:

    • Viewpoint variations
    • Illumination changes
    • Seasonal transitions

    alt text

3. Challenges & Future Directions

Current Research Challenges

  • Domain adaptation across environments
  • Real-time performance requirements
  • Lifelong learning capabilities

Prospective Research Directions

  1. Adaptive Systems

    • Cross-domain generalization
    • Continuous learning frameworks
  2. Efficiency Optimization

    • Computational efficiency improvements
    • Memory-constrained implementations
  3. Advanced Fusion Techniques

    • Multi-modal integration
    • Temporal consistency methods

4. All the methods are listed below:

TitleFirst AuthorVenueGithubBibtex
Gsv-cities: Toward appropriate supervised visual place recognitionAmar Ali-beyNeurocomputing 2022GithubBibTex
Mixvpr: Feature mixing for visual place recognitionAmar Ali-beyWACV 2023GithubBibTex
BoQ: A place is worth a bag of learnable queriesAmar Ali-beyCVPR 2024GithubBibTex
NetVLAD: CNN Architecture for Weakly Supervised Place RecognitionRelja ArandjelovicCVPR 2016GithubBibTex
AttDLNet: Attention-based Deep Network for 3D LiDAR Place RecognitionTiago BarrosRobot 2022GithubBibTex
Place recognition survey: An update on deep learning approachesTiago BarrosarXivBibTex
Rethinking visual geo-localization for large-scale applicationsGabriele BertonCVPR 2022GithubBibTex
Eigenplaces: Training viewpoint robust models for visual place recognitionGabriele BertonICCV 2023GithubBibTex
Unifying deep local and global features for image searchBingyi CaoECCV 2020GithubBibTex
Emerging properties in self-supervised vision transformersMathilde CaronICCV 2021BibTex
Lcdnet: Deep loop closure detection and point cloud registration for lidar slamDaniele CattaneoTRO 2022GithubBibTex
SpoxelNet: Spherical voxel-based deep place recognition for 3D point clouds of crowded indoor spacesMin Young ChangIROS 2020BibTex
Convolutional neural network-based place recognitionZetao ChenarXivBibTex
FAB-MAP: Probabilistic localization and mapping in the space of appearanceMark CumminsIJRR 2008GithubBibTex
A solution to the simultaneous localization and map building (SLAM) problemMWM Gamini DissanayakeTRO 2001BibTex
Dh3d: Deep hierarchical 3d descriptors for robust large-scale 6dof relocalizationJuan DuECCV 2020GithubBibTex
Direct sparse odometryJakob EngelTPAMI 2017GithubBibTex
Svt-net: Super light-weight sparse voxel transformer for large scale place recognitionZhaoxin FanAAAI 2022GithubBibTex
Adaptive mobile robot navigation and mappingHJS FederIJRR 1999BibTex
Toward object-based place recognition in dense rgb-d mapsDorian Gálvez-LópezTRO 2012BibTex
Bags of binary words for fast place recognition in image sequencesDorian Gálvez-LópezTRO 2012BibTex
Revisit Anything: Visual Place Recognition via Image Segment RetrievalKartik GargECCV 2024GithubBibTex
Self-supervising ffne-grained region similarities for large-scale image localizationYixiao GeECCV 2020GithubBibTex
FAB-MAP+ RatSLAM: Appearance-based SLAM for multiple times of dayAJ GloverICRA 2010BibTex
The perfect match: 3d point cloud matching with smoothed densitiesZan GojcicCVPR 2019GithubBibTex
Salsa: Swift adaptive lightweight self-attention for enhanced lidar place recognitionRaktim Gautam GoswamiRAL 2024GithubBibTex
Indoor localization improved by spatial context—A surveyFuqiang GuACM Computing SurveysBibTex
Recent trends in task and motion planning for robotics: A surveyHuihui GuoComputing SurveysBibTex
Visual place recognition using HMM sequence matchingPeter HansenIROS 2014BibTex
Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognitionStephen HauslerCVPR 2021GithubBibTex
Pair-vpr: Place-aware pre-training and contrastive pair classiffcation for visual place recognition with vision transformersStephen HauslerRAL 2025GithubBibTex
Hitpr: Hierarchical transformer for place recognition in point cloudZhixing HouICRA 2022BibTex
Progeo: Generating prompts through image-text contrastive learning for visual geo-localizationJingqi HuICANN 2024GithubBibTex
360loc: A dataset and benchmark for omnidirectional visual localization with cross-device queriesHuajian HuangCVPR 2024GithubBibTex
Cross-modal and uni-modal soft-label alignment for image-text retrievalHailang HuangAAAI 2024GithubBibTex
Optimal transport aggregation for visual place recognitionSergio IzquierdoCVPR 2024GithubBibTex
Learned contextual feature reweighting for image geo-localizationHyo Jin KimCVPR 2017GithubBibTex
HeLiPR: Heterogeneous LiDAR dataset for inter-LiDAR place recognition under spatiotemporal variationsMinwoo JungIJRR 2024GithubBibTex
HeLiPR: Heterogeneous LiDAR dataset for inter-LiDAR place recognition under spatiotemporal variationsMinwoo JungIJRR 2024GithubBibTex
Anyloc: Towards universal visual place recognitionNikhil KeethaRAL 2023GithubBibTex
A holistic visual place recognition approach using lightweight cnns for signiffcant viewpoint and appearance changesAhmad KhaliqTRO 2019BibTex
Level-5 autonomous driving—Are we there yet? A review of research literatureManzoor Ahmed KhanACM Computing SurveysBibTex
Narrowing your fov with solid: Spatially organized and lightweight global descriptor for fov-constrained lidar place recognitionHogyun KimRAL 2024GithubBibTex
Text2pos: Text-to-point-cloud cross-modal localizationManuel KolmetCVPR 2022GithubBibTex
Minkloc3d: Point cloud based large-scale place recognitionJacek KomorowskiWACV 2021GithubBibTex
Improving point cloud based place recognition with ranking-based loss and large batch trainingJacek KomorowskiICPR 2022GithubBibTex
Generalized contrastive optimization of siamese networks for place recognitionMaría Leyva-VallinaarXivGithubBibTex
Toward Robust Visual Place Recognition for Mobile Robots With an End-to-End Dark-Enhanced NetZhenyu LiTII 2025GithubBibTex
CSPFormer: A cross-spatial pyramid transformer for visual place recognitionZhenyu LiNeurocomputing 2024BibTex
Feature-Level Knowledge Distillation for Place Recognition Based on Soft-Hard Labels Teaching ParadigmZhenyu LiTIIS 2025GithubBibTex
CWPFormer: Towards High-performance Visual Place Recognition for Robot with Cross-weight Attention LearningZhenyu LiTAI 2025GithubBibTex
Translo: A window-based masked point transformer framework for large-scale lidar odometryJiuming LiuAAAI 2023GithubBibTex
Stochastic attraction-repulsion embedding for large scale image localizationLiu LiuICCV 2019GithubBibTex
Stochastic attraction-repulsion embedding for large scale image localizationLiu LiuICCV 2019GithubBibTex
Visual place recognition: A surveyStephanie LowryTRO 2015BibTex
Unsupervised online learning of condition-invariant images for place recognitionStephanie LowryACRA 2014BibTex
Deep homography estimation for visual place recognitionFeng LuAAAI 2024GithubBibTex
SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efffcient Place RecognitionFeng LuarXivGithubBibTex
Cricavpr: Cross-image correlation-aware representation learning for visual place recognitionFeng LuCVPR 2024GithubBibTex
Towards seamless adaptation of pre-trained models for visual place recognitionFeng LuarXivGithubBibTex
3D point cloud-based place recognition: a surveyKan LuoArtiffcial Intelligence ReviewBibTex
BEVPlace: Learning LiDAR-based place recognition using bird’s eye view imagesLun LuoICCV 2023GithubBibTex
Seqot: A spatial–temporal transformer network for place recognition using sequential lidar dataJunyi MaTIE 2022GithubBibTex
OverlapTransformer: An efffcient and yaw-angle-invariant transformer network for LiDAR-based place recognitionJunyi MaRAL 2022GithubBibTex
1 year, 1000 km: The oxford robotcar datasetWill MaddernIJRR 2017DatasetBibTex
SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nightsMichael J MilfordICRA 2012GithubBibTex
Environment selection and hierarchical place recognitionMahesh MohanICRA 2015BibTex
FastSLAM: A factored solution to the simultaneous localization and mapping problemMichael MontemerloAAAI 2002GithubBibTex
ORB-SLAM: A versatile and accurate monocular SLAM systemRaul Mur-ArtalTRO 2002BibTex
A comprehensive review on autonomous navigationSaeid NahavandiComputing SurveysBibTex
The mapillary vistas dataset for semantic understanding of street scenesGerhard NeuholdICCV 2017DatasetBibTex
Single-view place recognition under seasonal changesDaniel OlidArxivGithubBibTex
Dinov2: Learning robust visual features without supervisionMaxime OquabArxivGithubBibTex
Visual place recognition using landmark distribution descriptorsPilailuck PanphattarasapACCV 2016BibTex
PointNet: Deep Learning on Point Sets for 3D Classification and SegmentationCharles R QiCVPR 2017GithubBibTex
Pointnet++: Deep hierarchical feature learning on point sets in a metric spaceCharles Ruizhongtai Qineurips 2017GithubBibTex
Fine-tuning CNN image retrieval with no human annotationFilip RadenovićTPAMI 2018GithubBibTex
Learning transferable visual models from natural language supervisionAlec RadfordICML 2021GithubBibTex
Vlocnet++: Deep multitask learning for semantic visual localization and odometryNoha RadwanRAL 2018BibTex
Learning with average precision: Training image retrieval with a listwise lossJerome RevaudICCV 2019GithubBibTex
Superglue: Learning feature matching with graph neural networksPaul-Edouard SarlinCVPR 2020GithubBibTex
MambaPlace: Text-to-Point-Cloud Cross-Modal Place Recognition with Attention Mamba MechanismsTianyi ShangarXivGithubBibTex
Text-Driven 3D Lidar Place Recognition for Autonomous DrivingTianyi ShangarXivGithubBibTex
Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place RecognitionTianyi ShangarXivGithubBibTex
Voxel-based representation learning for place recognition based on 3d point cloudsSriram SivaIROS 2020BibTex
A dataset for benchmarking image-based localizationXun SunCVPR 2017BibTex
On the performance of convnet features for place recognitionNiko SünderhaufIROS 2015BibTex
OpenSeqSLAM2. 0: An open source toolbox for visual place recognition under changing conditionsBen TalbotIROS 2018GithubBibTex
OpenSeqSLAM2. 0: An open source toolbox for visual place recognition under changing conditionsBen TalbotIROS 2018GithubBibTex
The graph SLAM algorithm with applications to large-scale mapping of urban structuresSebastian ThrunIJRR 2006BibTex
24/7 place recognition by view synthesisAkihiko ToriiCVPR 2015BibTex
Visual place recognition with repetitive structuresAkihiko ToriiCVPR 2013BibTex
Effovpr: Effective foundation model utilization for visual place recognitionIssar TzachorICLR 2025BibTex
Pointnetvlad: Deep point cloud based retrieval for large-scale place recognitionMikaela Angelina UyCVPR 2018GithubBibTex
LoGG3D-Net: Locally guided global descriptor learning for 3D place recognitionKavisha VidanapathiranaICRA 2022GithubBibTex
Text to point cloud localization with relation-enhanced transformerGuangzhi WangAAAI 2023BibTex
Transvpr: Transformer-based place recognition with multi-level attention aggregationRuotong WangCVPR 2022GithubBibTex
Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networksSen WangICRA 2017GithubBibTex
Ranking-aware Continual Learning for LiDAR Place RecognitionXufei WangarXivBibTex
Text2loc: 3d point cloud localization from natural languageYan XiaCVPR 2024GithubBibTex
TransLoc3D: Point cloud based large-scale place recognition using adaptive receptive ffeldsTian-Xing XuarXivGithubBibTex
TransVLAD: Multi-scale attention-based global descriptors for visual geo-localizationYifan XuECCV 2023GithubBibTex
Hierarchical attention fusion for geo-localizationLiqi YanICASSP 2021GithubBibTex
Autonomous visual navigation for mobile robots: A systematic literature reviewYuri DV YasudaComputing SurveysBibTex
Mrs-vpr: a multi-resolution sampling based global visual place recognition methodPeng YinICRA 2019BibTex
Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognitionJun YuTNNLS 2019BibTex
3dmatch: Learning local geometric descriptors from rgb-d reconstructionsAndy ZengCVPR 2017GithubBibTex
PCAN: 3D attention map learning using contextual information for point cloud based retrievalWenxiao ZhangCVPR 2019GithubBibTex
Lidar-based place recognition for autonomous driving: A surveyYongjun ZhangComputing SurveysBibTex
Learning deep features for scene recognition using places databaseBolei ZhouNeurIPS 2014BibTex
Loop closure detection using local 3D deep descriptorsYoujie ZhouRAL 2022GithubBibTex
Ndt-transformer: Large-scale 3d point cloud localisation using the normal distribution transform representationZhicheng ZhouICRA 2021GithubBibTex
R2former: Uniffed retrieval and reranking transformer for place recognitionSijie ZhuCVPR 2023GithubBibTex
PRGS: Patch-to-Region Graph Search for Visual Place RecognitionWeiliang ZuoPattern Recognition 2025GithubBibTex
A2GC: Asymmetric Aggregation with Geometric Constraints for Locally Aggregated DescriptorsZhenyu Liarxiv 2025GithubBibTex
FourierPlace: A Vision-Language Localization Framework Based on Frequency Domain RepresentationsTianyi ShangIEEE RAL 2025GithubBibTex

Cite this article:

Li, Z., Shang, T., Xu, P. et al. Place recognition meet multiple modalities: a comprehensive review, current challenges and future development. Artif Intell Rev 58, 363 (2025). https://doi.org/10.1007/s10462-025-11367-8