We research 3D Occupancy Perception for Autonomous Driving

June 4, 2026 · View on GitHub

image

Huaiyuan Xu . Junliang Chen . Shiyu Meng . Yi Wang . Lap-Pui Chau*

arXiv PDF

We research 3D Occupancy Perception for Autonomous Driving

This work focuses on 3D dense perception in autonomous driving, encompassing LiDAR-Centric Occupancy Perception, Vision-Centric Occupancy Perception, and Multi-Modal Occupancy Perception. Information fusion techniques for this field are discussed. We believe this will be the most comprehensive survey to date on 3D Occupancy Perception. Please stay tuned!😉😉😉

This is an active repository, you can watch for following the latest advances. If you find it useful, please kindly star this repo.

If you discover any missing work or have any suggestions, please feel free to submit a pull request or contact us. We will promptly add the missing papers to this repository.

✨Highlight

[1] A systematically survey for the latest research on 3D occupancy perception in the field of autonomous driving.

[2] The survey provides the taxonomy of 3D occupancy perception, and elaborate on core methodological issues, including network pipelines, multi-source information fusion, and effective network training.

[3] The survey presents evaluations for 3D occupancy perception, and offers detailed performance comparisons. Furthermore, current limitations and future research directions are discussed.

🔥 News

  • [2024-09-03] This survey got accepted by Information Fusion (Impact factor: 14.7).
  • [2024-07-21] More representative works and benchmarking comparisons have been incorporated, bringing the total to 192 literature references.
  • [2024-05-18] More figures have been added to the survey. We reorganize the occupancy-based applications.
  • [2024-05-08] The first version of the survey is available on arXiv. We curate this repository.

Introduction

3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Owing to its comprehensive perception capability, this technology is emerging as a trend in autonomous driving perception systems, and is attracting significant attention from both industry and academia. Similar to traditional bird's-eye view (BEV) perception, 3D occupancy perception has the nature of multi-source input and the necessity for information fusion. However, the difference is that it captures vertical structures that are ignored by 2D BEV. In this survey, we review the most recent works on 3D occupancy perception, and provide in-depth analyses of methodologies with various input modalities. Specifically, we summarize general network pipelines, highlight information fusion techniques, and discuss effective network training. We evaluate and analyze the occupancy perception performance of the state-of-the-art on the most popular datasets. Furthermore, challenges and future research directions are discussed. We hope this paper will inspire the community and encourage more research work on 3D occupancy perception.

Summary of Contents

Methods: A Survey

LiDAR-Centric Occupancy Perception

YearVenuePaper TitleLink
2026arXivTFusionOcc: Student's t-Distribution Based Object-Centric Multi-Sensor Fusion Framework for 3D Occupancy PredictionCode
2026arXivLiFlow: Flow Matching for 3D LiDAR Scene CompletionCode
2025arXivOctree Latent Diffusion for Semantic 3D Scene Generation and Completion-
2025arXivDiffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene CompletionCode
2024NeurIPSTALoS: Enhancing Semantic Scene Completion via Test-time Adaptation on the Line of SightCode
2024CVPRPaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness (Best paper award candidate)Project Page
2024IROSLiDAR-based 4D Occupancy Completion and ForecastingProject Page
2024arXivTowards 3D Semantic Scene Completion for Autonomous Driving: A Meta-Learning Framework Empowered by Deformable Large-Kernel Attention and Mamba Model-
2024arXivOccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear ComplexityProject Page
2024arXivDiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models-
2024arXivMergeOcc: Bridge the Domain Gap between Different LiDARs for Robust Occupancy Prediction-
2023T-IVOccupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy AutoencodersCode
2023arXivPointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy PredictionCode
2021T-PAMISemantic Scene Completion using Local Deep Implicit Functions on LiDAR Data-
2021AAAISparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene CompletionCode
2020CoRLS3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds-
20203DVLMSCNet: Lightweight Multiscale 3D Semantic CompletionCode

Vision-Centric Occupancy Perception

YearVenuePaper TitleLink
2026CVPRDeformable Gaussian Occupancy: Decoupling Rigid and Nonrigid Motion with Factorized Distillation[Code] (https://github.com/vita-epfl/DeGO)
2026CVPRSparsity-Aware Voxel Attention and Foreground Modulation for 3D Semantic Scene Completion[Code] (https://github.com/xyandtyh/VoxSAMNet)
2026CVPRDr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving-
2026T-IPMulti-Resolution Alignment for Voxel Sparsity in Camera-Based 3D Semantic Scene CompletionCode
2026AAAITowards 3D Object-Centric Feature Learning for Semantic Scene Completion-
2026AAAITowards Temporal Fusion Beyond the Field of View for Camera-based Semantic Scene Completion-
2026arXivO3N: Omnidirectional Open-Vocabulary Occupancy PredictionCode
2026arXivM2-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera InputsCode
2026arXivVG3S: Visual Geometry Grounded Gaussian Splatting for Semantic Occupancy Prediction-
2026arXivRebenchmarking Unsupervised Monocular 3D Occupancy Prediction-
2026arXivSPOT-Occ: Sparse Prototype-guided Transformer for Camera-based 3D Occupancy PredictionCode
2025T-PAMISPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D RepresentationsCode
2025ICCVALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow PredictionCode
2025ICCVMAESTRO: Task-Relevant Optimization via Adaptive Feature Enhancement and Suppression for Multi-task 3D PerceptionCode
2025ICCVGaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian SplattingProject Page
2025ICCVSemantic Causality-Aware Vision-Based 3D Occupancy PredictionCode
2025ICCVOccupancy Learning with Spatiotemporal MemoryProject Page
2025ICCVGS-Occ3D: Scaling Vision-only Occupancy Reconstruction for Autonomous Driving with Gaussian SplattingProject Page
2025ICCVDisentangling Instance and Scene Contexts for 3D Semantic Scene CompletionCode
2025ICCVFeed-Forward SceneDINO for Unsupervised Semantic Scene CompletionProject Page
2025CVPRVoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow PredictionProject Page
2025CVPRRethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy PredictionProject Page
2025CVPRGaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial UnderstandingCode
2025CVPR3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View TransformationProject Page
2025T-ROParticle-based Instance-aware Semantic Occupancy Mapping in Dynamic Environments-
2025T-ITSGEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision-
2025AAAIVLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion (Oral)Code
2025AAAISkip Mamba Diffusion for Monocular 3D Semantic Scene CompletionCode
2025AAAIViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy PredictionProject Page
2025AAAIProtoOcc: Accurate, Efficient 3D Occupancy Prediction Using Dual Branch Encoder-Prototype Query DecoderCode
2025AAAILOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba-
2025AAAISemi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance-
2025ICRAOCCUQ: Exploring Efficient Uncertainty Quantification for 3D Occupancy PredictionCode
2025ICRADiffusion-Based Generative Models for 3D Occupancy Prediction in Autonomous Driving-
2025AAAIWA Spatiotemporal Approach to Tri-Perspective Representation for 3D Semantic Occupancy PredictionProject Page
2025arXivHyGE-Occ: Hybrid View-Transformation with 3D Gaussian and Edge Priors for 3D Panoptic Occupancy Prediction-
2025arXivVOIC: Visible-Occluded Decoupling for Monocular 3D Semantic Scene CompletionCode
2025arXivEnhancing 3D Semantic Scene Completion with a Refinement ModuleProject Page
2025arXivVG3T: Visual Geometry Grounded Gaussian Transformer-
2025arXivSuperQuadricOcc: Multi-Layer Gaussian Approximation of Superquadrics for Real-Time Self-Supervised Occupancy Estimation-
2025arXivQueryOcc: Query-based Self-Supervision for 3D Semantic Occupancy-
2025arXivShelfOcc: Native 3D Supervision beyond LiDAR for Vision-Based Occupancy Estimation-
2025arXivEasyOcc: 3D Pseudo-Label Supervision for Fully Self-Supervised Semantic Occupancy Prediction Models-
2025arXivST-GS: Vision-Based 3D Semantic Occupancy Prediction with Spatial-Temporal Gaussian Splatting-
2025arXivSPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene CompletionCode
2025arXivDA-Occ: Efficient 3D Voxel Occupancy Prediction via Directional 2D for Geometric Structure Preservation-
2025arXivUnleashing Semantic and Geometric Priors for 3D Scene Completion-
2025arXivDA-Occ: Efficient 3D Voxel Occupancy Prediction via Directional 2D for Geometric Structure Preservation-
2025arXivGTAD: Global Temporal Aggregation Denoising Learning for 3D Semantic Occupancy Prediction-
2025arXivVisHall3D: Monocular Semantic Scene Completion from Reconstructing the Visible Regions to Hallucinating the Invisible RegionsCode
2025arXivFMOcc: TPV-Driven Flow Matching for 3D Occupancy Prediction with Selective State Space Model-
2025arXivOut-of-Distribution Semantic Occupancy PredictionCode
2025arXivGraphGSOcc: Semantic and Geometric Graph Transformer for 3D Gaussian Splating-based Occupancy Prediction-
2025arXivQuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy PredictionProject Page
2025arXivODG: Occupancy Prediction Using Dual Gaussians-
2025arXivS2GO: Streaming Sparse Gaussian Occupancy Prediction-
2025arXivVoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object DetectionProject Page
2025arXivSHTOcc: Effective 3D Occupancy Prediction with Sparse Head and Tail VoxelsCode
2025arXivSee through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy PredictionCode
2025arXivSTCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow PredictionCode
2025arXivLMPOcc: 3D Semantic Occupancy Prediction Utilizing Long-Term Memory Prior from Historical Traversals-
2025arXivInverse++: Vision-Centric 3D Semantic Occupancy Prediction Assisted with 3D Object DetectionCode
2025arXivMitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy PredictionCode
2025arXivSGFormer: Satellite-Ground Fusion for 3D Semantic Scene CompletionCode
2025arXivL2COcc: Lightweight Camera-Centric Semantic Scene Completion via Distillation of LiDAR ModelProject Page
2025arXivSA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real WorldCode
2025arXivOccLinker: Deflickering Occupancy Networks through Lightweight Spatio-Temporal Correlation-
2025arXivLearning A Zero-shot Occupancy Network from Vision Foundation Models via Self-supervised Adaptation-
2025arXivVision-based 3D Semantic Scene Completion via Capture Dynamic Representations-
2025arXivTT-Occ: Test-Time Compute for Self-Supervised Occupancy via Spatio-Temporal Gaussian SplattingCode
2025arXivAutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting-
2025arXivGaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow-
2025arXivLearning Temporal 3D Semantic Scene Completion via Optical Flow Guidance-
2025arXivGaussRender: Learning 3D Occupancy with Gaussian RenderingCode
2025arXivEvent-aided Semantic Scene CompletionCode
2024NeurIPSOctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree QueriesCode
2024NeurIPSContext and Geometry Aware Voxel Transformer for Semantic Scene Completion (Spotlight paper)Code
2024NeurIPSOPUS: Occupancy Prediction Using a Sparse SetCode
2024ECCVViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided TransformersCode
2024ECCVCVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy PredictionCode
2024ECCVVEON: Vocabulary-Enhanced Occupancy PredictionCode
2024ECCVFully Sparse 3D Occupancy PredictionCode
2024ECCVGaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy PredictionProject Page
2024ECCVOccupancy as Set of PointsCode
2024ECCVHierarchical Temporal Context Learning for Camera-based Semantic Scene CompletionCode
2024CVPRLowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-based 3D Semantic Occupancy Prediction-
2024CVPRBi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion-
2024CVPRSymphonize 3D Semantic Scene Completion with Contextual Instance QueriesCode
2024CVPRSparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy PredictionProject Page
2024CVPRSelfOcc: Self-Supervised Vision-Based 3D Occupancy PredictionProject Page
2024CVPRPanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic SegmentationCode
2024CVPRNot All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-DistillationCode
2024CVPRCOTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy PredictionCode
2024CVPRCollaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated VehiclesProject Page
2024CVPRCam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving ApplicationsCode
2024CVPRBoosting Self-Supervision for Single-View Scene Completion via Knowledge DistillationProject Page
2024CVPRDriveWorld: 4D Pre-trained Scene Understanding viaWorld Models for Autonomous Driving-
2024T-IPCamera-based 3D Semantic Scene Completion with Sparse Guidance NetworkCode
2024CoRLLet Occ Flow: Self-Supervised 3D Occupancy Flow PredictionProject Page
2024IJCAILabel-efficient Semantic Scene Completion with Scribble AnnotationsCode
2024IJCAIBridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene CompletionCode
2024ICRAThe RoboDrive Challenge: Drive Anytime Anywhere in Any ConditionProject Page
2024ICRARenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering SupervisionCode
2024ICRAMonoOcc: Digging into Monocular Semantic Occupancy PredictionCode
2024ICRAFastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird’s-Eye View and Perspective View-
2024AAAIRegulating Intermediate 3D Features for Vision-Centric Autonomous DrivingCode
2024AAAIOne at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception-
2024RA-LHybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction-
2024RA-LUniScene: Multi-Camera Unified Pre-Training via 3D Scene ReconstructionCode
2024AAIMLSOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraintsProject Page
20243DVPanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving-
2024IROSSSCBench: Monocular 3D Semantic Scene Completion Benchmark in Street ViewsCode
2024arXivGSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting-
2024arXivGaussianWorld: Gaussian World Model for Streaming 3D Occupancy PredictionCode
2024arXivGaussianAD: Gaussian-Centric End-to-End Autonomous DrivingProject Page
2024arXivHierarchical Context Alignment with Disentangled Geometric and Temporal Modeling for Semantic Occupancy Prediction-
2024arXivFast Occupancy Network-
2024arXivLightweight Spatial Embedding for Vision-based 3D Occupancy Prediction-
2024arXivGaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy PredictionCode
2024arXivLanguage Driven Occupancy PredictionCode
2024arXivGaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous DrivingCode
2024arXivET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera-
2024arXivReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning-
2024arXivDeep Height Decoupling for Precise Vision-based 3D Occupancy PredictionCode
2024arXivAdaOcc: Adaptive-Resolution Occupancy Prediction-
2024arXivMambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive ReorderingCode
2024arXivVPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction-
2024arXivUniVision: A Unified Framework for Vision-Centric 3D PerceptionCode
2024arXivLangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering-
2024arXivReal-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement-
2024arXivα-SSC: Uncertainty-Aware Camera-based 3D Semantic Scene Completion-
2024arXivPanoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance CenterCode
2024arXivBDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy NetworkCode
2024arXivOccFlowNet: Towards Self-supervised Occupancy Estimation via Differentiable Rendering and Occupancy Flow-
2024arXivOccFiner: Offboard Occupancy Refinement with Hybrid Propagation-
2024arXivInverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy PredictionCode
2023CVPRVoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene CompletionCode
2023CVPRTri-Perspective View for Vision-Based 3D Semantic Occupancy PredictionProject Page
2023NeurIPSPOP-3D: Open-Vocabulary 3D Occupancy Prediction from ImagesProject Page
2023NeurIPSOcc3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous DrivingProject Page
2023ICCVSurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous DrivingProject Page
2023ICCVScene as OccupancyCode
2023ICCVOccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy PredictionCode
2023ICCVNDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates SpaceCode
2023T-IV3DOPFormer: 3D Occupancy Perception from Multi-Camera Images with Directional and Distance EnhancementCode
2023arXivOccupancyDETR: Using DETR for Mixed Dense-sparse 3D Occupancy Prediction-
2023arXivOVO: Open-Vocabulary OccupancyCode
2023arXivOccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free EnvironmentsProject Page
2023arXivOccDepth: A Depth-Aware Method for 3D Semantic Scene CompletionCode
2023arXivFlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height PluginCode
2023arXivFB-OCC: 3D Occupancy Prediction based on Forward-Backward View TransformationCode
2023arXivDepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for Monocular 3D Semantic Scene Completion-
2023arXivA Simple Framework for 3D Occupancy Estimation in Autonomous DrivingCode
2023arXivUniWorld: Autonomous Driving Pre-training via World ModelsCode
2022CVPRMonoScene: Monocular 3D Semantic Scene CompletionProject Page

Radar-Centric Occupancy Perception

YearVenuePaper TitleLink
2025arXiv4D-ROLLS: 4D Radar Occupancy Learning via LiDAR SupervisionCode
2024NeurIPSRadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar-

Multi-Modal Occupancy Perception

YearVenuePaper TitleCode
2025CVPROccMamba: Semantic Occupancy Prediction with State Space ModelsCode
2025IROSA Coarse-to-Fine Approach to Multi-Modality 3D Occupancy GroundingCode
2025IROSREOcc: Camera-Radar Fusion with Radar Feature Enrichment for 3D Occupancy Prediction-
2025arXivDrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and PlanningCode
2025arXivOccLE: Label-Efficient 3D Semantic Occupancy Prediction-
2025arXivGaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable AttentionProject Page
2025arXivOccCylindrical: Multi-Modal Fusion with Cylindrical Representation for 3D Semantic Occupancy PredictionCode
2025arXivMinkOcc: Towards real-time label-efficient semantic occupancy prediction-
2025arXivOccGS: Zero-shot 3D Occupancy Reconstruction with Semantic and Geometric-Aware Gaussian Splatting-
2025arXivMetaOcc: Surround-View 4D Radar and Camera Fusion Framework for 3D Occupancy Prediction with Dual Training StrategiesCode
2025arXivDORACAMOM: Joint 3D Detection and Occupancy Prediction with Multi-view 4D Radars and Cameras for Omnidirectional Perception-
2024ECCVOccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous DrivingProject Page
2024RA-LCo-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy PredictionProject Page
2024arXivMR-Occ: Efficient Camera-LiDAR 3D Semantic Occupancy Prediction Using Hierarchical Multi-Resolution Voxel Representation-
2024arXivPVP: Polar Representation Boost for 3D Semantic Occupancy Prediction-
2024arXivRobust 3D Semantic Occupancy Prediction with Calibration-free Spatial TransformationCode
2024arXivOccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction-
2024arXivDAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy PredictionCode
2024arXivLiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and CameraProject Page
2024arXivOccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction-
2024arXivEFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy NetworkCode
2024arXivReal-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution-
2024arXivOccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction-
2024arXivUnleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception-
2023ICCVOpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy PerceptionCode

3D Occupancy Datasets

DatasetYearVenueModality# of ClassesFlowLink
UniOcc2025ICCVCamera10, 15, 17✔️Intro.
OpenScene2024CVPR 2024 ChallengeCamera-✔️Intro.
Cam4DOcc2024CVPRCamera+LiDAR2✔️Intro.
Occ3D2024NeurIPSCamera14 (Occ3D-Waymo), 16 (Occ3D-nuScenes)Intro.
OpenOcc2023ICCVCamera16Intro.
OpenOccupancy2023ICCVCamera+LiDAR16Intro.
SurroundOcc2023ICCVCamera16Intro.
OCFBench2023arXivLiDAR-(OCFBench-Lyft), 17(OCFBench-Argoverse), 25(OCFBench-ApolloScape), 16(OCFBench-nuScenes)Intro.
SSCBench2023arXivCamera19(SSCBench-KITTI-360), 16(SSCBench-nuScenes), 14(SSCBench-Waymo)Intro.
SemanticKITT2019ICCVCamera+LiDAR19(Semantic Scene Completion task)Intro.

Occupancy-based Applications

Indoor Ego-Centric

Specific TaskYearVenuePaper TitleLink
Indoor Occupancy Prediction2026CVPRMonocular Open Vocabulary Occupancy Prediction for Indoor ScenesCode
Indoor Occupancy Prediction2026CVPRGeneralizing Visual Geometry Priors to Sparse Gaussian Occupancy PredictionCode
Indoor Occupancy Prediction2026arXivVEOcc: Voxel-Centric Online Semantic Occupancy Prediction For Embodied Scene UnderstandingProject Page
Indoor Occupancy Prediction2026arXivParameter-Free Adaptive Multi-Scale Channel-Spatial Attention Aggregation framework for 3D Indoor Semantic Scene Completion Toward Assisting Visually Impaired-
Indoor Occupancy Prediction2025RA-LEnhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge DistillationCode
Indoor Semantic Scene Completion2025arXivTGSFormer: Scalable Temporal Gaussian Splatting for Embodied Semantic Scene Completion-
Indoor Occupancy Prediction2025arXivSplatSSC: Decoupled Depth-Guided Gaussian Splatting for Semantic Scene Completion-
Indoor Occupancy Prediction2025arXivYouTube-Occ: Learning Indoor 3D Semantic Occupancy Prediction from YouTube Videos-

Robotics

Specific TaskYearVenuePaper TitleLink
Occupancy for UAV2026arXivSkyShield: Occupancy as a Safety Interface for Low-Altitude UAV Autonomy-
Occupancy for Robotic Manipulation2026arXivTrans2Occ: Voxel Occupancy Estimation and Grasp for Transparent Objects from Simulation to Reality-
Occupancy for Mobile Robots2025arXivMobileOcc: A Human-Aware Semantic Occupancy Dataset for Mobile Robots-
Humanoid Occupancy2025arXivHumanoid Occupancy: Enabling A Generalized Multimodal Occupancy Perception System on Humanoid RobotsProject Page
Video Generation2025arXivORV: 4D Occupancy-centric Robot Video GenerationProject Page
World Model2025arXivOccupancy World Model for Robots-
Perception2025arXivRoboOcc: Enhancing the Geometric and Semantic Scene Understanding for Robots-

Segmentation

Specific TaskYearVenuePaper TitleLink
3D Panoptic Segmentation2024CVPRPanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic SegmentationCode
BEV Segmentation2024CVPRWOccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation NetworksCode

Detection

Specific TaskYearVenuePaper TitleLink
3D Object Detection2025ICONIPCollaborative Perceiver: Elevating Vision-based 3D Object Detection via Local Density-Aware Spatial OccupancyCode
3D Object Detection2024NeurIPSTowards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object DetectionCode
3D Object Detection2024CVPRLearning Occupancy for Monocular 3D Object DetectionCode
3D Object Detection2024AAAISOGDet: Semantic-Occupancy Guided Multi-view 3D Object DetectionCode
3D Object Detection2024arXivUltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height-

Tracking

Specific TaskYearVenuePaper TitleLink
Object Tracking2025ICRATrackOcc: Camera-based 4D Panoptic Occupancy TrackingCode

Dynamic Perception

Specific TaskYearVenuePaper TitleLink
3D Flow Prediction2026RA-LSelfOccFlow: Towards end-to-end self-supervised 3D Occupancy Flow prediction-
3D Flow Prediction2024CVPRCam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving ApplicationsCode
3D Flow Prediction2024arXivLet Occ Flow: Self-Supervised 3D Occupancy Flow PredictionProject Page

Generation

Specific TaskYearVenuePaper TitleLink
Scene Generation2026arXivAnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and BeyondProject Page
Scene Generation2025T-PAMIOccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation-
Multimodal Scene Generation2025CVPRUniScene: Unified Occupancy-centric Driving Scene GenerationProject Page
Scene Generation2025arXivGenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video GenerationProject Page
Multimodal Scene Generation2025arXivScaling Up Occupancy-centric Driving Scene Generation: Dataset and MethodCode
Scene Generation2024ECCVPyramid Diffusion for Fine 3D Large Scene Generation (Oral paper)Code
Scene Generation2024CVPRSemCity: Semantic Scene Generation with Triplane DiffusionCode
Scene Generation2024arXivInfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video ModelsProject Page
Scene Generation2024arXivSyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIsProject Page
Specific TaskYearVenuePaper TitleLink
Navigation2026arXivSPAN-Nav: Generalized Spatial Awareness for Versatile Vision-Language NavigationProject Page
Navigation2025arXivOmniNWM: Omniscient Driving Navigation World ModelsProject Page
Navigation for Air-Ground Robots2024RA-LHE-Nav: A High-Performance and Efficient Navigation System for Aerial-Ground Robots in Cluttered EnvironmentsProject Page
Navigation for Air-Ground Robots2024ICRAAGRNav: Efficient and Energy-Saving Autonomous Navigation for Air-Ground Robots in Occlusion-Prone EnvironmentsCode
Navigation for Air-Ground Robots2024arXivOMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space ModelProject Page

World Models

Specific TaskYearVenuePaper TitleLink
Interactive 4D Occupancy Generation2026arXivOccDirector: Language-Guided Behavior and Interaction Generation in 4D Occupancy Space-
4D Occupancy Forecasting2026arXivForecastOcc: Vision-based Semantic Occupancy ForecastingProject Page
4D Occupancy Forecasting and Generation2025ICCVI2-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene ForecastingCode
4D Occupancy Forecasting2025CVPRSpatiotemporal Decoupling for Efficient Vision-Based Occupancy ForecastingCode
4D Occupancy Forecasting2025ICLROccProphet: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner FrameworkCode
4D Occupancy Generation2025ICLRDynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes (Spotlight)Project Page
4D Occupancy Forecasting and Motion Planing2025ICLRSemi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous DrivingCode
4D Occupancy Forecasting and Generation2025AAAIDriving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous DrivingProject Page
4D Occupancy Forecasting and Motion Planing2025ICRARenderWorld: World Model with Self-Supervised 3D Label-
4D Occupancy Forecasting, Motion Planing, and Scene Understanding2025ICRAOcc-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models-
4D Occupancy Forecasting2025arXivOccSTeP: Benchmarking 4D Occupancy Spatio-Temporal PersistenceCode
4D Occupancy Forecasting2025arXivSparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World ModelCode
4D Occupancy Forecasting and Motion Planing2025arXivSparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic QueriesCode
4D Occupancy Forecasting2025arXivCOME: Adding Scene-Centric Forecasting Control to Occupancy World ModelCode
4D Occupancy Forecasting and Motion Planing2025arXivTemporal Triplane Transformers as Occupancy World Models-
4D Occupancy Forecasting2025arXivLEAP: Enhancing Vision-Based Occupancy Networks with Lightweight Spatio-Temporal Correlation-
4D Occupancy Forecasting and Motion Planing2024ECCVOccWorld: Learning a 3D Occupancy World Model for Autonomous DrivingProject Page
4D Occupancy Forecasting2024CVPRUnO: Unsupervised Occupancy Fields for Perception and Forecasting (Oral paper)Project Page
4D Representation Learning Framework2024CVPRDriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving-
4D Occupancy Forecasting2024CVPRCam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving ApplicationsCode
4D Occupancy Forecasting2024AAAISemantic Complete Scene Forecasting from a 4D Dynamic Point Cloud SequenceProject Page
4D Occupancy Forecasting and Motion Planing2024arXivAn Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training-
4D Occupancy Forecasting and Generation2024arXivDOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World ModelProject Page
4D Occupancy Forecasting2024arXivFSF-Net: Enhance 4D Occupancy Forecasting with Coarse BEV Scene Flow for Autonomous Driving-
4D Occupancy Forecasting, Motion Planing, and Reasoning2024arXivOccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving-
4D Occupancy Generation2024arXivOccSora: 4D Occupancy Generation Models as World Simulators for Autonomous DrivingProject Page
4D Occupancy Forecasting2023CVPRPoint Cloud Forecasting as a Proxy for 4D Occupancy ForecastingProject Page

Unified Autonomous Driving Algorithm Framework

Specific TasksYearVenuePaper TitleLink
Perception and Understanding2026arXivXEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments-
Occupancy Forecasting, Reasoning2026arXivSparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and PlanningProject Page
Occupancy Prediction, 3D Object Detection, Segmentation2025AAAIM3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous DrivingCode
Occupancy Prediction, Occupancy Forecasting, Planning, and Understanding2025arXivDrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and PlanningCode
Occupancy Prediction and Planning2025arXivOccVLA: Vision-Language-Action Model with Implicit 3D Occupancy Supervision-
Occupancy Prediction, 3D Object Detection, Online Mapping, Multi-object Tracking, Motion Prediction, Motion Planning2024CVPRDriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving-
Occupancy Prediction, 3D Object Detection2024RA-LUniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction for Autonomous DrivingCode
Occupancy Prediction, 3D Object Detection, HD map reconstruction2024arXivGaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous DrivingCode
Occupancy Forecasting, Motion Planning2024arXivDriving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving-
Occupancy Prediction, 3D Object Detection, BEV segmentation, Motion Planning2023ICCVScene as OccupancyCode

Cite The Survey

If you find our survey and repository useful for your research project, please consider citing our paper:

@misc{xu2024survey,
      title={A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective}, 
      author={Huaiyuan Xu and Junliang Chen and Shiyu Meng and Yi Wang and Lap-Pui Chau},
      year={2024},
      eprint={2405.05173},
      archivePrefix={arXiv}
}

Contact

If you have any questions, please feel free to get in touch:

lap-pui.chau@polyu.edu.hk
huaiyuan.xu@polyu.edu.hk

If you are interested in joining us as a Ph.D. student to research computer vision, machine learning, please feel free to contact Professor Chau:

lap-pui.chau@polyu.edu.hk