MonoMRN: Monocular Semantic Scene Completion via Masked Recurrent Networks
July 28, 2025 Β· View on GitHub
Xuzhi Wang1ββXinran Wu1ββSong Wang2ββLidong Kong3ββZiping Zhao1
1TJNU 2ZJU 3NUS
π° News
- π 2025.06 Β π Our paper has been accepted to ICCV 2025!
- π 2025.07 Β π The arXiv preprint is now available: arxiv.org/abs/2507.17661
- π§ Coming Soon Β π οΈ We are preparing the code release. Stay tuned on GitHub!
π§ Overview
Monocular Semantic Scene Completion (MSSC) aims to infer voxel-wise occupancy and semantic labels from a single RGB image. Existing methods typically rely on single-stage pipelines that jointly handle visible segmentation and occluded region hallucination. However, these methods often suffer from depth estimation errors and limited generalizability to complex scenes.
MonoMRN is a novel two-stage framework designed to address these challenges:
- Stage 1: Coarse MSSC
- Stage 2: Masked Recurrent Network (MRN)
β£ Focuses on refining occluded regions
β£ Designs a Masked Sparse Gated Recurrent Unit (MS-GRU) to focus on occupied regions
β£ Proposes a Distance Attention Projection to reduce projection errors
β¨ Highlights
- π Masked Sparse GRU (MS-GRU): Efficient recurrent unit that updates only occupied voxels
- π― Distance Attention Projection: Improves feature projection accuracy
- π + π Indoor & Outdoor Scenes: Works seamlessly on NYUv2 and SemanticKITTI
@inproceedings{wang2025MonoMRN,
title={Monocular Semantic Scene Completion via Masked Recurrent Networks},
author={Wang, Xuzhi and Wu, Xinran and Wang, Song and Kong, Lingdong and Zhao, Ziping},
booktitle={Proceedings of the IEEE/CVF Conference on International Conference on Computer Vision (ICCV)},
year={2025}
}