MonoMRN: Monocular Semantic Scene Completion via Masked Recurrent Networks

July 28, 2025 Β· View on GitHub

Xuzhi Wang1  Xinran Wu1  Song Wang2  Lidong Kong3  Ziping Zhao1

1TJNU 2ZJU 3NUS

πŸ“° News

  • πŸ† 2025.06 Β πŸŽ‰ Our paper has been accepted to ICCV 2025!
  • πŸ“„ 2025.07 Β πŸ“ The arXiv preprint is now available: arxiv.org/abs/2507.17661
  • 🚧 Coming Soon Β πŸ› οΈ We are preparing the code release. Stay tuned on GitHub!

🧠 Overview

Monocular Semantic Scene Completion (MSSC) aims to infer voxel-wise occupancy and semantic labels from a single RGB image. Existing methods typically rely on single-stage pipelines that jointly handle visible segmentation and occluded region hallucination. However, these methods often suffer from depth estimation errors and limited generalizability to complex scenes.

MonoMRN is a novel two-stage framework designed to address these challenges:

  1. Stage 1: Coarse MSSC
  2. Stage 2: Masked Recurrent Network (MRN)
    β€£ Focuses on refining occluded regions
    β€£ Designs a Masked Sparse Gated Recurrent Unit (MS-GRU) to focus on occupied regions
    β€£ Proposes a Distance Attention Projection to reduce projection errors

MonoMRN Framework

✨ Highlights

  • πŸ” Masked Sparse GRU (MS-GRU): Efficient recurrent unit that updates only occupied voxels
  • 🎯 Distance Attention Projection: Improves feature projection accuracy
  • 🏠 + πŸš— Indoor & Outdoor Scenes: Works seamlessly on NYUv2 and SemanticKITTI
@inproceedings{wang2025MonoMRN,
  title={Monocular Semantic Scene Completion via Masked Recurrent Networks},
  author={Wang, Xuzhi and Wu, Xinran and Wang, Song and Kong, Lingdong and Zhao, Ziping},
  booktitle={Proceedings of the IEEE/CVF Conference on International Conference on Computer Vision (ICCV)},
  year={2025}
}