MonoMRN: Monocular Semantic Scene Completion via Masked Recurrent Networks

July 28, 2025 · View on GitHub

Xuzhi Wang¹ Xinran Wu¹ Song Wang² Lidong Kong³ Ziping Zhao¹

¹TJNU ²ZJU ³NUS

📰 News

🏆 2025.06 🎉 Our paper has been accepted to ICCV 2025!
📄 2025.07 📝 The arXiv preprint is now available: arxiv.org/abs/2507.17661
🚧 Coming Soon 🛠️ We are preparing the code release. Stay tuned on GitHub!

Monocular Semantic Scene Completion (MSSC) aims to infer voxel-wise occupancy and semantic labels from a single RGB image. Existing methods typically rely on single-stage pipelines that jointly handle visible segmentation and occluded region hallucination. However, these methods often suffer from depth estimation errors and limited generalizability to complex scenes.

MonoMRN is a novel two-stage framework designed to address these challenges:

Stage 1: Coarse MSSC
Stage 2: Masked Recurrent Network (MRN)
‣ Focuses on refining occluded regions
‣ Designs a Masked Sparse Gated Recurrent Unit (MS-GRU) to focus on occupied regions
‣ Proposes a Distance Attention Projection to reduce projection errors

MonoMRN Framework

✨ Highlights

🔁 Masked Sparse GRU (MS-GRU): Efficient recurrent unit that updates only occupied voxels
🎯 Distance Attention Projection: Improves feature projection accuracy
🏠 + 🚗 Indoor & Outdoor Scenes: Works seamlessly on NYUv2 and SemanticKITTI

@inproceedings{wang2025MonoMRN,
  title={Monocular Semantic Scene Completion via Masked Recurrent Networks},
  author={Wang, Xuzhi and Wu, Xinran and Wang, Song and Kong, Lingdong and Zhao, Ziping},
  booktitle={Proceedings of the IEEE/CVF Conference on International Conference on Computer Vision (ICCV)},
  year={2025}
}

Xuzhi Wang1 Xinran Wu1 Song Wang2 Lidong Kong3 Ziping Zhao1

📰 News

🧠 Overview

✨ Highlights

Xuzhi Wang¹ Xinran Wu¹ Song Wang² Lidong Kong³ Ziping Zhao¹