Skip Mamba Diffusion for Monocular 3D Semantic Scene Completion (AAAI 2025)
January 14, 2026 ยท View on GitHub
Li Liang1, Naveed Akhtar2, Jordan Vice1, Xiangrui Kong1, Ajmal Mian1,
1The University of Western Australia
2The University of Melbourne
Figure 1: Schematics of the approach. Our method comprises a 3D scene completion and a 3D semantic segmentation network.
The former is encapsulated in a VAE framework that employs two sub-networks for conditioning its latent space, a Muti-Scale
Convolutonal Block (MSCB) and a Skimba denoising network. The 3D semantic segmentation network employs a variant of
Skimba. L, W, and H denote the length, width, and height of the original scene, and D is feature map dimension.
Figure 2: Architectural details of the Skimba denoising network. Refer to the text for details.
Citation
If you use this codebase, or otherwise find our work valuable, please cite Skimba:
@article{skimba_2025,
title={Skip Mamba Diffusion for Monocular 3D Semantic Scene Completion},
volume={39},
url={https://ojs.aaai.org/index.php/AAAI/article/view/32547},
DOI={10.1609/aaai.v39i5.32547},
number={5},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
author={Liang, Li and Akhtar, Naveed and Vice, Jordan and Kong, Xiangrui and Mian, Ajmal Saeed},
year={2025},
month={Apr.},
pages={5155-5163}
}