DAOcc

March 26, 2026 · View on GitHub

DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction [arxiv]
Zhen Yang, Yanpeng Dong, Jiayu Wang
Beijing Mechanical Equipment Institute, Beijing, China

This is the official implementation of DAOcc. DAOcc is a novel multi-modal occupancy prediction framework that leverages 3D object detection to assist in achieving superior performance while using a deployment-friendly image encoder and practical input image resolution.

News

2025-09-09: DAOcc is accepted to TCSVT — cue the confetti! 🎉
2025-07-20: We have open-sourced the TensorRT inference code for DAOcc, achieving 54.25 mIoU at 104.9 FPS. Check it out here.
2025-07-11: DAOcc achieved 54.33 mIoU on Occ3D-nuScenes without EMA.
2025-04-24: Following SparseBEV, we optimized the 2D-to-3D image feature transformation process, achieving substantial reductions in GPU memory consumption while slightly reducing training time. Check the config file.
2025-01-31: Release the model weights and the first version of the code.
2024-10-01: Our preprint is available on arXiv.

Experimental results

3D Semantic Occupancy Prediction on Occ3D-nuScenes

Method	Camera Mask	Image Backbone	Image Resolution	mIoU	Config	Model	Log
DAOcc	√	R50	256×704	54.33	config	model	log

Method	Camera Mask	Image Backbone	Image Resolution	RayIoU	Config	Model	Log
DAOcc	×	R50	256×704	48.4	config	model	log

Deprecated results (archived)

Method	Camera Mask	Image Backbone	Image Resolution	mIoU	Config	Model	Log
DAOcc	√	R50	256×704	53.82	config	model	log
DAOcc*	√	R50	256×704	54.19	-	model	-

Method	Camera Mask	Image Backbone	Image Resolution	RayIoU	Config	Model	Log
DAOcc	×	R50	256×704	48.2	config	model	log

3D Semantic Occupancy Prediction on SurroundOcc

Method	Image Backbone	Image Resolution	IoU	mIoU	Config	Model	Log
DAOcc	R50	256×704	45.0	30.5	config	model	log

3D Semantic Occupancy Prediction on OpenOccupancy

Method	Image Backbone	Image Resolution	IoU	mIoU	Config	Model	Log
DAOcc	R18	256×704	32.2	24.1	config	model	log

3D Semantic Occupancy Prediction on Occ3D-Waymo

Method	Camera Mask	Infov Mask	Image Backbone	Image Resolution	mIoU	Config	Model	Log
DAOcc	√	√	R50	256×704	44.69	config	-	log
DAOcc*	√	√	R50	256×704	45.13	-	-	-

The * means using exponential moving average (EMA) hook.
For Occ3D-Waymo, we use only 20% of the training data.

Getting Started

TensorRT Deployment

We provide deployment details of DAOcc, including converting the Torch model to ONNX format and building a TensorRT (TRT) engine from the ONNX model. For specific details, please refer to CUDA_DAOcc.

Model	Precision	Hardware	mIoU	FPS
DAOcc	FP16+INT8	AGX Orin (64GB)	53.70	20.0

Citation

@article{yang2025daocc,
  title={Daocc: 3d object detection assisted multi-sensor fusion for 3d occupancy prediction},
  author={Yang, Zhen and Dong, Yanpeng and Wang, Jiayu and Wang, Heng and Ma, Lichao and Cui, Zijian and Liu, Qi and Pei, Haoran and Zhang, Kexin and Zhang, Chao},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  year={2025},
  publisher={IEEE}
}

Acknowledgements

Many thanks to these excellent open-source projects: