README.md
January 25, 2026 · View on GitHub
DuOcc: Stream and Query-guided Feature Aggregation for Efficient and Effective 3D Occupancy Prediction
Seokha Moon1† · Janghyun Baek1 · Giseop Kim2‡ · Jinkyu Kim1 · Sunwook Choi3
1Korea University · 2DGIST · 3NAVER LABS
†Work done during internship at NAVER LABS, ‡Work done at NAVER LABS
🚀 News
- 2025.11.29 — Code released.
- 2025.11.27 — DuOcc paper has been updated on arXiv.
✨ Highlights
- DuOcc introduces a dual aggregation strategy combining Stream-based Voxel Feature Aggregation (StreamAgg) and Query-guided Feature Aggregation (QueryAgg) for efficient and high-fidelity 3D occupancy prediction.
- Achieves state-of-the-art performance:
- Occ3D-nuScenes: 41.9 mIoU (+2.3 over prior SOTA / in real-time setting)
- SurroundOcc dataset: 23.0 mIoU (+1.1 over prior SOTA)
- Runs within real-time constraints (83 ms) and requires only 2.8 GB of GPU memory — over 40% less memory than competing approaches.
💡 Method

DuOcc is composed of two complementary components:
Stream-based Voxel Feature Aggregation (StreamAgg)
- Aligns and aggregates voxel features over time using motion-aware warping.
- Reduces warping artifacts via a lightweight refinement module (RefineNet).
- Preserves spatially coherent geometry and is particularly effective for static structures, whose positions remain stable after ego-motion compensation—making them inherently suitable for stream-based accumulation.
Query-guided Feature Aggregation (QueryAgg)
- Extracts semantics of dynamic objects from image features and encodes them into propagated instance queries.
- Injects these instance-level query features into the corresponding voxel regions.
- Complements fine-grained dynamic object details that are difficult to capture through voxel accumulation alone due to motion-induced misalignment, occlusion, or sparse projection.
StreamAgg and QueryAgg jointly produce a fast, memory-efficient, and high-fidelity 3D occupancy representation.
🎨 Qualitative Results
DuOcc provides clearer and more consistent 3D occupancy predictions, significantly improving reconstruction of both dynamic objects and fine-grained static structures compared to prior methods.
📊 Quantitative Results
DuOcc achieves state-of-the-art performance on Occ3D-nuScenes (41.9 mIoU) and SurroundOcc Dataset(23.0 mIoU), while running at 83 ms and using only 2.8 GB of memory, making it one of the most efficient high-performing occupancy prediction models available. These results highlight DuOcc’s strong balance of accuracy, speed, and memory efficiency, making it highly suitable for real-world autonomous driving.
🔧 Getting Started
Step 1. Set up the environment:
➡️ Install
Step 2. Prepare datasets and PKL files:
➡️ Prepare Data
🏋️ Training & Inference
# Train
bash local_train.sh DuOcc
# Test
bash local_test.sh DuOcc path/to/checkpoint
🙏 Acknowledgement
This project is not possible without multiple great open-sourced code bases. We list some notable examples below.
📃 Bibtex
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{moon2025duocc,
title={Stream and Query-guided Feature Aggregation for Efficient and Effective 3D Occupancy Prediction},
author={Moon, Seokha and Baek, Janghyun and Kim, Giseop and Kim, Jinkyu and Choi, Sunwook},
journal={arXiv preprint arXiv:2503.22087},
year={2025}
}