README.md

January 15, 2026 · View on GitHub

Welcome to OccStudio, a unified framework for 3D Occupancy Prediction. This project unifies our previous works, including ALOcc, CausalOcc, and GDFusion, along with multiple classic methods into a single, standardized codebase to support research in autonomous driving, embodied AI, and other intelligent systems.

The framework is designed to handle both Semantic Occupancy and Occupancy Flow prediction, supporting a wide variety of input modalities, feature encoding methods, temporal fusion strategies, image backbones, etc. Our goal is to provide a flexible foundation to accelerate research in Spatial Intelligence across academia and industry.

🌟 Highlights

🏆 A Unified Framework: Provides a common codebase for multiple occupancy prediction methods, including ALOcc, CausalOcc, GDFusion, BEVDetOcc, FB-Occ, etc.
🔧 Flexible and Configurable Architecture: Supports multiple input modalities (e.g., images, depth), various types of 3D feature encoding (e.g., Volume-based, BEV-based), different temporal fusion methods (e.g., SoloFusion, GDFusion), and different image backbones (e.g., Resnet, InterImage, Swin-Transformer), all of which are switchable via configuration.
📚 Dataset Support: Provides full support for large-scale datasets like nuScenes and Waymo, and allows for seamlessly switching between different occupancy annotation formats (e.g., Occ3D, SurroundOcc, OpenOccupancy) for robust experimentation.

🛠 Model Zoo

OccStudio currently supports the following models:

Method	Task	Publication
ALOcc	Semantic Occupancy & Flow	ICCV 2025
GDFusion	Semantic Occupancy	CVPR 2025
BEVDetOcc	Semantic Occupancy	-
FB-Occ	Semantic Occupancy	ICCV 2023
SparseOcc	Semantic Occupancy	ECCV 2024

🚀 Get Started

1. Installation

We recommend using Conda for environment management.

# Clone this repository (replace OccStudio with your actual repo name)
git clone https://github.com/cdb342/OccStudio
cd OccStudio

# Create and activate the conda environment
conda create -n OccStudio python=3.8 -y
conda activate OccStudio

# Install PyTorch dependencies (for CUDA 11.8)
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 -f https://download.pytorch.org/whl/torch_stable.html

# Install MMCV dependencies
git clone https://github.com/open-mmlab/mmcv
cd mmcv
git checkout 1.x # Use the stable 1.x branch
MMCV_WITH_OPS=1 pip install -e . -v
cd ..

# Install MMDetection and MMSegmentation
pip install mmdet==2.28.2 mmsegmentation==0.30.0

# Install the OccStudio framework itself
pip install -v -e .

# Install other dependencies
pip install torchmetrics timm dcnv4 ninja spconv transformers IPython einops numba
pip install numpy==1.23.4 # Pin numpy version for compatibility

# (Optional for SparseOcc)
cd mmdet3d/models/sparseocc/csrc
pip install -v -e .

2. Data Preparation

nuScenes

Download the full nuScenes dataset from the official website.
Download the Occ3D nuScenes annotations from the project page.
(Optional) Download other community annotations for extended experiments:
- OpenOcc_v2.1 Annotations
- OpenOcc_v2.1 Ray Mask
- SurroundOcc Annotations (rename to gts_surroundocc)
- OpenOccupancy-v0.1 Annotations

Please organize the data into the following directory structure:

├── data
│   ├── nuscenes
│   │   ├── maps, samples, sweeps, v1.0-test, v1.0-trainval
│   │   ├── gts                 # Occ3D annotations
│   │   ├── gts_surroundocc     # (Optional) SurroundOcc annotations
│   │   ├── openocc_v2          # (Optional) OpenOcc annotations
│   │   ├── openocc_v2_ray_mask # (Optional) OpenOcc ray mask
│   │   └── nuScenes-Occupancy-v0.1 # (Optional) OpenOccupancy annotations

Finally, run the preprocessing scripts:

# 1. Extract semantic segmentation labels from LiDAR
python tools/nusc_process/extract_sem_point.py

# 2. Create formatted info files for the dataloader
PYTHONPATH=$(pwd):$PYTHONPATH python tools/create_data_bevdet.py

Alternatively, you can download the pre-processed segmentation labels, train.pkl and val.pkl files from our Hugging Face Hub, and organize their path as:

ALOcc/
├── data/
│   ├── lidar_seg
│   ├── nuscenes/
│   │   ├── train.pkl
│   │   ├── val.pkl
│   │   ...
..

Waymo

Download the Waymo Open Dataset from the official website.
Download the Occ3D Waymo annotations and pkl files from here.
Follow the official instructions to organize the files.

3. Pre-trained Models

For training, please download pre-trained image backbones from BEVDet GitHub, GeoMIM GitHub, or Hugging Face Hub. Place them in the ckpts/pretrain/ directory.

🎮 Usage

Training

Use the following script for distributed training.

# Syntax: bash tools/dist_train.sh [CONFIG_FILE] [WORK_DIR] [NUM_GPUS]
# Example: Train the ALOcc-3D model
bash tools/dist_train.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py work_dir/alocc_3d 8

Testing

Download our pre-trained models from Hugging Face and run the testing script.

# Evaluate semantic occupancy (mIoU) or occupancy flow
# Syntax: bash tools/dist_test.sh [CONFIG_FILE] [CHECKPOINT_PATH] [NUM_GPUS]
# Example: Evaluate the ALOcc-3D model
bash tools/dist_test.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py ckpts/alocc_3d_256x704_bevdet_preatrain.pth 8

# Evaluate semantic occupancy (RayIoU)
# Syntax: bash tools/dist_test_ray.sh [CONFIG_FILE] [CHECKPOINT_PATH] [NUM_GPUS]
# Example: Evaluate the ALOcc-3D model
bash tools/dist_test.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain_wo_mask.py ckpts/alocc_3d_256x704_bevdet_preatrain_wo_mask.pth 8

Note: When performing inference with temporal fusion, please use 1 or 8 GPUs. A sampler bug may cause duplicate sample counting with other GPU configurations.

Benchmarking

We provide convenient tools to benchmark model FPS (Frames Per Second) and FLOPs.

# Benchmark FPS
# Syntax: python tools/analysis_tools/benchmark.py [CONFIG_FILE]
# Example: Benchmark the ALOcc-3D model
python tools/analysis_tools/benchmark.py configs/alocc/alocc_3d_256x704_bevdet_preatrain.py

# Calculate FLOPs
# Syntax: python tools/analysis_tools/get_flops.py [CONFIG_FILE] --modality image --shape 256 704
# Example: Calculate FLOPs for the ALOcc-3D model
python tools/analysis_tools/get_flops.py configs/alocc/alocc_3d_256x704_bevdet_preatrain.py --modality image --shape 256 704

Visualization

First, ensure you have Mayavi installed. You can install it using pip:

pip install mayavi

Before you can visualize the output, you need to run the model on the test set and save the prediction results.

Use the dist_test.sh script with the --save flag. This will store the model's output in a directory.

# Example: Evaluate the ALOcc-3D model and save the predictions
bash tools/dist_test.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py ckpts/alocc_3d_256x704_bevdet_preatrain.pth 8 --save

The prediction results will be saved in the test/ directory, following a path structure like: test/[CONFIG_NAME]/[TIMESTAMP]/.

Once the predictions are saved, you can run the visualization script. This script requires the path to the prediction results and the path to the ground truth data.

# Syntax: python tools/visual.py [PREDICTION_PATH] [GROUND_TRUTH_PATH]
# Example:
python tools/visual.py work_dirs/alocc_3d_256x704_bevdet_preatrain/xxxxxxxx_xxxxxx/ your/path/to/ground_truth

Replace work_dirs/alocc_3d_256x704_bevdet_preatrain/xxxxxxxx_xxxxxx/ with the actual path to your saved prediction results from Step 2.
Replace your/path/to/ground_truth with the path to the corresponding ground truth dataset.

This will launch an interactive Mayavi window where you can inspect and compare the 3D occupancy predictions.

📊 Main Results

Here are the performance benchmarks of models implemented in OccStudio.

🏆 Performance on nuScenes (Models on Occ3D Are Trained with Camera Visible Mask)

Model	Annotation	Backbone	Input	Input Size	mIoU	mIoU_D	IoU	FPS	Memory	Checkpoint	Config
BEVDetOcc-SF	Occ3D	R-50	C	`256x704`	41.9	34.4	75.1	6.5	10717	🤗 HF	config
BEVDetOcc-GF	Occ3D	R-50	C	`256x704`	43.6	36.1	77.8	7.0	3017	🤗 HF	config
FB-Occ	Occ3D	R-50	C	`256x704`	39.8	34.2	69.9	10.3	4099	🤗 HF	config
FB-Occ-GF	Occ3D	R-50	C	`256x704`	42.1	36.4	73.3	10.3	2879	🤗 HF	config
ALOcc-2D-mini	Occ3D	R-50	C	`256x704`	41.4	35.4	70.0	30.5	1605	🤗 HF	config
ALOcc-2D	Occ3D	R-50	C	`256x704`	44.8	38.7	74.3	8.2	5553	🤗 HF	config
ALOcc-3D	Occ3D	R-50	C	`256x704`	45.5	39.3	75.3	6.0	10793	🤗 HF	config
ALOcc-3D	Occ3D	R-50	C+D	`256x704`	54.5	50.6	85.2	6.0	13003	🤗 HF	config
ALOcc-3D	Occ3D	Intern-T	C+D	`256x704`	55.6	52.4	85.1	5.8	13015	🤗 HF	config
ALOcc-3D	Occ3D	Swin-Base	C+D	`512x1408`	60.0	57.8	87.8	1.5	26867	🤗 HF	config
ALOcc-3D-GF	Occ3D	R-50	C	`256x704`	46.5	40.2	77.4	6.2	4347	🤗 HF	config
ALOcc-3D-GF	Occ3D	R-50	C+D	`256x704`	54.9	51.4	85.9	6.2	6561	🤗 HF	config
ALOcc-2D-GF	OpenOccupancy	R-50	C	`900x1600`	17.9	13.7	28.6	0.8	13857	🤗 HF	config
ALOcc-2D-GF	OpenOccupancy	R-50	C+D	`900x1600`	24.5	21.6	34.5	0.8	13891	🤗 HF	config
ALOcc-2D-mini*	SurroundOcc	R-50	C	`900x1600`	21.5	19.5	31.5	5.8	2869	🤗 HF	config
ALOcc-3D*	SurroundOcc	R-50	C	`900x1600`	24.0	21.7	34.7	1.7	11117	🤗 HF	config
ALOcc-3D-GF	SurroundOcc	R-50	C	`900x1600`	25.5	22.5	38.2	0.9	11857	🤗 HF	config

🏆 Performance on nuScenes (Trained w/o Camera Visible Mask)

Model	Annotation	Backbone	Input	Input Size	mIoU	RayIoU	RayIoU_{1m, 2m, 4m}	FPS	Memory	Checkpoint	Config
BEVDetOcc-SF	Occ3D	R-50	C	`256x704`	24.3	35.2	31.2, 35.9, 38.4	6.5	10717	🤗 HF	config
FB-Occ	Occ3D	R-50	C	`256x704`	31.1	39.0	33.0, 39.9, 44.0	10.3	4099	🤗 HF	config
SparseOcc	Occ3D	R-50	C	`256x704`	26.6	32.5	26.2, 33.2, 38.1	-	5967	🤗 HF	config
ALOcc-2D-mini	Occ3D	R-50	C	`256x704`	33.4	39.3	32.9, 40.1, 44.8	30.5	1605	🤗 HF	config
ALOcc-2D	Occ3D	R-50	C	`256x704`	37.4	43.0	37.1, 43.8, 48.2	8.2	5553	🤗 HF	config
ALOcc-3D	Occ3D	R-50	C	`256x704`	38.0	43.7	37.8, 44.7, 48.8	6.0	10793	🤗 HF	config
ALOcc-3D-GF	Occ3D	R-50	C	`256x704`	38.4	44.1	38.1, 45.1, 49.3	6.2	4347	🤗 HF	config

🏆 Performance on OpenOcc (Semantic Occupancy and Flow)

Method	Annotation	Backbone	Input	Input Size	Occ Score	mAVE	mAVE_TP	RayIoU	RayIoU_{1m, 2m, 4m}	FPS	Checkpoint	Config
ALOcc-Flow-2D	Occ3D	R-50	C	`256x704`	41.9	0.530	0.431	40.3	34.3, 41.0, 45.5	7.0	🤗 HF	config
ALOcc-Flow-3D	Occ3D	R-50	C	`256x704`	43.1	0.549	0.458	41.9	35.6, 42.9, 47.2	5.5	🤗 HF	config

🤝 Contribution

We welcome contributions from the community! If you find a bug, have a feature request, or want to contribute new models/datasets to OccStudio, please feel free to open an issue or submit a pull request. You can also contact Dubing Chen via email (dobbin.chen@gmail.com).

🙏 Acknowledgement

We gratefully acknowledge the foundational work of many excellent open-source projects, and we would like to extend our special thanks to:

📜 Citation

If you find OccStudio useful in your research, please consider citing our relevant papers:

@InProceedings{chen2025rethinking,
    author    = {Chen, Dubing and Zheng, Huan and Fang, Jin and Dong, Xingping and Li, Xianfei and Liao, Wenlong and He, Tao and Peng, Pai and Shen, Jianbing},
    title     = {Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {1505-1515}
}

@InProceedings{chen2025alocc,
    author    = {Chen, Dubing and Fang, Jin and Han, Wencheng and Cheng, Xinjing and Yin, Junbo and Xu, Chenzhong and Khan, Fahad Shahbaz and Shen, Jianbing},
    title     = {Alocc: adaptive lifting-based 3d semantic occupancy and cost volume-based flow prediction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
}

@InProceedings{chen2025semantic,
    author    = {Chen, Dubing and Zheng, Huan and Zhou, Yucheng and Li, Xianfei and Liao, Wenlong and He, Tao and Peng, Pai and Shen, Jianbing},
    title     = {Semantic Causality-Aware Vision-Based 3D Occupancy Prediction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
}