📢 Announcement: ALOcc is now integrated into OccStudio!

December 1, 2025 · View on GitHub

ALOcc has been merged into OccStudio.

This repository serves as the official archive for the original ICCV 2025 paper implementation. For the latest updates, bug fixes, and a more unified framework supporting multiple models, we highly recommend using OccStudio.

👉 Check out the new framework: https://github.com/cdb342/OccStudio

ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction

ALOcc is a state-of-the-art, vision-only framework for dense 3D scene understanding. It transforms multi-camera 2D images into rich, spatiotemporal 3D representations, jointly predicting semantic occupancy grids and per-voxel motion flow. Our purely convolutional design achieves top-tier performance while offering a spectrum of models that balance accuracy and real-time efficiency, making it ideal for autonomous systems.

🚀 Get Started

1. Installation

We recommend managing the environment with Conda.

# Clone this repository
git clone https://github.com/cdb342/ALOcc.git
cd ALOcc

# Create and activate the conda environment
conda create -n alocc python=3.8 -y
conda activate alocc

# Install PyTorch (example for CUDA 11.8, adjust if needed)
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

# Install MMCV (requires building C++ ops)
# Note: Using the stable 1.x branch for compatibility
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
git checkout 1.x
MMCV_WITH_OPS=1 pip install -e . -v
cd ..

# Install MMDetection and MMSegmentation
pip install mmdet==2.28.2 mmsegmentation==0.30.0

# Install the ALOcc framework in editable mode
pip install -v -e .

# Install remaining dependencies
pip install torchmetrics timm dcnv4 ninja spconv transformers IPython einops
pip install numpy==1.23.4 # Pin numpy version to avoid potential issues

2. Data Preparation

nuScenes Dataset

Download the full nuScenes dataset from the official website.
Download the primary Occ3D-nuScenes annotations from the project page.
(Optional) For extended experiments, download other community annotations:
- OpenOcc_v2.1 Annotations & Ray Mask
- SurroundOcc Annotations (unzip and rename folder to gts_surroundocc)
- OpenOccupancy-v0.1 Annotations

Please organize your data following this directory structure:

ALOcc/
├── data/
│   ├── nuscenes/
│   │   ├── maps/
│   │   ├── samples/
│   │   ├── sweeps/
│   │   ├── v1.0-test/
│   │   ├── v1.0-trainval/
│   │   ├── gts/                 # Main Occ3D annotations
│   │   ├── gts_surroundocc/     # (Optional) SurroundOcc annotations
│   │   ├── openocc_v2/          # (Optional) OpenOcc annotations
│   │   ├── openocc_v2_ray_mask/ # (Optional) OpenOcc ray mask
│   │   └── nuScenes-Occupancy-v0.1/ # (Optional) OpenOccupancy annotations
...

Finally, run the preprocessing scripts to prepare the data for training:

# 1. Extract semantic segmentation labels from LiDAR
python tools/nusc_process/extract_sem_point.py

# 2. Create formatted info files for the dataloader
PYTHONPATH=$(pwd):$PYTHONPATH python tools/create_data_bevdet.py

Alternatively, you can download the pre-processed segmentation labels, train.pkl and val.pkl files from our Hugging Face Hub, and organize their path as:

ALOcc/
├── data/
│   ├── lidar_seg
│   ├── nuscenes/
│   │   ├── train.pkl
│   │   ├── val.pkl
│   │   ...
...

3. Pre-trained Models

For training, please download pre-trained image backbones from BEVDet, GeoMIM, or our Hugging Face Hub. Place the checkpoint files in the ckpts/pretrain/ directory.

🎮 Train & Evaluate

Training

Use the provided script for distributed training on multiple GPUs.

# Syntax: bash tools/dist_train.sh [CONFIG_FILE] [WORK_DIR] [NUM_GPUS]

# Example: Train the ALOcc-3D model with 8 GPUs
bash tools/dist_train.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py work_dirs/alocc_3d 8

Testing

Download our official pre-trained models from the ALOcc Hugging Face Hub and place them in the ckpts/ directory.

# Evaluate semantic occupancy (mIoU) or occupancy flow
# Syntax: bash tools/dist_test.sh [CONFIG_FILE] [CHECKPOINT_PATH] [NUM_GPUS]

# Example: Evaluate the pre-trained ALOcc-3D model
bash tools/dist_test.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py ckpts/alocc_3d.pth 8

# Evaluate semantic occupancy (RayIoU metric)
# Syntax: bash tools/dist_test_ray.sh [CONFIG_FILE] [CHECKPOINT_PATH] [NUM_GPUS]

# Example: Evaluate ALOcc-3D with the RayIoU script
bash tools/dist_test_ray.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain_wo_mask.py ckpts/alocc_3d_wo_mask.pth 8

⚠️ Important Note: When running inference with temporal fusion enabled, please use exactly 1 or 8 GPUs. Using a different number of GPUs may lead to incorrect results due to a sampler bug causing duplicate sample processing.

Benchmarking

We provide convenient tools to benchmark model latency (FPS) and computational cost (FLOPs).

# Benchmark FPS (Frames Per Second)
# Syntax: python tools/analysis_tools/benchmark.py [CONFIG_FILE]
python tools/analysis_tools/benchmark.py configs/alocc/alocc_3d_256x704_bevdet_preatrain.py

# Calculate FLOPs
# Syntax: python tools/analysis_tools/get_flops.py [CONFIG_FILE] --shape [HEIGHT] [WIDTH]
python tools/analysis_tools/get_flops.py configs/alocc/alocc_3d_256x704_bevdet_preatrain.py --shape 256 704

Visualization

First, ensure you have Mayavi installed. You can install it using pip:

pip install mayavi

Before you can visualize the output, you need to run the model on the test set and save the prediction results.

Use the dist_test.sh script with the --save flag. This will store the model's output in a directory.

# Example: Evaluate the ALOcc-3D model and save the predictions
bash tools/dist_test.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py ckpts/alocc_3d_256x704_bevdet_preatrain.pth 8 --save

The prediction results will be saved in the test/ directory, following a path structure like: test/[CONFIG_NAME]/[TIMESTAMP]/.

Once the predictions are saved, you can run the visualization script. This script requires the path to the prediction results and the path to the ground truth data.

# Syntax: python tools/visual.py [PREDICTION_PATH] [GROUND_TRUTH_PATH]
# Example:
python tools/visual.py work_dirs/alocc_3d_256x704_bevdet_preatrain/xxxxxxxx_xxxxxx/ your/path/to/ground_truth

Replace work_dirs/alocc_3d_256x704_bevdet_preatrain/xxxxxxxx_xxxxxx/ with the actual path to your saved prediction results from Step 2.
Replace your/path/to/ground_truth with the path to the corresponding ground truth dataset.

This will launch an interactive Mayavi window where you can inspect and compare the 3D occupancy predictions.

📊 Results & Model Zoo

🏆 Performance on Occ3D-nuScenes (trained with camera visible mask)

Model	Backbone	Input Size	mIoU_D^m	mIoU^m	FPS	Config	Weights
ALOcc-2D-mini	R-50	256 × 704	35.4	41.4	30.5	config	HF Hub
ALOcc-2D	R-50	256 × 704	38.7	44.8	8.2	config	HF Hub
ALOcc-3D	R-50	256 × 704	39.3	45.5	6.0	config	HF Hub

🏆 Performance on Occ3D-nuScenes (trained w/o camera visible mask)

Model	Backbone	Input Size	mIoU	RayIoU	RayIoU_{1m, 2m, 4m}	FPS	Config	Weights
ALOcc-2D-mini	R-50	256 × 704	33.4	39.3	32.9, 40.1, 44.8	30.5	config	HF Hub
ALOcc-2D	R-50	256 × 704	37.4	43.0	37.1, 43.8, 48.2	8.2	config	HF Hub
ALOcc-3D	R-50	256 × 704	38.0	43.7	37.8, 44.7, 48.8	6.0	config	HF Hub

🏆 Performance on OpenOcc (Semantic Occupancy and Flow)

Method	Backbone	Input Size	Occ Score	mAVE	mAVE_TP	RayIoU	RayIoU_{1m, 2m, 4m}	FPS	Config	Weights
ALOcc-Flow-2D	R-50	256 × 704	41.9	0.530	0.431	40.3	34.3, 41.0, 45.5	7.0	config	HF Hub
ALOcc-Flow-3D	R-50	256 × 704	43.1	0.549	0.458	41.9	35.6, 42.9, 47.2	5.5	config	HF Hub

For more detailed results and ablations, please refer to our paper.

🙏 Acknowledgement

This project is built upon the excellent foundation of several open-source projects. We extend our sincere gratitude to their authors and contributors.

📜 Citation

If you find ALOcc useful for your research or applications, please consider citing our paper:

@InProceedings{chen2025alocc,
    author    = {Chen, Dubing and Fang, Jin and Han, Wencheng and Cheng, Xinjing and Yin, Junbo and Xu, Chenzhong and Khan, Fahad Shahbaz and Shen, Jianbing},
    title     = {Alocc: adaptive lifting-based 3d semantic occupancy and cost volume-based flow prediction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
}

@article{chen2024adaocc,
  title={AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction},
  author={Chen, Dubing and Han, Wencheng and Fang, Jin and Shen, Jianbing},
  journal={arXiv preprint arXiv:2407.01436},
  year={2024}
}

🔼 Back to Top