HENet Series

May 23, 2026 · View on GitHub

This is the official implementation of HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras (ECCV 2024, Paper) and HENet++: Hybrid Encoding and Multi-task Learning for 3D Perception and End-to-end Autonomous Driving (Paper).

HENet Introduction

HENet is an end-to-end multi-task 3D perception framework. It reduces training costs through hybrid image encoding and mitigates multi-task conflicts through independent BEV feature encoding.

Demo

Visualization results of HENet and baselines on end-to-end multi-tasking. The proposed HENet estimates occluded objects better through long-term information and have more accurate predictions through high-resolution information.

HENet++ Introduction

HENet++ extends HENet to end-to-end planning. It simultaneously extracts both dense and sparse features, providing more suitable representations for different tasks, reducing cumulative errors, and delivering more comprehensive information to the planning module.

Demo

Through improvements in model architecture and pre-training based on model merging, HENet++ achieves superior multi-task and single-task performance.

Main Results

This repository provides a sample model for hybrid encoding and multi-task decoding:

mAPNDSmIoUconfigmodel
HENet49.859.858.0HENetGoogle Drive

Additionally, this repository provides a student distilled model for HENet++ end-to-end autonomous driving. This model was distilled using a high-precision HENet++ model as the teacher and achieves a comparable end-to-end collision rate. It serves as a baseline and has been applied in the KnowVal and DrivingAgent frameworks.

UniAD L2UniAD ColVAD L2VAD Colconfigmodel
HENet++1.290.12%0.550.04%HENet++Google Drive

Getting Started

KnowVal, HENet, R4Det, RCBEVDet, and TEOcc were developed under the same framework. You can easily merge these repositories into one. If you have prepared the environment for any of them, you do not need to create a new environment.

Environment

The code is tested in the following two environments:

cuda     12.1
pytorch  2.0.1+cu118 
GPU      A800, A40
(Need to manually comment out the cuda version check of pytorch)
(For a detailed package list, please refer to envs_list_cu121.txt)

cuda     11.3
pytorch  1.12.1+cu113
GPU      RTX8000, RTX3090, V100, P40
(For a detailed package list, please refer to envs_list_cu113.txt)

The most recommended installation steps are:

  1. Create a Python environment. Install PyTorch corresponding to your machine's CUDA version;

  2. Install mmcv corresponding to your PyTorch and CUDA version;

  3. Install other dependencies of mmdet and install mmdet;

  4. Install other dependencies of this project (Please change the spconv version in the requirements.txt to the CUDA version you are using) and setup this project;

python setup.py develop
  1. Compile some operators manually.
cd mmdet3d/ops/csrc
python setup.py build_ext --inplace
cd ../deformattn
python setup.py build install
  1. Install other dependencies of detectron2 and install detectron2;
cd detr2
python setup.py develop

Data Preparation

Please download nuScenes-v1.0-trainval and nuScenes-map-expansion-v1.3 at nuScenes.org and CVPR23-Occupancy/gts.tar.gz at CVPR2023-3D-Occupancy-Prediction.

If your folder structure is different from the following, you may need to change the corresponding paths in config files.

├── mmdet3d
├── tools
├── configs
├── data
│   ├── nuscenes
│   │   ├── maps
│   │   │   ├── basemap
│   │   │   ├── expansion
│   │   │   ├── prediction
│   │   │   ├── *.png
│   │   ├── samples
│   │   ├── sweeps
│   │   ├── v1.0-test
|   |   ├── v1.0-trainval

We recommend that you download the processed data index file directly via this Google Drive link.

Prepare nuScenes data by running:

python tools/create_data_nuscenes_C.py

Training

./tools/dist_train.sh $config_path $gpus

Testing

Testing on validation set:

./tools/dist_test.sh $config_path $checkpoint_path $gpus --eval bbox

Testing on test set:

./tools/dist_test.sh $config_path $checkpoint_path $gpus --format-only --eval-options 'jsonfile_prefix=work_dirs'
mv work_dirs/pts_bbox/results_nusc.json work_dirs/pts_bbox/{$name}.json

If you have any other questions, please refer to mmdet3d docs.

Acknowledgements

We sincerely thank these excellent open-source projects:

Citation

If this work is helpful for your research, please consider citing our paper HENet++ and HENet.

@article{xia2025henet++,
  title={HENet++: Hybrid Encoding and Multi-task Learning for 3D Perception and End-to-end Autonomous Driving},
  author={Xia, Zhongyu and Lin, Zhiwei and Wang, Yongtao and Yang, Ming-Hsuan},
  journal={arXiv preprint arXiv:2511.07106},
  year={2025}
}

@inproceedings{xia2024henet,
  title={HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras},
  author={Xia, Zhongyu and Lin, Zhiwei and Wang, Xinhao and Wang, Yongtao and Xing, Yun and Qi, Shengxiang and Dong, Nan and Yang, Ming-Hsuan},
  booktitle={Proceedings of the European Conference on Computer Vision},
  year={2024}
}

License

The project is only free for academic research purposes but needs authorization for commerce. For commerce permission, please contact wyt@pku.edu.cn.