Data and Weights Preparation
April 30, 2026 · View on GitHub
This guide walks you through preparing datasets and pretrained weights required for HERMES.
1. Prepare nuScenes Dataset
- First, create a
datadirectory:mkdir -p data - Place or symlink your nuScenes dataset to the
datadirectory:ln -s /path/to/your/nuscenes data/nuscenes - Download the following
.pklfiles andmask_cam_img.jpgtodata/nuscenes/:- nuscenes_advanced_12Hz_infos_train.pkl
- nuscenes_masked_only_infos_temporal_train.pkl
- nuscenes_infos_temporal_train.pkl
- nuscenes_infos_temporal_val.pkl
- mask_cam_img.jpg (required for Stage2-1 data augmentation)
- We also provide a Baidu Netdisk download link for your convenience.
2. Prepare Text Annotations
- Download and unzip the following files into the
datadirectory: - Example:
unzip omnidrive_nusc.zip -d data/ unzip NuInteractCaption.zip -d data/
3. Prepare Pretrained Weights
a) Download InternVL-2 Pretraining
cd projects/mmdet3d_plugin/models/internvl_chat
mkdir pretrained
cd pretrained
huggingface-cli download --resume-download --local-dir-use-symlinks False OpenGVLab/InternVL2-2B --local-dir InternVL2-2B
b) Download Project Checkpoints
- Create a
ckptdirectory in your project root, and download the following model weights into it:
Directory Structure
Your project directory should look like this after setup:
HERMES
├── data
│ ├── nuscenes
│ │ ├── mask_cam_img.jpg
│ │ ├── nuscenes_advanced_12Hz_infos_train.pkl
│ │ ├── nuscenes_masked_only_infos_temporal_train.pkl
│ │ ├── nuscenes_infos_temporal_train.pkl
│ │ └── nuscenes_infos_temporal_val.pkl
│ ├── omnidrive_nusc
│ └── NuInteractCaption
├── ckpt
│ ├── open_clip_convnext_base_w-320_laion_aesthetic-s13B-b82k.bin
│ ├── hermes_stage1.pth
│ ├── hermes_stage2_1.pth
│ ├── hermes_stage2_2.pth
│ └── hermes_final.pth
Please refer to Usage.md for instructions on HERMES training and evaluation.