Data and Weights Preparation

April 30, 2026 · View on GitHub

This guide walks you through preparing datasets and pretrained weights required for HERMES.


1. Prepare nuScenes Dataset


2. Prepare Text Annotations


3. Prepare Pretrained Weights

a) Download InternVL-2 Pretraining

cd projects/mmdet3d_plugin/models/internvl_chat
mkdir pretrained
cd pretrained
huggingface-cli download --resume-download --local-dir-use-symlinks False OpenGVLab/InternVL2-2B --local-dir InternVL2-2B

b) Download Project Checkpoints


Directory Structure

Your project directory should look like this after setup:

HERMES
├── data
│   ├── nuscenes
│   │   ├── mask_cam_img.jpg
│   │   ├── nuscenes_advanced_12Hz_infos_train.pkl
│   │   ├── nuscenes_masked_only_infos_temporal_train.pkl
│   │   ├── nuscenes_infos_temporal_train.pkl
│   │   └── nuscenes_infos_temporal_val.pkl
│   ├── omnidrive_nusc
│   └── NuInteractCaption
├── ckpt
│   ├── open_clip_convnext_base_w-320_laion_aesthetic-s13B-b82k.bin
│   ├── hermes_stage1.pth
│   ├── hermes_stage2_1.pth
│   ├── hermes_stage2_2.pth
│   └── hermes_final.pth

Please refer to Usage.md for instructions on HERMES training and evaluation.