Training and Evaluation
November 19, 2024 · View on GitHub
Here an overview of the steps involved in training the policy is provided.
Data Generation
For the data generation, please follow the instruction given in here.
Cost-Map Building
Cost-Map building is an essential step in guiding optimization and representing the environment. Cost-Maps can be built from either depth and semantic images (i.e., data generated in simulation) or (semantically annotated) point clouds (i.e., real-world data).
If depth and semantic images of the simulation are available, then first 3D reconstruction has to be performed, following the steps described in Point 1. If the (semantically annotated) pointclouds are generated, then the cost-map can be build directly from the pointcloud, following the steps described in Point 2.
-
Simulation: Depth Reconstruction
The reconstruction is executed in two steps, controlled by the config parameter defined in ReconstructionCfg Class:
- Generate colored point cloud by warping each semantic images onto the depth image (account for cameras in different frames)
- Projection into 3D space and voxelization
The process expects following datastructure:
env_name ├── camera_extrinsic.txt # format: x y z qx qy qz qw ├── intrinsics.txt # expects ROS CameraInfo format --> P-Matrix ├── depth # either png and/ or npy, if both npy is used | ├── xxxx.png # images saved with 4 digits, e.g. 0000.png | ├── xxxx.npy # arrays saved with 4 digits, e.g. 0000.npy ├── semantics # optional ├── xxxx.png # images saved with 4 digits, e.g. 0000.pngIn the case that the semantic and depth images have an offset in their position (as typical on some robotic platforms), define a
sem_sufficanddepth_suffixinReconstructionCfgto differentiate between the two with the following structure:env_name ├── camera_extrinsic{depth_suffix}.txt # format: x y z qx qy qz qw ├── camera_extrinsic{sem_suffix}.txt # format: x y z qx qy qz qw ├── intrinsics.txt # P-Matrix for intrinsics of depth and semantic images (depth first) ├── depth # either png and/ or npy, if both npy is used | ├── xxxx{depth_suffix}.png # images saved with 4 digits, e.g. 0000.png | ├── xxxx{depth_suffix}.npy # arrays saved with 4 digits, e.g. 0000.npy ├── semantics # optional ├── xxxx{sem_suffix}.png # images saved with 4 digits, e.g. 0000.png -
Real-World: Open3D-Slam
To create an annotated 3D Point-Cloud from real-world data, i.e., LiDAR scans and semantics generated from the RGB camera stream, use tools such as Open3D Slam.
-
Cost-Building
Either a geometric or semantic cost map can be generated running the following command:
python viplanner/cost_builder.pyWith configs set in CostMapConfig. We provided some standard values, however, before running the script, please adjust the config to your needs and local environment paths.
Cost-Maps will be saved within the environment folder, with the following structure:
maps ├── cloud │ ├── cost_{map_name}.txt # 3d visualization of cost map ├── data │ ├── cost_{map_name}_map.txt # cost map │ ├── cost_{map_name}_ground.txt # ground height estimated from pointcloud └── params ├── config_cost_{map_name}.yaml # CostMapConfig used to generate cost map
Training
Configurations of the training given in TrainCfg. Training can be started using the example training script train.py.
python viplanner/train.py
For the training a directory structure as follows is expected/ will be created:
file_path # TrainCfg.file_path or env variable EXPERIMENT_DIRECTORY
├── data
│ ├── env_name # structure as defined in Cost-Map Building
├── models
│ ├── model_name
│ | ├── model.pth # trained model
│ | ├── model.yaml # TrainCfg used to train model
├── logs
│ ├── model_name
It is important that the model name is unique, otherwise the previous training will be overwritten.
Also always copy the model.pt and model.yaml because the configs are necessary to reload the model.