User Training
March 25, 2024 · View on GitHub
We provide training illustrations on Unreal4kStereo dataset in this document. Users can adopt to custome datasets based on this Unreal4kStereo version. We provide all configs for Depth-Anything and ZoeDepth training.
Dataset Preparation:
Download the dataset from https://github.com/fabiotosi92/SMD-Nets.
Preprocess the dataset following the instruction (convert images to raw format).
Copy the split files in ./splits/u4k and organize the folder structure as:
monocular-depth-estimation-toolbox
├── estimator
├── docs
├── ...
├── data (it's included in `.gitignore`)
│ ├── u4k (recommand ln -s)
│ │ ├── 00000
│ │ │ ├── Disp0
│ │ │ │ ├── 00000.npy
│ │ │ │ ├── 00001.npy
│ │ │ │ ├── ...
│ │ │ ├── Extrinsics0
│ │ │ ├── Extrinsics1
│ │ │ ├── Image0
│ │ │ │ ├── 00000.raw (Note it's important to convert png to raw to speed up training)
│ │ │ │ ├── 00001.raw
│ │ │ │ ├── ...
│ │ ├── 00001
│ │ │ ├── Disp0
│ │ │ ├── Extrinsics0
│ │ │ ├── Extrinsics1
│ │ │ ├── Image0
| | ├── ...
| | ├── 00008
| | ├── splits
│ │ │ ├── train.txt
│ │ │ ├── val.txt
│ │ │ ├── test.txt
│ │ │ ├── test_out.txt
Pre-trained Model Preparation:
Before trainig, please download pre-trained metric depth estimators from https://huggingface.co/zhyever/PatchFusion/tree/main. We provide pre-trained checkpoints for Depth-Anything and ZoeDepth.
| Model Name | Config Path |
|---|---|
| Depth-Anything-vitl | https://huggingface.co/zhyever/PatchFusion/blob/main/DepthAnything_vitl.pt |
| Depth-Anything-vitb | https://huggingface.co/zhyever/PatchFusion/blob/main/DepthAnything_vitb.pt |
| Depth-Anything-vits | https://huggingface.co/zhyever/PatchFusion/blob/main/DepthAnything_vits.pt |
| ZoeDepth-N | https://huggingface.co/zhyever/PatchFusion/blob/main/patchfusion_u4k.pt |
Note that these checkpoints are pre-trained using the offical implementation, and as a result, their file names end with .pt.
Put them to one specific folder, for example, ./work_dir/DepthAnything_vitl.pt. (Note: ./work_dir is included in .gitignore)
Model Training:
This repo follows the design of Monocular-Depth-Estimation-Toolbox, but it's more flexible in training and inference. The overall training script follows this line of command:
bash tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
Arguments Explanation:
${CONFIG_FILE}: Select the configuration file for training${GPU_NUM}: Specify the number of GPU used for training (We use 4 as default)[optional arguments]: You can specify more arguments. We present some important arguments here--log-name: experiment name shown in wandb website--work-dir:work-dir + log-nameindicates the path to save logs and checkpoints--tag: tags shown in wandb website--debug: if set, omit wandb log
You will see examples in the following sections.
Training PatchFusion includes three steps. Here, we use take DepthAnything vitl as an example.
Coarse Model Training
First, check the config file: ./configs/patchfusion_depthanything/depthanything_vitl_coarse_pretrain_u4k. Modify the config item zoe_depth_config.pretrained_resource to the checkpoint path (the default path is local::./work_dir/DepthAnything_vitl.pt). The prefix local:: is necessary because we based on the offical implementation.
Then, run:
bash ./tools/dist_train.sh configs/patchfusion_depthanything/depthanything_vitl_coarse_pretrain_u4k.py 4 --work-dir ./work_dir/depthanything_vitl_u4k --log-name coarse_pretrain --tag coarse,da,vitl
As for this command, we will use the config depthanything_vitl_coarse_pretrain_u4k.py, 4 gpus to train the model, and save the logs and checkpoints to ./work_dir/depthanything_vitl_u4k/coarse_pretrain. During training, you can check logs with experiment name coarse_pretrain on wandb.
Fine Model Training
Again, check the config file: ./configs/patchfusion_depthanything/depthanything_vitl_fine_pretrain_u4k.py. Modify the config item zoe_depth_config.pretrained_resource to the checkpoint path (the default path is local::./work_dir/DepthAnything_vitl.pt).
Then, run:
bash ./tools/dist_train.sh configs/patchfusion_depthanything/depthanything_vitl_fine_pretrain_u4k.py 4 --work-dir ./work_dir/depthanything_vitl_u4k --log-name fine_pretrain --tag fine,da,vitl
Fusion Model Training
Finally, we can train the fusion model. Check the config file: ./configs/patchfusion_depthanything/depthanything_vitl_patchfusion_u4k.py. Now, you need to modify the config item model.config.pretrain_model to the checkpoint paths of both the pre-trained coarse and fine models. (the default path is ['./work_dir/depthanything_vitl_u4k/coarse_pretrain/checkpoint_24.pth', './work_dir/depthanything_vitl_u4k/fine_pretrain/checkpoint_24.pth'])
Finally, run:
bash ./tools/dist_train.sh configs/patchfusion_depthanything/depthanything_vitl_patchfusion_u4k.py 4 --work-dir ./work_dir/depthanything_vitl_u4k --log-name patchfusion --tag patchfusion,da,vitl
Models & Configs
| Model Name | Coarse Config | Fine Config | PatchFusion Config |
|---|---|---|---|
| Depth-Anything-vitl | ./configs/patchfusion_depthanything/depthanything_vitl_coarse_pretrain_u4k.py | ./configs/patchfusion_depthanything/depthanything_vitl_fine_pretrain_u4k.py | ./configs/patchfusion_depthanything/depthanything_vitl_patchfusion_u4k.py |
| Depth-Anything-vitb | ./configs/patchfusion_depthanything/depthanything_vitb_coarse_pretrain_u4k.py | ./configs/patchfusion_depthanything/depthanything_vitb_fine_pretrain_u4k.py | ./configs/patchfusion_depthanything/depthanything_vitb_patchfusion_u4k.py |
| Depth-Anything-vits | ./configs/patchfusion_depthanything/depthanything_vits_coarse_pretrain_u4k.py | ./configs/patchfusion_depthanything/depthanything_vits_fine_pretrain_u4k.py | ./configs/patchfusion_depthanything/depthanything_vits_patchfusion_u4k.py |
| ZoeDepth-N | ./configs/patchfusion_zoedepth/zoedepth_coarse_pretrain_u4k.py | ./configs/patchfusion_zoedepth/zoedepth_fine_pretrain_u4k.py | ./configs/patchfusion_zoedepth/zoedepth_patchfusion_u4k.py |
Model Validation:
Running Time Validation
During training, the validation is processed intermittently. In the config file, you can change the related settings. For example,
train_cfg=dict(max_epochs=16, val_interval=2, save_checkpoint_interval=16, log_interval=100, train_log_img_interval=500, val_log_img_interval=50, val_type='epoch_base', eval_start=0)
By changing val_interval=4, you can validate every 4 epochs.
Offline Validation
Run:
bash ./tools/dist_test.sh configs/patchfusion_depthanything/depthanything_vitl_patchfusion_u4k.py 4 --ckp-path ./work_dir/depthanything_vitl_u4k/patchfusion/checkpoint_16.pth --cai-mode m1
Check more details of arguments at Inference with Multiple GPUs and Running.