RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions
February 4, 2025 · View on GitHub
Official implementation of the paper "RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions"
Accepted by NeurIPS 2024
Paper Link: https://arxiv.org/abs/2410.02924
Authors: Ziyao Zeng, Yangchao Wu, Hyoungseob Park, Daniel Wang, Fengyu Yang, Stefano Soatto, Dong Lao, Byung-Woo Hong, Alex Wong
Overview

Poster

Setup Environment
Create Virtual Environment:
cd RSA
virtualenv -p /usr/bin/python3.8 ~/venvs/rsa
vim ~/.bash_profile
Insert the following line to vim:
alias rsa="export CUDA_HOME=/usr/local/cuda-11.1 && source ~/venvs/rsa/bin/activate"
Then activate it, install all packages:
source ~/.bash_profile
rsa
pip install -r requirements.txt
Run training for Depth Anything on NYU-Depth-V2 / KITTI / VOID
Specify GPU Number in sh/train_3d_da.sh, then run by:
sh train_3d_da.sh
Before running new experiments, remember to change the model_name in train_3d_da.sh and config/arguments_train_3d_da.txt to be the same.
Run training for MiDaS on NYU-Depth-V2 / KITTI / VOID
Specify GPU Number in sh/train_3d_midas.sh, then run by:
sh train_3d_midas.sh
Before running new experiments, remember to change the model_name in train_3d_midas.sh and config/arguments_train_3d_midas.txt to be the same.
Run training for DPT on NYU-Depth-V2 / KITTI / VOID
Specify GPU Number in sh/train_3d_da.sh, then run by:
sh train_3d_da.sh
Before running new experiments, remember to change the model_name in train_3d_da.sh and config/arguments_train_3d_da.txt to be the same.
Setup Datasets
Prepare Datasets
Download NYU-Depth-v2, KITTI, and VOID. Or you can refer to KBNet to prepare datasets through the provided download scripts.
For VOID. Please use VOID-1500. The number indicates the number of point for depth completion, and it doesn't affect this task. Then change the data_path and gt_path in train_3d.py and DATA_PATH_VOID in eval_void/data_utils.py
The structure of dataset should look like this:
├── nyu_depth_v2
│ ├── official_splits # path to nyu-depth-v2 data_path_eval and gt_path_eval
│ │ ├── test
│ │ │ ├── bathroom
│ │ │ │ ├── rgb_00045.jpg
│ │ │ │ ├── rgb_00046.jpg
│ │ │ │ ├── ...
│ │ ├── train # We don't use this part
│ │ │ ├── ...
│ ├── sync # path to nyu-depth-v2 data_path and gt_path
│ │ ├── basement_0001a
│ │ │ ├── rgb_00000.jpg
│ │ │ ├── rgb_00001.jpg
│ │ │ ├── ...
└── ...
├── kitti_raw_data # path to kitti data_path and data_path_eval
│ ├── 2011_09_26 # name of dataset
│ │ ├── 2011_09_26_drive_0001_sync
│ │ │ ├── ...
└── ...
├── kitti_ground_truth # path to kitti gt_path and gt_path_eval
│ ├── 2011_09_26_drive_0001_sync
│ │ ├── ...
└── ...
├── XXX # DATA_PATH_VOID in eval_void/data_utils.py
│ ├── void_1500 # path to void dataset
│ │ ├── data
│ │ │ ├── birthplace_of_internet
│ │ │ │ ├── ...
│ └── ...
Evaluate Checkpoints
Checkpoints are put under ./checkpoints. To run evaluation only, specify args.load_ckpt_path and args.eval_before_train to conduct evaluation.
Acknowledgements
We would like to acknowledge the use of code snippets from various open-source libraries and contributions from the online coding community, which have been invaluable in the development of this project. Specifically, we would like to thank the authors and maintainers of the following resources: