Installation instructions
September 3, 2025 ยท View on GitHub
a. Create a conda virtual environment and activate it.
conda create -n discene python=3.8.19
conda activate discene
b. Install PyTorch and torchvision following the official instructions.
pip install torch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 --index-url https://download.pytorch.org/whl/cu113
c. Install mmcv-full.
pip install openmim
mim install mmcv-full==1.6.0
d. Install other dependencies.
mim install mmdet==2.28.2
mim install mmsegmentation==0.30.0
mim install mmdet3d==1.0.0rc6
pip install -r requirements.txt
e. Compile CUDA extensions.
cd models/csrc
python setup.py build_ext --inplace
# For InternImage
cd models/backbones/ops_dcnv3
python setup.py build install
f. Install DepthAnythingV2.
git clone https://github.com/DepthAnything/Depth-Anything-V2.git
cd Depth-Anything-V2
pip install -r requirements.txt
Go to Depth-Anything-V2/metric_depth/depth_anything_v2/dpt.py and change the function infer_image in the class DepthAnythingV2 as follows:
def infer_image(self, image, h_, w_, input_size=518):
depth = self.forward(image)
depth = F.interpolate(depth[:, None], (h_, w_), mode="bilinear", align_corners=True)[0, 0]
return depth
And download the fine-tuned model from HERE.
g. Install Metric3Dv2 for training.
git clone https://github.com/YvanYin/Metric3D.git
cd Metric3D
pip install -r requirements_v2.txt
h. modify paths
Finally, remember to modify these lines in student head, teacher head and distill head to your own paths if used.
# NOTE: modify below to /your/path/to/DiScene
sys.path.append('/path/to/DiScene')
sys.path.append('/path/to/DiScene/depth_anything/metric_depth')
sys.path.append('/path/to/DiScene/Depth-Anything-V2/metric_depth')
sys.path.append('/path/to/DiScene/Metric3D')
DepthAnythingV1:
if pretrained_depth_model == 'DepthAnythingV1':
# NOTE: modify below to /your/path/to/DiScene
overrite = {"pretrained_resource": "local::/path/to/DiScene/checkpoints/depth_anything_metric_depth_indoor.pt"}
DepthAnythingV2:
self.depth_model = DepthAnythingV2(**{**model_configs['vitb'], 'max_depth':20})
# NOTE: modify below to /your/path/to/DiScene
checkpoint = torch.load('/path/to/DiScene/checkpoints/finetune_scannet_depthanythingv2.pth', map_location='cpu')['model']
Metric3Dv2-Small:
# NOTE: modify below to /your/path/to/DiScene
cfg = Config.fromfile('/path/to/DiScene/Metric3D/mono/configs/HourglassDecoder/vit.raft5.small.py')
cfg.load_from = '/path/to/.cache/torch/hub/checkpoints/metric_depth_vit_small_800k.pth'
Metric3Dv2-Giant:
# NOTE: modify below to /your/path/to/DiScene
cfg = Config.fromfile('/path/to/DiScene/Metric3D/mono/configs/HourglassDecoder/vit.raft5.giant2.py')
cfg.load_from = '/path/to/.cache/torch/hub/checkpoints/metric_depth_vit_giant2_800k.pth'