๐ Visual Geometry Benchmark
December 9, 2025 ยท View on GitHub
This document provides comprehensive instructions for running benchmark evaluation on Depth Anything 3.
โจ Highlights
- ๐๏ธ Diverse and Challenging Datasets: 5 datasets (ETH3D, 7Scenes, ScanNet++, HiRoom, DTU) covering from objects to indoor and outdoor scenes. Part of datasets are recalibrated for high accuracy (see ScanNet++ details). All preprocessed datasets are uploaded to depth-anything/DA3-BENCH.
- ๐ง Robust Evaluation Pipeline: Standardized pipeline featuring RANSAC-based pose alignment for better coordinate system alignment, TSDF fusion for directly reflecting depth 3D consistency.
- ๐ Standardized Metrics: Performance measured using established metrics: AUC for pose accuracy, F1-score and Chamfer Distance for reconstruction.
๐ Table of Contents
- ๐ Quick Start
- ๐ฅ Dataset Download
- โ๏ธ Evaluation Pipeline
- ๐ง Configuration
- ๐ Metrics
- ๐๏ธ Dataset Details
- ๐ป Command Reference
- ๐ Troubleshooting
๐ Quick Start
1. Download Benchmark Data
๐ก Note: Install HuggingFace CLI first:
pip install -U huggingface_hub[cli]๐ Mirror: If download is slow, try:
export HF_ENDPOINT=https://hf-mirror.com
cd da3_release
# Create directory and download from HuggingFace
mkdir -p workspace/benchmark_dataset
hf download depth-anything/DA3-BENCH \
--local-dir workspace/benchmark_dataset \
--repo-type dataset
# Extract all datasets
cd workspace/benchmark_dataset
for f in *.zip; do unzip -q "$f"; done
2. Run Evaluation
# Set model (default: depth-anything/DA3-GIANT)
MODEL=depth-anything/DA3-GIANT
# Full evaluation (all datasets, all modes)
python -m depth_anything_3.bench.evaluator model.path=$MODEL
# View results
python -m depth_anything_3.bench.evaluator eval.print_only=true
๐ฅ Dataset Download
All benchmark datasets are hosted on HuggingFace: depth-anything/DA3-BENCH
| Dataset | File | Size | Description |
|---|---|---|---|
| ETH3D | eth3d.zip | ~14.1 GB | High-resolution multi-view stereo (indoor/outdoor) |
| ScanNet++ | scannetpp.zip | ~10.1 GB | High-quality RGB-D indoor scenes |
| DTU-49 | dtu.zip | ~8.3 GB | Multi-view stereo benchmark (22 scenes ร 49 views) |
| 7Scenes | 7scenes.zip | ~3.3 GB | RGB-D indoor localization |
| DTU-64 | dtu64.zip | ~1.7 GB | DTU subset for pose evaluation (13 scenes ร 64 views) |
| HiRoom | hiroom.zip | ~0.7 GB | High-resolution indoor rooms |
Download Options
Option 1: Download All (Recommended)
hf download depth-anything/DA3-BENCH \
--local-dir workspace/benchmark_dataset \
--repo-type dataset
Option 2: Download Specific Dataset
# Download only HiRoom
hf download depth-anything/DA3-BENCH hiroom.zip \
--local-dir workspace/benchmark_dataset \
--repo-type dataset
Option 3: Manual Download
Visit https://huggingface.co/datasets/depth-anything/DA3-BENCH and download the zip files manually.
Extract Datasets
cd workspace/benchmark_dataset
# Extract all
for f in *.zip; do unzip -q "$f"; done
# Or extract specific dataset
unzip hiroom.zip
Expected Directory Structure
After extraction, your directory should look like:
workspace/benchmark_dataset/
โโโ eth3d/
โ โโโ courtyard/
โ โโโ electro/
โ โโโ ...
โโโ 7scenes/
โ โโโ 7Scenes/
โ โโโ chess/
โ โโโ ...
โโโ scannetpp/
โ โโโ 09c1414f1b/
โ โโโ ...
โโโ hiroom/
โ โโโ data/
โ โโโ fused_pcd/
โ โโโ selected_scene_list_val.txt
โโโ dtu/
โ โโโ Rectified/
โ โโโ Cameras/
โ โโโ Points/
โ โโโ SampleSet/
โ โโโ depth_raw/
โโโ dtu64/
โโโ Cameras/
โโโ scan105/
โโโ ...
โ๏ธ Evaluation Pipeline
Evaluation Modes
| Mode | Description | Metrics |
|---|---|---|
pose | Camera pose estimation | AUC@3ยฐ, AUC@30ยฐ |
recon_unposed | 3D reconstruction with predicted poses | F-score, Overall |
recon_posed | 3D reconstruction with GT poses | F-score, Overall |
Basic Usage
cd da3_release
MODEL=depth-anything/DA3-GIANT
# Full evaluation (inference + evaluation + print results)
python -m depth_anything_3.bench.evaluator model.path=$MODEL
# Skip inference, only evaluate existing predictions
python -m depth_anything_3.bench.evaluator eval.eval_only=true
# Only print saved metrics
python -m depth_anything_3.bench.evaluator eval.print_only=true
Selective Evaluation
# Evaluate specific datasets
python -m depth_anything_3.bench.evaluator model.path=$MODEL eval.datasets=[hiroom]
# Evaluate specific modes
python -m depth_anything_3.bench.evaluator model.path=$MODEL eval.modes=[pose,recon_unposed]
# Combine dataset and mode selection
python -m depth_anything_3.bench.evaluator model.path=$MODEL \
eval.datasets=[hiroom] \
eval.modes=[pose]
๐ฅ๏ธ Multi-GPU Inference
The evaluator automatically distributes inference across available GPUs:
# Use 4 GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m depth_anything_3.bench.evaluator model.path=$MODEL
# Use all available GPUs (default)
python -m depth_anything_3.bench.evaluator model.path=$MODEL
# Single GPU
CUDA_VISIBLE_DEVICES=0 python -m depth_anything_3.bench.evaluator model.path=$MODEL
๐ง Configuration
Config File
Default config: src/depth_anything_3/bench/configs/eval_bench.yaml
# Model path
model:
path: depth-anything/DA3-GIANT
# Workspace directory
workspace:
work_dir: ./workspace/evaluation
# Evaluation settings
eval:
datasets: [eth3d, 7scenes, scannetpp, hiroom, dtu, dtu64]
modes: [pose, recon_unposed, recon_posed]
max_frames: 100 # Max frames per scene (-1 = no limit)
scenes: null # Specific scenes (null = all)
# Inference settings
inference:
num_fusion_workers: 4
debug: false
Output Structure
workspace/evaluation/
โโโ model_results/ # Inference outputs
โ โโโ eth3d/
โ โ โโโ {scene}/
โ โ โโโ unposed/ # Predictions for recon_unposed
โ โ โโโ posed/ # Predictions for recon_posed
โ โโโ 7scenes/
โ โโโ scannetpp/
โ โโโ hiroom/
โ โโโ dtu/
โ โโโ dtu64/
โโโ metric_results/ # Evaluation metrics (JSON)
โโโ eth3d_pose.json
โโโ eth3d_recon_unposed.json
โโโ eth3d_recon_posed.json
โโโ ...
๐ Metrics
๐ฏ Pose Estimation
| Metric | Description |
|---|---|
| Auc3 | Area Under Curve at 3ยฐ angular error threshold |
| Auc30 | Area Under Curve at 30ยฐ angular error threshold |
๐๏ธ 3D Reconstruction
| Metric | Description | Note |
|---|---|---|
| F-score | Harmonic mean of Precision and Recall | Higher is better |
| Overall | (Accuracy + Completeness) / 2 | Lower is better (error in meters/mm) |
Note: DTU reports Overall in millimeters; other datasets report in meters.
Expected Results for DA3-GIANT
If your setup is correct, you should get the following results when evaluating the DA3-GIANT model:
========================================================
๐ SUMMARY
========================================================
๐ฏ POSE ESTIMATION
---------------------------------------------------------------------------------------
Metric Avg HiRoom ETH3D DTU-64 7Scenes ScanNet++
---------------------------------------------------------------------------------------
Auc3 0.6705 0.8030 0.4872 0.9408 0.2744 0.8470
Auc30 0.9436 0.9592 0.9153 0.9939 0.8668 0.9827
๐๏ธ RECON_UNPOSED (Pred Pose)
---------------------------------------------------------------------------------------
Metric Avg* HiRoom ETH3D DTU 7Scenes ScanNet++
---------------------------------------------------------------------------------------
F-score 0.7345 0.8629 0.7876 N/A 0.5043 0.7831
Overall 0.1682 0.0457 0.4366 1.7927 0.1230 0.0676
๐๏ธ RECON_POSED (GT Pose)
---------------------------------------------------------------------------------------
Metric Avg* HiRoom ETH3D DTU 7Scenes ScanNet++
---------------------------------------------------------------------------------------
F-score 0.7978 0.9546 0.8685 N/A 0.5635 0.8045
Overall 0.1408 0.0213 0.3679 1.7488 0.1092 0.0649
* Avg F-score / Overall = average over HiRoom, ETH3D, 7Scenes, ScanNet++ (4 datasets)
๐๏ธ Dataset Details
ETH3D
High-resolution multi-view stereo benchmark with laser-scanned ground truth.
- Scenes: 11 (courtyard, electro, kicker, pipes, relief, delivery_area, facade, office, playground, relief_2, terrains)
- Resolution: Variable (high-res DSLR images)
- GT: Laser-scanned meshes + depth maps
โ ๏ธ Image Filtering: Some images with unusual camera rotations are filtered out for stable evaluation. See
ETH3D_FILTER_KEYSinconstants.py.
7Scenes
RGB-D dataset for camera relocalization.
- Scenes: 7 (chess, fire, heads, office, pumpkin, redkitchen, stairs)
- Resolution: 640ร480
- GT: Poses from KinectFusion, meshes from TSDF fusion
ScanNet++
High-quality indoor RGB-D dataset with dense annotations.
- Scenes: 20 validation scenes
- Resolution: 768ร1024 (after undistortion)
- GT: High-quality meshes from FARO scanner
โ ๏ธ Camera Pose Re-calibration: The default ScanNet++ poses are often inaccurate due to motion blur and textureless frames from iPhone captures. We re-ran COLMAP with the following improvements:
- Frame filtering: Removed blurry images during frame extraction
- Fisheye calibration: Jointly calibrated fisheye camera for wider FOV and better accuracy
- Exhaustive matching: Used COLMAP's exhaustive matcher and mapper for reliable poses (takes several days per scene but necessary for quality)
- All processed scenes are available at haotongl/scannetpp_zipnerf
HiRoom
Indoor room scenes with high-resolution RGB-D data.
- Scenes: 24 validation scenes
- GT: Fused point clouds
DTU-49 (Reconstruction Only)
Multi-view stereo benchmark following MVSNet evaluation protocol.
- Scenes: 22 evaluation scenes
- Views: 49 images per scene
- GT: Laser-scanned point clouds with observation masks
- Metrics: Overall only (accuracy + completeness in mm)
DTU-64 (Pose Only)
DTU subset for pose estimation evaluation.
- Scenes: 13 scenes
- Views: 64 images per scene
- Metrics: AUC@3ยฐ, AUC@30ยฐ
Why two DTU settings?
- DTU-64 (pose): More views = more challenging pose estimation
- DTU-49 (recon): Standard MVSNet protocol for fair comparison with MVS methods
๐ป Command Reference
python -m depth_anything_3.bench.evaluator [OPTIONS] [KEY=VALUE ...]
Configuration:
--config PATH Config YAML file (default: bench/configs/eval_bench.yaml)
Config Overrides (using dotlist notation):
model.path=VALUE Model path or HuggingFace ID
workspace.work_dir=VALUE Working directory for outputs
eval.datasets=[dataset1,dataset2] Datasets to evaluate (eth3d,7scenes,scannetpp,hiroom,dtu,dtu64)
eval.modes=[mode1,mode2] Evaluation modes (pose,recon_unposed,recon_posed)
eval.scenes=[scene1,scene2] Specific scenes to evaluate (null=all)
eval.max_frames=VALUE Max frames per scene (-1=no limit, default: 100)
eval.ref_view_strategy=VALUE Reference view strategy (default: first)
eval.eval_only=VALUE Only run evaluation (skip inference) (true/false)
eval.print_only=VALUE Only print saved metrics (true/false)
inference.num_fusion_workers=VALUE Number of parallel workers (default: 4)
inference.debug=VALUE Enable debug mode (true/false)
Special Flags:
--help, -h Show this help message
Multi-GPU:
Use CUDA_VISIBLE_DEVICES to specify GPUs (auto-detected and distributed)
Examples
MODEL=depth-anything/DA3-GIANT
# Full evaluation
python -m depth_anything_3.bench.evaluator model.path=$MODEL
# Quick test on HiRoom only
python -m depth_anything_3.bench.evaluator \
model.path=$MODEL \
eval.datasets=[hiroom] \
eval.modes=[pose]
# Pose-only evaluation (all 5 pose datasets)
python -m depth_anything_3.bench.evaluator \
model.path=$MODEL \
eval.datasets=[eth3d,7scenes,scannetpp,hiroom,dtu64] \
eval.modes=[pose]
# Recon-only evaluation (all 5 recon datasets)
python -m depth_anything_3.bench.evaluator \
model.path=$MODEL \
eval.datasets=[eth3d,7scenes,scannetpp,hiroom,dtu] \
eval.modes=[recon_unposed,recon_posed]
# Debug specific scenes
python -m depth_anything_3.bench.evaluator \
model.path=$MODEL \
eval.datasets=[eth3d] \
eval.scenes=[courtyard] \
inference.debug=true
# Re-evaluate without re-running inference
python -m depth_anything_3.bench.evaluator eval.eval_only=true
# Just view results
python -m depth_anything_3.bench.evaluator eval.print_only=true
๐ Troubleshooting
Data Path Issues
Ensure dataset paths in src/depth_anything_3/utils/constants.py are correct:
# Default paths (relative to project root)
ETH3D_EVAL_DATA_ROOT = "workspace/benchmark_dataset/eth3d"
SEVENSCENES_EVAL_DATA_ROOT = "workspace/benchmark_dataset/7scenes"
SCANNETPP_EVAL_DATA_ROOT = "workspace/benchmark_dataset/scannetpp"
HIROOM_EVAL_DATA_ROOT = "workspace/benchmark_dataset/hiroom/data"
DTU_EVAL_DATA_ROOT = "workspace/benchmark_dataset/dtu"
DTU64_EVAL_DATA_ROOT = "workspace/benchmark_dataset/dtu64"
๐ Citation
If you find this benchmark useful, please cite:
@article{depthanything3,
title={Depth Anything 3: Recovering the visual space from any views},
author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang},
journal={arXiv preprint arXiv:2511.10647},
year={2025}
}
Please also cite the original dataset papers for each benchmark you use.
๐ License
The benchmark datasets are provided for research purposes only. Users must follow the original licenses of each dataset:
- ETH3D: https://www.eth3d.net/
- 7Scenes: Microsoft Research
- ScanNet++: http://www.scan-net.org/
- DTU: https://roboimagedata.compute.dtu.dk/
- HiRoom: SVLightVerse