๐Ÿ“ Visual Geometry Benchmark

December 9, 2025 ยท View on GitHub

This document provides comprehensive instructions for running benchmark evaluation on Depth Anything 3.

โœจ Highlights

  • ๐Ÿ—‚๏ธ Diverse and Challenging Datasets: 5 datasets (ETH3D, 7Scenes, ScanNet++, HiRoom, DTU) covering from objects to indoor and outdoor scenes. Part of datasets are recalibrated for high accuracy (see ScanNet++ details). All preprocessed datasets are uploaded to depth-anything/DA3-BENCH.
  • ๐Ÿ”ง Robust Evaluation Pipeline: Standardized pipeline featuring RANSAC-based pose alignment for better coordinate system alignment, TSDF fusion for directly reflecting depth 3D consistency.
  • ๐Ÿ“Š Standardized Metrics: Performance measured using established metrics: AUC for pose accuracy, F1-score and Chamfer Distance for reconstruction.

๐Ÿ“‘ Table of Contents


๐Ÿš€ Quick Start

1. Download Benchmark Data

๐Ÿ’ก Note: Install HuggingFace CLI first: pip install -U huggingface_hub[cli]

๐ŸŒ Mirror: If download is slow, try: export HF_ENDPOINT=https://hf-mirror.com

cd da3_release

# Create directory and download from HuggingFace
mkdir -p workspace/benchmark_dataset
hf download depth-anything/DA3-BENCH \
    --local-dir workspace/benchmark_dataset \
    --repo-type dataset

# Extract all datasets
cd workspace/benchmark_dataset
for f in *.zip; do unzip -q "$f"; done

2. Run Evaluation

# Set model (default: depth-anything/DA3-GIANT)
MODEL=depth-anything/DA3-GIANT

# Full evaluation (all datasets, all modes)
python -m depth_anything_3.bench.evaluator model.path=$MODEL

# View results
python -m depth_anything_3.bench.evaluator eval.print_only=true

๐Ÿ“ฅ Dataset Download

All benchmark datasets are hosted on HuggingFace: depth-anything/DA3-BENCH

DatasetFileSizeDescription
ETH3Deth3d.zip~14.1 GBHigh-resolution multi-view stereo (indoor/outdoor)
ScanNet++scannetpp.zip~10.1 GBHigh-quality RGB-D indoor scenes
DTU-49dtu.zip~8.3 GBMulti-view stereo benchmark (22 scenes ร— 49 views)
7Scenes7scenes.zip~3.3 GBRGB-D indoor localization
DTU-64dtu64.zip~1.7 GBDTU subset for pose evaluation (13 scenes ร— 64 views)
HiRoomhiroom.zip~0.7 GBHigh-resolution indoor rooms

Download Options

Option 1: Download All (Recommended)

hf download depth-anything/DA3-BENCH \
    --local-dir workspace/benchmark_dataset \
    --repo-type dataset

Option 2: Download Specific Dataset

# Download only HiRoom
hf download depth-anything/DA3-BENCH hiroom.zip \
    --local-dir workspace/benchmark_dataset \
    --repo-type dataset

Option 3: Manual Download

Visit https://huggingface.co/datasets/depth-anything/DA3-BENCH and download the zip files manually.

Extract Datasets

cd workspace/benchmark_dataset

# Extract all
for f in *.zip; do unzip -q "$f"; done

# Or extract specific dataset
unzip hiroom.zip

Expected Directory Structure

After extraction, your directory should look like:

workspace/benchmark_dataset/
โ”œโ”€โ”€ eth3d/
โ”‚   โ”œโ”€โ”€ courtyard/
โ”‚   โ”œโ”€โ”€ electro/
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ 7scenes/
โ”‚   โ””โ”€โ”€ 7Scenes/
โ”‚       โ”œโ”€โ”€ chess/
โ”‚       โ””โ”€โ”€ ...
โ”œโ”€โ”€ scannetpp/
โ”‚   โ”œโ”€โ”€ 09c1414f1b/
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ hiroom/
โ”‚   โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ fused_pcd/
โ”‚   โ””โ”€โ”€ selected_scene_list_val.txt
โ”œโ”€โ”€ dtu/
โ”‚   โ”œโ”€โ”€ Rectified/
โ”‚   โ”œโ”€โ”€ Cameras/
โ”‚   โ”œโ”€โ”€ Points/
โ”‚   โ”œโ”€โ”€ SampleSet/
โ”‚   โ””โ”€โ”€ depth_raw/
โ””โ”€โ”€ dtu64/
    โ”œโ”€โ”€ Cameras/
    โ”œโ”€โ”€ scan105/
    โ””โ”€โ”€ ...

โš™๏ธ Evaluation Pipeline

Evaluation Modes

ModeDescriptionMetrics
poseCamera pose estimationAUC@3ยฐ, AUC@30ยฐ
recon_unposed3D reconstruction with predicted posesF-score, Overall
recon_posed3D reconstruction with GT posesF-score, Overall

Basic Usage

cd da3_release
MODEL=depth-anything/DA3-GIANT

# Full evaluation (inference + evaluation + print results)
python -m depth_anything_3.bench.evaluator model.path=$MODEL

# Skip inference, only evaluate existing predictions
python -m depth_anything_3.bench.evaluator eval.eval_only=true

# Only print saved metrics
python -m depth_anything_3.bench.evaluator eval.print_only=true

Selective Evaluation

# Evaluate specific datasets
python -m depth_anything_3.bench.evaluator model.path=$MODEL eval.datasets=[hiroom]

# Evaluate specific modes
python -m depth_anything_3.bench.evaluator model.path=$MODEL eval.modes=[pose,recon_unposed]

# Combine dataset and mode selection
python -m depth_anything_3.bench.evaluator model.path=$MODEL \
    eval.datasets=[hiroom] \
    eval.modes=[pose]

๐Ÿ–ฅ๏ธ Multi-GPU Inference

The evaluator automatically distributes inference across available GPUs:

# Use 4 GPUs
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m depth_anything_3.bench.evaluator model.path=$MODEL

# Use all available GPUs (default)
python -m depth_anything_3.bench.evaluator model.path=$MODEL

# Single GPU
CUDA_VISIBLE_DEVICES=0 python -m depth_anything_3.bench.evaluator model.path=$MODEL

๐Ÿ”ง Configuration

Config File

Default config: src/depth_anything_3/bench/configs/eval_bench.yaml

# Model path
model:
  path: depth-anything/DA3-GIANT

# Workspace directory
workspace:
  work_dir: ./workspace/evaluation

# Evaluation settings
eval:
  datasets: [eth3d, 7scenes, scannetpp, hiroom, dtu, dtu64]
  modes: [pose, recon_unposed, recon_posed]
  max_frames: 100      # Max frames per scene (-1 = no limit)
  scenes: null         # Specific scenes (null = all)

# Inference settings
inference:
  num_fusion_workers: 4
  debug: false

Output Structure

workspace/evaluation/
โ”œโ”€โ”€ model_results/              # Inference outputs
โ”‚   โ”œโ”€โ”€ eth3d/
โ”‚   โ”‚   โ””โ”€โ”€ {scene}/
โ”‚   โ”‚       โ”œโ”€โ”€ unposed/       # Predictions for recon_unposed
โ”‚   โ”‚       โ””โ”€โ”€ posed/         # Predictions for recon_posed
โ”‚   โ”œโ”€โ”€ 7scenes/
โ”‚   โ”œโ”€โ”€ scannetpp/
โ”‚   โ”œโ”€โ”€ hiroom/
โ”‚   โ”œโ”€โ”€ dtu/
โ”‚   โ””โ”€โ”€ dtu64/
โ””โ”€โ”€ metric_results/             # Evaluation metrics (JSON)
    โ”œโ”€โ”€ eth3d_pose.json
    โ”œโ”€โ”€ eth3d_recon_unposed.json
    โ”œโ”€โ”€ eth3d_recon_posed.json
    โ””โ”€โ”€ ...

๐Ÿ“Š Metrics

๐ŸŽฏ Pose Estimation

MetricDescription
Auc3Area Under Curve at 3ยฐ angular error threshold
Auc30Area Under Curve at 30ยฐ angular error threshold

๐Ÿ—๏ธ 3D Reconstruction

MetricDescriptionNote
F-scoreHarmonic mean of Precision and RecallHigher is better
Overall(Accuracy + Completeness) / 2Lower is better (error in meters/mm)

Note: DTU reports Overall in millimeters; other datasets report in meters.

Expected Results for DA3-GIANT

If your setup is correct, you should get the following results when evaluating the DA3-GIANT model:

========================================================
๐Ÿ“Š SUMMARY
========================================================

๐ŸŽฏ POSE ESTIMATION
---------------------------------------------------------------------------------------
Metric         Avg         HiRoom      ETH3D       DTU-64      7Scenes     ScanNet++
---------------------------------------------------------------------------------------
Auc3           0.6705      0.8030      0.4872      0.9408      0.2744      0.8470
Auc30          0.9436      0.9592      0.9153      0.9939      0.8668      0.9827

๐Ÿ—๏ธ  RECON_UNPOSED (Pred Pose)
---------------------------------------------------------------------------------------
Metric         Avg*        HiRoom      ETH3D       DTU         7Scenes     ScanNet++
---------------------------------------------------------------------------------------
F-score        0.7345      0.8629      0.7876      N/A         0.5043      0.7831
Overall        0.1682      0.0457      0.4366      1.7927      0.1230      0.0676

๐Ÿ—๏ธ  RECON_POSED (GT Pose)
---------------------------------------------------------------------------------------
Metric         Avg*        HiRoom      ETH3D       DTU         7Scenes     ScanNet++
---------------------------------------------------------------------------------------
F-score        0.7978      0.9546      0.8685      N/A         0.5635      0.8045
Overall        0.1408      0.0213      0.3679      1.7488      0.1092      0.0649

* Avg F-score / Overall = average over HiRoom, ETH3D, 7Scenes, ScanNet++ (4 datasets)

๐Ÿ—‚๏ธ Dataset Details

ETH3D

High-resolution multi-view stereo benchmark with laser-scanned ground truth.

  • Scenes: 11 (courtyard, electro, kicker, pipes, relief, delivery_area, facade, office, playground, relief_2, terrains)
  • Resolution: Variable (high-res DSLR images)
  • GT: Laser-scanned meshes + depth maps

โš ๏ธ Image Filtering: Some images with unusual camera rotations are filtered out for stable evaluation. See ETH3D_FILTER_KEYS in constants.py.

7Scenes

RGB-D dataset for camera relocalization.

  • Scenes: 7 (chess, fire, heads, office, pumpkin, redkitchen, stairs)
  • Resolution: 640ร—480
  • GT: Poses from KinectFusion, meshes from TSDF fusion

ScanNet++

High-quality indoor RGB-D dataset with dense annotations.

  • Scenes: 20 validation scenes
  • Resolution: 768ร—1024 (after undistortion)
  • GT: High-quality meshes from FARO scanner

โš ๏ธ Camera Pose Re-calibration: The default ScanNet++ poses are often inaccurate due to motion blur and textureless frames from iPhone captures. We re-ran COLMAP with the following improvements:

  • Frame filtering: Removed blurry images during frame extraction
  • Fisheye calibration: Jointly calibrated fisheye camera for wider FOV and better accuracy
  • Exhaustive matching: Used COLMAP's exhaustive matcher and mapper for reliable poses (takes several days per scene but necessary for quality)
  • All processed scenes are available at haotongl/scannetpp_zipnerf

HiRoom

Indoor room scenes with high-resolution RGB-D data.

  • Scenes: 24 validation scenes
  • GT: Fused point clouds

DTU-49 (Reconstruction Only)

Multi-view stereo benchmark following MVSNet evaluation protocol.

  • Scenes: 22 evaluation scenes
  • Views: 49 images per scene
  • GT: Laser-scanned point clouds with observation masks
  • Metrics: Overall only (accuracy + completeness in mm)

DTU-64 (Pose Only)

DTU subset for pose estimation evaluation.

  • Scenes: 13 scenes
  • Views: 64 images per scene
  • Metrics: AUC@3ยฐ, AUC@30ยฐ

Why two DTU settings?

  • DTU-64 (pose): More views = more challenging pose estimation
  • DTU-49 (recon): Standard MVSNet protocol for fair comparison with MVS methods

๐Ÿ’ป Command Reference

python -m depth_anything_3.bench.evaluator [OPTIONS] [KEY=VALUE ...]

Configuration:
  --config PATH                      Config YAML file (default: bench/configs/eval_bench.yaml)

Config Overrides (using dotlist notation):
  model.path=VALUE                   Model path or HuggingFace ID
  workspace.work_dir=VALUE           Working directory for outputs
  eval.datasets=[dataset1,dataset2]  Datasets to evaluate (eth3d,7scenes,scannetpp,hiroom,dtu,dtu64)
  eval.modes=[mode1,mode2]           Evaluation modes (pose,recon_unposed,recon_posed)
  eval.scenes=[scene1,scene2]        Specific scenes to evaluate (null=all)
  eval.max_frames=VALUE              Max frames per scene (-1=no limit, default: 100)
  eval.ref_view_strategy=VALUE       Reference view strategy (default: first)
  eval.eval_only=VALUE               Only run evaluation (skip inference) (true/false)
  eval.print_only=VALUE              Only print saved metrics (true/false)
  inference.num_fusion_workers=VALUE Number of parallel workers (default: 4)
  inference.debug=VALUE              Enable debug mode (true/false)

Special Flags:
  --help, -h                         Show this help message

Multi-GPU:
  Use CUDA_VISIBLE_DEVICES to specify GPUs (auto-detected and distributed)

Examples

MODEL=depth-anything/DA3-GIANT

# Full evaluation
python -m depth_anything_3.bench.evaluator model.path=$MODEL

# Quick test on HiRoom only
python -m depth_anything_3.bench.evaluator \
    model.path=$MODEL \
    eval.datasets=[hiroom] \
    eval.modes=[pose]

# Pose-only evaluation (all 5 pose datasets)
python -m depth_anything_3.bench.evaluator \
    model.path=$MODEL \
    eval.datasets=[eth3d,7scenes,scannetpp,hiroom,dtu64] \
    eval.modes=[pose]

# Recon-only evaluation (all 5 recon datasets)
python -m depth_anything_3.bench.evaluator \
    model.path=$MODEL \
    eval.datasets=[eth3d,7scenes,scannetpp,hiroom,dtu] \
    eval.modes=[recon_unposed,recon_posed]

# Debug specific scenes
python -m depth_anything_3.bench.evaluator \
    model.path=$MODEL \
    eval.datasets=[eth3d] \
    eval.scenes=[courtyard] \
    inference.debug=true

# Re-evaluate without re-running inference
python -m depth_anything_3.bench.evaluator eval.eval_only=true

# Just view results
python -m depth_anything_3.bench.evaluator eval.print_only=true

๐Ÿ” Troubleshooting

Data Path Issues

Ensure dataset paths in src/depth_anything_3/utils/constants.py are correct:

# Default paths (relative to project root)
ETH3D_EVAL_DATA_ROOT = "workspace/benchmark_dataset/eth3d"
SEVENSCENES_EVAL_DATA_ROOT = "workspace/benchmark_dataset/7scenes"
SCANNETPP_EVAL_DATA_ROOT = "workspace/benchmark_dataset/scannetpp"
HIROOM_EVAL_DATA_ROOT = "workspace/benchmark_dataset/hiroom/data"
DTU_EVAL_DATA_ROOT = "workspace/benchmark_dataset/dtu"
DTU64_EVAL_DATA_ROOT = "workspace/benchmark_dataset/dtu64"

๐Ÿ“ Citation

If you find this benchmark useful, please cite:

@article{depthanything3,
  title={Depth Anything 3: Recovering the visual space from any views},
  author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang},
  journal={arXiv preprint arXiv:2511.10647},
  year={2025}
}

Please also cite the original dataset papers for each benchmark you use.


๐Ÿ“„ License

The benchmark datasets are provided for research purposes only. Users must follow the original licenses of each dataset: