Runtime and accuracy metrics for all release models

March 5, 2026 ยท View on GitHub

Setup

The runtime and accuracy reported in this page are generated using n2-standard-96 GCP instances which has the following configuration:

GCP instance type: n2-standard-96
CPUs: 96-core (vCPU)
Memory: 384GiB
GPUs: 0

Details of metrics can be found here:

Sample sheet contains details of the input files used to generate this report.

Note: Each model type uses different coverages.

Accuracy

Below we report full genome accuracy as reported using hap.py for our models.

Model typesampleTypeTRUTH.TOTALTRUTH.TPTRUTH.FNQUERY.TOTALQUERY.FPRecallPrecisionF1_Score
wgsHG003INDEL504501501594290793793711900.9942380.9977290.99598
wgsHG003SNP3327496330672020776381796248800.9937560.9985270.996136
wesHG003INDEL1051102427148580.974310.9924170.98328
wesHG003SNP252792498329627709600.9882910.9976040.992926
pacbioHG003INDEL504501501567293498995830570.9941840.9941620.994173
pacbioHG003SNP332749533217655730432994241250.9982780.9987610.99852
ont-r104HG003INDEL50450146035544146830072256760.9124960.9486950.930243
ont-r104HG003SNP332749533217995696440047546110.9982880.9986150.998451
rnaseqHG005INDEL18815137285360.8031910.8105260.806842
rnaseqHG005SNP1134910656693123363910.9389370.9645990.951595
hybrid-pacbio-illuminaHG003INDEL504501503264123799827420520.9975480.9961290.996838
hybrid-pacbio-illuminaHG003SNP332749533240213474406805818560.9989560.9994420.999199

Runtime

Each case study was run 5x times and the runtimes were averaged. Here we report the mean runtime in seconds, the standard deviation of runtimes, and a duration format (mean_runtime; hours, minutes, seconds).

Total runtime only

Model typesamplestagemean runtimetotal_runs
wgsHG003total1h 8m 58s5
exomeHG003total4m 11s5
pacbioHG003total1h 2m 17s5
ont-r104HG003total1h 43m 18s5
rnaseqHG005total9m 1s5
hybrid-pacbio-illuminaHG003total2h 14m 54s5

Detailed runtime

Model typesamplestagemean runtimetotal_runs
wgsHG003make_examples46m 15s5
wgsHG003call_variants15m 58s5
wgsHG003postprocess_variants6m 45s5
wgsHG003vcf_stats5m 17s5
wgsHG003total1h 8m 58s5
exomeHG003make_examples3m 6s5
exomeHG003call_variants34s5
exomeHG003postprocess_variants30s5
exomeHG003vcf_stats6s5
exomeHG003total4m 11s5
pacbioHG003make_examples37m 4s5
pacbioHG003call_variants18m 28s5
pacbioHG003postprocess_variants6m 45s5
pacbioHG003vcf_stats5m 46s5
pacbioHG003total1h 2m 17s5
ont-r104HG003make_examples56m 4s5
ont-r104HG003call_variants32m 52s5
ont-r104HG003postprocess_variants14m 21s5
ont-r104HG003vcf_stats7m 23s5
ont-r104HG003total1h 43m 18s5
rnaseqHG005make_examples7m 31s5
rnaseqHG005call_variants25s5
rnaseqHG005postprocess_variants1m 4s5
rnaseqHG005vcf_stats5s5
rnaseqHG005total9m 1s5
hybrid-pacbio-illuminaHG003make_examples1h 54s5
hybrid-pacbio-illuminaHG003call_variants1h 10m 4s5
hybrid-pacbio-illuminaHG003postprocess_variants3m 55s5
hybrid-pacbio-illuminaHG003vcf_stats5m 3s5
hybrid-pacbio-illuminaHG003total2h 14m 54s5

Inspect outputs that produced the metrics above

The DeepVariant VCFs, gVCFs, and hap.py evaluation outputs are available at:

gs://deepvariant/case-study-outputs

You can also inspect them in a web browser here: https://42basepairs.com/browse/gs/deepvariant/case-study-outputs

How to reproduce the metrics on this page

For simplicity and consistency, we report runtime with a CPU instance with 96 CPUs This is NOT the fastest or cheapest configuration.

Use gcloud compute ssh to log in to the newly created instance.

Download and run any of the following case study scripts:

# Get the script.
curl -O https://raw.githubusercontent.com/google/deepvariant/r1.10/scripts/inference_deepvariant.sh

# WGS
bash inference_deepvariant.sh --model_preset WGS

# WES
bash inference_deepvariant.sh --model_preset WES

# PacBio
bash inference_deepvariant.sh --model_preset PACBIO

# ONT_R104
bash inference_deepvariant.sh --model_preset ONT_R104

# Hybrid
bash inference_deepvariant.sh --model_preset HYBRID_PACBIO_ILLUMINA

Runtime metrics are taken from the resulting log after each stage of DeepVariant. The runtime numbers reported above are the average of 5 runs each. The accuracy metrics come from the hap.py summary.csv output file. The runs are deterministic so all 5 runs produced the same output.