fastVEP

June 10, 2026 · View on GitHub

A high-performance Variant Effect Predictor written in Rust. fastVEP predicts the functional consequences of genomic variants (SNPs, insertions, deletions, structural variants) on genes, transcripts, and protein sequences, with direct integration of clinical and population databases.

fastVEP is inspired by and aims to be compatible with Ensembl VEP and Illumina Nirvana, while delivering significantly better performance through Rust's zero-cost abstractions and native parallelism.

Try it now: A hosted web server is available at fastVEP.org — paste VCF data and get annotated results instantly, no installation required.

Features

  • Variant Consequence Prediction — Classifies variants using 49 Sequence Ontology terms (missense, frameshift, splice donor, copy_number_change, transcript_ablation, etc.)
  • Structural Variant Support — Full SV pipeline: <DEL>, <DUP>, <INV>, <CNV>, <BND>, <INS>, <STR> with SV-specific consequence prediction
  • Supplementary Annotations — Direct integration with ClinVar, gnomAD, dbSNP, COSMIC, 1000 Genomes, TOPMed, MitoMap via the native fastSA format (v1: zstd block compression with byte-budgeted block cache; v2: echtvar-inspired chunked ZIP with Var32 encoding, parallel u32 value arrays, delta encoding, and LRU caching)
  • Prediction Scores — PhyloP, GERP, REVEL, SpliceAI, PrimateAI, DANN conservation and pathogenicity scores; SIFT/PolyPhen via dbNSFP
  • Gene-Level Annotations — OMIM phenotypes, gnomAD gene constraint (pLI, LOEUF), ClinGen gene-disease validity
  • Filter Engine — Expression-based filtering compatible with VEP's filter_vep syntax
  • HGVS Nomenclature — Generates HGVSg, HGVSc, and HGVSp notations with 3' normalization
  • Multiple Output Formats — VCF (with 49-field CSQ), tab-delimited, JSON (including Nirvana-style structured output)
  • Multi-Sample Support — Parse FORMAT/GT/DP/GQ/AD fields per sample with genotype classification
  • Regulatory Region Detection — Promoters, enhancers, CTCF binding sites, TF binding sites from Ensembl regulatory build
  • Mitochondrial Support — Circular coordinate handling, vertebrate mitochondrial codon table (NCBI table 2)
  • Custom Annotations — User-provided VCF (--source custom_vcf) and BED (--source custom_bed) files; .osi interval databases load alongside .osa via --sa-dir
  • ACMG-AMP Classification--acmg runs the full Richards 2015 + ClinGen SVI rule set (28 criteria, configurable thresholds, trio / compound-het support via --proband/--mother/--father)
  • VEP --merged Cache--gff3 is repeatable on annotate and cache; combine Ensembl + RefSeq in a single run with per-transcript SOURCE labels
  • --sa-only Mode — Skip the default CSQ pipeline and emit only supplementary annotations, useful for re-annotating already-annotated VCFs
  • Gzipped VCF Inputannotate auto-detects .vcf.gz / .vcf.bgz (no upstream decompression needed)
  • Web Interface — Built-in web GUI for interactive variant annotation
  • GFF3 Annotation Support — Load gene models from standard GFF3 files (any organism)

Quick Start

1. Install Rust (if you don't have it)

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"

2. Build and install fastVEP

git clone https://github.com/Huang-lab/fastVEP.git
cd fastVEP

# Build and install both binaries to ~/.cargo/bin/
cargo install --path crates/fastvep-cli   # fastvep (CLI annotator)
cargo install --path crates/fastvep-web   # fastvep-web (production web server)

# Verify it works
fastvep --version

Note: cargo install places the binary in ~/.cargo/bin/. If fastvep is not found after install, run source "$HOME/.cargo/env" or add this line to your ~/.zshrc (or ~/.bashrc):

source "$HOME/.cargo/env"

Alternative: build a conda package

Prefer conda? The repo ships a recipe under conda/recipe/ that builds both fastvep and fastvep-web into a local conda package (Linux and macOS):

# One-time: tools for building conda packages
conda install -n base -c conda-forge conda-build

# Build the package from the repo root
conda build conda/recipe

# Install into a fresh environment
conda create -n fastvep -c local fastvep
conda activate fastvep
fastvep --version

3. Try it — annotate the included test data

fastVEP ships with a small test VCF and GFF3 so you can try it immediately:

# Annotate 12 test variants covering SNVs, indels, splice sites, UTRs, and intergenic regions
fastvep annotate -i tests/test.vcf --gff3 tests/test.gff3 --hgvs --output-format tab

4. Build supplementary annotation databases

# Build ClinVar annotation database
fastvep sa-build --source clinvar --input clinvar.vcf.gz --output clinvar

# Build gnomAD population frequency database
fastvep sa-build --source gnomad --input gnomad.genomes.v4.vcf.bgz --output gnomad

# Build PhyloP conservation scores
fastvep sa-build --source phylop --input hg38.phyloP100way.wigFix.gz --output phylop

# Build SpliceAI predictions
fastvep sa-build --source spliceai --input spliceai_scores.vcf.gz --output spliceai

5. Annotate with supplementary databases

# Annotate with all databases in a directory
fastvep annotate \
  -i your_variants.vcf \
  -o annotated.vcf \
  --gff3 Homo_sapiens.GRCh38.112.gff3 \
  --fasta Homo_sapiens.GRCh38.dna.primary_assembly.fa \
  --sa-dir /path/to/annotation_databases/ \
  --hgvs

6. Filter annotated variants

# Filter for high-impact or rare missense variants
fastvep filter \
  -i annotated.vcf \
  --filter "IMPACT is HIGH or (Consequence in missense_variant and AF < 0.001)"

7. Launch the web interface

# Quick start — uses a built-in example gene model (OR4F5, chr1)
fastvep-web

# With your own data
fastvep-web --gff3 Homo_sapiens.GRCh38.115.gff3 --fasta Homo_sapiens.GRCh38.dna.primary_assembly.fa

# With supplementary annotations (ClinVar, gnomAD, etc.)
fastvep-web --gff3 genes.gff3 --fasta ref.fa --sa-dir /path/to/sa_databases/

Open http://localhost:8080 in your browser. The web interface lets you paste VCF data, switch gene models, and view results in an interactive table.

Note: fastvep-web is a separate production-quality binary (axum/tokio, async, multi-connection). The legacy fastvep web command still works but is single-threaded.

Local Setup Guide

This section walks through setting up fastVEP with full annotation capabilities (gene models, reference sequence, and supplementary databases like ClinVar and gnomAD).

Step 1: Download reference data

mkdir -p data && cd data

# Gene models (GFF3) — pick your organism
# Human GRCh38
wget https://ftp.ensembl.org/pub/release-115/gff3/homo_sapiens/Homo_sapiens.GRCh38.115.gff3.gz
gunzip Homo_sapiens.GRCh38.115.gff3.gz

# Reference FASTA (needed for HGVS and sequence context)
wget https://ftp.ensembl.org/pub/release-115/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

# Create FASTA index (enables memory-mapped access — important for large genomes)
samtools faidx Homo_sapiens.GRCh38.dna.primary_assembly.fa

Step 2: Build supplementary annotation databases

Each supplementary database (ClinVar, gnomAD, etc.) is built in two stepsdownload the source file, then run fastvep sa-build to convert it into the fastSA .osa + .osa.idx pair. sa-build is a converter, not a downloader; if you skip the download, the resulting .osa will be empty and your annotations will silently come back blank. After each build, check that the .osa size matches the expected magnitude (column below); a few-KB .osa is the tell that the source file wasn't real.

mkdir -p sa_databases

# ── ClinVar — clinical variant significance ──
# Download (~50 MB)
wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
# Build (expect ~80–120 MB .osa)
fastvep sa-build --source clinvar -i clinvar.vcf.gz -o sa_databases/clinvar --assembly GRCh38

# ── gnomAD v4 — population allele frequencies ──
# Download per-chromosome from https://gnomad.broadinstitute.org/downloads
# (~30–60 GB total for genomes v4.0)
fastvep sa-build --source gnomad -i gnomad.genomes.v4.0.sites.vcf.bgz -o sa_databases/gnomad --assembly GRCh38

# ── dbSNP — variant identifiers ──
wget https://ftp.ncbi.nih.gov/snp/latest_release/VCF/GCF_000001405.40.gz
fastvep sa-build --source dbsnp -i GCF_000001405.40.gz -o sa_databases/dbsnp --assembly GRCh38

# ── COSMIC — somatic mutations (requires license) ──
# https://cancer.sanger.ac.uk/cosmic/download
fastvep sa-build --source cosmic -i CosmicCodingMuts.vcf.gz -o sa_databases/cosmic --assembly GRCh38

Verify before moving on:

ls -la sa_databases/*.osa
# Expected: clinvar ~100 MB; gnomad several GB; dbsnp ~5 GB.
# Anything < 1 MB usually means an empty build — re-check the source file.

For ACMG-AMP classification specifically (REVEL, SpliceAI, PhyloP, dbNSFP, OMIM, ClinVar protein index, etc.), see the dedicated ACMG Setup Guide — it walks through every source the classifier needs with download URLs, build commands, expected disk sizes, and a verification recipe.

Step 3: Run the CLI annotator

fastvep annotate \
  -i your_variants.vcf \
  -o annotated.vcf \
  --gff3 data/Homo_sapiens.GRCh38.115.gff3 \
  --fasta data/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
  --sa-dir sa_databases/ \
  --hgvs

Step 4: Run the web server

# Install the web server binary
cargo install --path crates/fastvep-web

# Run with all annotation sources
fastvep-web \
  --gff3 data/Homo_sapiens.GRCh38.115.gff3 \
  --fasta data/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
  --sa-dir sa_databases/ \
  --port 8080

All flags also accept environment variables (FASTVEP_GFF3, FASTVEP_FASTA, FASTVEP_SA_DIR, FASTVEP_PORT) for container deployments.

Multi-organism setup

To serve multiple genomes from the web interface, organize data into subdirectories and use --data-dir. Each subdirectory is one genome — the server auto-detects GFF3, FASTA, and SA files inside.

Directory layout:

genomes/
  human_grch38/
    Homo_sapiens.GRCh38.115.gff3       # gene models (required)
    Homo_sapiens.GRCh38.dna.primary_assembly.fa   # reference (optional, for HGVS)
    Homo_sapiens.GRCh38.dna.primary_assembly.fa.fai
    sa/                                 # supplementary annotations (optional)
      clinvar.osa2
      gnomad.osa2
      dbsnp.osa2
  mouse_grcm39/
    Mus_musculus.GRCm39.115.gff3
    mouse.fa
    mouse.fa.fai
  zebrafish/
    Danio_rerio.GRCz11.115.gff3

Setup:

mkdir -p genomes/human_grch38/sa genomes/mouse_grcm39 genomes/zebrafish

# Human: GFF3 + FASTA + SA databases
cp data/Homo_sapiens.GRCh38.115.gff3 genomes/human_grch38/
cp data/Homo_sapiens.GRCh38.dna.primary_assembly.fa* genomes/human_grch38/
cp sa_databases/*.osa2 genomes/human_grch38/sa/   # ClinVar, gnomAD, etc.

# Mouse
wget -O- https://ftp.ensembl.org/pub/release-115/gff3/mus_musculus/Mus_musculus.GRCm39.115.gff3.gz | gunzip > genomes/mouse_grcm39/Mus_musculus.GRCm39.115.gff3

# Zebrafish
wget -O- https://ftp.ensembl.org/pub/release-115/gff3/danio_rerio/Danio_rerio.GRCz11.115.gff3.gz | gunzip > genomes/zebrafish/Danio_rerio.GRCz11.115.gff3

Run:

fastvep-web --data-dir genomes/

Users can switch between organisms from the dropdown in the web UI. When a genome has a sa/ subdirectory, its SA databases are automatically loaded. The dropdown shows "(FASTA + SA)" labels for genomes that have these resources.

--sa-dir is optional — if provided, it serves as a fallback for genomes that don't have their own sa/ folder. If the directory doesn't exist, the server starts without SA (no error).

fastVEP works with any organism — just provide the matching GFF3 (and optionally FASTA for HGVS).

Merged annotation (Ensembl + RefSeq, à la VEP --merged)

--gff3 accepts multiple values, so a single run can annotate against Ensembl and RefSeq side-by-side — fastVEP's analog of VEP's --merged cache. The SOURCE column of each CSQ entry records which file produced that transcript.

# Auto-detected SOURCE labels (filename contains "ensembl"/"gencode" → Ensembl;
# "refseq" or GCF_ prefix → RefSeq; otherwise the basename).
fastvep annotate -i variants.vcf \
  --gff3 Homo_sapiens.GRCh38.115.gff3 \
  --gff3 GCF_000001405.40.gff.gz \
  --fasta GRCh38.fa --hgvs

# Or set the labels explicitly with `LABEL=path` (also accepts comma-separated):
fastvep annotate -i variants.vcf \
  --gff3 Ensembl=ensembl.gff3,RefSeq=refseq.gff3 \
  --fasta GRCh38.fa

Each transcript carries its source through to all output formats: SOURCE field in the VCF CSQ string, the source key in JSON, and the SOURCE column in tab output (with --canonical / extended fields). Transcripts from both sources are queried independently — overlap detection, HGVS, and supplementary annotation all run per-transcript, so RefSeq NM_… and Ensembl ENST… records appear as separate CSQ entries for the same variant.

The auto-managed sidecar cache (<gff3>.fastvep.cache) only kicks in for single-GFF3 runs. For merged workflows, pre-build a combined binary cache with fastvep cache --gff3 ensembl.gff3 -o combined.cache once per source and pass --transcript-cache combined.cache on subsequent runs — though GFF3 parsing is fast enough that this is rarely needed.

Supplementary Annotation Sources

fastVEP supports direct integration with clinical and population databases through its native fastSA binary format. Build once with fastvep sa-build, then use --sa-dir to annotate:

SourceTypeDescriptionBuild Command
ClinVarAllele-specificClinical significance, review status, phenotypes--source clinvar
gnomADAllele-specificPopulation frequencies (8 populations), allele counts--source gnomad
dbSNPAllele-specificRS IDs, global minor allele frequency--source dbsnp
COSMICAllele-specificSomatic mutations, gene, sample counts--source cosmic
1000 GenomesAllele-specificPopulation frequencies (AFR, AMR, EAS, EUR, SAS)--source onekg
TOPMedAllele-specificPopulation frequencies, allele counts--source topmed
MitoMapAllele-specificMitochondrial disease associations--source mitomap
PhyloPPositionalPhylogenetic conservation scores--source phylop
GERPPositionalEvolutionary rate profiling--source gerp
DANNPositionalDeleterious annotation scores--source dann
REVELAllele-specificMissense pathogenicity predictions--source revel
SpliceAIAllele-specificSplice site effect predictions (delta scores)--source spliceai
PrimateAIAllele-specificPrimate-based pathogenicity--source primateai
dbNSFPAllele-specificSIFT/PolyPhen predictions--source dbnsfp
OMIM / ClinGen GDVGene-level (.oga)Disease-gene annotations driving PVS1, BS2, PM3, BP2 in ACMG--source omim
gnomAD constraintGene-level (.oga)gnomAD constraint metrics (pLI, LOEUF) for PVS1, PP2, BP1--source gnomad_genes
ClinVar protein indexGene-level (.oga)Pathogenic missense by protein position (PS1, PM1, PM5)--source clinvar_protein
Custom VCFAllele-specific (.osa)Any user-supplied VCF, INFO fields become the JSON object--source custom_vcf
Custom BEDInterval (.osi)Any user-supplied BED, name/score columns become the JSON object--source custom_bed
Custom (auto-detect)VCF or BEDAuto-detects format from .vcf[.gz] / .bed[.gz] extension--source custom

For the per-source VCF FV_* / tab column / JSON-key schema (pipe formats, escaping rules, identifiers), see docs/SUPPLEMENTARY_ANNOTATIONS.md.

Custom annotation sources

You don't have to wait for a built-in parser to plug in your own data — sa-build accepts arbitrary VCFs and BEDs via --source custom_vcf, --source custom_bed, or --source custom (auto-detects from the input extension). The --name flag becomes the JSON / column key for the resulting database, so it shows up in output exactly like a first-class source.

# Custom allele-level VCF — select which INFO fields to keep
fastvep sa-build --source custom_vcf \
  --name clinical --info-fields CLIN_LABEL,CLIN_SCORE \
  -i my_clinical.vcf.gz -o sa_databases/clinical

# Custom interval-level BED — score/name columns flow through automatically
fastvep sa-build --source custom_bed \
  --name myregions \
  -i my_regions.bed -o sa_databases/myregions

# Annotate as usual — both .osa and .osi in --sa-dir are picked up
fastvep annotate -i variants.vcf --gff3 genes.gff3 \
  --sa-dir sa_databases/ --output-format json

Allele-level custom VCFs produce a .osa and attach to records whose (pos, ref, alt) matches. Interval-level custom BEDs produce a .osi and attach via positional overlap (returning every interval that contains the variant). Omit --info-fields to capture every INFO key on every record — convenient for exploration, but the resulting JSON objects will be heterogeneous.

Command Reference

fastvep annotate

FlagDescriptionDefault
-i, --inputInput VCF file (- for stdin; .vcf.gz auto-detected)required
-o, --outputOutput file (- for stdout)-
--gff3GFF3 gene annotation file. May be repeated to replicate VEP's --merged cache (Ensembl + RefSeq in a single run); each value may be LABEL=path to control the SOURCE column.--
--fastaReference FASTA file--
--output-formatvcf, tab, or jsonvcf
--hgvsInclude HGVS notationsoff
--pickReport only the most severe consequence per variantoff
--symbolInclude gene symbol in outputoff
--canonicalInclude canonical-transcript flag in outputoff
--everythingTurn on all common annotation flagsoff
--distanceUpstream/downstream distance in bp5000
--buffer-sizeVariants buffered per parallel batch5000
--sa-dirDirectory containing .osa / .osa2 / .osi / .oga supplementary annotations--
--sa-onlySkip the default CSQ annotation and emit only supplementary annotations from --sa-dir (requires --sa-dir)off
--cache-dirPath to VEP cache directory for known-variant lookup--
--transcript-cachePath to binary transcript cache file (overrides the auto-managed <gff3>.fastvep.cache sidecar)--
--acmgRun ACMG-AMP classification (Richards 2015 + ClinGen SVI); adds ACMG + ACMG_CRITERIA to CSQoff
--acmg-configTOML file with custom ACMG thresholdsbuilt-in defaults
--proband / --mother / --fatherSample names for trio analysis — enables PS2 (de novo), PM6, BP2--
--gene-listPath to a gene-panel file (one symbol or Ensembl gene ID per line). Tab output drops rows whose transcript isn't on the panel.--
--explicit-allelesAdd an explicit REF column to tab output after Alleleoff
--qc-rulesTOML file of QC class rules; populates a QC_CLASS column in tab output--

fastvep sa-build

FlagDescriptionDefault
--sourceSource type (clinvar, gnomad, dbsnp, cosmic, onekg, topmed, mitomap, phylop, gerp, dann, revel, spliceai, primateai, dbnsfp, omim, gnomad_genes, clinvar_protein, custom_vcf, custom_bed, custom)required
-i, --inputInput file (VCF/TSV/wigFix/BED, supports .gz)required
-o, --outputOutput base path (creates .osa + .osa.idx, or .osi for BED)required
--assemblyGenome assemblyGRCh38
--nameDisplay + JSON-key name for custom_* sourcesderived from input filename
--info-fieldsComma-separated INFO keys to extract for custom_vcfall INFO keys

fastvep filter

FlagDescriptionDefault
-i, --inputInput VEP-annotated VCFrequired
-o, --outputOutput file-
--filterFilter expression (filter_vep-compatible syntax)required

Filter syntax examples:

IMPACT is HIGH
Consequence in missense_variant,stop_gained,frameshift_variant
AF < 0.001
IMPACT is HIGH and AF < 0.01
(IMPACT is HIGH or IMPACT is MODERATE) and not Consequence is synonymous_variant

fastvep-web (production web server)

FlagDescriptionDefault
--gff3GFF3 gene annotation file--
--fastaReference FASTA file--
--sa-dirDirectory containing .osa / .osa2 / .osi / .oga supplementary annotations--
--data-dirDirectory of genome subdirectories (for multi-organism switching)--
--portHTTP port (also FASTVEP_PORT env)8080
--bindBind address (also FASTVEP_BIND env)0.0.0.0
--distanceUpstream/downstream distance in bp5000
--max-body-sizeMax request body in bytes10485760
--max-concurrentMax concurrent annotation requests64
--stats-fileJSON file to write per-request stats to (also FASTVEP_STATS_FILE env)--

fastvep cache

Pre-builds a binary transcript cache for fast startup. Accepts the same multi---gff3 / LABEL=path syntax as annotate, so a merged Ensembl

  • RefSeq cache can be built once and reused via --transcript-cache.
FlagDescriptionDefault
--gff3GFF3 annotation file(s). Repeatable; each value may be LABEL=path.required
--fastaReference FASTA (for pre-building spliced sequences)--
-o, --outputOutput cache file pathrequired

Output Formats

VCF Output

Consequence annotations are added as a CSQ field in the INFO column with 49 pipe-delimited fields matching Ensembl VEP's extended format, plus fastVEP-specific ACMG and ACMG_CRITERIA slots when --acmg is set. When supplementary annotation databases are loaded with --sa-dir, fastVEP also emits VCF-compatible INFO projections for supported fastSA sources: standard SpliceAI for SpliceAI databases, and fastVEP-specific FV_* fields such as FV_CLINVAR, FV_GNOMAD, FV_DBSNP, FV_REVEL, and gene-level FV_OMIM.

The VCF output never embeds raw JSON in INFO values. Use --output-format json for the richest structured representation of all supplementary annotation objects.

Tab Output

One line per variant-transcript-allele combination with 17 columns.

JSON Output

Structured JSON with transcript_consequences array per variant, including supplementary annotations from SA providers (ClinVar, gnomAD, etc.) and gene-level annotations.

Consequence Types

fastVEP predicts 49 consequence types organized by impact:

ImpactConsequences
HIGHtranscript_ablation, splice_acceptor_variant, splice_donor_variant, stop_gained, frameshift_variant, stop_lost, start_lost, transcript_amplification, TFBS_ablation, regulatory_region_ablation
MODERATEinframe_insertion, inframe_deletion, missense_variant, protein_altering_variant, regulatory_region_amplification, TFBS_amplification
LOWsplice_region_variant, splice_donor_5th_base_variant, splice_donor_region_variant, splice_polypyrimidine_tract_variant, synonymous_variant, start_retained_variant, stop_retained_variant, incomplete_terminal_codon_variant
MODIFIERcoding_sequence_variant, 5_prime_UTR_variant, 3_prime_UTR_variant, non_coding_transcript_exon_variant, intron_variant, upstream_gene_variant, downstream_gene_variant, intergenic_variant, copy_number_change, copy_number_increase, copy_number_decrease, short_tandem_repeat_change, transcript_variant, and others

Architecture

crates/
  fastvep-core/         # Core types: Consequence (49 SO terms), VariantType, Allele, Impact
  fastvep-genome/       # Transcript, Exon, Gene, CodonTable, mitochondrial codon table
  fastvep-cache/        # GFF3 parser, FASTA reader, annotation providers, regulatory regions
  fastvep-consequence/  # Consequence prediction: small variants + SV predictor
  fastvep-hgvs/         # HGVS nomenclature generation (c., p., g.)
  fastvep-io/           # VCF parser (incl. SVs), output formatters, multi-sample parsing
  fastvep-filter/       # Filter engine: lexer, parser, evaluator (filter_vep-compatible)
  fastvep-sa/           # Supplementary annotation format (fastSA):
                       #   v1 (.osa): zstd block compression, binary search,
                       #     byte-budgeted LRU cache of decompressed blocks
                       #     (default 32 MiB/reader; override via
                       #     FASTVEP_SA_CACHE_BYTES_PER_READER)
                       #   v2 (.osa2): echtvar-inspired chunked ZIP with Var32 encoding,
                       #     parallel u32 value arrays, delta encoding, LRU chunk cache,
                       #     Bloom filters for negative lookups
                       #   .osi: interval-level annotations (BED-derived), positional overlap
                       #   .oga: gene-level annotations (OMIM, gnomAD constraint, ClinVar
                       #     protein index)
                       # Source parsers: ClinVar, gnomAD (incl. v4.1 joint), dbSNP, COSMIC,
                       # 1000G, TOPMed, MitoMap, PhyloP, GERP, DANN, REVEL, SpliceAI,
                       # PrimateAI, dbNSFP, plus user-supplied custom_vcf / custom_bed.
  fastvep-annotate/     # Shared annotation pipeline (used by CLI batch and web server):
                       #   variant overlap, consequence prediction, HGVS, SA/gene SA
                       #   provider loading
  fastvep-classification/ # ACMG-AMP variant classification engine (Richards 2015 +
                       #   ClinGen SVI). 28 criteria, trio/compound-het support,
                       #   configurable thresholds via TOML
  fastvep-cli/          # CLI binary: annotation pipeline, sa-build, filter, cache,
                       #   legacy web server
  fastvep-web/          # Production web server (axum/tokio): async, multi-connection,
                       #   genome switching, SA integration, rate limiting
web/                   # Web GUI (HTML/CSS/JS, embedded in both server binaries)
tests/                 # Test data: chr1 (OR4F5) and chr17 (BRCA1) VCF + GFF3

Running Tests

cargo test --workspace          # 515 tests
cargo test -p fastvep-consequence  # Consequence prediction tests (incl. SV)
cargo test -p fastvep-filter       # Filter engine tests
cargo test -p fastvep-sa           # Supplementary annotation format tests

Performance Benchmarks

Benchmarked on Apple M-series (ARM64), release build with LTO. Median of 3 runs, full Ensembl annotations with FASTA and HGVS.

Multi-Organism Throughput (Gold-Standard Datasets)

OrganismTranscriptsVariantsSourceTimeThroughput
Yeast (R64, full genome)7,036260,526Ensembl/SGD3.0s85,934 v/s
Drosophila (BDGP6, full)35,4424,438,427DGRP257.3s77,486 v/s
Arabidopsis (TAIR10, full)54,01312,883,8541001 Genomes168.7s76,378 v/s
Mouse (GRCm39, full genome)142,62626,062,054MGP CAST/EiJ338.0s77,113 v/s
Human full WGS (GRCh38)508,5304,048,342GIAB HG00286.3s46,917 v/s

vs. Ensembl VEP v115.1 (head-to-head, GIAB HG002 chr22)

VariantsfastVEPVEPSpeedup
1,0000.40s1.06s2.6x
5,0000.47s13.9s29x
10,0000.67s30.3s45x
50,0001.59s206.1s130x
4,048,342 (full WGS)86.3scannot complete--
Peak memory (100K variants)~500 MB2.8 MB
Binary size~200 MB installed3.3 MB
DependenciesPerl 5.22+, DBI, 10+ CPAN modulesNone

Citation

If you use fastVEP in your research, please cite:

fastVEP: A Fast, Comprehensive Variant Effect Predictor Written in Rust
Kuan-lin Huang
bioRxiv (2026)
doi: https://doi.org/10.64898/2026.04.14.718452
URL: https://www.biorxiv.org/content/10.64898/2026.04.14.718452v1

License

Apache License 2.0

Acknowledgements

fastVEP is inspired by Ensembl VEP by EMBL-EBI and Illumina Nirvana. The consequence prediction logic follows the Sequence Ontology term definitions and the Ensembl variant annotation framework. The supplementary annotation system (fastSA v2) incorporates algorithms and encoding strategies from echtvar.