graphREML (reml)

June 6, 2026 ยท View on GitHub

Use graphld reml for heritability partitioning and annotation enrichment estimation.

Basic Usage

uv run graphld reml \
    /path/to/sumstats.sumstats \
    output_prefix \
    --annot-dir /path/to/annotations/

You must provide one annotation source:

  • --annot-dir for variant or region annotations
  • --gene-annot-dir for GMT gene sets

Input Types

  • Summary statistics: LDSC-style .sumstats, GWAS-VCF .vcf/.vcf.gz, or kodama-style .parquet; see Summary Statistics.
  • Variant annotations: per-chromosome .annot files, optionally alongside .bed files; see Annotations.
  • Gene annotations: .gmt files converted to variant-level annotations with nearest-gene weighting; see GMT Format.

Output Files

Default output:

  • output_prefix.tall.csv: heritability, enrichment, and coefficient estimates
  • output_prefix.convergence.csv: optimization diagnostics

With --alt-output:

  • output_prefix.heritability.csv
  • output_prefix.enrichment.csv
  • output_prefix.parameters.csv

Use --name to label runs when appending to alternate output files or score-test HDF5 outputs.

Common Options

OptionDefaultDescription
--intercept1.0LD score regression intercept
--nameNoneRun label for outputs and score-test artifacts
--metadatadata/ldgms/metadata.csvLDGM metadata CSV
--score-test-filenameNoneHDF5 file for score-test precomputation
--surrogatesNonePrecomputed surrogate-marker HDF5
--no-saveFalseSkip result-file writing or write logs only

Optimization Options

OptionDefaultDescription
--num-iterations50Maximum optimization iterations
--convergence-tol0.01Convergence tolerance
--convergence-window3Iterations used for convergence checks
--num-jackknife-blocks100Jackknife blocks for standard errors
--xtrace-num-samples100Samples for stochastic gradient estimation
--reset-trust-regionFalseReset trust-region size each iteration
--initial-paramsNoneComma-separated initial coefficient values

Variant Matching And Filtering

OptionDefaultDescription
--match-by-positionFalseMatch variants by genomic position instead of RSID
--maximum-missingness0.1Maximum missing-variant fraction allowed
--max-chisq-thresholdNoneDrop LD blocks above a chi-squared threshold
--annotation-columnsallRestrict to specific annotation columns
--binary-annotations-onlyFalseKeep only 0/1-valued annotations

Gene Set Annotations

Use --gene-annot-dir to supply GMT files:

uv run graphld reml \
    /path/to/sumstats.sumstats \
    output_prefix \
    --gene-annot-dir /path/to/gmt/files/ \
    --gene-table data/genes.tsv

Related options:

OptionDefaultDescription
--gene-tabledata/genes.tsvGene coordinate table
--nearest-weights0.4,0.2,0.1,0.1,0.1,0.05,0.05Weights for nearest-gene mapping

GMT rows are:

gene_set_name<TAB>description<TAB>gene1<TAB>gene2<TAB>...

Surrogate Markers

When GWAS variants are missing from the LDGM reference, graphREML can use surrogate markers in high LD. To avoid recomputing them for repeated analyses:

uv run graphld surrogates /path/to/sumstats.sumstats out.h5 --population EUR
uv run graphld reml /path/to/sumstats.sumstats output_prefix --annot-dir /path/to/annot --surrogates out.h5

Parquet Files

For a multi-trait parquet input, select the trait to analyze when writing the default saved output files:

uv run graphld reml sumstats.parquet output --name height

Default tall-output runs write one file pair per trait, such as output.height.tall.csv and output.height.convergence.csv.