Installation

June 8, 2026 · View on GitHub

Enrichment Score Test Only

If you only need the enrichment score test, no installation or dependencies are required. With uv installed, simply run:

uv run src/score_test/score_test.py --help

Or make the script executable:

chmod +x src/score_test/score_test.py
./src/score_test/score_test.py --help

Full Installation

The full graphld package (for graphREML, simulation, clumping, BLUP) requires SuiteSparse for sparse matrix operations.

Prerequisites

SuiteSparse

On Mac:

brew install suitesparse

GraphLD supports Python 3.11-3.13. On macOS, use a system Python from Homebrew or another local Python install in that range. The project config tells uv to prefer system Python so source builds such as scikit-sparse use the current Command Line Tools SDK instead of a stale SDK path embedded in an older uv-managed Python.

On Ubuntu/Debian:

sudo apt-get install libsuitesparse-dev

Intel MKL (Recommended)

For users with Intel chips, Intel MKL can produce a 100x speedup with SuiteSparse vs. OpenBLAS (your likely default BLAS library). See Giulio Genovese's documentation.

Install uv if needed. In the repo directory:

uv sync

For development installation:

uv sync --extra dev  # editable with pytest dependencies
uv run pytest  # tests will fail if you haven't run `make download`

Using conda and pip

Conda has the advantage that you can conda install SuiteSparse directly.

Create conda environment:

module load miniconda3/4.10.3  # if on a cluster
conda create -n suitesparse conda-forge::suitesparse python=3.11.0
conda activate suitesparse

You may need to revert or reinstall some Python packages:

pip install numpy==1.26.4

Install scikit-sparse:

conda config --add channels conda-forge
conda config --set channel_priority strict
conda install 'scikit-sparse<0.5'

Install graphld:

cd graphld && pip install .

Test installation:

graphld -h

Downloading Data

Pre-computed LDGMs and data files are available from Zenodo. Download using the provided Makefile:

cd data && make download_all

The full download takes 30-60 minutes depending on connection speed.

Use CaseCommandSize
Score test for gene sets onlymake download_gene_scores~10 MB
Score test for variant annotationsmake download_scores~6.5 GB
graphREML on European-ancestry datamake download_reml~2 GB
All populations / all featuresmake download_all~25 GB

To try out graphREML or score test with example summary statistics, additionally run make download_sumstats (~7 GB).

All Download Options

CommandDescriptionSize
make download_allAll data files~25 GB
make download_remlUKBB precision + annotations + surrogates~2 GB
make download_ukbb_precisionUK Biobank LDGM precision matrices~1.5 GB
make download_precisionAll LDGM precision matrices (all populations)~10 GB
make download_annotationsBaselineLD annotation files~400 MB
make download_scoresScore statistics (variant + gene level)~6.5 GB
make download_gene_scoresGene-level score statistics only~10 MB
make download_surrogatesSurrogate markers + gene table~60 MB
make download_sumstatsGWAS summary statistics (Li et al. 2025)~7 GB

Data Sources

  • Precision matrices: Zenodo 8157131 - LDGM precision matrices for 1000 Genomes populations
  • Annotations & sumstats: Zenodo 15085817 - BaselineLD annotations and UK Biobank summary statistics
  • Score statistics: Zenodo 20597740 - Pre-computed score statistics for enrichment testing

Directory Structure

After downloading, the data/ directory will contain:

data/
├── ldgms/              # LDGM precision matrices
├── baselineld/         # BaselineLD annotation files
├── scores/             # Score statistics (.h5 files)
├── surrogates/         # Surrogate marker files
├── genes.tsv           # Gene table (GRCh38)
└── rsid_position.csv   # SNP position mapping