data.md
October 10, 2024 ยท View on GitHub
Data produced in this manuscript was deposited on Zenodo at: https://doi.org/10.5281/zenodo.13904205
BGZipped GAF files (.gaf.gz) and their index (.gaf.gz.tbi) are available.
They relate to three applications presented in the manuscript.
The same HPRC draft pangenome v1 (GRCh38-based Minigraph-Cactus) was used for all analysis.
It's available in GBZ at https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-grch38/hprc-v1.1-mc-grch38.gbz
HPRC annotations projected to the pangenome
The annotations for each haplotype used for the HPRC v1 pangenome were projected in graph space. More information about the assemblies and annotations at https://github.com/human-pangenomics/HPP_Year1_Assemblies
- CAT gene annotations
gene_CDS.gaf.gzCDS regions only, for all haplotypesgene_exon.gaf.gzexon regions only, for all haplotypes
- RepeatMasker annotation:
rm.gaf.gz - Tandem repeats from trf:
trf.gaf.gz
ATAC-seq coverage tracks for 7 ENCODE tissues
For each tissue, there is an indexed GAF file (<tissue>.cov.gaf.gz, <tissue>.cov.gaf.gz.tbi), a short table summary of the paths (<tissue>.cov.sum.tsv), and the number of paths in the GAF (<tissue>.gaf.wc.txt).
The seven tissues are:
breast_epitheliumgastrocnemius_medialisgastroesophageal_sphincterPeyersPatchsigmoid_colonspleenthyroid_gland
Variants
Three types of genomic variants were projected to the pangenome:
eQTLs.gaf.gzeQTLs from GTEx v8gwasCatalog.hprc-v1.1-mc-grch38.sorted.gaf.gzthe GWAS CatalogHG002.gt.min30bp.sorted.gaf.gzstructural variant calls fromvg call(projected from theHG002.gt.min30bp.vcf.gzVCF).