README.md

December 11, 2018 · View on GitHub

List of gene lists

Often in bioinformatics we want a list of genes so that we can ask, "are genes in this list more X than other genes?" or "are genes in this list enriched in this other list?" and so on. There are many useful lists out there, but many of them are in an Excel file supplement to a paper, or an XML format with loads of other info you don't need, or use outdated gene symbols. For one reason or another, it often takes a lot of work to wrestle them into a format you can use. This repository is the MacArthur Lab's effort to collect all the lists we find useful into one place, with each formatted as just a single-column text file listing the current gene symbols.

Here is a guide to the lists we currently have in this repo:

ListCountDescriptionPlease cite
Universe19,194Approved symbols for 18,991 protein-coding genes according to HGNC as of Feb 9, 2015. For details see src/create_universe.bash. This list is the "universe" of which all subsequent lists are subsets.See genenames.org/about/overview. Users are asked to web reference "HUGO Gene Nomenclature Committee at the European Bioinformatics Institute" (http://www.genenames.org/) if possible.
FDA-approved drug targets385Genes whose protein products are known to be the mechanistic targets of FDA-approved drugs (updated 2018-09-13). For details on the exact criteria we used for inclusion in this list, see src/drug_targets.pySee drugbank.ca/about. Please cite [Law 2014, Knox 2011, Wishart 2008, Wishart 2006, and/or Wishart 2018].
Drug targets by Nelson et al 2012201Drug targets according to Nelson et al 2012, with reference to Russ & Lampel 2005.[Nelson 2012, Russ & Lampel 2005]
Autosomal dominant genes by Blekhman et al 2008307OMIM disease genes deemed to follow autosomal dominant inheritance according to extensive manual curation by Molly Przeworski's group.[Blekhman 2008]
Autosomal dominant genes by Berg et al 2013631OMIM disease genes (as of June 2011) deemed to follow autosomal dominant inheritance according Berg et al, 2013.[Berg 2013]
Autosomal recessive genes by Blekhman et al 2008527OMIM disease genes deemed to follow autosomal recessive inheritance according to extensive manual curation by Molly Przeworski's group.[Blekhman 2008]
Autosomal recessive genes by Berg et al 20131073OMIM disease genes (as of June 2011) deemed to follow autosomal recessive inheritance according Berg et al, 2013.[Berg 2013]
X-linked genes by Blekhman et al 200866OMIM disease genes deemed to follow X-linked inheritance (dominant/recessive not specified) according to extensive manual curation by Molly Przeworski's group.[Blekhman 2008]
X-linked recessive genes by Berg et al 2013102OMIM disease genes (as of June 2011) deemed to follow X-linked recessive inheritance according Berg et al, 2013.[Berg 2013]
X-linked dominant genes by Berg et al 201334OMIM disease genes (as of June 2011) deemed to follow X-linked dominant inheritance according Berg et al, 2013.[Berg 2013]
X-linked ClinVar genes61X chromosome genes in the August 6, 2015 ClinVar release that have at least 3 reportedly pathogenic, non-conflicted variants in ClinVar with at least one submitter other than OMIM or GeneReviews. Code here.Cite the ClinVar paper [Landrum 2014]
All dominant genes709Currently the union of the Berg and Blekhman dominant lists, may add more lists later.[Blekhman 2008, Berg 2013]
All recessive genes1183Currently the union of the Berg and Blekhman recessive lists, may add more lists later.[Blekhman 2008, Berg 2013]
Homozygous LoF tolerant330Genes with at least two different high-confidence LoF variants found in a homozygous state in at least one individual in ExAC. By Konrad Karczewski.Just cite the ExAC paper [Lek 2016]
Essential in culture283Genes deemed essential in multiple cultured cell lines based on shRNA screen data[Hart 2014]
Essential in culture (CRISPR screening)683Genes deemed essential in multiple cultured cell lines based on CRISPR/Cas screen data[Hart 2017]
Non-essential in culture (CRISPR screening)913Genes deemed non-essential in multiple cultured cell lines based on CRISPR/Cas screen data[Hart 2017]
Essential in mice2,454Genes where homozygous knockout in mice results in pre-, peri- or post-natal lethality. The mouse phenotypes were reported by Jackson Labs [Blake 2011], then essential gene list was extracted via manual review of phenotypes by [Georgi 2013], and the essential/non-essential flag was put into dbNSFP [Liu 2013]. We extracted the genes from dbNSFP.[Blake 2011, Georgi 2013, and Liu 2013]
Genes nearest to GWAS peaks6,336Closest gene to GWAS hits with P < 5-e8 in the NHGRI GWAS catalog (MAPPED_GENE column) as of Sep 13, 2018[MacArthur 2017]
DNA Repair Genes, WoodRD178An updated inventory of human DNA repair genes. (Last modified on Tuesday 15th April 2014). For details see src/DRG_WoodRD.RCite [Wood 2005] and include a web reference to this URL.
DNA Repair Genes, KangJ151Supplementary Table 1. 151 DNA repair genes. DNA repair genes from DNA repair pathways: ATM, BER, FA/HR, MMR, NHEJ, NER, TLS, XLR, RECQ, and other.Cite [Kang 2012]
ClinGen haploinsufficient genes294Genes with sufficient evidence for dosage pathogenicity (level 3) as determined by the ClinGen Dosage Sensitivity Map as of Sep 13, 2018Cite [Rehm 2015]. See also ClinGen's TOU
Olfactory receptors371Olfactory receptors from the Mainland 2015's data releaseMainland 2015
Genes with any disease association reported in ClinVar3078Using this simple script, downloaded the ClinVar tab-delimited summary as of May 12, 2015, and took all gene symbols for which there is at least one variant with an assertion of pathogenic or likely pathogenic in ClinVar.Cite the ClinVar paper [Landrum 2014]
Kinases347From UniProt's pkinfam list[UniProt Consortium 2018], and also according to UniProt this list is based on 3 publications: [Hunter 2000, Manning 2002, Miranda-Saavedra & Barton 2007]
GPCRs from guidetopharmacology391GPCR list from guidetopharmacology.orgCiting instructions here — for GPCRs, cite [Alexander 2017 & Harding 2018].
GPCRs from Uniprot756This query of the Uniprot database[UniProt Consortium 2018]
GPCRs all759Union of the above two listsSee previous two entries
Natural product targets37List of hand-curated targets of natural products from supplement of [Dancik 2010][Dancik 2010]
BROCA - Cancer Risk Panel66BROCA is useful for the evaluation of patients with a suspected hereditary cancer predisposition, with a focus on syndromes that include breast or ovarian cancer as one of the cancer types. Depending on the causative gene involved, these cancers may co-occur with other cancer types (such as colorectal, endometrial, pancreatic, endocrine, or melanoma).University of Washington
ACMG V2.059The minimum list of genes to be reported as incidental or secondary findings as published by the American College of Medical Genetics and Genomics (ACMG)[Kalia 2017]
GPI-anchored proteins135Gene symbols encoding proteins annotated by UniProt as being GPI-anchored.Cite the latest UniProt paper: [UniProt Consortium 2017]

We welcome pull requests for adding additional lists, provided they are licensed for redistribution. If possible, please provide the source code used to extract the list from its original source, and an appropriate description for this readme.