HarmonizationSCANVI

May 27, 2019 ยท View on GitHub

  • Reproducing results in the "Harmonization and Annotation of Single-cell Transcriptomics data with Deep Generative Models" paper
  • Demonstration of how to use scVI and scANVI for the harmonization and annotation problem

Contact

chenlingantelope [at] berkeley [dot] edu

Datasets

AnalysisAssociated ScriptDatasetsTechnologyNumber of CellsRef.
Figure 2: BenchmarkPBMC8KCITE.pyPBMC-8K; PBMC-CITE10x8,381; 7,66710x DatasetsStoeckius, Marlon, et al. 2017
Supplementary Figure 2: UMAP VisualizationPBMC8KCITE.pyPBMC-8K; PBMC-CITE10x8,381; 7,66710x Datasets; Stoeckius, Marlon, et al. 2017
Figure 2: BenchmarkMarrowTM.py Tech1.pretty.ipynbMarrowTM-10x; MarrowTM-ss210x; SmartSeq24,112;5,351Quake, Stephen R., et al. 2018
Supplementary Figure 1: Robustness Analysis for Hyperparameter ChoiceRobustness_study.ipynbMarrowTM-10x; MarrowTM-ss210x; SmartSeq24,112;5,351Quake, Stephen R., et al. 2018
Supplementary Figure 3: UMAP VisualizationMarrowTM.pyMarrowTM-10x; MarrowTM-ss210x; SmartSeq24,112;5,351
Figure 2: BenchmarkPancreas.pyPancreas-InDrop; Pancreas-CEL-Seq2inDrop; CEL-Seq28,569; 2,449Baron, Maayan, et al. 2016; Muraro, Mauro J., et al. 2016
Supplementary Figure 4: UMAP VisualizationPancreas.pyPancreas-InDrop; Pancreas-CEL-Seq2inDrop; CEL-Seq28,569; 2,449Baron, Maayan, et al. 2016; Muraro, Mauro J., et al. 2016
Figure 2: BenchmarkDentateGyrus.pyDentateGyrus-10x; DentateGyrus-C110x; Fluidigm C15,454; 2,303Hochgerner, Hannah, et al. 2018
Supplementary Figure 5: UMAP VisualizationDentateGyrus.pyDentateGyrus-10x; DentateGyrus-C110x; Fluidigm C15,454; 2,303Hochgerner, Hannah, et al. 2018
Figure 3: Robustness Analysis by subsampling cells Supplementary Figure 10NoOverlapSCANVI.py PopRemoveSCANVI.py SCANVI_posterior-NoOverlap.ipynb SCANVI_posterior_poprm.ipynbPBMC-8K; PBMC-CITE10x8,381; 7,66710x Datasets; Stoeckius, Marlon, et al. 2017
Figure 4: Continuous Trajectory Supplementary Supplementary Figure 6: UMAPcontinuous.ipynbHEMATO-Tusi; HEMATO-PaulinDrop; MARS-seq4,016 ; 2,730Tusi, Betsabeh Khoramian, et al. 2018; Paul, Franziska, et al. 2015
Figure 5: External Validation by Experimentally Derived Labels, Supplementary Figure 11harmonization-CitePure-SCANVI.ipynbPBMC-68K; PBMC-Sorted; PBMC-CITE10x68,579; 94,655; 7,667Zheng, Grace XY, et al. 2017; Stoeckius, Marlon, et al. 2017
Figure 6: Semi-Supervised Annotation of T Cell Subtypes, Supplementary Figure 12SCANVI-mild-annot-Clustering.ipynbPBMC-Sorted T cell Subtypes10x42919Zheng, Grace XY, et al. 2017; Stoeckius, Marlon, et al. 2017
Hierarchical Semi-Supervised AnnotationHierarchical.ipynbCORTEX10x160,796Zeisel, Amit, et al. "Molecular architecture of the mouse nervous system." bioRxiv (2018): 294918.
Supplementary Figure 7: Scalability Analysisscanorama.ipynbSCANORAMAMixed105,476Hie, Brian L., Bryan Bryson, and Bonnie Berger. "Panoramic stitching of heterogeneous single-cell transcriptomic data." bioRxiv (2018): 371179.
Supplementary Figure 13: Differential ExpressionDE-final.ipynbPBMC-8K; PBMC-68K10x8,381; 68,57910x Datasets; Zheng, Grace XY, et al. 2017
  • Supplemtary Figure 2,3,4,5,8,9 are generated using scripts in Additional_Scripts/ using output from the analysis python scripts including scanvi_acc.R, KNNcurves.py and BE_curves.py.
  • Boxplots for Figure 3 are generated using poprm_boxplot.R in Additional_Scripts/
  • The Additional_Scripts also contains code for running Seurat directly from commandline runSeurat.R and SeuratPCA.R.
  • All .gmt files in Additional_Scripts/ are gene signatures.

Installation

  • Clone the github repository, install the dependencies and call functions from the modules scVI
  • Install time (< 10 min)

Requirements

  • Pytorch V0.4.1
  • Python 3
  • scikit-learn V0.19.1

Instructions

  • To reproduce results from the paper, look up the relevant datasets, python notebooks (located in notebooks/), or python scripts (located in the root directory).
  • Download the relevant datasets except for the ones already wrapped for the scVI package (PBMC-8K, PBMC-CITE, PBMC-68K, PBMC-Sorted, MarrowTM-10x, MarrowTM-ss2 can be loaded directly with the dataloader functions)
  • Annotation files generated by us when the original study did not provide annotation (cite.seurat.labels) can be found in the scvi-data repository
  • Run the analysis and results should match those of the paper.
  • This repository contains functions written uniquely to produce some of the analysis in this paper. For more up-to-date package refer to main scVI repository