handbook.md

August 19, 2022 · View on GitHub

Tutorial Handbook

Input data format

Bulk2Space requires five formatted data as input:

  1. Bulk-seq Normalized Data: a .csv file with genes as rows and one sample as column
Sample
Gene15.22
Gene23.67
......
GeneN15.76

  1. Single Cell RNA-seq Normalized Data: a .csv file with genes as rows and cells as columns
Cell1Cell2Cell3...CellN
Gene11.052.311.72...0
Gene24.711.070...4.22
..................
GeneN0.5501.48...0

  1. Single Cell RNA-seq Annotation Data: a .csv file with cell ID and celltype annotation columns.
    • The column containing cell ID should be named Cell
    • the column containing the labels should be named Cell_type
CellCell_type
Cell1Cell1T cell
Cell2Cell2B cell
.........
CellNCellNMonocyte

  1. Spatial Transcriptomics Normalized Data: a .csv file with genes as rows and cells (or spots) as columns
Cell1 / Spot1Cell2 / Spot2...CellN / SpotN
Gene13.224.71...1.01
Gene202.17...2.20
...............
GeneN00.11...1.61

  1. Spatial Transcriptomics Coordinates Data: a .csv with cell/spot ID and coordinates columns.
    • The column containing the coordinates should be named xcoord and ycoord
    • For spot-based data, the column containing spot ID should be named Spot
    • For image-based data, the column containing cell ID should be named Cell
Spot (or Cell) xcoordycoord
Cell_1 / Spot_1Cell_1 / Spot_11.25.2
Cell_2 / Spot_2Cell_1 / Spot_15.44.3
............
Cell_n / Spot_nCell_1 / Spot_111.36.3

Parameter description

  • Decompose bulk transcriptomics data into single-cell transcriptomics data:
from bulk2space import Bulk2Space
model = Bulk2Space()

# Decompose bulk transcriptomics data into single-cell transcriptomics data
generate_sc_meta, generate_sc_data = model.train_vae_and_generate(
    input_bulk_path,
    input_sc_data_path,
    input_sc_meta_path,
    input_st_data_path,
    input_st_meta_path,
    ratio_num=1,
    top_marker_num=500,
    gpu=0,
    batch_size=512,
    learning_rate=1e-4,
    hidden_size=256,
    epoch_num=5000,
    vae_save_dir='save_model',
    vae_save_name='vae',
    generate_save_dir='output',
    generate_save_name='output')
ParameterDescriptionDefault Value
input_bulk_pathPath to bulk-seq data files (.csv)None
input_sc_data_pathPath to scRNA-seq data files (.csv)None
input_sc_meta_pathPath to scRNA-seq annotation files (.csv)None
input_st_data_pathPath to ST data files (.csv)None
input_st_meta_pathPath to ST metadata files (.csv)None
ratio_numThe multiples of the number of cells of generated scRNA-seq data(int) 1
top_marker_numThe number of marker genes of each celltype used(int) 500
gpuThe GPU ID. Use cpu if --gpu < 0(int) 0
batch_sizeThe batch size for β-VAE model training(int) 512
learning_rateThe learning rate for β-VAE model training(float) 0.0001
hidden_sizeThe hidden size of β-VAE model(int) 256
epoch_numThe epoch number for β-VAE model training(int) 5000
vae_save_dirPath to save the trained β-VAE model(str) save_model
vae_save_nameFile name of the trained β-VAE model(str) vae
generate_save_dirPath to save the generated scRNA-seq data(str) output
generate_save_nameFile name of the generated scRNA-seq data(str) output

  • Decompose spatial barcoding-based spatial transcriptomics data (10x Genomics, ST, or Slide-seq, etc) into spatially resolved single-cell transcriptomics data:
from bulk2space import Bulk2Space
model = Bulk2Space()

# Decompose spatial barcoding-based spatial transcriptomics data 
# (10x Genomics, ST, or Slide-seq, etc) into spatially resolved 
# single-cell transcriptomics data
df_meta, df_data = model.train_df_and_spatial_deconvolution(
    generate_sc_meta,
    generate_sc_data,
    input_st_data_path,
    input_st_meta_path,
    spot_num=500,
    cell_num=10,
    df_save_dir='save_model',
    df_save_name='df',
    map_save_dir='output', 
    map_save_name='deconvolution',
    top_marker_num=500,
    marker_used=True,
    k=10)
ParameterDescriptionDefault Value
generate_sc_metaGenerated scRNA-seq metadataNone
generate_sc_dataGenerated scRNA-seq dataNone
input_st_data_pathPath to ST data files (.csv)None
input_st_meta_pathPath to ST metadata files (.csv)None
spot_numThe spot number of pseudo-spot data which used to train the deep forest model(int) 500
cell_numThe cell number per spot of pseudo-spot data which used to train the deep forest model(int) 10
df_save_dirPath to save the trained deep forest model(str) save_model
df_save_nameFile name of the trained deep forest model(str) df
map_save_dirPath to save the deconvoluted ST data(str) output
map_save_nameFile name of the deconvoluted ST data(str) deconvolution
top_marker_numThe number of marker genes of each celltype used(int) 500
marker_usedWhether to only use marker genes of each cell type(bool) True
kThe number of cells per spot set(int) 10

  • Map image-based spatial transcriptomics data (MERFISH, SeqFISH, or STARmap, etc) into spatially resolved single-cell transcriptomics data:
from bulk2space import Bulk2Space
model = Bulk2Space()

# Map image-based spatial transcriptomics data (MERFISH, SeqFISH, or STARmap, etc) 
# into spatially resolved single-cell transcriptomics data
df_meta, df_data = model.spatial_mapping(
    generate_sc_meta,
    generate_sc_data,
    input_st_data_path,
    input_st_meta_path)
ParameterDescription Default Value
generate_sc_metaGenerated scRNA-seq metadataNone
generate_sc_dataGenerated scRNA-seq dataNone
input_st_data_pathPath to ST data files (.csv)None
input_st_meta_pathPath to ST metadata files (.csv)None