handbook.md

August 19, 2022 · View on GitHub

Tutorial Handbook

Input data format

Bulk2Space requires five formatted data as input:

Bulk-seq Normalized Data: a .csv file with genes as rows and one sample as column

	Sample
Gene1	5.22
Gene2	3.67
...	...
GeneN	15.76

Single Cell RNA-seq Normalized Data: a .csv file with genes as rows and cells as columns

	Cell1	Cell2	Cell3	...	CellN
Gene1	1.05	2.31	1.72	...	0
Gene2	4.71	1.07	0	...	4.22
...	...	...	...	...	...
GeneN	0.55	0	1.48	...	0

Single Cell RNA-seq Annotation Data: a .csv file with cell ID and celltype annotation columns.
- The column containing cell ID should be named Cell
- the column containing the labels should be named Cell_type

	Cell	Cell_type
Cell1	Cell1	T cell
Cell2	Cell2	B cell
...	...	...
CellN	CellN	Monocyte

Spatial Transcriptomics Normalized Data: a .csv file with genes as rows and cells (or spots) as columns

	Cell1 / Spot1	Cell2 / Spot2	...	CellN / SpotN
Gene1	3.22	4.71	...	1.01
Gene2	0	2.17	...	2.20
...	...	...	...	...
GeneN	0	0.11	...	1.61

Spatial Transcriptomics Coordinates Data: a .csv with cell/spot ID and coordinates columns.
- The column containing the coordinates should be named xcoord and ycoord
- For spot-based data, the column containing spot ID should be named Spot
- For image-based data, the column containing cell ID should be named Cell

	Spot (or Cell)	xcoord	ycoord
Cell_1 / Spot_1	Cell_1 / Spot_1	1.2	5.2
Cell_2 / Spot_2	Cell_1 / Spot_1	5.4	4.3
...	...	...	...
Cell_n / Spot_n	Cell_1 / Spot_1	11.3	6.3

Parameter description

Decompose bulk transcriptomics data into single-cell transcriptomics data:

from bulk2space import Bulk2Space
model = Bulk2Space()

# Decompose bulk transcriptomics data into single-cell transcriptomics data
generate_sc_meta, generate_sc_data = model.train_vae_and_generate(
    input_bulk_path,
    input_sc_data_path,
    input_sc_meta_path,
    input_st_data_path,
    input_st_meta_path,
    ratio_num=1,
    top_marker_num=500,
    gpu=0,
    batch_size=512,
    learning_rate=1e-4,
    hidden_size=256,
    epoch_num=5000,
    vae_save_dir='save_model',
    vae_save_name='vae',
    generate_save_dir='output',
    generate_save_name='output')

Parameter	Description	Default Value
input_bulk_path	Path to bulk-seq data files (.csv)	None
input_sc_data_path	Path to scRNA-seq data files (.csv)	None
input_sc_meta_path	Path to scRNA-seq annotation files (.csv)	None
input_st_data_path	Path to ST data files (.csv)	None
input_st_meta_path	Path to ST metadata files (.csv)	None
ratio_num	The multiples of the number of cells of generated scRNA-seq data	(int) `1`
top_marker_num	The number of marker genes of each celltype used	(int) `500`
gpu	The GPU ID. Use cpu if `--gpu < 0`	(int) `0`
batch_size	The batch size for β-VAE model training	(int) `512`
learning_rate	The learning rate for β-VAE model training	(float) `0.0001`
hidden_size	The hidden size of β-VAE model	(int) `256`
epoch_num	The epoch number for β-VAE model training	(int) `5000`
vae_save_dir	Path to save the trained β-VAE model	(str) `save_model`
vae_save_name	File name of the trained β-VAE model	(str) `vae`
generate_save_dir	Path to save the generated scRNA-seq data	(str) `output`
generate_save_name	File name of the generated scRNA-seq data	(str) `output`

Decompose spatial barcoding-based spatial transcriptomics data (10x Genomics, ST, or Slide-seq, etc) into spatially resolved single-cell transcriptomics data:

from bulk2space import Bulk2Space
model = Bulk2Space()

# Decompose spatial barcoding-based spatial transcriptomics data 
# (10x Genomics, ST, or Slide-seq, etc) into spatially resolved 
# single-cell transcriptomics data
df_meta, df_data = model.train_df_and_spatial_deconvolution(
    generate_sc_meta,
    generate_sc_data,
    input_st_data_path,
    input_st_meta_path,
    spot_num=500,
    cell_num=10,
    df_save_dir='save_model',
    df_save_name='df',
    map_save_dir='output', 
    map_save_name='deconvolution',
    top_marker_num=500,
    marker_used=True,
    k=10)

Parameter	Description	Default Value
generate_sc_meta	Generated scRNA-seq metadata	None
generate_sc_data	Generated scRNA-seq data	None
input_st_data_path	Path to ST data files (.csv)	None
input_st_meta_path	Path to ST metadata files (.csv)	None
spot_num	The spot number of pseudo-spot data which used to train the deep forest model	(int) `500`
cell_num	The cell number per spot of pseudo-spot data which used to train the deep forest model	(int) `10`
df_save_dir	Path to save the trained deep forest model	(str) `save_model`
df_save_name	File name of the trained deep forest model	(str) `df`
map_save_dir	Path to save the deconvoluted ST data	(str) `output`
map_save_name	File name of the deconvoluted ST data	(str) `deconvolution`
top_marker_num	The number of marker genes of each celltype used	(int) `500`
marker_used	Whether to only use marker genes of each cell type	(bool) `True`
k	The number of cells per spot set	(int) `10`

Map image-based spatial transcriptomics data (MERFISH, SeqFISH, or STARmap, etc) into spatially resolved single-cell transcriptomics data:

from bulk2space import Bulk2Space
model = Bulk2Space()

# Map image-based spatial transcriptomics data (MERFISH, SeqFISH, or STARmap, etc) 
# into spatially resolved single-cell transcriptomics data
df_meta, df_data = model.spatial_mapping(
    generate_sc_meta,
    generate_sc_data,
    input_st_data_path,
    input_st_meta_path)

Parameter	Description	Default Value
generate_sc_meta	Generated scRNA-seq metadata	None
generate_sc_data	Generated scRNA-seq data	None
input_st_data_path	Path to ST data files (.csv)	None
input_st_meta_path	Path to ST metadata files (.csv)	None