SpotGF
April 8, 2025 · View on GitHub
SpotGF: Denoising Spatially Resolved Transcriptomics Data Using an Optimal Transport-Based Gene Filtering Algorithm https://doi.org/10.1016/j.cels.2024.09.005
Protocol paper: Protocol to denoise spatially resolvedtranscriptomics data utilizing optimaltransport-based gene filtering algorithm https://doi.org/10.1016/j.xpro.2025.103625
Installation
We recommend using conda to manage the installation of all dependencies. To do this, simply run:
conda create --name SpotGF
conda activate SpotGF
# conda config --add channels conda-forge ##If you do not have this channel added before#
conda install python=3.7 pandas pot=0.8.2 numpy scipy matplotlib descartes scanpy=1.9.2 seaborn
Then, download this repo and install it.
git clone [repo_path]
cd [path/to/SpotGF]
pip install .
The total installation time is around 5 mintunes. If error occuors, please upgrade pip and try again.
Usage
Once the input data have been processed into the supported format, the full SpotGF workflow can be run by calling the SpotGF.py script. The input files can include various formats such as gem, txt, csv, gem.gzand others, containing the raw SRT data information. These files must contain specific columns including geneID, x, y, MIDCount.
The denoising resolution can be adjusted using the binsize parameter. A smaller binsize will result in a finer denoising effect but will also increase the processing time.Please note that the binsize parameter is only used when calculating SpotGF scores and it does not affect the final denoised data.
If the input file contains columns named ‘cen_x’ and ‘cen_y’, the denoising process can be performed based on the cell bin data, achieving denoising at the single-cell level by setting ‘binsize = 1’ or ‘-b 1’.
The proportion parameter determines the proportion of genes to be retained in the final denoised data. For example, setting proportion=0.9 will retain only 90% of the effective genes, resulting in a new denoised SRT dataset.
#python [path/to/SpotGF.py] -i [path/to/input.gem] -b [binsize] -p [proportion] -s [spotsize]
cd /path/test
python ../SpotGF.py -i ./demo.gem -o ./result -b 70 -p 0.5 -s 5
If you use Spot in Jupyter environment, you can choose blow Usage.
import SpotGF
gem_path = './demo.gem' #please change this file path base on your work directory
binsize =70
lower= 0
upper = 100000
proportion = 0.1
max_iterations = 10000
auto_threshold = True
visualize=True
spot_size=5
alpha=0
#initial class
spotgf =SpotGF.SpotGF(gem_path,binsize,proportion,auto_threshold,lower,upper,max_iterations,output,visualize,spot_size,alpha)
#acquire SpotGF socres distribution figure to help choose proportion
GF_df = spotgf.calculate_GFscore(gem_path,binsize,alpha)
#denoised SRT data
new_gem = spotgf.generate_GFgem(gem_path,GF_df,proportion,auto_threshold,visualize,spot_size)
Data preparation
SpotGF workflow requires one mandatory files, a input.gem (with "\t" or "," as the delimiter) for gene expression data,
| geneID | x | y | MIDCount |
|---|---|---|---|
| #gene1 | 23 | 36 | 1 |
| #gene1 | 24 | 35 | 1 |
| #gene1 | 23 | 30 | 1 |
| #gene2 | 20 | 31 | 1 |
| #gene2 | 21 | 22 | 1 |
Output files
"SpotGF_scores.txt": This file contains the SpotGF scores for each gene. The SpotGF score indicates the degree of clustering or diffusion of a gene. A smaller SpotGF score suggests that the gene is more diffuse, while a larger SpotGF score indicates that the gene is more clustered.
"SpotGF_proportion_0.5.gem": A gene expression matrix file with noise reduction that retains only 50% of the genes.
"SpotGF_auto_threshold.gem": Gene expression matrix file after denoising based on an automatic filtering threshold method.
"alpha_shape.png": It shows the tissue contour used by SpotGF during the denoising process, with special attention to the fact that a highly detailed outline is not required; only a rough outline is needed.
"Spatial_automatic.png": Spatial expression levels of data after denoising based on an automatic filtering threshold method.
"Spatial_proportion.png": Spatial expression levels of data after denoising based on a manual threshold method.
"Violinplot_n_genes_by_counts.png": Violin plot distribution of n_genes_by_counts across several datasets.
"Violinplot_total_counts.png": Violin plot distribution of total_counts across several datasets.
List of Parameters
positional arguments:
-i Input SRT data files path.
-o Outpath for saving results.
-b Denoising resolution binsize, must int type, default=10.
-lower Lower limit for tissue structures capturing optimization, default=0.
-upper Upper limit for tissue structures capturing optimization', default=sys.float_info.max.
-max_iterations maximum number of iterations when capturing tissue structures', default=10000.
-p Proportion of gene numbers, must float type [0,1], default=0.5.
-auto_threshold if True, return a denoised gem file using automatic threshold, default=True.
-v Visualize SpotGF-denoised data, default=True.
-s Spot size used for spatial expression figure, default=5.
-a Alpha for tissue boundary detection, default use auto optimizealpha, default=0.
Contact Us
If you have any suggestions/ideas for SpotGF or are having issues trying to use it, please don't hesitate to reach out to us.
Lin Du, dulin@genomics.cn, 3051095449@qq.com
Cite
Du, L., Kang, J., Hou, Y., Sun, H. X., & Zhang, B. (2024). SpotGF: Denoising spatially resolved transcriptomics data using an optimal transport-based gene filtering algorithm. Cell systems, 15(10), 969–981.e6. https://doi.org/10.1016/j.cels.2024.09.005
Du, L., Kang, J., Li, J., Qin, H., Hou, Y., & Sun, H. X. (2025). Protocol to denoise spatially resolved transcriptomics data utilizing optimal transport-based gene filtering algorithm. STAR protocols, 6(1), 103625. https://doi.org/10.1016/j.xpro.2025.103625