FlashDeconv

June 30, 2026 · View on GitHub

Spatial deconvolution with linear scalability for atlas-scale data.

FlashDeconv estimates cell type proportions from spatial transcriptomics data (Visium, Visium HD, Stereo-seq). It is designed for large-scale analyses where computational efficiency is essential, while maintaining attention to low-abundance cell populations through leverage-score-based feature weighting.

Paper: Yang, C., Zhang, X. & Chen, J. FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution via structure-preserving sketching. bioRxiv (2025). DOI: 10.64898/2025.12.22.696108

Installation

pip install flashdeconv

For development or additional I/O support, see Installation Options.

Quick Start

import scanpy as sc
import flashdeconv as fd

# Load data
adata_st = sc.read_h5ad("spatial.h5ad")
adata_ref = sc.read_h5ad("reference.h5ad")

# Deconvolve
fd.tl.deconvolve(adata_st, adata_ref, cell_type_key="cell_type")

# Results stored in adata_st.obsm["flashdeconv"]
sc.pl.spatial(adata_st, color="flashdeconv_dominant")

FlashDeconv is also available as a tool in ChatSpatial, an MCP server for spatial transcriptomics — run deconvolution through natural language from any compatible client.

Overview

Spatial deconvolution methods offer different trade-offs. Probabilistic approaches like Cell2Location and RCTD provide rigorous uncertainty quantification; methods like CARD incorporate spatial structure through dense kernel matrices. FlashDeconv takes a complementary approach, prioritizing computational efficiency for million-scale datasets.

Design Principles

Linear complexity — O(N) time and memory through randomized sketching and sparse graph regularization.
Leverage-based feature weighting — Variance-based selection (PCA, HVG) can underweight markers of low-abundance populations. We use leverage scores from the reference SVD to identify genes that define distinct transcriptomic directions, regardless of expression magnitude.
Sparse spatial regularization — Graph Laplacian smoothing with O(N) complexity, avoiding the O(N²) cost of dense kernel methods.

Performance

Scalability

Spots	Time	Memory
10,000	< 1 sec	< 1 GB
100,000	~4 sec	~2 GB
1,000,000	~3 min	~21 GB

Benchmarked on MacBook Pro M2 Max (32GB unified memory), CPU-only.

Accuracy

On the Spotless benchmark:

Metric	FlashDeconv	RCTD	Cell2Location
Pearson (56 datasets)	0.944	0.905	0.895

Performance varies by tissue type and experimental conditions. We recommend evaluating on data similar to your use case.

Algorithm

FlashDeconv solves a graph-regularized non-negative least squares problem:

minimize  ½‖Y - βX‖²_F + ½λ·Tr(βᵀLβ) + ρ‖β‖₁,  subject to β ≥ 0

where Y is spatial expression, X is reference signatures, L is the graph Laplacian, and β represents cell type abundances.

FlashDeconv Framework

Pipeline:

Select informative genes (HVG ∪ markers) and compute leverage scores
Compress gene space via CountSketch with uniform hashing + leverage-weighted amplitudes (G → 512 dimensions)
Construct sparse k-NN spatial graph
Solve via block coordinate descent with spatial smoothing

API

Scanpy-style

fd.tl.deconvolve(
    adata_st,                    # Spatial AnnData
    adata_ref,                   # Reference AnnData
    cell_type_key="cell_type",   # Column in adata_ref.obs
    key_added="flashdeconv",     # Key for results
)

NumPy

from flashdeconv import FlashDeconv

model = FlashDeconv(
    sketch_dim=512,
    lambda_spatial="auto",
    n_hvg=2000,
    k_neighbors=6,
    random_state=0,
)
proportions = model.fit_transform(Y, X, coords)

Parameters

Parameter	Default	Description
`sketch_dim`	512	Sketch dimension
`lambda_spatial`	"auto"	Spatial regularization (auto-tuned)
`rho_sparsity`	0.01	L1 sparsity penalty (dimensionless fraction)
`n_hvg`	2000	Highly variable genes
`n_markers_per_type`	50	Marker genes per cell type
`spatial_method`	"knn"	Graph method: "knn", "radius", or "grid"
`k_neighbors`	6	Spatial graph neighbors (for "knn")
`radius`	None	Neighbor radius (required for "radius")
`preprocess`	"log_cpm"	Normalization: "log_cpm", "pearson", or "raw"
`random_state`	0	Random seed for reproducibility

Output

Attribute	Description
`proportions_$	\text{Cell} \text{type} \text{proportions} (\text{N} \times \text{K}), \text{sum} \text{to} 1
$beta_`	Raw abundances (N × K)
`info_`	Convergence statistics

API Reference

`flashdeconv.FlashDeconv`

Main class for spatial deconvolution.

from flashdeconv import FlashDeconv

model = FlashDeconv(sketch_dim=512, lambda_spatial="auto", ...)

Constructor parameters

Parameter	Type	Default	Description
`sketch_dim`	`int`	512	Dimension of the randomized sketch space.
`lambda_spatial`	`float` or `"auto"`	`"auto"`	Spatial regularization strength. `"auto"` tunes based on data scale.
`rho_sparsity`	`float`	0.01	L1 sparsity penalty (dimensionless fraction, internally scaled).
`n_hvg`	`int`	2000	Number of highly variable genes to select.
`n_markers_per_type`	`int`	50	Number of marker genes per cell type.
`spatial_method`	`str`	`"knn"`	Graph construction: `"knn"`, `"radius"`, or `"grid"`.
`k_neighbors`	`int`	6	Number of neighbors for KNN graph.
`radius`	`float` or `None`	`None`	Radius for radius-based graph (required when `spatial_method="radius"`).
`max_iter`	`int`	100	Maximum BCD solver iterations.
`tol`	`float`	1e-4	Convergence tolerance (relative change in beta).
`preprocess`	`str`	`"log_cpm"`	Preprocessing: `"log_cpm"`, `"pearson"`, or `"raw"`.
`random_state`	`int` or `None`	0	Random seed for reproducibility.
`verbose`	`bool`	`False`	Whether to print progress.

Methods

fit(Y, X, coords, cell_type_names=None)

Fit the deconvolution model.

Parameter	Type	Description
`Y`	ndarray or sparse (N, G)	Spatial transcriptomics count matrix.
`X`	ndarray (K, G)	Reference cell type signature matrix.
`coords`	ndarray (N, 2) or (N, 3)	Spatial coordinates.
`cell_type_names`	ndarray (K,), optional	Cell type names.

Returns self.

fit_transform(Y, X, coords, **kwargs)

Fit and return cell type proportions. Same parameters as fit(). Returns ndarray of shape (N, K).

get_cell_type_proportions() — Return normalized proportions (N, K).

get_abundances() — Return raw (unnormalized) abundances (N, K).

get_dominant_cell_type() — Return index of dominant cell type per spot (N,).

summary() — Return dict with model parameters and fit statistics.

compute_uncertainty(alpha=0.05)

Analytical uncertainty via Hessian-diagonal Laplace approximation. Returns dict with keys: entropy, residual_ss, residual_norm, var_prop, ci_lower, ci_upper, ci_half_width, cv, detection_confident, mean_ci_width.

bootstrap_uncertainty(n_bootstrap=100, max_iter_boot=20, seed=42, verbose=False)

Poisson parametric bootstrap for empirical confidence intervals. Returns dict with keys: boot_mean, boot_std, boot_ci_lower, boot_ci_upper, boot_cv, n_bootstrap.

Attributes (after fitting)

Attribute	Type	Description
`proportions_`	ndarray (N, K)	Cell type proportions (sum to 1 per spot).
`beta_`	ndarray (N, K)	Raw (unnormalized) cell type abundances.
`gene_idx_`	ndarray	Indices of genes used for deconvolution.
`lambda_used_`	float	Actual lambda value used (relevant when `lambda_spatial="auto"`).
`info_`	dict	Optimization info: `converged`, `n_iterations`, `final_objective`.

`flashdeconv.tl.deconvolve`

Scanpy-style entry point. Runs deconvolution and stores results in adata_st.

fd.tl.deconvolve(
    adata_st, adata_ref,
    cell_type_key="cell_type",
    *,
    sketch_dim=512, lambda_spatial="auto", rho_sparsity=0.01,
    n_hvg=2000, n_markers_per_type=50,
    spatial_method="knn", k_neighbors=6, radius=None,
    preprocess="log_cpm",
    layer_st=None, layer_ref=None,
    spatial_key="spatial", key_added="flashdeconv",
    random_state=0, copy=False,
)

Parameter	Type	Default	Description
`adata_st`	AnnData	—	Spatial transcriptomics data with coordinates in `.obsm[spatial_key]`.
`adata_ref`	AnnData	—	Single-cell reference with cell type labels in `.obs[cell_type_key]`.
`cell_type_key`	`str`	`"cell_type"`	Column in `adata_ref.obs` for cell type annotations.
`layer_st`	`str` or `None`	`None`	Layer in `adata_st` to use. Uses `.X` if `None`.
`layer_ref`	`str` or `None`	`None`	Layer in `adata_ref` to use. Uses `.X` if `None`.
`spatial_key`	`str`	`"spatial"`	Key in `adata_st.obsm` for spatial coordinates.
`key_added`	`str`	`"flashdeconv"`	Key for storing results.
`copy`	`bool`	`False`	If `True`, return a copy instead of modifying in-place.

All other parameters (sketch_dim, lambda_spatial, etc.) are forwarded to FlashDeconv — see constructor parameters.

Stores in adata_st:

.obsm[key_added] — DataFrame of cell type proportions (N x K)
.obs[f"{key_added}_dominant"] — Dominant cell type per spot (Categorical)
.uns[f"{key_added}_params"] — Parameters used for deconvolution

`flashdeconv.io`

I/O utilities for loading data from AnnData objects.

load_spatial_data(adata, layer=None, coord_key="spatial")

Extract count matrix, coordinates, and gene names from a spatial AnnData object. Looks for coordinates in adata.obsm[coord_key], then adata.obsm["X_spatial"], then adata.obs[["x", "y"]].

Parameter	Type	Default	Description
`adata`	AnnData	—	Spatial transcriptomics AnnData.
`layer`	`str` or `None`	`None`	Layer to use for counts. Uses `.X` if `None`.
`coord_key`	`str`	`"spatial"`	Key in `adata.obsm` for coordinates.

Returns (Y, coords, gene_names).

load_reference(adata_ref, cell_type_key="cell_type", layer=None, method="mean")

Aggregate single-cell reference into cell type signatures.

Parameter	Type	Default	Description
`adata_ref`	AnnData	—	Single-cell reference AnnData.
`cell_type_key`	`str`	`"cell_type"`	Column in `adata_ref.obs` for cell type labels.
`layer`	`str` or `None`	`None`	Layer to use. Uses `.X` if `None`.
`method`	`str`	`"mean"`	Aggregation method: `"mean"` or `"sum"`.

Returns (X, cell_type_names, gene_names).

align_genes(Y, X, genes_spatial, genes_ref)

Intersect and align genes between spatial and reference data. Returns (Y_aligned, X_aligned, common_genes).

prepare_data(adata_st, adata_ref, cell_type_key="cell_type", spatial_coord_key="spatial", layer_st=None, layer_ref=None)

Convenience wrapper combining load_spatial_data, load_reference, and align_genes. Returns (Y, X, coords, cell_type_names, gene_names).

result_to_anndata(beta, adata, cell_type_names=None, key_added="flashdeconv")

Store deconvolution results in AnnData. Adds .obsm[key_added] (DataFrame) and .obs[f"{key_added}_dominant"] (Categorical).

`flashdeconv.utils`

Graph construction and evaluation metrics.

Graph construction

build_knn_graph(coords, k=6, include_self=False)

Build k-nearest neighbor spatial graph from coordinates.

Parameter	Type	Default	Description
`coords`	ndarray (N, 2) or (N, 3)	—	Spatial coordinates.
`k`	`int`	6	Number of nearest neighbors.
`include_self`	`bool`	`False`	Whether to include self-loops.

Returns scipy.sparse.csr_matrix (N, N) binary adjacency matrix.

build_radius_graph(coords, radius, include_self=False)

Build radius-based neighbor graph. Parameters same as build_knn_graph except radius: float replaces k.

coords_to_adjacency(coords, method="knn", k=6, radius=None)

Convert coordinates to adjacency matrix. Dispatches to build_knn_graph, build_radius_graph, or grid-based construction depending on method.

Evaluation metrics

All evaluation functions take pred and true as ndarray of shape (N, K).

compute_rmse(pred, true, per_cell_type=False) — Root mean squared error. Returns float or ndarray (K,) if per_cell_type=True.

compute_mae(pred, true, per_cell_type=False) — Mean absolute error. Returns float or ndarray (K,).

compute_correlation(pred, true, method="pearson", per_cell_type=False) — Pearson or Spearman correlation. Returns float or ndarray (K,).

compute_jsd(pred, true, epsilon=1e-10) — Jensen-Shannon divergence per spot. Returns ndarray (N,).

evaluate_deconvolution(pred, true, cell_type_names=None) — Comprehensive evaluation returning a dict with overall metrics (RMSE, MAE, Pearson, Spearman, mean JSD) and per_cell_type breakdown.

Input Formats

Spatial data: AnnData, NumPy array (N × G), or SciPy sparse matrix
Reference: AnnData (aggregated by cell type) or NumPy array (K × G)
Coordinates: Extracted from adata.obsm["spatial"] or NumPy array (N × 2)

Reference Quality

Deconvolution accuracy depends on reference quality:

Requirement	Guideline
Cells per type	≥ 500 recommended
Marker fold-change	≥ 5× for distinguishability
Signature correlation	< 0.95 between types
No Unknown cells	Filter before deconvolution

Critical: Always remove cells labeled "Unknown", "Unassigned", or similar. These cells act as universal signatures that absorb proportions from specific types—a fundamental property of regression-based deconvolution, not a FlashDeconv limitation.

See Reference Data Guide for details.

Installation Options

# Standard
pip install flashdeconv

# With AnnData support
pip install flashdeconv[io]

# Development
git clone https://github.com/cafferychen777/flashdeconv.git
cd flashdeconv && pip install -e ".[dev]"

Requirements: Python ≥ 3.9, numpy, scipy, numba. Optional: scanpy, anndata.

Citation

If you use FlashDeconv in your research, please cite:

Yang, C., Zhang, X. & Chen, J. FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution via structure-preserving sketching. bioRxiv (2025). DOI: 10.64898/2025.12.22.696108

@article{yang2025flashdeconv,
  title={FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution
         via structure-preserving sketching},
  author={Yang, Chen and Zhang, Xianyang and Chen, Jun},
  journal={bioRxiv},
  year={2025},
  doi={10.64898/2025.12.22.696108}
}

Resources

Paper reproducibility code
Reference data guide — Building quality reference signatures
Stereo-seq guide — Platform-specific considerations
GitHub Issues
BSD-3-Clause License

Acknowledgments

We thank the developers of Spotless, Cell2Location, RCTD, CARD, and other deconvolution methods whose work contributed to this field.