FlashDeconv

March 16, 2026 · View on GitHub

PyPI version License Python 3.9+ DOI

Spatial deconvolution with linear scalability for atlas-scale data.

FlashDeconv estimates cell type proportions from spatial transcriptomics data (Visium, Visium HD, Stereo-seq). It is designed for large-scale analyses where computational efficiency is essential, while maintaining attention to low-abundance cell populations through leverage-score-based feature weighting.

Paper: Yang, C., Zhang, X. & Chen, J. FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution via structure-preserving sketching. bioRxiv (2025). DOI: 10.64898/2025.12.22.696108


Installation

pip install flashdeconv

For development or additional I/O support, see Installation Options.


Quick Start

import scanpy as sc
import flashdeconv as fd

# Load data
adata_st = sc.read_h5ad("spatial.h5ad")
adata_ref = sc.read_h5ad("reference.h5ad")

# Deconvolve
fd.tl.deconvolve(adata_st, adata_ref, cell_type_key="cell_type")

# Results stored in adata_st.obsm["flashdeconv"]
sc.pl.spatial(adata_st, color="flashdeconv_Hepatocyte")

FlashDeconv is also available as a tool in ChatSpatial, an MCP server for spatial transcriptomics — run deconvolution through natural language from any compatible client.


Overview

Spatial deconvolution methods offer different trade-offs. Probabilistic approaches like Cell2Location and RCTD provide rigorous uncertainty quantification; methods like CARD incorporate spatial structure through dense kernel matrices. FlashDeconv takes a complementary approach, prioritizing computational efficiency for million-scale datasets.

Design Principles

  1. Linear complexity — O(N) time and memory through randomized sketching and sparse graph regularization.

  2. Leverage-based feature weighting — Variance-based selection (PCA, HVG) can underweight markers of low-abundance populations. We use leverage scores from the reference SVD to identify genes that define distinct transcriptomic directions, regardless of expression magnitude.

  3. Sparse spatial regularization — Graph Laplacian smoothing with O(N) complexity, avoiding the O(N²) cost of dense kernel methods.


Performance

Scalability

SpotsTimeMemory
10,000< 1 sec< 1 GB
100,000~4 sec~2 GB
1,000,000~3 min~21 GB

Benchmarked on MacBook Pro M2 Max (32GB unified memory), CPU-only.

Accuracy

On the Spotless benchmark:

MetricFlashDeconvRCTDCell2Location
Pearson (56 datasets)0.9440.9050.895

Performance varies by tissue type and experimental conditions. We recommend evaluating on data similar to your use case.


Algorithm

FlashDeconv solves a graph-regularized non-negative least squares problem:

minimize  ½‖Y - βX‖²_F + ½λ·Tr(βᵀLβ) + ρ‖β‖₁,  subject to β ≥ 0

where Y is spatial expression, X is reference signatures, L is the graph Laplacian, and β represents cell type abundances.

FlashDeconv Framework

Pipeline:

  1. Select informative genes (HVG ∪ markers) and compute leverage scores
  2. Compress gene space via CountSketch with uniform hashing + leverage-weighted amplitudes (G → 512 dimensions)
  3. Construct sparse k-NN spatial graph
  4. Solve via block coordinate descent with spatial smoothing

API

Scanpy-style

fd.tl.deconvolve(
    adata_st,                    # Spatial AnnData
    adata_ref,                   # Reference AnnData
    cell_type_key="cell_type",   # Column in adata_ref.obs
    key_added="flashdeconv",     # Key for results
)

NumPy

from flashdeconv import FlashDeconv

model = FlashDeconv(
    sketch_dim=512,
    lambda_spatial="auto",
    n_hvg=2000,
    k_neighbors=6,
    random_state=0,
)
proportions = model.fit_transform(Y, X, coords)

Parameters

ParameterDefaultDescription
sketch_dim512Sketch dimension
lambda_spatial"auto"Spatial regularization (auto-tuned)
n_hvg2000Highly variable genes
spatial_method"knn"Graph method: "knn", "radius", or "grid"
k_neighbors6Spatial graph neighbors (for "knn")
radiusNoneNeighbor radius (required for "radius")
preprocess"log_cpm"Normalization: "log_cpm", "pearson", or "raw"
random_state0Random seed for reproducibility

Output

AttributeDescription
`proportions_$\text{Cell} \text{type} \text{proportions} (\text{N} \times \text{K}), \text{sum} \text{to} 1
$beta_`Raw abundances (N × K)
info_Convergence statistics

Input Formats

  • Spatial data: AnnData, NumPy array (N × G), or SciPy sparse matrix
  • Reference: AnnData (aggregated by cell type) or NumPy array (K × G)
  • Coordinates: Extracted from adata.obsm["spatial"] or NumPy array (N × 2)

Reference Quality

Deconvolution accuracy depends on reference quality:

RequirementGuideline
Cells per type≥ 500 recommended
Marker fold-change≥ 5× for distinguishability
Signature correlation< 0.95 between types
No Unknown cellsFilter before deconvolution

Critical: Always remove cells labeled "Unknown", "Unassigned", or similar. These cells act as universal signatures that absorb proportions from specific types—a fundamental property of regression-based deconvolution, not a FlashDeconv limitation.

See Reference Data Guide for details.


Installation Options

# Standard
pip install flashdeconv

# With AnnData support
pip install flashdeconv[io]

# Development
git clone https://github.com/cafferychen777/flashdeconv.git
cd flashdeconv && pip install -e ".[dev]"

Requirements: Python ≥ 3.9, numpy, scipy, numba. Optional: scanpy, anndata.


Citation

If you use FlashDeconv in your research, please cite:

Yang, C., Zhang, X. & Chen, J. FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution via structure-preserving sketching. bioRxiv (2025). DOI: 10.64898/2025.12.22.696108

@article{yang2025flashdeconv,
  title={FlashDeconv enables atlas-scale, multi-resolution spatial deconvolution
         via structure-preserving sketching},
  author={Yang, Chen and Zhang, Xianyang and Chen, Jun},
  journal={bioRxiv},
  year={2025},
  doi={10.64898/2025.12.22.696108}
}

Resources


Acknowledgments

We thank the developers of Spotless, Cell2Location, RCTD, CARD, and other deconvolution methods whose work contributed to this field.