README.md

August 15, 2025 ยท View on GitHub

DECLUST [1] is a Python package developed to identify spatially coherent clusters of spots by integrating gene expression profiles with spatial coordinates in spatial transcriptomics data. It also enables accurate estimation of cell-type compositions within each cluster.


๐ŸŒŸ Features

Spatially-aware clustering: Combines gene expression and spatial coordinates.

Robust deconvolution: Aggregates signals over clusters to enhance cell type detection.

Easy to install: Available via pip.

Visualization: Includes modules for visualizing clustering and marker gene expression.

โฌ Installation

We recommend using a separate Conda environment. Information about Conda and how to install it can be found in the anaconda webpage.

  • Create a conda environment and install the DECLUST package
   conda create -n declust_env python=3.9
   conda activate declust_env

   pip install declust
  • Following dependencies are required to installed in advanace: scanpy, rpy2, and R version >= 4.3 with dplyr R-packages. These dependencies can be installed using the install_dependencies.sh script:
   sh install_dependencies.sh

The DECLUST package has been installed successfully on Operating systems:

  • macOS Sequoia 15.3.2
  • Ubuntu 22.04
  • SUSE Linux Enterprise Server 15 SP5 (Dardel HPC system)

๐Ÿ“Š Data Input

DECLUST uses .h5ad files, which are AnnData objects commonly used for storing annotated data matrices in single-cell and spatial transcriptomics analysis.

Each .h5ad file includes:

sc_adata.h5ad (Single-cell RNA-seq data)

  • .X: Gene expression matrix (cells ร— genes)
  • .obs: Cell type annotation of single cells

st_adata.h5ad (Spatial transcriptomics data)

  • .X: Spatial gene expression matrix (spots ร— genes)
  • .obs: Spots coordinates

๐Ÿ’ก Both datasets should originate from the same tissue and have overlapping gene sets to ensure proper implementation of DECLUST.

๐Ÿ”— Example Data Download

โš™๏ธ Usage

DECLUST can be embedded into python scripts or used independently as a tool. A guide of how to use it in python scripts is provided in this tutorial. In this section, we introduce how to use it as a bioinformatics pipeline.

Run the pipeline using the following command:

python declust.py --module <module_name> [other options]
  • Available Modules
ModuleDescription
markerConstruction of Reference Matrix from Annotated Single-Cell Transcriptomic Data
clusterIdentification of spatial clusters of spots from ST data
pseudo_bulkGenerate pseudo-bulk ST profiles per cluster
deconvRun deconvolution by Ordinary Least Squares
visualizeVisualize markers or deconvolution results

Type python declust.py --help in the terminal to see a list of available commands.

๐Ÿงฌ DECLUST pipeline

  1. Download DECLUST:
   wget https://github.com/Qingyueee/DECLUST/archive/refs/tags/0.1.1.tar.gz
   tar -xvf 0.1.1.tar.gz
  1. Unpack data:
   cd DECLUST-0.1.1
   unzip data.zip
  1. Marker gene selection:
   python declust.py --module marker \
   --celltype_col \
   --sample_col

Outputs:

  • sc_data_overlapped.csv and sc_label.csv in the data/ folder

  • marker_genes.csv in the results/ folder

  1. Clustering:
   python declust.py --module cluster

Performs Hierarchical Clustering โ†’ DBSCAN โ†’ Seeded Region Growing (SRG). Saves:

  • srg_df.csv and clustering plots in results/
  1. Deconvolution:
   python declust.py --module deconv

Performs OLS-based deconvolution and outputs:

  • DECLUST_result.csv in results/

You can run each step individually or execute the entire pipeline by running the deconvolution script.

To export pseudo-bulk profiles for external methods:

   python declust.py --module pseduo_bulk
  • Generates pseudo_bulk.csv in the results/ folder.

๐Ÿ’ก Custom Marker Genes

Users can provide their own marker gene list in one of two formats:

  • CSV file containing two columns:
    • Gene: gene names
    • maxgroup: corresponding cell type annotations
   --custom_marker_genes file_path
  • Comma-separated gene list, along with a corresponding comma-separated list of cell types:
   --custom_marker_genes "DCN, LUM, C1S, AGR2, PPDPF, ..."
   --custom_marker_celltype "CAFs, CAFs, CAFs, Cancer Epithelial, Cancer Epithelial, ..."

โš ๏ธ The provided marker genes and cell type annotations must exist in the single-cell dataset.

๐Ÿ“ฌ Quick example to run DECLUST on a simulated data

# 1. Download DECLUST
   wget https://github.com/Qingyueee/DECLUST/archive/refs/tags/0.1.1.tar.gz
   tar -xvf 0.1.1.tar.gz
   cd DECLUST-0.1.1

# 2. Configuring environment and install dependencies
   conda create -n declust_env python=3.9
   conda activate declust_env
   pip install declust
   sh install_dependencies.sh

# 3. Download and unpack simulated data
   wget "https://drive.usercontent.google.com/download?id=1xDx_Wny4NQxWiv0JmPheQIL9oD9XDI6A&export=download&authuser=0&confirm=t&uuid=532376ff-6e95-41f1-8357-31b333fa093f&at=APcmpowvVMOvWT63KvGDXfYSA9ZJ:1746541698306" -O simulation_data.zip
   unzip simulation_data.zip

# 4. Run pipeline - it may take about 2 minutes to complete on a personal computer
   python declust.py --module deconv \
      --data_dir simulation_data \
      --results_dir simulation_results \
      --sc_file sc_adata_200_per_celltype.h5ad \
      --st_file st_simu_adata.h5ad \
      --celltype_col celltype_major \
      --sample_col Patient

# 5. Results visulization
   python declust.py --module visualize \
      --data_dir simulation_data \
      --results_dir simulation_results \
      --sc_file sc_adata_200_per_celltype.h5ad \
      --st_file st_simu_adata.h5ad \
      --celltype_col celltype_major 

๐Ÿ“ Output Structure

   project/
   โ”‚
   โ”œโ”€โ”€ data/
   โ”‚   โ”œโ”€โ”€ sc_adata_overlapped.h5ad
   โ”‚   โ”œโ”€โ”€ sc_labels.csv
   โ”‚   โ””โ”€โ”€ ...
   โ”‚
   โ”œโ”€โ”€ results/
   โ”‚   โ”œโ”€โ”€ marker_genes.csv
   โ”‚   โ”œโ”€โ”€ srg_df.csv
   โ”‚   โ”œโ”€โ”€ pseudo_bulk.csv
   โ”‚   โ”œโ”€โ”€ DECLUST_result.csv
   โ”‚   โ””โ”€โ”€ [visualization plots]

License

GNU General Public License v3.0

References

  1. Wang Q, Khatri P, Dinh HQ, Huang J, Pawitan Y, Vu TN. A cluster-based cell-type deconvolution of spatial transcriptomic data. Nucleic acids research 2025 53;14 (https://academic.oup.com/nar/article/53/14/gkaf714/8211932)