README.md

August 15, 2025 · View on GitHub

DECLUST [1] is a Python package developed to identify spatially coherent clusters of spots by integrating gene expression profiles with spatial coordinates in spatial transcriptomics data. It also enables accurate estimation of cell-type compositions within each cluster.

🌟 Features

Spatially-aware clustering: Combines gene expression and spatial coordinates.

Robust deconvolution: Aggregates signals over clusters to enhance cell type detection.

Easy to install: Available via pip.

Visualization: Includes modules for visualizing clustering and marker gene expression.

⏬ Installation

We recommend using a separate Conda environment. Information about Conda and how to install it can be found in the anaconda webpage.

Create a conda environment and install the DECLUST package

   conda create -n declust_env python=3.9
   conda activate declust_env

   pip install declust

Following dependencies are required to installed in advanace: scanpy, rpy2, and R version >= 4.3 with dplyr R-packages. These dependencies can be installed using the install_dependencies.sh script:

   sh install_dependencies.sh

The DECLUST package has been installed successfully on Operating systems:

macOS Sequoia 15.3.2
Ubuntu 22.04
SUSE Linux Enterprise Server 15 SP5 (Dardel HPC system)

📊 Data Input

DECLUST uses .h5ad files, which are AnnData objects commonly used for storing annotated data matrices in single-cell and spatial transcriptomics analysis.

Each .h5ad file includes:

sc_adata.h5ad (Single-cell RNA-seq data)

.X: Gene expression matrix (cells × genes)
.obs: Cell type annotation of single cells

st_adata.h5ad (Spatial transcriptomics data)

.X: Spatial gene expression matrix (spots × genes)
.obs: Spots coordinates

💡 Both datasets should originate from the same tissue and have overlapping gene sets to ensure proper implementation of DECLUST.

🔗 Example Data Download

Download the Real Data Example.
Download the Simulation Data Example.

⚙️ Usage

DECLUST can be embedded into python scripts or used independently as a tool. A guide of how to use it in python scripts is provided in this tutorial. In this section, we introduce how to use it as a bioinformatics pipeline.

Run the pipeline using the following command:

python declust.py --module <module_name> [other options]

Available Modules

Module	Description
`marker`	Construction of Reference Matrix from Annotated Single-Cell Transcriptomic Data
`cluster`	Identification of spatial clusters of spots from ST data
`pseudo_bulk`	Generate pseudo-bulk ST profiles per cluster
`deconv`	Run deconvolution by Ordinary Least Squares
`visualize`	Visualize markers or deconvolution results

Type python declust.py --help in the terminal to see a list of available commands.

🧬 DECLUST pipeline

Download DECLUST:

   wget https://github.com/Qingyueee/DECLUST/archive/refs/tags/0.1.1.tar.gz
   tar -xvf 0.1.1.tar.gz

Unpack data:

   cd DECLUST-0.1.1
   unzip data.zip

Marker gene selection:

   python declust.py --module marker \
   --celltype_col \
   --sample_col

Outputs:

sc_data_overlapped.csv and sc_label.csv in the data/ folder
marker_genes.csv in the results/ folder

Clustering:

   python declust.py --module cluster

Performs Hierarchical Clustering → DBSCAN → Seeded Region Growing (SRG). Saves:

srg_df.csv and clustering plots in results/

Deconvolution:

   python declust.py --module deconv

Performs OLS-based deconvolution and outputs:

DECLUST_result.csv in results/

You can run each step individually or execute the entire pipeline by running the deconvolution script.

To export pseudo-bulk profiles for external methods:

   python declust.py --module pseduo_bulk

Generates pseudo_bulk.csv in the results/ folder.

💡 Custom Marker Genes

Users can provide their own marker gene list in one of two formats:

CSV file containing two columns:
- Gene: gene names
- maxgroup: corresponding cell type annotations

   --custom_marker_genes file_path

Comma-separated gene list, along with a corresponding comma-separated list of cell types:

   --custom_marker_genes "DCN, LUM, C1S, AGR2, PPDPF, ..."
   --custom_marker_celltype "CAFs, CAFs, CAFs, Cancer Epithelial, Cancer Epithelial, ..."

⚠️ The provided marker genes and cell type annotations must exist in the single-cell dataset.

📬 Quick example to run DECLUST on a simulated data

# 1. Download DECLUST
   wget https://github.com/Qingyueee/DECLUST/archive/refs/tags/0.1.1.tar.gz
   tar -xvf 0.1.1.tar.gz
   cd DECLUST-0.1.1

# 2. Configuring environment and install dependencies
   conda create -n declust_env python=3.9
   conda activate declust_env
   pip install declust
   sh install_dependencies.sh

# 3. Download and unpack simulated data
   wget "https://drive.usercontent.google.com/download?id=1xDx_Wny4NQxWiv0JmPheQIL9oD9XDI6A&export=download&authuser=0&confirm=t&uuid=532376ff-6e95-41f1-8357-31b333fa093f&at=APcmpowvVMOvWT63KvGDXfYSA9ZJ:1746541698306" -O simulation_data.zip
   unzip simulation_data.zip

# 4. Run pipeline - it may take about 2 minutes to complete on a personal computer
   python declust.py --module deconv \
      --data_dir simulation_data \
      --results_dir simulation_results \
      --sc_file sc_adata_200_per_celltype.h5ad \
      --st_file st_simu_adata.h5ad \
      --celltype_col celltype_major \
      --sample_col Patient

# 5. Results visulization
   python declust.py --module visualize \
      --data_dir simulation_data \
      --results_dir simulation_results \
      --sc_file sc_adata_200_per_celltype.h5ad \
      --st_file st_simu_adata.h5ad \
      --celltype_col celltype_major

📁 Output Structure

   project/
   │
   ├── data/
   │   ├── sc_adata_overlapped.h5ad
   │   ├── sc_labels.csv
   │   └── ...
   │
   ├── results/
   │   ├── marker_genes.csv
   │   ├── srg_df.csv
   │   ├── pseudo_bulk.csv
   │   ├── DECLUST_result.csv
   │   └── [visualization plots]

License

GNU General Public License v3.0

References

Wang Q, Khatri P, Dinh HQ, Huang J, Pawitan Y, Vu TN. A cluster-based cell-type deconvolution of spatial transcriptomic data. Nucleic acids research 2025 53;14 (https://academic.oup.com/nar/article/53/14/gkaf714/8211932)