README.md
August 15, 2025 ยท View on GitHub
DECLUST [1] is a Python package developed to identify spatially coherent clusters of spots by integrating gene expression profiles with spatial coordinates in spatial transcriptomics data. It also enables accurate estimation of cell-type compositions within each cluster.
๐ Features
Spatially-aware clustering: Combines gene expression and spatial coordinates.
Robust deconvolution: Aggregates signals over clusters to enhance cell type detection.
Easy to install: Available via pip.
Visualization: Includes modules for visualizing clustering and marker gene expression.
โฌ Installation
We recommend using a separate Conda environment. Information about Conda and how to install it can be found in the anaconda webpage.
- Create a conda environment and install the DECLUST package
conda create -n declust_env python=3.9
conda activate declust_env
pip install declust
- Following dependencies are required to installed in advanace: scanpy, rpy2, and R version >= 4.3 with dplyr R-packages. These dependencies can be installed using the
install_dependencies.shscript:
sh install_dependencies.sh
The DECLUST package has been installed successfully on Operating systems:
- macOS Sequoia 15.3.2
- Ubuntu 22.04
- SUSE Linux Enterprise Server 15 SP5 (Dardel HPC system)
๐ Data Input
DECLUST uses .h5ad files, which are AnnData objects commonly used for storing annotated data matrices in single-cell and spatial transcriptomics analysis.
Each .h5ad file includes:
sc_adata.h5ad (Single-cell RNA-seq data)
.X: Gene expression matrix (cells ร genes).obs: Cell type annotation of single cells
st_adata.h5ad (Spatial transcriptomics data)
.X: Spatial gene expression matrix (spots ร genes).obs: Spots coordinates
๐ก Both datasets should originate from the same tissue and have overlapping gene sets to ensure proper implementation of DECLUST.
๐ Example Data Download
-
Download the Real Data Example.
-
Download the Simulation Data Example.
โ๏ธ Usage
DECLUST can be embedded into python scripts or used independently as a tool. A guide of how to use it in python scripts is provided in this tutorial. In this section, we introduce how to use it as a bioinformatics pipeline.
Run the pipeline using the following command:
python declust.py --module <module_name> [other options]
- Available Modules
| Module | Description |
|---|---|
marker | Construction of Reference Matrix from Annotated Single-Cell Transcriptomic Data |
cluster | Identification of spatial clusters of spots from ST data |
pseudo_bulk | Generate pseudo-bulk ST profiles per cluster |
deconv | Run deconvolution by Ordinary Least Squares |
visualize | Visualize markers or deconvolution results |
Type python declust.py --help in the terminal to see a list of available commands.
๐งฌ DECLUST pipeline
- Download DECLUST:
wget https://github.com/Qingyueee/DECLUST/archive/refs/tags/0.1.1.tar.gz
tar -xvf 0.1.1.tar.gz
- Unpack data:
cd DECLUST-0.1.1
unzip data.zip
- Marker gene selection:
python declust.py --module marker \
--celltype_col \
--sample_col
Outputs:
-
sc_data_overlapped.csvandsc_label.csvin thedata/folder -
marker_genes.csvin theresults/folder
- Clustering:
python declust.py --module cluster
Performs Hierarchical Clustering โ DBSCAN โ Seeded Region Growing (SRG). Saves:
srg_df.csvand clustering plots inresults/
- Deconvolution:
python declust.py --module deconv
Performs OLS-based deconvolution and outputs:
DECLUST_result.csvinresults/
You can run each step individually or execute the entire pipeline by running the deconvolution script.
To export pseudo-bulk profiles for external methods:
python declust.py --module pseduo_bulk
- Generates
pseudo_bulk.csvin theresults/folder.
๐ก Custom Marker Genes
Users can provide their own marker gene list in one of two formats:
- CSV file containing two columns:
Gene: gene namesmaxgroup: corresponding cell type annotations
--custom_marker_genes file_path
- Comma-separated gene list, along with a corresponding comma-separated list of cell types:
--custom_marker_genes "DCN, LUM, C1S, AGR2, PPDPF, ..."
--custom_marker_celltype "CAFs, CAFs, CAFs, Cancer Epithelial, Cancer Epithelial, ..."
โ ๏ธ The provided marker genes and cell type annotations must exist in the single-cell dataset.
๐ฌ Quick example to run DECLUST on a simulated data
# 1. Download DECLUST
wget https://github.com/Qingyueee/DECLUST/archive/refs/tags/0.1.1.tar.gz
tar -xvf 0.1.1.tar.gz
cd DECLUST-0.1.1
# 2. Configuring environment and install dependencies
conda create -n declust_env python=3.9
conda activate declust_env
pip install declust
sh install_dependencies.sh
# 3. Download and unpack simulated data
wget "https://drive.usercontent.google.com/download?id=1xDx_Wny4NQxWiv0JmPheQIL9oD9XDI6A&export=download&authuser=0&confirm=t&uuid=532376ff-6e95-41f1-8357-31b333fa093f&at=APcmpowvVMOvWT63KvGDXfYSA9ZJ:1746541698306" -O simulation_data.zip
unzip simulation_data.zip
# 4. Run pipeline - it may take about 2 minutes to complete on a personal computer
python declust.py --module deconv \
--data_dir simulation_data \
--results_dir simulation_results \
--sc_file sc_adata_200_per_celltype.h5ad \
--st_file st_simu_adata.h5ad \
--celltype_col celltype_major \
--sample_col Patient
# 5. Results visulization
python declust.py --module visualize \
--data_dir simulation_data \
--results_dir simulation_results \
--sc_file sc_adata_200_per_celltype.h5ad \
--st_file st_simu_adata.h5ad \
--celltype_col celltype_major
๐ Output Structure
project/
โ
โโโ data/
โ โโโ sc_adata_overlapped.h5ad
โ โโโ sc_labels.csv
โ โโโ ...
โ
โโโ results/
โ โโโ marker_genes.csv
โ โโโ srg_df.csv
โ โโโ pseudo_bulk.csv
โ โโโ DECLUST_result.csv
โ โโโ [visualization plots]
License
GNU General Public License v3.0
References
- Wang Q, Khatri P, Dinh HQ, Huang J, Pawitan Y, Vu TN. A cluster-based cell-type deconvolution of spatial transcriptomic data. Nucleic acids research 2025 53;14 (https://academic.oup.com/nar/article/53/14/gkaf714/8211932)