README.md

October 11, 2025 · View on GitHub


PyPI version License Documentation Status Test Examples

Slack Twitter URL

Overview of DANCE 1.0 and 2.0

DANCE 1.0 is a Python toolkit designed to support deep learning models for large-scale analysis of single-cell gene expression data. Its goal is to foster a deep learning community and establish a benchmark platform for computational methods in single-cell analysis.

DANCE 2.0 extends this effort by introducing an automated preprocessing recommendation platform. It addresses the pressing need to move beyond trial-and-error approaches by transforming single-cell preprocessing into a systematic, data-driven, and interpretable workflow.

Both include three modules at present:

  1. Single-modality analysis: cell type annotation, clustering, gene imputation
  2. Single-cell multimodal omics: modality prediction (only DANCE 1.0), modality matching(only DANCE 1.0), joint embedding
  3. Spatially resolved transcriptomics: spatial domain identification, cell type deconvolution

DANCE 2.0 Release Schedule

  • Open-source release of the DANCE 2.0 codebase
  • Launch of the DANCE 2.0 web platform for users to upload datasets and receive optimal preprocessing recommendations
  • Release of the DANCE 2.0 API for programmatic access to preprocessing recommendations

DANCE Open Source: https://github.com/OmicsML/dance
DANCE Documentation: https://pydance.readthedocs.io/en/latest/
DANCE 1.0 Paper (published on Genome Biology): DANCE: a deep learning library and benchmark platform for single-cell analysis
DANCE 2.0 Paper: DANCE 2.0: Transforming single-cell analysis from black box to transparent workflow
Survey Paper (published on ACM TIST): Deep Learning in Single-cell Analysis

Join the Community

Slack: https://join.slack.com/t/omicsml/shared_invite/zt-1hxdz7op3-E5K~EwWF1xDvhGZFrB9AbA
Twitter: https://twitter.com/OmicsML
Wechat Group Assistant: 736180290
Email: danceteamgnn@gmail.com

Contributing

Community-wide contribution is the key to sustainable development and continual growth of the DANCE package. We deeply appreciate any contribution made to improve the DANCE code base. If you would like to get started, please refer to our brief guidelines about our automated quality controls, as well as setting up the dev environments.

Citation

If you find our work useful in your research, please consider citing our DANCE package or survey paper:

@article{ding2025dance,
  title={DANCE 2.0: Transforming single-cell analysis from black box to transparent workflow},
  author={Ding, Jiayuan and Xing, Zhongyu and Wang, Yixin and Liu, Renming and Liu, Sheng and Huang, Zhi and Tang, Wenzhuo and Xie, Yuying and Zou, James and Qiu, Xiaojie and others},
  journal={bioRxiv},
  pages={2025--07},
  year={2025},
  publisher={Cold Spring Harbor Laboratory}
}
@article{ding2024dance,
  title={DANCE: A deep learning library and benchmark platform for single-cell analysis},
  author={Ding, Jiayuan and Liu, Renming and Wen, Hongzhi and Tang, Wenzhuo and Li, Zhaoheng and Venegas, Julian and Su, Runze and Molho, Dylan and Jin, Wei and Wang, Yixin and others},
  journal={Genome Biology},
  volume={25},
  number={1},
  pages={1--28},
  year={2024},
  publisher={BioMed Central}
}
@article{molho2024deep,
  title={Deep learning in single-cell analysis},
  author={Molho, Dylan and Ding, Jiayuan and Tang, Wenzhuo and Li, Zhaoheng and Wen, Hongzhi and Wang, Yixin and Venegas, Julian and Jin, Wei and Liu, Renming and Su, Runze and others},
  journal={ACM Transactions on Intelligent Systems and Technology},
  volume={15},
  number={3},
  pages={1--62},
  year={2024},
  publisher={ACM New York, NY}
}

Usage (DANCE 1.0)

Overview

In release 1.0, the main usage of the DANCE is to provide readily available experiment reproduction (see detail information about the reproduced performance below). Users can easily reproduce selected experiments presented in the original papers for the computational single-cell methods implemented in DANCE, which can be found under examples/.

Motivation

Computational methods for single-cell analysis are quickly emerging, and the field is revolutionizing the usage of single-cell data to gain biological insights. A key challenge to continually developing computational single-cell methods that achieve new state-of-the-art performance is reproducing previous benchmarks. More specifically, different studies prepare their datasets and perform evaluation differently, and not to mention the compatibility of different methods, as they could be written in different languages or using incompatible library versions.

DANCE addresses these challenges by providing a unified Python package implementing many popular computational single-cell methods (see Implemented Algorithms), as well as easily reproducible experiments by providing unified tools for

  • Data downloading
  • Data (pre-)processing and transformation (e.g. graph construction)
  • Model training and evaluation

Example: run cell-type annotation benchmark using scDeepSort

  • Step0. Install DANCE (see Installation)
  • Step1. Navigate to the folder containing the corresponding example scrtip. In this case, it is examples/single_modality/cell_type_annotation.
  • Step2. Obtain command line interface (CLI) options for a particular experiment to reproduce at the end of the script. For example, the CLI options for reproducing the Mouse Brain experiment is
    python scdeepsort.py --species mouse --tissue Brain --train_dataset 753 3285 --test_dataset 2695
    
  • Step3. Wait for the experiment to finish and check results.

Usage (DANCE 2.0)

Overview

In release 2.0, DANCE evolves from an experiment reproduction library into an automated and interpretable preprocessing platform. It provides powerful tools to optimize your single-cell analysis workflows: To discover the best preprocessing pipeline for a specific method, you can use our Method-Aware Preprocessing (MAP) module. For practical examples on how to run this locally, please see examples/tuning/custom-methods/. To get an instant, high-quality pipeline recommendation for a new dataset, you can use our Dataset-Aware Preprocessing (DAP) web service, available at http://omicsml.ai:81/dance/. Together, these features transform single-cell preprocessing from a manual, trial-and-error process into a systematic, data-driven, and reproducible workflow.

Motivation

While DANCE 1.0 addressed benchmark reproduction, a more fundamental challenge in single-cell analysis is the preprocessing itself. The optimal combination of normalization, gene selection, and dimensionality reduction varies across tasks, models, and datasets, yet the selection process is often guided by legacy defaults or time-consuming trial-and-error. This inconsistency hinders reproducibility and can lead to suboptimal or even misleading results. DANCE 2.0 tackles this challenge by transforming preprocessing from a black-box art into a systematic, data-driven science. It provides tools to automatically construct pipelines tailored to a specific analytical method and dataset, ensuring more robust and transparent downstream analysis.

Example: run cell-type annotation benchmark using SVM

  • Step0. Install DANCE (see Installation)
  • Step1. Navigate to the folder containing the corresponding example scrtip. In this case, it is examples/tuning/cta_svm.
  • Step2. Obtain command line interface (CLI) options for a particular experiment to reproduce at the end of the script. For example, the CLI options for reproducing the Human Brain experiment is
    python main.py --tune_mode (pipeline/params/pipeline_params) --species human --tissue Brain --train_dataset 328 --test_dataset 138 --valid_dataset 328
    
  • Step3. Wait for the experiment to finish and check results.

Installation

Quick install

The full installation process might be a bit tedious and could involve some debugging when using CUDA enabled packages. Thus, we provide an install.sh script that simplifies the installation process, assuming the user have conda set up on their machines. The installation script creates a conda environment dance and install the DANCE package along with all its dependencies with a apseicifc CUDA version. Currently, two options are accepted: cpu and cu118. For example, to install the DANCE package using CUDA 11.8 in a dance-env conda environment, simply run:

# Clone the repository via SSH
git clone git@github.com:OmicsML/dance.git && cd dance
# Alternatively, use HTTPS if you have not set up SSH
# git clone https://github.com/OmicsML/dance.git  && cd dance

# Run the auto installation script to install DANCE and its dependencies in a conda environment
source install.sh cu118 dance-env

Note: the first argument for cuda version is mandatory, while the second argument for conda environment name is optional (default is dance).

Custom install


Step1. Setup environment

First create a conda environment for dance (optional)

conda create -n dance python=3.11 -y && conda activate dance

Then, install CUDA enabled packages (PyTorch, PyG, DGL):

pip install torch==2.1.1 torchvision==0.16.1 --index-url https://download.pytorch.org/whl/cu118
pip install torch_geometric==2.4.0
pip install dgl==1.1.3 -f https://data/dgl.ai/wheels/cu118/repo.html

Alternatively, install these dependencies for CPU only:

pip install torch==2.1.1 torchvision==0.16.1 --index-url https://download.pytorch.org/whl/cpu
pip install torch_geometric==2.4.0
pip install dgl==1.1.3 -f https://data/dgl.ai/wheels/repo.html

For more information about installation or other CUDA version options, check out the installation pages for the corresponding packages

Step2. Install DANCE

Install from PyPI

pip install pydance

Or, install the latest dev version from source

git clone https://github.com/OmicsML/dance.git && cd dance
pip install -e .

Implemented Algorithms

P1 not covered in the first release

Single Modality Module

1)Imputation

BackBoneModelAlgorithmYearCheckIn
GNNGraphSCIImputing Single-cell RNA-seq data by combining Graph Convolution and Autoencoder Neural Networks2021
GNNscGNN (2020)SCGNN: scRNA-seq Dropout Imputation via Induced Hierarchical Cell Similarity Graph2020P1
GNNscGNN (2021)scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses2021
GNNGNNImputeAn efficient scRNA-seq dropout imputation method using graph attention network2021P1
Graph DiffusionMAGICMAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data2018P1
Probabilistic ModelscImputeAn accurate and robust imputation method scImpute for single-cell RNA-seq data2018P1
GANscGAINscGAIN: Single Cell RNA-seq Data Imputation using Generative Adversarial Networks2019P1
NNDeepImputeDeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data2019
NN + TFSaver-XTransfer learning in single-cell transcriptomics improves data denoising and pattern discovery2019P1
ModelMouse Brain (DANCE 2.0/DANCE1.0/Original)Mouse Embryo (DANCE 2.0/DANCE1.0/Original)PBMC (DANCE 2.0/DANCE1.0/Original)Evaluation Metric
DeepImpute0.229/0.244/NA0.252/0.255/NA0.220/0.230/NATest MRE
GraphSCI0.453/0.654/NA0.459/0.497/NA0.458/0.704/NATest MRE
scGNN20.323/0.629/NA0.299/0.620/NA0.441/0.684/NATest MRE

Note: Stage 1, 2 and 3 (valid mask as metric for selection) for all methods.

Note: scGNN2.0 is evaluated on 2,000 genes with highest variance following the original paper.

2)Cell Type Annotation

BackBoneModelAlgorithmYearCheckIn
GNNScDeepsortSingle-cell transcriptomics with weighted GNN2021
Logistic RegressionCelltypistCross-tissue immune cell analysis reveals tissue-specific features in humans.2021
Random ForestsingleCellNetSingleCellNet: a computational tool to classify single cell RNA-Seq data across platforms and across species.2019
Neural NetworkACTINNACTINN: automated identification of cell types in single cell RNA sequencing.2020
Hierarchical ClusteringSingleRReference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage.2019P1
SVMSVMA comparison of automatic cell identification methods for single-cell RNA sequencing data.2018
GNNscHeteroNetscHeteroNet: A Heterophily-Aware Graph Neural Network for Accurate Cell Type Annotation and Novel Cell Detection2025
ModelGSE67835 Brain
(DANCE 2.0/DANCE 1.0/Original)
CD8+ TIL atlas
(DANCE 2.0/DANCE 1.0/Original)
GSE123813 Immune
(DANCE 2.0/DANCE 1.0/Original)
CD4+ TIL atlas
(DANCE 2.0/DANCE 1.0/Original)
GSE182320 (Tissue- Spleen)
(DANCE 2.0/DANCE 1.0/Original)
Evaluation Metric
SVM0.82/0.07/NA0.81/0.39/NA0.86/0.83/NA0.92/0.48/NA0.47/0.30/NAACC
ACTINN0.80/0.80/NA0.84/0.78/NA0.83/0.81/NA0.92/0.89/NA0.47/0.44/NAACC
singleCellNet0.78/0.77/NA0.76/0.75/NA0.85/0.84/NA0.87/0.85/NA0.45/0.44/NAACC
Celltypist0.84/0.90/NA0.81/0.72/NA0.83/0.80/NA0.92/0.87/NA0.45/0.43/NAACC
ScdeepSort0.84/0.07/NA0.83/0.65/NA0.83/0.82/NA0.92/0.78/NA0.45/0.43/NAACC
scHeteroNet0.87/0.83/NA0.80/0.78/NA0.82/0.81/NA0.91/0.89/NA0.47/0.45/NAACC

Note: Stage 1, 2 and 3 (valid dataset as metric for selection) for all methods.

3)Clustering

BackBoneModelAlgorithmYearCheckIn
GNNgraph-scGNN-based embedding for clustering scRNA-seq data2022
GNNscTAGZINB-based Graph Embedding Autoencoder for Single-cell RNA-seq Interpretations2022
GNNscDSCDeep structural clustering for single-cell RNA-seq data jointly through autoencoder and graph neural network2022
GNNscGACscGAC: a graph attentional architecture for clustering single-cell RNA-seq data2022P1
AutoEncoderscDeepClusterClustering single-cell RNA-seq data with a model-based deep learning approach2019
AutoEncoderscDCCModel-based deep embedding for constrained clustering analysis of single cell RNA-seq data2021
AutoEncoderscziDeskDeep soft K-means clustering with self-training for single-cell RNA sequence data2020P1
ModelWorm Neuron (DANCE 2.0/DANCE 1.0/Original)Mouse Bladder (DANCE 2.0/DANCE 1.0/Original)10X PBMC (DANCE 2.0/DANCE 1.0/Original)Mouse ES (DANCE 2.0/DANCE 1.0/Original)Evaluation Metric
graph-sc0.71/0.53/0.460.76/0.59/0.630.79/0.68/0.700.95/0.81/0.78ARI
scDCC0.69//0.41/0.580.78/0.60/0.660.84/0.82/0.810.9987/0.98/NAARI
scDeepCluster0.70/0.51/0.520.80/0.56/0.580.83/0.81/0.780.9951/0.98/0.97ARI
scDSC0.66/0.46/0.650.68/0.65/0.720.72/0.72/0.780.98/0.98/0.84/NAARI
scTAG0.72/0.49/NA0.76/0.69/NA0.81/0.77/NA0.93/0.96/NAARI

Note: Stage 1, 2 and 3 (test dataset as metric for selection) for all methods.

Multimodality Module

1)Modality Prediction

BackBoneModelAlgorithmYearCheckIn
GNNScMoGCNGraph Neural Networks for Multimodal Single-Cell Data Integration2022
GNNScMoLPLink Prediction Variant of ScMoGCN2022P1
GNNGRAPEHandling Missing Data with Graph Representation Learning2020P1
Generative ModelSCMMSCMM: MIXTURE-OF-EXPERTS MULTIMODAL DEEP GENERATIVE MODEL FOR SINGLE-CELL MULTIOMICS DATA ANALYSIS2021
Auto-encoderCross-modal autoencodersMulti-domain translation between single-cell imaging and sequencing data using autoencoders2021
Auto-encoderBABELBABEL enables cross-modality translation between multiomic profiles at single-cell resolution2021
ModelEvaluation MetricGEX2ADT (DANCE 1.0/Original)ADT2GEX (DANCE 1.0/Original)GEX2ATAC (DANCE 1.0/Original)ATAC2GEX (DANCE 1.0/Original)
ScMoGCNRMSE0.3885 / 0.38850.3242 / 0.32420.1778 / 0.17780.2315 / 0.2315
SCMMRMSE0.6264 / N/A0.4458 / N/A0.2163 / N/A0.3730 / N/A
Cross-modal autoencodersRMSE0.5725 / N/A0.3585 / N/A0.1917 / N/A0.2551 / N/A
BABELRMSE0.4335 / N/A0.3673 / N/A0.1816 / N/A0.2394 / N/A

2) Modality Matching

BackBoneModelAlgorithmYearCheckIn
GNNScMoGCNGraph Neural Networks for Multimodal Single-Cell Data Integration2022
GNN/Auto-ecnoderGLUEMulti-omics single-cell data integration and regulatory inference with graph-linked embedding2021P1
Generative ModelSCMMSCMM: MIXTURE-OF-EXPERTS MULTIMODAL DEEP GENERATIVE MODEL FOR SINGLE-CELL MULTIOMICS DATA ANALYSIS2021
Auto-encoderCross-modal autoencodersMulti-domain translation between single-cell imaging and sequencing data using autoencoders2021
ModelEvaluation MetricGEX2ADT (DANCE 1.0/Original)GEX2ATAC (DANCE 1.0/Original)
ScMoGCNAccuracy0.0827 / 0.08100.0600 / 0.0630
SCMMAccuracy0.005 / N/A5e-5 / N/A
Cross-modal autoencodersAccuracy0.0002 / N/A0.0002 / N/A

3) Joint Embedding

BackBoneModelAlgorithmYearCheckIn
GNNScMoGCNGraph Neural Networks for Multimodal Single-Cell Data Integration2022
Auto-encoderscMVAEDeep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data2020
Auto-encoderscDECSimultaneous deep generative modelling and clustering of single-cell genomic data2021
GNN/Auto-ecnoderGLUEMulti-omics single-cell data integration and regulatory inference with graph-linked embedding2021P1
Auto-encoderDCCADeep cross-omics cycle attention model for joint analysis of single-cell multi-omics data2021
ModelBRAIN ATAC2GEX (DANCE 2.0/DANCE 1.0/Original)SKIN ATAC2GEX (DANCE 2.0/DANCE 1.0/Original)OP 2022 Multi ATAC2GEX (DANCE 2.0/DANCE 1.0/Original)Evaluation Metric
DCCA0.399/0.112/NA0.597/0.335/NA0.549/0.438/NAARI
scDEC0.853/0.475/NA0.889/0.34/NA0.827/0.428/NAARI
ScMoGCN0.704/0.478/NA0.634/0.32/NA0.85/0.433/NAARI
scMVAE0.342/0.218/NA0.399/0.341/NA0.437/0.362/NAARI

Note: Stage 1, 2 and 3 (test dataset as metric for selection) for all methods.

4) Multimodal Imputation

BackBoneModelAlgorithmYearCheckIn
GNNScMoLPLink Prediction Variant of ScMoGCN2022P1
GNNscGNNscGNN is a novel graph neural network framework for single-cell RNA-Seq analyses2021P1
GNNGRAPEHandling Missing Data with Graph Representation Learning2020P1

5) Multimodal Integration

BackBoneModelAlgorithmYearCheckIn
GNNScMoGCNGraph Neural Networks for Multimodal Single-Cell Data Integration2022P1
GNNscGNNscGNN is a novel graph neural network framework for single-cell RNA-Seq analyses (GCN on Nearest Neighbor graph)2021P1
Nearest NeighborWNNIntegrated analysis of multimodal single-cell data2021P1
GANMAGANMAGAN: Aligning Biological Manifolds2018P1
Auto-encoderSCIMSCIM: universal single-cell matching with unpaired feature sets2020P1
Auto-encoderMultiMAPMultiMAP: Dimensionality Reduction and Integration of Multimodal Data2021P1
Generative ModelSCMMSCMM: MIXTURE-OF-EXPERTS MULTIMODAL DEEP GENERATIVE MODEL FOR SINGLE-CELL MULTIOMICS DATA ANALYSIS2021P1

Spatial Module

1)Spatial Domain

BackBoneModelAlgorithmYearCheckIn
GNNSpaGCNSpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network2021
GNNSTAGATEDeciphering spatial domains from spatially resolved transcriptomics with adaptive graph attention auto-encoder2021
BayesianBayesSpaceSpatial transcriptomics at subspot resolution with BayesSpace2021P1
Pseudo-space-time (PST) DistancestLearnstLearn: integrating spatial location, tissue morphology and gene expression to find cell types, cell-cell interactions and spatial trajectories within undissociated tissues2020
HeuristicLouvainFast unfolding of community hierarchies in large networks2008
GNNEfNSTA composite scaling network of EfficientNet for improving spatial domain identification performance2024
Model151676 (DANCE 2.0/DANCE 1.0/Original)Sub MBA (DANCE 2.0/DANCE 1.0/Original)Evaluation Metric
Louvain0.27/0.25/NA0.43/0.42/NAARI
SpaGCN0.47/0.27/0.350.32/0.32/NAARI
STAGATE0.60/0.60/0.550.29/0.27/NAARI
stLearn0.30/0.29/NA0.45/0.36/NAARI
EfNST0.52/0.33/0.510.30/0.19/NAARI

Note: Stage 1, 2 and 3 (test dataset as metric for selection) for all methods.

2)Cell Type Deconvolution

BackBoneModelAlgorithmYearCheckIn
GNNDSTGDSTG: deconvoluting spatial transcriptomics data through graph-based artificial intelligence2021
logNormRegSpatialDeconAdvances in mixed cell deconvolution enable quantification of cell types in spatial transcriptomic data2022
NNMFregSPOTlightSPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes2021
NN Linear + CAR assumptionCARDSpatially informed cell-type deconvolution for spatial transcriptomics2022
GNNSTdGCNSTdGCN: spatial transcriptomic cell-type deconvolution using graph convolutional networks2024
ModelCARD Synthetic (DANCE 2.0/DANCE 1.0/Original)SPOTlight Synthetic (DANCE 2.0/DANCE 1.0/Original)Evaluation Metric
CARD0.00553/0.00627/NA0.00653/0.00772/NATest MSE
DSTG0.0105/0.0239/NA0.0314/0.0315/NATest MSE
SpatialDecon0.00754/0.00821/NA0.00528/0.00528/NATest MSE
SPOTlight0.0113/0.0250/NA0.00614/0.0106/NATest MSE
STdGCN0.0058/0.0202/NA0.0145/0.0261/NATest MSE

Note: DANCE 2.0 indicates Stage 1, 2, and 3 (valid pseudo or dataset as metric for selection) for all methods.

A Note on Function Naming

The function ScaleFeature has been renamed to ColumnSumNormalize in the code to resolve a naming ambiguity. However, historical WandB logs have not been modified and will still reference the old name (ScaleFeature). This is a naming change only and does not affect the program's execution.