RareQ

April 13, 2026 · View on GitHub

RareQ is an R package for identifying rare cell populations in single-cell and cell-segmented spatial datasets. It uses Q-value-guided network propagation and is designed to be accurate, scalable, and robust.

Dependencies

R >= 4.0.0
Seurat >= 4.0.2
Signac >= 1.9.0  # required only for scATAC-seq preprocessing

Installation

library(devtools)
install_github("fabotao/RareQ")

Quick start (scRNA-seq)

The Jurkat example dataset is included for demonstration.

library(RareQ)
library(Seurat)

# Load example data
obj <- readRDS("Jurkat.RDS")
counts <- obj@assays$RNA@counts

# Preprocess scRNA-seq data
sc_object <- CreateSeuratObject(counts = counts, project = "sc_object", min.cells = 3)
sc_object <- NormalizeData(sc_object)
sc_object <- FindVariableFeatures(sc_object, nfeatures = 2000)
sc_object <- ScaleData(sc_object)
sc_object <- RunPCA(sc_object, npcs = 50)
sc_object <- RunUMAP(sc_object, dims = 1:50)
sc_object <- FindNeighbors(
  object = sc_object,
  k.param = 20,
  compute.SNN = FALSE,
  prune.SNN = 0,
  reduction = "pca",
  dims = 1:50,
  force.recalc = FALSE,
  return.neighbor = TRUE
)

# Identify major and rare clusters
cluster <- FindRare(sc_object)
table(cluster)

sc_object$cluster <- cluster
DimPlot(sc_object, group.by = "cluster")

Optional: consensus clustering with ConsensusRare

FindRare is deterministic and generally robust. If you want additional stability checks, ConsensusRare runs FindRare repeatedly on shuffled cell orders and aggregates the results through consensus clustering.

Note: ConsensusRare is slower and more memory-intensive than FindRare, especially for large datasets.

library(RareQ)
library(Seurat)

# Load example data
obj <- readRDS("Jurkat.RDS")
counts <- obj@assays$RNA@counts

# Preprocess scRNA-seq data
sc_object <- CreateSeuratObject(counts = counts, project = "sc_object", min.cells = 3)
sc_object <- NormalizeData(sc_object)
sc_object <- FindVariableFeatures(sc_object, nfeatures = 2000)
sc_object <- ScaleData(sc_object)
sc_object <- RunPCA(sc_object, npcs = 50)
sc_object <- RunUMAP(sc_object, dims = 1:50)

# Run consensus clustering
cluster <- ConsensusRare(
  sc_object,
  assay = "RNA",
  reduction = "pca",
  dims = 1:50,
  k.param = 20,
  k = 6,
  Q_cut = 0.6,
  ratio = 0.2,
  reps = 30
)

table(cluster)
sc_object$cluster <- cluster
DimPlot(sc_object, group.by = "cluster")

Tutorials

Tutorial HTML notebooks are available in the Tutorials/ directory:

  1. scRNA_analysis: scRNA-seq analysis using Jurkat data.
  2. scRNA_scATAC_analysis: joint scRNA-seq/scATAC-seq multiome analysis.
  3. scRNA_ADT_analysis: CITE-seq analysis with RNA and ADT modalities.
  4. Xenium_spatial_analysis: cell-segmented Xenium spatial analysis.

Optional tutorial for consensus workflow:

Related tutorial datasets:

Simulation resources

Simulation scripts are in Simulation/.

Citation

If you use RareQ in your work, please cite the: Fa, B., Huang, C., Ma, Y. et al. Cell neighborhood topology directs rare cell population identification. Nat Commun (2026). https://doi.org/10.1038/s41467-026-71180-x

License

Copyright © 2026 XiaoLab@XJTU. This project is licensed under the MIT License - see the LICENSE file for details.