squab

February 2, 2026 · View on GitHub

squab performs gene expression quantification by counting the number of aligned records that intersects a set of features.

The original goal of this project is to provide a faster alternative to htseq-count. It uses similar counting rules and outputs a compatible data table.

Installation

Install Rust and use cargo to install squab.

$ cargo install --locked --git https://github.com/zaeleus/squab.git

Usage

squab performs gene expression quantification by counting the number of times aligned records intersect known gene annotations.

Usage: squab [OPTIONS] --annotations <ANNOTATIONS> <SRC>

Arguments:
  <SRC>  Input alignment file

Options:
      --with-secondary-records
          Count secondary records (BAM flag 0x100)
      --with-supplementary-records
          Count supplementary records (BAM flag 0x800)
      --with-nonunique-records
          Count nonunique records (BAM data tag NH > 1)
      --strand-specification <STRAND_SPECIFICATION>
          Strand specification [default: auto] [possible values: none, forward, reverse, auto]
  -t, --feature-type <FEATURE_TYPE>
          Feature type to count [default: exon]
  -i, --feature-id <FEATURE_ID>
          Feature attribute to use as the feature identity [default: gene_id]
      --min-mapping-quality <MIN_MAPPING_QUALITY>
          [default: 10]
  -o, --output <OUTPUT>
          Output destination
  -a, --annotations <ANNOTATIONS>
          Input annotations file (GFF3)
      --worker-count <WORKER_COUNT>
          The number of workers to spawn
  -h, --help
          Print help (see more with '--help')
  -V, --version
          Print version

The default output is a tab-delimited text file with two columns: the feature identifier (string) and the number of reads (integer) from the input alignment that overlap it. This file is compatible as output from htseq-count, meaning it includes statistics in the trailer.

Examples

Count features (exons by gene ID)

$ squab \
    --annotations annotations.gff3.gz \
    --output sample.counts.tsv \
    sample.bam

Count featues (genes by gene name)

$ squab \
    --annotations annotations.gff3.gz \
    --feature-type gene \
    --id gene_name \
    --output sample.counts.tsv \
    sample.bam

Limitations

Counts are taken only as the union of matched feature sets, i.e., reads that overlap any part of the feature is considered once.
For paired end alignments, a read that matches itself before a mate is found replaces the previously known record.

References

S Anders, T P Pyl, W Huber. HTSeq – A Python framework to work with high-throughput sequencing data. bioRxiv 2014. https://doi.org/10.1101/002824