RastQC

June 3, 2026 · View on GitHub

A fast quality control tool for high-throughput sequencing data, written in Rust. Drop-in replacement for FastQC with identical QC modules, matching algorithms, and compatible output formats.

Features

  • 15 QC modules: all 12 FastQC modules + 3 long-read QC modules
  • Fast: streaming parallel pipeline — 2-3x faster than FastQC on real sequencing data
  • Portable: single 2.1 MB static binary, no Java runtime needed
  • Compatible output: HTML reports, tab-separated data files, ZIP archives, native MultiQC JSON
  • Multi-file summary: overview dashboard when processing many files
  • Web GUI: built-in report browser (--serve)
  • Input formats: FASTQ, gzip, bzip2, BAM, SAM, SOLiD colorspace, Fast5/POD5 (optional), stdin
  • Pipeline integration: QC-aware exit codes (--exit-code) for Nextflow/Snakemake gates

Installation

Via conda (Bioconda)

# Core build (short-read QC)
conda install -c bioconda rastqc

# With Fast5/POD5 support (Oxford Nanopore)
conda install -c bioconda rastqc-nanopore

The Bioconda recipe lives in recipes/rastqc/.

From source

# Requires Rust 1.70+
cargo install --path .

Build manually

git clone https://github.com/Huang-lab/RastQC.git
cd RastQC
cargo build --release
# Binary at ./target/release/rastqc

With Nanopore format support

cargo build --release --features nanopore

Quick start

# Single file
rastqc sample.fastq.gz

# Multiple files (processed in parallel)
rastqc *.fastq.gz

# Specify output directory
rastqc -o results/ sample_R1.fastq.gz sample_R2.fastq.gz

# HTML only (no ZIP)
rastqc --nozip -o results/ sample.fastq.gz

# Stream from stdin (gzip/bzip2 auto-detected)
samtools fastq aligned.bam | rastqc --stdin -o results/
zcat sample.fastq.gz | rastqc --stdin -o results/

# Use 8 threads
rastqc -t 8 -o results/ *.fastq.gz

# Pipeline QC gate (exit 2 if any module fails)
rastqc --exit-code sample.fastq.gz || echo "QC failed"

# Browse reports in browser
rastqc -o results/ *.fastq.gz --serve

# Native MultiQC JSON output
rastqc --multiqc-json -o results/ sample.fastq.gz

Usage

rastqc [OPTIONS] [FILES]...

Arguments:
  [FILES]...  Input files (FASTQ, FASTA, BAM, SAM, Fast5, POD5). Use "-" for stdin (gzip/bzip2 auto-detected).

Options:
  -o, --outdir <DIR>            Output directory [default: current directory]
  -t, --threads <N>             Number of threads [default: all CPUs]
  -c, --contaminants <FILE>     Custom contaminant list (tab-separated: name\tsequence)
  -a, --adapters <FILE>         Custom adapter list (tab-separated: name\tsequence)
  -l, --limits <FILE>           Custom pass/warn/fail thresholds
  -k, --kmer-size <N>           Kmer size for enrichment analysis [default: 7]
      --stdin                   Read FASTQ from standard input (gzip/bzip2 auto-detected)
      --nofilter                Include all reads (don't skip QC-failed reads)
      --extract                 Extract ZIP contents after creation
      --nozip                   Write HTML report only, skip ZIP archive
      --summary                 Write multi-file summary report
      --multiqc-json            Output native MultiQC JSON alongside standard reports
      --exit-code               Return QC-aware exit codes: 0=pass, 1=warn, 2=fail
      --serve                   Start web server to browse reports
      --port <N>                Web server port [default: 8080]
      --long-read               Enable long-read QC modules (auto-enabled for Fast5/POD5 inputs)
      --time                    Show per-file and per-step timing breakdown
      --no-parallel             Disable streaming intra-file parallelism (on by default for >50MB files)
  -q, --quiet                   Suppress progress output
      --dup-length <N>          Truncation length for duplication detection [default: 50]
  -h, --help                    Print help
  -V, --version                 Print version

Architecture

rastqc/
├── src/
│   ├── main.rs              # CLI entry point, file dispatch, exit codes
│   ├── config.rs            # Adapters, contaminants, limits, thresholds
│   ├── gui.rs               # Built-in HTTP server for report browsing
│   ├── parallel.rs          # Streaming parallel pipeline (reader → channel → workers → merge)
│   ├── io/
│   │   ├── mod.rs           # SequenceReader enum (unified format dispatch)
│   │   ├── fastq.rs         # FASTQ/gz/bz2 streaming reader + stdin
│   │   ├── bam.rs           # BAM/SAM reader via noodles
│   │   ├── colorspace.rs    # SOLiD di-base → basespace decoder
│   │   ├── fast5.rs         # Oxford Nanopore Fast5 (HDF5) reader
│   │   └── pod5.rs          # Oxford Nanopore POD5 (Arrow IPC) reader
│   ├── modules/
│   │   ├── mod.rs           # QCModule trait, merge support, factory
│   │   ├── basic_stats.rs   # Sequence count, length, %GC, encoding
│   │   ├── per_base_quality.rs
│   │   ├── per_tile_quality.rs
│   │   ├── per_sequence_quality.rs
│   │   ├── per_base_content.rs
│   │   ├── per_sequence_gc.rs
│   │   ├── n_content.rs
│   │   ├── sequence_length.rs
│   │   ├── duplication.rs
│   │   ├── overrepresented.rs
│   │   ├── adapter_content.rs
│   │   ├── kmer_content.rs
│   │   └── long_read_quality.rs  # N50, quality-stratified length, homopolymer
│   └── report/
│       └── mod.rs           # HTML, text, JSON, ZIP, summary generation
├── tests/
│   └── integration_test.rs  # 11 integration tests
├── paper/                   # Manuscript, benchmarks, figures
└── FastQC/                  # Reference FastQC for concordance testing

Data flow: Files → SequenceReader → streaming Sequence records → each record passed to all QCModule instances → calculate_results() → report generation (HTML/text/JSON/ZIP).

Streaming parallel pipeline (default for files >50MB): A dedicated reader thread streams batches of sequences through a bounded crossbeam channel to N worker threads, each with independent module instances. After the file is fully read, worker states are merged via merge_from(). This avoids buffering the entire file in memory while achieving near-linear speedup with thread count.

All 15 modules implement the QCModule trait with process_sequence(), calculate_results(), merge_from() (for parallel chunk merging), and output methods. Modules are created by ModuleFactory based on the limits configuration.

Output files

For each input file sample.fastq.gz, RastQC produces:

FileDescription
sample_fastqc.zipZIP archive containing all outputs below
sample_fastqc/fastqc_report.htmlSelf-contained HTML report with SVG charts
sample_fastqc/fastqc_data.txtTab-separated data for each module
sample_fastqc/summary.txtOne-line PASS/WARN/FAIL per module
sample_multiqc.jsonNative MultiQC JSON (with --multiqc-json)

When processing multiple files with --summary:

FileDescription
summary.tsvTab-separated matrix: rows = files, columns = modules
summary.htmlOverview dashboard linking to all individual reports

QC modules

#ModuleWhat it checksPass/Warn/Fail criteria
1Basic StatisticsSequence count, length, %GC, encodingInformational only
2Per Base Sequence QualityQuality score distribution at each positionMedian < 25 (warn) / < 20 (fail)
3Per Tile Sequence QualityQuality variation between flowcell tilesMax deviation > 5 (warn) / > 10 (fail)
4Per Sequence Quality ScoresDistribution of mean quality per readMode <= 27 (warn) / <= 20 (fail)
5Per Base Sequence ContentA/T/G/C proportions at each position
6Per Sequence GC ContentGC% distribution vs theoretical normalDeviation > 15% (warn) / > 30% (fail)
7Per Base N ContentUnknown base (N) frequency per positionN% > 5 (warn) / > 20 (fail)
8Sequence Length DistributionRead length variabilityVariable lengths (warn)
9Sequence Duplication LevelsLibrary complexity estimate< 70% unique (warn) / < 50% unique (fail)
10Overrepresented SequencesFrequently occurring sequences + contaminant matchingAny seq > 0.1% (warn) / > 1% (fail)
11Adapter ContentKnown adapter sequence contamination> 5% (warn) / > 10% (fail)
12Kmer ContentPositionally biased k-mers-log10(p) > 2 (warn) / > 5 (fail)
13Read Length N50 (Long Read)N50, N90, mean, median, min, max lengthsInformational only
14Quality Stratified Length (Long Read)Length distribution by quality tier (Q<10 to Q40+)>50% below Q20 (warn)
15Homopolymer Content (Long Read)Homopolymer run frequency by base and length>5% bases in runs (warn) / >10% (fail)

Modules 13--15 are RastQC-exclusive, designed for long-read sequencing data (PacBio HiFi, Oxford Nanopore). These modules are disabled by default and enabled with --long-read or automatically when processing Fast5/POD5 files. Their thresholds are calibrated for long-read error profiles and would produce false positives on short-read Illumina data.

Working with many files

Batch processing

# Process all FASTQ files in a directory
rastqc -o qc_results/ data/*.fastq.gz

# Process with summary dashboard
rastqc -o qc_results/ --summary data/*.fastq.gz

# Use find for recursive discovery
find data/ -name "*.fastq.gz" | xargs rastqc -o qc_results/ --summary

Summary report

The --summary flag generates two files for multi-file review:

summary.tsv -- machine-readable matrix for scripting:

Sample	Basic Statistics	Per Base Quality	...	Adapter Content
sample_A	PASS	PASS	...	WARN
sample_B	PASS	FAIL	...	PASS

summary.html -- browser-friendly dashboard with color-coded PASS/WARN/FAIL table.

Filtering results

# Find all failing samples
grep "FAIL" qc_results/summary.tsv

# Count warnings per sample
awk -F'\t' '{n=0; for(i=2;i<=NF;i++) if($i=="WARN") n++; print \$1, n}' qc_results/summary.tsv

Custom configuration

Adapter list

Tab-separated file with adapter name and 12bp sequence:

My Custom Adapter	AGATCGGAAGAG
Another Adapter		CTGTCTCTTATA

Contaminant list

Tab-separated file with contaminant name and full sequence:

PhiX Control	GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACT
Custom Primer	AATGATACGGCGACCACCGA

Limits file

Controls pass/warn/fail thresholds and which modules run:

# Disable a module
kmer    ignore  1

# Adjust thresholds
quality_base_lower  warn    10
quality_base_lower  error   5
adapter             warn    5
adapter             error   10

Compatibility with FastQC

RastQC produces output compatible with tools that consume FastQC results:

  • MultiQC: fastqc_data.txt files are compatible with MultiQC's FastQC module
  • Native JSON: --multiqc-json provides structured output without parsing
  • summary.txt: same PASS/WARN/FAIL format per module
  • Identical module names and data headers in text output
  • 100% concordance: 55/55 module calls identical across 5 model organisms

Performance

Benchmarked on real sequencing data (ENA/SRA), 4 threads, macOS ARM64:

Short-read (Illumina)

FileSizeReadsFastQC 0.12.1RastQCSpeedup
DRR609229 R122 MB720K3.5s2.0s1.8x
DRR609229 R223 MB720K3.5s2.0s1.7x
ERR5897746 R1320 MB4.3M15.6s4.8s3.2x
ERR5897746 R2327 MB4.3M15.6s4.8s3.2x
DRR013000 R11.4 GB24.8M51.8s19.6s2.6x
All 5 files2.1 GB34.7M55.7s22.3s2.5x

Long-read (ONT / PacBio)

FilePlatformSizeReadsMean LengthFastQCRastQCSpeedup
DRR242198ONT MinION406 MB76K5.3 kb14.6s3.1s4.7x
DRR723651PacBio Revio281 MB42K18.8 kb17.6s2.7s6.5x

The --long-read flag enables 3 additional QC modules with negligible overhead.

Resource comparison

MetricRastQCFastQC (Java)
Binary size2.1 MB~215 MB (with JRE)
Startup time<5 ms~2.5 s JVM warmup
Peak memory (small files)49-50 MB424-425 MB
Peak memory (1.4 GB file)315 MB434 MB
Peak memory (long reads)670-1257 MB702-854 MB
Threadingstreaming intra-file + multi-file parallelper-file parallel
Modules12 core + 3 long-read11

RastQC's streaming parallel pipeline automatically activates for files >50MB, using a bounded reader-worker architecture with adaptive batch sizing that scales with thread count without buffering the entire file in memory.


Citation

If you use RastQC in your research, please cite:

Huang KL. RastQC: A fast, Rust-based quality control tool for high-throughput sequencing data. bioRxiv (2026). https://www.biorxiv.org/content/10.64898/2026.03.31.715630v2

Acknowledgments

RastQC is a reimplementation inspired by FastQC by Simon Andrews at the Babraham Institute. FastQC has served as the gold standard for sequencing quality control for over a decade, and its elegant module design, diagnostic algorithms, and output formats are the foundation upon which RastQC is built. We are grateful to the FastQC team for creating and maintaining such an essential tool for the genomics community.

License

MIT License. See LICENSE for details.

Contributions are welcome! Please open an issue or pull request on GitHub.

Author

Written by Kuan-Lin Huang at PrecisionOmics.org