RastQC
June 3, 2026 · View on GitHub
A fast quality control tool for high-throughput sequencing data, written in Rust. Drop-in replacement for FastQC with identical QC modules, matching algorithms, and compatible output formats.
Features
- 15 QC modules: all 12 FastQC modules + 3 long-read QC modules
- Fast: streaming parallel pipeline — 2-3x faster than FastQC on real sequencing data
- Portable: single 2.1 MB static binary, no Java runtime needed
- Compatible output: HTML reports, tab-separated data files, ZIP archives, native MultiQC JSON
- Multi-file summary: overview dashboard when processing many files
- Web GUI: built-in report browser (
--serve) - Input formats: FASTQ, gzip, bzip2, BAM, SAM, SOLiD colorspace, Fast5/POD5 (optional), stdin
- Pipeline integration: QC-aware exit codes (
--exit-code) for Nextflow/Snakemake gates
Installation
Via conda (Bioconda)
# Core build (short-read QC)
conda install -c bioconda rastqc
# With Fast5/POD5 support (Oxford Nanopore)
conda install -c bioconda rastqc-nanopore
The Bioconda recipe lives in recipes/rastqc/.
From source
# Requires Rust 1.70+
cargo install --path .
Build manually
git clone https://github.com/Huang-lab/RastQC.git
cd RastQC
cargo build --release
# Binary at ./target/release/rastqc
With Nanopore format support
cargo build --release --features nanopore
Quick start
# Single file
rastqc sample.fastq.gz
# Multiple files (processed in parallel)
rastqc *.fastq.gz
# Specify output directory
rastqc -o results/ sample_R1.fastq.gz sample_R2.fastq.gz
# HTML only (no ZIP)
rastqc --nozip -o results/ sample.fastq.gz
# Stream from stdin (gzip/bzip2 auto-detected)
samtools fastq aligned.bam | rastqc --stdin -o results/
zcat sample.fastq.gz | rastqc --stdin -o results/
# Use 8 threads
rastqc -t 8 -o results/ *.fastq.gz
# Pipeline QC gate (exit 2 if any module fails)
rastqc --exit-code sample.fastq.gz || echo "QC failed"
# Browse reports in browser
rastqc -o results/ *.fastq.gz --serve
# Native MultiQC JSON output
rastqc --multiqc-json -o results/ sample.fastq.gz
Usage
rastqc [OPTIONS] [FILES]...
Arguments:
[FILES]... Input files (FASTQ, FASTA, BAM, SAM, Fast5, POD5). Use "-" for stdin (gzip/bzip2 auto-detected).
Options:
-o, --outdir <DIR> Output directory [default: current directory]
-t, --threads <N> Number of threads [default: all CPUs]
-c, --contaminants <FILE> Custom contaminant list (tab-separated: name\tsequence)
-a, --adapters <FILE> Custom adapter list (tab-separated: name\tsequence)
-l, --limits <FILE> Custom pass/warn/fail thresholds
-k, --kmer-size <N> Kmer size for enrichment analysis [default: 7]
--stdin Read FASTQ from standard input (gzip/bzip2 auto-detected)
--nofilter Include all reads (don't skip QC-failed reads)
--extract Extract ZIP contents after creation
--nozip Write HTML report only, skip ZIP archive
--summary Write multi-file summary report
--multiqc-json Output native MultiQC JSON alongside standard reports
--exit-code Return QC-aware exit codes: 0=pass, 1=warn, 2=fail
--serve Start web server to browse reports
--port <N> Web server port [default: 8080]
--long-read Enable long-read QC modules (auto-enabled for Fast5/POD5 inputs)
--time Show per-file and per-step timing breakdown
--no-parallel Disable streaming intra-file parallelism (on by default for >50MB files)
-q, --quiet Suppress progress output
--dup-length <N> Truncation length for duplication detection [default: 50]
-h, --help Print help
-V, --version Print version
Architecture
rastqc/
├── src/
│ ├── main.rs # CLI entry point, file dispatch, exit codes
│ ├── config.rs # Adapters, contaminants, limits, thresholds
│ ├── gui.rs # Built-in HTTP server for report browsing
│ ├── parallel.rs # Streaming parallel pipeline (reader → channel → workers → merge)
│ ├── io/
│ │ ├── mod.rs # SequenceReader enum (unified format dispatch)
│ │ ├── fastq.rs # FASTQ/gz/bz2 streaming reader + stdin
│ │ ├── bam.rs # BAM/SAM reader via noodles
│ │ ├── colorspace.rs # SOLiD di-base → basespace decoder
│ │ ├── fast5.rs # Oxford Nanopore Fast5 (HDF5) reader
│ │ └── pod5.rs # Oxford Nanopore POD5 (Arrow IPC) reader
│ ├── modules/
│ │ ├── mod.rs # QCModule trait, merge support, factory
│ │ ├── basic_stats.rs # Sequence count, length, %GC, encoding
│ │ ├── per_base_quality.rs
│ │ ├── per_tile_quality.rs
│ │ ├── per_sequence_quality.rs
│ │ ├── per_base_content.rs
│ │ ├── per_sequence_gc.rs
│ │ ├── n_content.rs
│ │ ├── sequence_length.rs
│ │ ├── duplication.rs
│ │ ├── overrepresented.rs
│ │ ├── adapter_content.rs
│ │ ├── kmer_content.rs
│ │ └── long_read_quality.rs # N50, quality-stratified length, homopolymer
│ └── report/
│ └── mod.rs # HTML, text, JSON, ZIP, summary generation
├── tests/
│ └── integration_test.rs # 11 integration tests
├── paper/ # Manuscript, benchmarks, figures
└── FastQC/ # Reference FastQC for concordance testing
Data flow: Files → SequenceReader → streaming Sequence records → each record passed to all QCModule instances → calculate_results() → report generation (HTML/text/JSON/ZIP).
Streaming parallel pipeline (default for files >50MB): A dedicated reader thread streams batches of sequences through a bounded crossbeam channel to N worker threads, each with independent module instances. After the file is fully read, worker states are merged via merge_from(). This avoids buffering the entire file in memory while achieving near-linear speedup with thread count.
All 15 modules implement the QCModule trait with process_sequence(), calculate_results(), merge_from() (for parallel chunk merging), and output methods. Modules are created by ModuleFactory based on the limits configuration.
Output files
For each input file sample.fastq.gz, RastQC produces:
| File | Description |
|---|---|
sample_fastqc.zip | ZIP archive containing all outputs below |
sample_fastqc/fastqc_report.html | Self-contained HTML report with SVG charts |
sample_fastqc/fastqc_data.txt | Tab-separated data for each module |
sample_fastqc/summary.txt | One-line PASS/WARN/FAIL per module |
sample_multiqc.json | Native MultiQC JSON (with --multiqc-json) |
When processing multiple files with --summary:
| File | Description |
|---|---|
summary.tsv | Tab-separated matrix: rows = files, columns = modules |
summary.html | Overview dashboard linking to all individual reports |
QC modules
| # | Module | What it checks | Pass/Warn/Fail criteria |
|---|---|---|---|
| 1 | Basic Statistics | Sequence count, length, %GC, encoding | Informational only |
| 2 | Per Base Sequence Quality | Quality score distribution at each position | Median < 25 (warn) / < 20 (fail) |
| 3 | Per Tile Sequence Quality | Quality variation between flowcell tiles | Max deviation > 5 (warn) / > 10 (fail) |
| 4 | Per Sequence Quality Scores | Distribution of mean quality per read | Mode <= 27 (warn) / <= 20 (fail) |
| 5 | Per Base Sequence Content | A/T/G/C proportions at each position | |
| 6 | Per Sequence GC Content | GC% distribution vs theoretical normal | Deviation > 15% (warn) / > 30% (fail) |
| 7 | Per Base N Content | Unknown base (N) frequency per position | N% > 5 (warn) / > 20 (fail) |
| 8 | Sequence Length Distribution | Read length variability | Variable lengths (warn) |
| 9 | Sequence Duplication Levels | Library complexity estimate | < 70% unique (warn) / < 50% unique (fail) |
| 10 | Overrepresented Sequences | Frequently occurring sequences + contaminant matching | Any seq > 0.1% (warn) / > 1% (fail) |
| 11 | Adapter Content | Known adapter sequence contamination | > 5% (warn) / > 10% (fail) |
| 12 | Kmer Content | Positionally biased k-mers | -log10(p) > 2 (warn) / > 5 (fail) |
| 13 | Read Length N50 (Long Read) | N50, N90, mean, median, min, max lengths | Informational only |
| 14 | Quality Stratified Length (Long Read) | Length distribution by quality tier (Q<10 to Q40+) | >50% below Q20 (warn) |
| 15 | Homopolymer Content (Long Read) | Homopolymer run frequency by base and length | >5% bases in runs (warn) / >10% (fail) |
Modules 13--15 are RastQC-exclusive, designed for long-read sequencing data (PacBio HiFi, Oxford Nanopore). These modules are disabled by default and enabled with --long-read or automatically when processing Fast5/POD5 files. Their thresholds are calibrated for long-read error profiles and would produce false positives on short-read Illumina data.
Working with many files
Batch processing
# Process all FASTQ files in a directory
rastqc -o qc_results/ data/*.fastq.gz
# Process with summary dashboard
rastqc -o qc_results/ --summary data/*.fastq.gz
# Use find for recursive discovery
find data/ -name "*.fastq.gz" | xargs rastqc -o qc_results/ --summary
Summary report
The --summary flag generates two files for multi-file review:
summary.tsv -- machine-readable matrix for scripting:
Sample Basic Statistics Per Base Quality ... Adapter Content
sample_A PASS PASS ... WARN
sample_B PASS FAIL ... PASS
summary.html -- browser-friendly dashboard with color-coded PASS/WARN/FAIL table.
Filtering results
# Find all failing samples
grep "FAIL" qc_results/summary.tsv
# Count warnings per sample
awk -F'\t' '{n=0; for(i=2;i<=NF;i++) if($i=="WARN") n++; print \$1, n}' qc_results/summary.tsv
Custom configuration
Adapter list
Tab-separated file with adapter name and 12bp sequence:
My Custom Adapter AGATCGGAAGAG
Another Adapter CTGTCTCTTATA
Contaminant list
Tab-separated file with contaminant name and full sequence:
PhiX Control GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACT
Custom Primer AATGATACGGCGACCACCGA
Limits file
Controls pass/warn/fail thresholds and which modules run:
# Disable a module
kmer ignore 1
# Adjust thresholds
quality_base_lower warn 10
quality_base_lower error 5
adapter warn 5
adapter error 10
Compatibility with FastQC
RastQC produces output compatible with tools that consume FastQC results:
- MultiQC:
fastqc_data.txtfiles are compatible with MultiQC's FastQC module - Native JSON:
--multiqc-jsonprovides structured output without parsing - summary.txt: same PASS/WARN/FAIL format per module
- Identical module names and data headers in text output
- 100% concordance: 55/55 module calls identical across 5 model organisms
Performance
Benchmarked on real sequencing data (ENA/SRA), 4 threads, macOS ARM64:
Short-read (Illumina)
| File | Size | Reads | FastQC 0.12.1 | RastQC | Speedup |
|---|---|---|---|---|---|
| DRR609229 R1 | 22 MB | 720K | 3.5s | 2.0s | 1.8x |
| DRR609229 R2 | 23 MB | 720K | 3.5s | 2.0s | 1.7x |
| ERR5897746 R1 | 320 MB | 4.3M | 15.6s | 4.8s | 3.2x |
| ERR5897746 R2 | 327 MB | 4.3M | 15.6s | 4.8s | 3.2x |
| DRR013000 R1 | 1.4 GB | 24.8M | 51.8s | 19.6s | 2.6x |
| All 5 files | 2.1 GB | 34.7M | 55.7s | 22.3s | 2.5x |
Long-read (ONT / PacBio)
| File | Platform | Size | Reads | Mean Length | FastQC | RastQC | Speedup |
|---|---|---|---|---|---|---|---|
| DRR242198 | ONT MinION | 406 MB | 76K | 5.3 kb | 14.6s | 3.1s | 4.7x |
| DRR723651 | PacBio Revio | 281 MB | 42K | 18.8 kb | 17.6s | 2.7s | 6.5x |
The --long-read flag enables 3 additional QC modules with negligible overhead.
Resource comparison
| Metric | RastQC | FastQC (Java) |
|---|---|---|
| Binary size | 2.1 MB | ~215 MB (with JRE) |
| Startup time | <5 ms | ~2.5 s JVM warmup |
| Peak memory (small files) | 49-50 MB | 424-425 MB |
| Peak memory (1.4 GB file) | 315 MB | 434 MB |
| Peak memory (long reads) | 670-1257 MB | 702-854 MB |
| Threading | streaming intra-file + multi-file parallel | per-file parallel |
| Modules | 12 core + 3 long-read | 11 |
RastQC's streaming parallel pipeline automatically activates for files >50MB, using a bounded reader-worker architecture with adaptive batch sizing that scales with thread count without buffering the entire file in memory.
Citation
If you use RastQC in your research, please cite:
Huang KL. RastQC: A fast, Rust-based quality control tool for high-throughput sequencing data. bioRxiv (2026). https://www.biorxiv.org/content/10.64898/2026.03.31.715630v2
Acknowledgments
RastQC is a reimplementation inspired by FastQC by Simon Andrews at the Babraham Institute. FastQC has served as the gold standard for sequencing quality control for over a decade, and its elegant module design, diagnostic algorithms, and output formats are the foundation upon which RastQC is built. We are grateful to the FastQC team for creating and maintaining such an essential tool for the genomics community.
License
MIT License. See LICENSE for details.
Contributions are welcome! Please open an issue or pull request on GitHub.
Author
Written by Kuan-Lin Huang at PrecisionOmics.org