README

June 12, 2013 ยท View on GitHub

The original fastqc (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) is a java program that reads genomic fastq files and does various quality control (QC) checks and graphs. This fastqc project is an R package that does much the same (the results are near identical).

At base it uses the seqan library (http://www.seqan.de) - "an open source C++ library of efficient algorithms and data structures for the analysis of sequences". This is interfaced with R using the Rcpp package plus I use RcppArmadillo too for matrix calculations. Then for graphics I use ggplot2.

The original fastqc (java) reads ~200k reads only and this program does by default - but will work on all reads if asked. So whilst fairly efficient in memory and speed it maybe make your 4GB laptop chunter if you push past the ~4-5 million read mark.

the package now features multithreading (via the parallel package) and multifile processing.