RDConnect_RNASeq

October 4, 2018 ยท View on GitHub

Step 1 : Alignment

From fastq files for paired end files will run

  • STAR 2.5.3a to generate alignment to genome and transcriptome
  • RSEM 1.3.0 to quantify gene and isoform usage

It requires to have previously generated STAR indexes apt for the task. Same goes for RSEM indexes. Both folders are to be added as input for the call We require also that the reference fasta file is processed as by GATK requirements (with samtools faidx and Picard)

sh rd_connect_align.sh \
         -A f1.fq.gz \
         -B f2.fq.gz \
         -n CPUS \
         -m gigabyte memory \
         -j RSEM_indexes_folder \
         -i sampleID \
         -s STAR_indexes_folder \
         -r reference_fasta_file \
         -t tmp_folder \
         -o results_folder

Step 2: Variant Calling

-This step starts from the aligned bam file to the genome and produces:

  • Requires:
    • GATK 3.6.0
    • Picard 2.18.2
    • Tabix 0.2.5
  • One processed bam file (with adjusted scores, duplicate removed,
    etc.)
  • One gVCF file with all variants and non variant sites
  • One VCF file with all filtered variants [these are the variants for RD-Connect site]
  • It also produces an Allele Specific Report for each variant

Example:

sh rd_connect_rna_call.sh \
    -b : my_bam.bam \
    -r : reference file \
    -s : sample id \
    -n : CPUS \
    -g : gigabyte memory

Requirements

  • GATK 3.6.0
  • STAR 2.5.3a
  • RSEM 1.3.0
  • Picard 2.18.2
  • Tabix 0.2.5

Nextflow execution / Docker

Both steps come with nextflow implementation (in Nextflow folder) which allow for full reproducibility, counting with a Docker implementation using publicly available images.