RNA-seq analysis pipeline, STAR input, edgeR, functional enrichment, visualization

October 31, 2025 ยท View on GitHub

Main scripts

  • Analysis_NF.Rmd - RNA-seq analysis pipeline for the Nextflow rnaseq results.

    • Input: star_rsem/rsem.merged.gene_counts.tsv and star_rsem/rsem.merged.gene_tpm.tsv
    • Output: PDF with exploratory analysis, quality control, diagnostics, visualizations.
      • Sums of counts by sample, from largest to smallest
      • Correlation matrix and PCA, colored by groups
      • Top 15 highest/lowest expressed genes
      • Differential expression analysis
        • Total number of DEGs and proportions of their gene types
        • Table of top differential protein-coding DEGs
        • Boxplots of top differential genes, to check logFC direction
        • Heatmap of top 50 protein-coding DEGs, and volcano plot
      • DEGs.xlsx - complete differential analysis results
      • TPM.xlsx - log2-transformed TPM values, to look up expression of individual genes.
  • GSEA_UP.Rmd - EnrichR (non-directional) and GSEA (directional) analysis using KEGG, GO, MSigDb.

    • Input: DEGs.xlsx, analyzing each sheet separately
    • Output: GSEA_<analysis_name>.xlsx - full pathway analysis results
  • Figure_GSEA_figures.Rmd - Visualizing GSEA results

    • Input: GSEA_<analysis_name>.xlsx
    • Output: PDF with visualizations, renamed to Figure_GSEA_<analysis_name>.pdf

Additional scripts

data

misc - Old scripts

  • Analysis_featurecounts.Rmd - RNA-seq analysis pipeline for featureCount counts. Prerequisites:
    • A path to data folder. This folder should have 3 subfolders:
      • 03_featureCount - gzipped count files outputted by featureCount
      • results - folder where the results will be stored
      • data - Must have sample_annotation.csv file. Annotation file should have "Sample" column with sample names, and any other annotation columns. Include "Group" column containing covariate of interest. Example:
# Sample,Group
VLI10_AA_S61_L006_R1_001.txt.gz,AA
VLI10_AA_S61_L007_R1_001.txt.gz,AA
VLI10_AA_S61_L008_R1_001.txt.gz,AA
VLI11_C_S62_L006_R1_001.txt.gz,C
VLI11_C_S62_L007_R1_001.txt.gz,C
VLI11_C_S62_L008_R1_001.txt.gz,C
  • NGS_Pipelines - Bash pipelines for processing of NGS data, https://github.com/ATpoint/NGS_Pipelines

  • snRNA-seq-workflow - snRNA-seq analysis of 24 human post-mortem brain, temporal cortes. 10x, R code. https://github.com/jgamache014/snRNA-seq-workflow

  • Figure_clusterProfiler_nes.Rmd - Takes the results of edgeR analysis from an Excel file, performs GO and KEGG GSEA and plots the results as horizontal barplots, sorted by normalized enrichment score (NES). Example

  • Figure_clusterProfiler_asis.Rmd - Takes the results of edgeR analysis from an Excel file, performs GO and KEGG GSEA and plots the results as horizontal barplots, sorted by p-value, as they come out of the enrichment analysis.

  • enrichR_analysis.Rmd - Analyze gene lists using enrichR. Analyze all genes, and up- and downregulated genes separately. Uses DEGs.xlsx produced by Analysis*.Rmd.

  • enrichR_plot.Rmd - barplot of selected enrichment results, similar to Example. WIP

scripts

Scripts for running RNA-seq preprocessing steps on a cluster using PBS job submission system. subread-featurecounts scripts are in the dcaf/ngs.rna-seq repository

multiqc --filename multiqc_01_trimmed.html --outdir multiqc_01_trimmed 01_trimmed/
multiqc --filename multiqc_02_STAR-align.html --outdir multiqc_02_STAR-align 02_STAR-align/

CaSpER pipeline detecting CNVs from RNA-seq data

Dedicated repository with detailed instructions: mdozmorov/CaSpER_pipeline

Misc

Disclaimer

The code in this repository is provided "as is" without any warranty of any kind, express or implied. The author(s) of this code make no guarantees and assume no liability for the accuracy, completeness, or usefulness of this code, or for any damages that may arise from its use. By using this code, you agree to do so at your own risk.