RNA-seq analysis pipeline, STAR input, edgeR, functional enrichment, visualization
October 31, 2025 ยท View on GitHub
Main scripts
-
Analysis_NF.Rmd - RNA-seq analysis pipeline for the Nextflow rnaseq results.
- Input:
star_rsem/rsem.merged.gene_counts.tsvandstar_rsem/rsem.merged.gene_tpm.tsv - Output: PDF with exploratory analysis, quality control, diagnostics, visualizations.
- Sums of counts by sample, from largest to smallest
- Correlation matrix and PCA, colored by groups
- Top 15 highest/lowest expressed genes
- Differential expression analysis
- Total number of DEGs and proportions of their gene types
- Table of top differential protein-coding DEGs
- Boxplots of top differential genes, to check logFC direction
- Heatmap of top 50 protein-coding DEGs, and volcano plot
DEGs.xlsx- complete differential analysis resultsTPM.xlsx- log2-transformed TPM values, to look up expression of individual genes.
- Input:
-
GSEA_UP.Rmd - EnrichR (non-directional) and GSEA (directional) analysis using KEGG, GO, MSigDb.
- Input:
DEGs.xlsx, analyzing each sheet separately - Output:
GSEA_<analysis_name>.xlsx- full pathway analysis results
- Input:
-
Figure_GSEA_figures.Rmd - Visualizing GSEA results
- Input:
GSEA_<analysis_name>.xlsx - Output: PDF with visualizations, renamed to
Figure_GSEA_<analysis_name>.pdf
- Input:
Additional scripts
-
Analysis_STAR.Rmd - RNA-seq analysis pipeline for
STARcounts. Prerequisites:- A path to data folder. This folder should have 3 subfolders:
02_STAR-align- gzipped count files with.tabextension outputted bySTARalignerresults- folder where the results will be storeddata- Must havesample_annotation.csvfile, example below
- A path to data folder. This folder should have 3 subfolders:
-
GSEA_custom.Rmd - GSEA analysis using custom gene signature.
-
GSEA_figures.Rmd - Visualization of GSEA enrichment results as horizontal barplots.
-
GSEA_ATAC_RNA.Rmd - GSEA analysis of integrative results.
-
Figure_Barplot_DEGs.Rmd - barplot of selected genes. Example
-
Figure_GSEA_comparison.Rmd - GSEA results plotting for two analyses.
-
Figure_heatmap.Rmd - make heatmap of top 50 differentially expressed genes. Uses
TMP.xlsxproduced byAnalysis*.Rmd. May use a custom signature of genes. Includes EnhancedVolcano and boxplots of selected genes. -
Figure_Linear_DEGs.Rmd - plotting DEGs ranked by fold changes. Example
-
oncoEnrichR.Rmd - Cancer-dedicated gene set interpretation using the oncoEnrichR R package. Example
-
Pathview.Rmd - visualization of top KEGG pathways. Uses
DEGs.xlsxproduced byAnalysis*.Rmd. Example -
VennDiagram.qmd - Venn diagram plotting
-
calcTPM.R - a function to calculate TPMs from gene counts
-
utils.R - helper functions. utils_NF.R - adjusted for the Nextflow pipeline.
data
Human.MitoCarta3.0.xls- Human MitoCarta3.0: 1136 mitochondrial genes, https://personal.broadinstitute.org/scalvo/MitoCarta3.0/human.mitocarta3.0.html, downloaded 2023-11-02.Mouse.MitoCarta3.0.xls- Mouse MitoCarta3.0: 1140 mitochondrial genes, https://personal.broadinstitute.org/scalvo/MitoCarta3.0/human.mitocarta3.0.html, downloaded 2023-11-02.
misc - Old scripts
Analysis_featurecounts.Rmd- RNA-seq analysis pipeline forfeatureCountcounts. Prerequisites:- A path to data folder. This folder should have 3 subfolders:
03_featureCount- gzipped count files outputted byfeatureCountresults- folder where the results will be storeddata- Must havesample_annotation.csvfile. Annotation file should have "Sample" column with sample names, and any other annotation columns. Include "Group" column containing covariate of interest. Example:
- A path to data folder. This folder should have 3 subfolders:
# Sample,Group
VLI10_AA_S61_L006_R1_001.txt.gz,AA
VLI10_AA_S61_L007_R1_001.txt.gz,AA
VLI10_AA_S61_L008_R1_001.txt.gz,AA
VLI11_C_S62_L006_R1_001.txt.gz,C
VLI11_C_S62_L007_R1_001.txt.gz,C
VLI11_C_S62_L008_R1_001.txt.gz,C
-
NGS_Pipelines - Bash pipelines for processing of NGS data, https://github.com/ATpoint/NGS_Pipelines
-
snRNA-seq-workflow - snRNA-seq analysis of 24 human post-mortem brain, temporal cortes. 10x, R code. https://github.com/jgamache014/snRNA-seq-workflow
-
Figure_clusterProfiler_nes.Rmd- Takes the results of edgeR analysis from an Excel file, performs GO and KEGG GSEA and plots the results as horizontal barplots, sorted by normalized enrichment score (NES). Example -
Figure_clusterProfiler_asis.Rmd- Takes the results of edgeR analysis from an Excel file, performs GO and KEGG GSEA and plots the results as horizontal barplots, sorted by p-value, as they come out of the enrichment analysis. -
enrichR_analysis.Rmd - Analyze gene lists using enrichR. Analyze all genes, and up- and downregulated genes separately. Uses
DEGs.xlsxproduced byAnalysis*.Rmd. -
enrichR_plot.Rmd - barplot of selected enrichment results, similar to Example. WIP
scripts
Scripts for running RNA-seq preprocessing steps on a cluster using PBS job submission system. subread-featurecounts scripts are in the dcaf/ngs.rna-seq repository
- submit00_fastqc.sh - FASTQC on raw FASTQ files
- MultiQC commands to summarize QC reports generated by TrimGalore and STAR
multiqc --filename multiqc_01_trimmed.html --outdir multiqc_01_trimmed 01_trimmed/
multiqc --filename multiqc_02_STAR-align.html --outdir multiqc_02_STAR-align 02_STAR-align/
- submit01_trimgalore.sh - Adapter trimming using TrimGalore
- submit02_STAR-index.sh - Index the genome for the STAR aligner
- submit02_STAR.sh - Align samples using STAR. Requires
input01_toStarAlign.listtext file with the list of input files, each string contains (comma-separated) file name(s), space separates first and second read pairs
CaSpER pipeline detecting CNVs from RNA-seq data
Dedicated repository with detailed instructions: mdozmorov/CaSpER_pipeline
- submit05_BAFExtract-index.sh - indexing the genome for BAFExtract
- submit05_BAFExtract.sh - BAFExtract run
Misc
-
RNAseq-workflow - A repository for setting up a RNAseq workflow. Detailed instructions and code for each analysis and visualization step.
-
DESeq results to pathways in 60 Seconds with the fgsea package, https://stephenturner.github.io/deseq-to-fgsea/
-
A Shiny app for visualizing DESeq2 results by Zuguang Gu. Tweet
Disclaimer
The code in this repository is provided "as is" without any warranty of any kind, express or implied. The author(s) of this code make no guarantees and assume no liability for the accuracy, completeness, or usefulness of this code, or for any damages that may arise from its use. By using this code, you agree to do so at your own risk.