IARC bioinformatics pipelines, tools and other resources (updated on 1st April 2026)

April 23, 2026 ยท View on GitHub

This page lists all the pipelines and tools developed or used at IARC (mostly nextflow pipelines which are suffixed with -nf). It includes also some useful ressources like courses, data notes, manuscripts code/datasets and tips/tricks. Finally at the bottom of the page you will also find explanations on how to use nextflow pipelines.

Table of Content:

1. IARC pipelines/tools list

2. Courses, data notes and manuscripts code/datasets

3. Tips & Tricks

4. Nextflow, Docker and Singularity installation and use

5. Past work - deprecated and unmaintained pipelines and tools

1. IARC pipelines/tools list

1a. Raw NGS data processing

NameLatest versionCodeMaintainedDescriptionTools used
alignment-nfv1.3 - March 2021DSL1:heavy_check_mark: YesPerforms BAM realignment or fastq alignment, with/without local indel realignment and base quality score recalibrationbwa, samblaster, sambamba, samtools, AdapterRemoval, GATK, k8 javascript execution shell, bwa-postalt.js
abra-nfApr 2026DSL1 (v3.0a) & DSL2:heavy_check_mark: YesRuns ABRA (Assembly Based ReAligner)ABRA, bedtools, bwa, sambamba, samtools
BQSR-nfApr 2026DSL1 (v1.1) & DSL2:heavy_check_mark: YesPerforms base quality score recalibration of bam files using GATKsamtools, samblaster, sambamba, GATK
metagenomics-nfMay 2024DSL2:heavy_check_mark: YesRun software centrifuge to detect reads mapping to microbial or viral references, and optionally software virusbreakend to detect viral integrationsamtools, centrifuge, VIRUSBreakend

1b. RNA Seq

NameLatest versionCodeMaintainedDescriptionTools used
RNAseq-nfApril 2026DSL1 (v2.4a) & DSL2 (dev):heavy_check_mark: YesPerforms RNAseq mapping, quality control, and reads counting - See also RNAseq_analysis_scripts for post-processingfastqc, RESeQC, MultiQC, STAR, htseq, cutadapt, Python version > 2.7, trim_galore, hisat2, GATK, samtools
RNAseq-transcript-nfApril 2026DSL1 (v2.2) & DSL2:heavy_check_mark: YesPerforms transcript identification and quantification from a series of BAM filesStringTie
RNAseq-fusion-nfApril 2026DSL1 (v1.1) & DSL2:heavy_check_mark: YesPerform fusion-genes discovery from RNAseq data using STAR-FusionSTAR-Fusion
gene-fusions-nfv1.1 May 2024DSL1 (v1.0) & DSL2 (v.1.1):heavy_check_mark: YesPerform fusion-genes discovery from RNAseq data using ArribaArriba
quantiseq-nfApril 2026DSL1 (v1.1) & DSL2:heavy_check_mark: YesQuantify immune cell content from RNA-seq dataquanTIseq
RNAsplicing-nfv1.0 - April 2025DSL2:heavy_check_mark: Yes๐Ÿ”ด NEW : Perform RNA splicing analyses using SUPPA2trimgalore, salmon and SUPPA2
TCR-BCR-nfv1.0 - 2024DSL2:heavy_check_mark: YesGenotype Tcell and Bcell receptors from bulk or single-cell RNA-seq data using TRUST4TRUST4

workflow

1c. Single-cell RNA seq

NameLatest versionCodeMaintainedDescriptionTools used
singlecell_preprocessing๐Ÿ nf-core pipelineDSL2not developed by IARC๐Ÿ”ด NEW : best-practice analysis pipeline for processing 10x Genomics single-cell RNA-seq dataSalmon-Alevin, Kallisto, STARsolo, Cellranger, CellBender, MultiQC
singlecell_scriptsJanuary 2025NA:heavy_check_mark: Yes๐Ÿ”ด NEW : Python notebook for single-cell analyses following 'Single-cell best practices guide'Python
SComatic-nfApril 2024DSL2:heavy_check_mark: YesPerforms variant calling from single-cell RNAseq dataSComatic, annovar
numbat-nfApril 2024DSL2:heavy_check_mark: YesPerforms variant calling from single-cell RNAseq datanumbat, SigProfilerExtractor

1d. QC

NameLatest versionCodeMaintainedDescriptionTools used
NGSCheckMatev1.1a - July 2021DSL1:heavy_check_mark: YesRuns NGSCheckMate on BAM files to identify data files from a same indidual (i.e. check N/T pairs)NGSCheckMate
fastqc-nfv1.1 - July 2020DSL1:heavy_check_mark: YesRuns fastqc and multiqc on DNA seq data (fastq data)FastQC, MultiQC
qualimap-nfv1.1 - Nov 2019DSL1:heavy_check_mark: YesPerforms quality control on bam files (WES, WGS and target alignment data)samtools, Qualimap, MultiQC

1e. Variant calling

NameLatest versionCodeMaintainedDescriptionTools used
needlestackv1.1 - May 2019DSL1:heavy_check_mark: YesPerforms multi-sample somatic variant callingperl, bedtools, samtools and R software
strelka2-nfFeb 2024DSL1 & DSL2 (dsl2 branch):heavy_check_mark: YesRuns Strelka 2 (germline and somatic variant caller)Strelka2
mutect-nfv2.3 - July 2021DSL1 & DSL2 (dsl2 branch):heavy_check_mark: YesRuns Mutect on tumor-matched normal bam pairsMutect and its dependencies (Java 1.7 and Maven 3.0+), bedtools
vcf_normalization-nfv1.1 - May 2020DSL1:heavy_check_mark: YesDecomposes and normalizes variant calls (vcf files)bcftools,samtools/htslib
gama_annot-nfAug 2020DSL2:heavy_check_mark: YesFilter and annotate batch of vcf files (annovar + strand + context)annovar, R
table_annovar-nfv1.1.1 - Feb 2021DSL1:heavy_check_mark: YesAnnotate variants with annovar (vcf files)annovar
RF-mut-fNov 2021NA:heavy_check_mark: YesRandom forest implementation to filter germline mutations from tumor-only samplesannovar
snpeff_annotation-nf2023DSL2:heavy_check_mark: YesAnnotate variants VCF files with SnpEff and dbSnpdbSNP database, dbNSFP database
***********************************************************************************************
MutSigOct 2021NA:heavy_check_mark: YesPipeline to perform mutational signatures analysis of WGS data using SigProfilerExtractorSigProfilerExtractor
MutSpecv2.0 - May 2017NASuite of tools for analyzing and interpreting mutational signaturesannovar
***********************************************************************************************
purple-nfv1.1 - Nov 2021DSL1:heavy_check_mark: YesPipeline to perform copy number calling from tumor/normal or tumor-only sequencing data using PURPLEPURPLE
facets-nfv3.0 - April 2026DSL1 (v2.0) & DSL2 (v3.0):heavy_check_mark: YesPerforms fraction and copy number estimate from tumor/normal sequencing data using facetsfacets , R
svaba-nfv1.0 - August 2020DSL1:heavy_check_mark: YesPerforms structural variant calling using SvABASvABA , R
sv_somatic_cns-nfv1.0 - Nov 2021DSL1:heavy_check_mark: YesPipeline using multiple SV callers for consensus structural variant calling from tumor/normal sequencing dataDelly, SvABA, Manta, SURVIVOR, bcftools, Samtools
ssvhtv1 - Oct 2022:heavy_check_mark: Yes๐Ÿ”ด NEW : set of scripts to assist the calling of somatic structural variants from short reads using a random forest classifier

1f. Deep learning pipelines and tools for digital pathology

NameLatest versionCodeMaintainedDescriptionTools used
1fa. Whole slide images (WSI) pre-processing
WSIPreprocessingDecember 2023:heavy_check_mark: YesPreprocessing pipeline for WSIs (Tiling, color normalization)Python, openslide
1fb. Tumor segmentation with CFlow AD
TumorSegmentationCFlowADDecember 2023:heavy_check_mark: YesTumour segmentation with an anomaly detection modelPython, PyTorch
1fc. Supervised learning on immunohistochemistry slides
PathonetLNENDecember 2023:heavy_check_mark: YesDetection and classification of cells as positive or negative for an immunomarker developed for PHH3 and Ki-67 in lung carcinoma.Python, TensorFlow
1fd. Self-suprevised feature extractor for WSIs
LNENBarlowTwinsDecember 2023:heavy_check_mark: YesExtractions of HE tiles features with Barlow Twins a self-supervised deep learning model.Python, Pytorch
1fe. Additional tools
SpatialPCAForWSIsDecember 2023:heavy_check_mark: YesSpatially aware principal component analysis to obtain a low-dimensional representation of the tiles encoding vectors.R
LeidenForTilesCommunity_accGPUDecember 2024:heavy_check_mark: Yestools for GPU-accelerated Leiden community detection using the RAPIDS package (focus on clustering encoded vectors from high-dimensional data)

1g. Other tools/pipelines

NameLatest versionCodeMaintainedDescriptionTools used
template-nfMay 2020:heavy_check_mark: YesEmpty template for nextflow pipelinesNA
data_testAug 2020:heavy_check_mark: YesSmall data files to test IARC nextflow pipelinesNA
bam/cram2fastq-nf๐Ÿ nf-core pipelineNAnot developed by IARC๐Ÿ”ด NEW : Pipeline to convert bam files or cram files to fastq filessamtools
bam2cram-nfv1.0 - Nov 2020DSL1 & DSL2:heavy_check_mark: YesPipeline to convert bam files to cram filessamtools
DPclust-nfFeb 2024DSL2:heavy_check_mark: YesMethod for subclonal reconstruction using SNVs and/or CNAs from whole genome or whole exome sequencing datadpclust , R
ITH_pipeline:heavy_check_mark: YesStudy intra-tumoral heterogeneity (ITH) through subclonality reconstructionHATCHet , DeCiFer, ClonEvol
hla-neo-nfApril 2024DSL2:heavy_check_mark: YesPipeline to predict neoantigens from WGS of T/N pairsxHLA, VEP, pVACtools
PRSiceNov 2020Pipeline to compute polygenic risk scoresPRSice-2
methylkeyNov 2024:heavy_check_mark: YesPipeline for 450k and 850k array analysis (bisulfite data analysis using Minfi, Methylumi, Comet, Bumphunter and DMRcate packages)R software
bam2peaksOct 2024DSL2:heavy_check_mark: Yes๐Ÿ”ด NEW : Pipeline designed for peak calling using MACS and IDR, coupled with QC generation using deeptoolsMACS, IDR,Deeptools
wsearch-nfJuly 2022:heavy_check_mark: YesMicrobiome analysis with usearch, vsearch and phyloseq
AmpliconArchitect-nfv1.0 - Oct 2021:heavy_check_mark: YesDiscovers ecDNA in cancer genomes using AmpliconArchitectAmpliconArchitect
addreplacerg-nfJan 2017?Adds and replaces read group tags in BAM filessamtools
bametrics-nfMar 2017?Computes average metrics from reads that overlap a given set of positionsNA
Gviz_multiAlignmentsAug 2017?Generates multiple BAM alignments views using Gviz bioconductor packageGviz
nf_coverage_demov2.3 - July 2020:heavy_check_mark: YesPlots mean coverage over a series of BAM filesbedtools, R software
LiftOver-nfNov 2017?Converts BED/VCF between hg19 and hg38picard
MinION_pipesJan 2020?Analyze MinION sequencing data for the reconstruction of viral genomesGuppy V3.1.5+, Porechop V0.2.4, Nanofilt V2.2.0, Filtlong V0.2.0, SPAdes V3.10.1, CAP3 02/10/15, BLAST V2.9.0+, MUSCLE V3.8.1551, Nanopolish V0.11.0, Minimap2 V2.15, Samtools version 1.9
DraftPolisherJan 2020?Fast polishing of draft sequences (draft genome assembly)MUSCLE, Python3
Imputation-nfv1.1 - July 2021:heavy_check_mark: YesPipeline to perform dataset genotyping imputationLiftOver, Plink, Admixture, Perl, Term::ReadKey, Becftools, Eagle, Minimac4 and samtools
PVAmpliconFinderAug 2020:heavy_check_mark: YesIdentify and classify known and potentially new papilliomaviridae sequences from amplicon deep-sequencing with degenerated papillomavirus primers.Python and Perl + FastQC, MultiQC, Trim Galore, VSEARCH, Blast, RaxML-EPA, PaPaRa, CAP3, KRONA)
integration_analysis_scriptsMar 2020:heavy_check_mark: YesPerforms unsupervised analyses (clustering) from transformed expression data (e.g., log fpkm) and methylation beta valuesR software with iClusterPlus, gplots and lattice R packages
mpileup2readcountsApr 2018?Get the readcounts at a locus by piping samtools mpileup output - forked from gatoravisamtools
Methylation_analysis_scriptsv1.0 - June 2020 - updated Nov 2021:heavy_check_mark: YesPerform Illumina EPIC 850K array pre-processing and QC from idat filesR software
DRMetricsOct 2020:heavy_check_mark: YesEvaluate the quality of projections obtained after using dimensionality reduction techniquesR software
acnviewer-singularityJul 2019?Build a singularity image of aCNViewer (tool for visualization of absolute copy number and copy neutral variations) (Singularity
polysolver-singularityDec 2019?Build a singularity image of Polysolver (tool for HLA typing based on whole exome seq)Singularity
scanMyWorkDirMay 2018?Non-destructive and informative scan of a nextflow work folderNA

2. Courses, data notes and manuscripts code/datasets

NameDescription
nextflow-course-2018Nextflow course
SBG-CGC_course2018Analyzing TCGA data in SBG-CGC
Medical Genomics CourseMedical Genomics course held at the INSA Lyon - updated Fall 2024
intro-cancer-genomicsIntroduction to cancer genomics
*****************************
mesomics_data_noteRepository with code and datasets used in the mesomics data note manuscript: Di Genova et al.
MESOMICS_dataRepository with data and processing scripts associated with the MESOMICS project and main analysis paper: Mangiante et al.
MS_panNEN_organoidsRepository with data and scripts used to produce the genomic Figures in the panNEN organoids manuscript Dayton et al. and associated data note: Alcala et al.
ESMOOpen_LungNENomicsCohortRepository presenting the analyses performed in the manuscript from Mathian et al. regarding histopathological classification of LNEN tumors
MS-EPIC-RareCancersRepository with scripts used in EPIC rare cancer project manuscript from Fernandez-Cuesta, Voegele et al.
MS_SVAwith code used to produce the figures from Morrison et al.
MS_lungNENomicsRepository with scripts from the lungNENomics manuscript Sexton-Oates et al. 2025

3.Tips & Tricks

NameLatest versionCodeMaintainedDescriptionTools used
BAM-tricksTips and tricks for BAM filessamtools, freebayes, bedtools, biobambam2, Picard, rbamtools
VCF-tricksTips and tricks for VCF filessamtools,bcftools, vcflib, vcftools, R scripts
R-tricksTips and tricks for RNA
EGA-tricksTips and tricks to use the European Genome-Phenome Archive from the European Bioinformatics InstituteEGA client
GDC-tricksTips and tricks to use the GDC data portalNA
awesomeTCGACurated list of resources to access TCGA dataNA
LSF-TricksTips and tricks for LSF HPC schedulerNA

4. Nextflow, Docker and Singularity installation and use

4a. Nextflow

  1. Install java JRE if you don't already have it (7 or higher).

  2. Install nextflow.

    curl -fsSL get.nextflow.io | bash
    

    And move it to a location in your $PATH (/usr/local/bin for example here):

    sudo mv nextflow /usr/local/bin
    

4b. Docker

To avoid having to installing all dependencies each time you use a pipeline, you can instead install docker and let nextflow dealing with it. Installing docker is system specific (but quite easy in most cases), follow ย docker documentation (docker CE is sufficient). Also follow the post-installation step to manage Docker as a non-root user (here for Linux), otherwise you will need to change the sudo option in nextflow docker config scope as described in the nextflow documentation here.

To run nextflow pipeline with Docker, simply add the -with-docker option in the nextflow run command.

4c. Singularity

To avoid having to installing all dependencies each time you use a pipeline, you can also install singularity and let nextflow dealing with it.

See documentation here.

In case you want to use the same singularity container - with the exactly same versions of pipeline and tools - on several data over time you may want to pull the container and archive it somewhere :

singularity pull shub://IARCbioinfo/pipeline-nf:v2.2

where "pipeline-nf" should be replaced by the name of the pipeline you want to use (example: RNAseq-nf) and 2.2 by the version of the pipeline you want to use (example: 2.4) This will create a singularity container file: pipeline-nf_v2.2.sif (example: RNAseq-nf_v2.4.sif) that you can then use by specifying it in the nextflow command (see usage)

=> example:

singularity pull shub://IARCbioinfo/RNAseq-nf:v2.4

4d. Usage

nextflow run iarcbioinfo/pipeline_name -r X --input_folder xxx --output_folder xxx -params-file xxx.yml -w /scratch/work

OR USING SINGULARITY

nextflow run iarcbioinfo/pipeline_name -r X -profile singularity --input_folder xxx --output_folder xxx -params-file xxx.yml -w /scratch/work

OR USING SINGULARITY WITH SPECIFIC CONTAINER

nextflow run iarcbioinfo/pipeline_name -r X -with-singularity XXX.sif --input_folder xxx --output_folder xxx -params-file xxx.yml -w /scratch/work

4e. Updates

You can update the nextflow sofware and the pipeline itself simply using:

nextflow -self-update
nextflow pull iarcbioinfo/pipeline_name

You can also automatically update the pipeline when you run it by adding the option -latest in the nextflow run command. Doing so you will always run the latest version from Github.

4f. Help

nextflow run iarcbioinfo/pipeline_name --help

5. Past work (Deprecated and unmaintained pipelines and tools)

NameLatest versionMaintainedDescriptionTools used
GATK-Alignment-nfJune 2017NoPerforms bwa alignment and pre-processing (realignment and recalibration) following first version of GATK best practices (less performant than alignment-nf )bwa, picard, GATK
gatk4-DataPreProcessing-nfNov 2018NoPerforms bwa alignment and pre-processing (mark duplicates and recalibration) following GATK4 best practices - compatible with hg38bwa, picard, GATK4, sambamba, qualimap
PostAlignment-nfAug 2018NoPerform post alignment on bam filessamtools, sambamba, bwa-postalt.js
QC3May 2016NoRuns QC on DNA seq data (raw data, aligned data and variant calls - forked from slzhaosamtools
mpileup-nfJan 2018NoComputes bam coverage with samtools mpileup (bed parallelization)samtools,annovar
GVCF_pipeline-nfNov 2016NoPerforms bam realignment and recalibration + variant calling in GVCF mode following GATK best practicesbwa, samblaster, sambamba, GATK
platypus-nfv1.0 - Apr 2018NoRuns Platypus (germline variant caller)Platypus
TCGA_platypus-nfAug 2018NoConverts TCGA Platypus vcf in format for annotation with annovarvt,VCFTools
TCGA_germline-nfMay 2017DSL1?Extract germline variants from TCGA data for annotation with annovar (vcf files)
marathon-wgsJune 2018NoStudies intratumor heterogeneity with Canopybwa, platypus, strelka2, vt, annovar, R, Falcon, Canopy
ITH-nfSept 2018NoPerform intra-tumoral heterogeneity (ITH) analysisStrelka2 , Platypus, Bcftools, Tabix, Falcon, Canopy
conpair-nfJune 2018NoRuns conpair (concordance and contamination estimator)conpair, Python 2.7, numpy 1.7.0 or higher, scipy 0.14.0 or higher, GATK 2.3 or higher
damage-estimator-nfJune 2017NoRuns "Damage Estimator"Damage Estimator, samtools, R with GGPLOT2 package
bamsurgeon-nfMar 2019NoRuns bamsurgeon (tool to add mutations to bam files) with step of variant simulationPython 2.7, bamsurgeon, R software (tested with R version 3.2.3)
target-seqAug 2019NoWhole pipeline to perform multi-sample somatic variant calling using Needlestack on targeted sequencing dataabra2,QC3 ,needlestack, annovar and R software
strelka-nfJun 2017NoRuns Strelka (germline and somatic variant caller)Strelka
gatk4-HaplotypeCaller-nfDec 2019NoRuns variant calling in GVCF mode on bam files following GATK best practicesGATK
gatk4-GenotypeGVCFs-nfApr 2019NoRuns joint genotyping on gvcf files following GATK best practicesGATK
CODEX-nfMar 2017NoPerforms copy number variant calling from whole exome sequencing data using CODEXR with package Codex, Rscript