snpeff_annotation-nf
December 1, 2023 ยท View on GitHub
Nextflow DSL2 pipeline to annotate VCF files with SnpEff and dbSnp
This repository contains a Nextflow DSL2 pipeline for annotating genetic variants in VCF files using SnpEff and dbSnp database. The pipeline processes input VCF files, performs various annotations, and generates a comprehensive annotation file.
Prerequisites
Make sure you have the following dependencies installed before running the pipeline:
Pipeline Overview
-
FilterInputFiles: Filters input VCF files using PLINK 2 to retain PASS variants with a maximum of 2 alleles.
-
AnnotateWithRSID: Annotates variants with RSID using SnpSift and the dbSNP database.
-
AnnotateWithImpact: Annotates variants with functional impact using snpEff and a specified reference genome.
-
FullyAnnotateWithDbSNP: Performs comprehensive annotation using SnpSift and dbNSFP database, including information on gene impact, gnomAD data, REVEL scores, ClinVar information, and more.
-
ExtractFields: Extracts relevant fields from the annotated VCF files and creates a tab-separated text file with a header for downstream analysis.
Usage
-
Clone the repository:
git clone https://github.com/IARCbioinfo/snpeff_annotation-nf cd snpeff_annotation-nf -
Adjust the
nextflow.configfile if necessary. The package versions are specified inenvironment.ymlfile. -
Run the pipeline with:
nextflow run main.nf -profile conda
Input
| Name | Default value | Description |
|---|---|---|
--input_folder_with_VCF_files | ${baseDir}/VCFs/ | Folder containing *vcf.gz files |
Parameters
| Name | Default value | Description |
|---|---|---|
--reference_genome | GRCh37.75 | Reference genome |
--dbNSF_path | ${baseDir}/dbNSFP4.1a.txt.gz | dbNSFP database |
--dbSNP_path | ${baseDir}/dbsnp150.vcf.gz | dbSNP database |
--output_path | ${baseDir}/output | Output folder |
Output
The final annotated and extracted information will be available in the output directory as full_annotation.txt.
Customization
- Adjust the memory requirements etc in the
nextflow.configfile. - Customize the annotation processes in the
main.nfscript based on your specific requirements.
Acknowledgments
- This pipeline utilizes various bioinformatics tools and databases, including PLINK, bcftools, SnpSift, snpEff, dbNSFP, and dbSNP.