vcf_normalization-nf
May 25, 2020 ยท View on GitHub
Nextflow pipeline for vcf normalization

Description
Apply bcftools norm to decompose and normalize variants from a set of VCF (compressed with gzip/bgzip).
This scripts takes a set of a folder containing compressed VCF files (*.vcf.gz) as an input.
It consists at four piped steps:
- (optional) filtering of variants (
bcftoolvs view -f) - split multiallelic sites into biallelic records (
bcftools norm -m -) and left-alignment and normalization (-f ref) - sorting (
bcftools sort) - duplicate removal (
bcftools norm -d exact) and compression (-Oz)
Dependencies
-
This pipeline is based on nextflow. As we have several nextflow pipelines, we have centralized the common information in the IARC-nf repository. Please read it carefully as it contains essential information for the installation, basic usage and configuration of nextflow and our pipelines.
-
External software:
Caution: bcftools has to be in your $PATH. Try each of the commands bcftools and bgzip, if it returns the options this is ok.
Input
| Name | Description |
|---|---|
--vcf_folder | Folder containing tumor zipped VCF files |
Parameters
| Name | Example value | Description |
|---|---|---|
--ref | /path/to/ref.fasta | Reference fasta file indexed |
| Name | Default value | Description |
|---|---|---|
--output_folder | normalized_VCF/ | Folder to output resulting compressed vcf |
--filter_opt | -f PASS | Options for bcftools view |
--cpu | 2 | Number of cpus to use |
--mem | 8 | Size of memory used for mapping (in GB) |
Note that the default is to filter variants with the PASS flag. To deactivate, use --filter_opt " ".
Flags are special parameters without value.
| Name | Description |
|---|---|
--help | Display help |
Usage
Simple use case example:
nextflow run iarcbioinfo/vcf_normalization-nf -r v1.1 -profile singularity --vcf_folder VCF/ --ref ref.fasta
To run the pipeline without singularity just remove "-profile singularity". Alternatively, one can run the pipeline using a docker container (-profile docker) the conda receipe containing all required dependencies (-profile conda).
Output
| Type | Description |
|---|---|
| VCF.gz, VCF.gz.tbi | Compressed normalized VCF files with indexes |
Contributions
| Name | Description | |
|---|---|---|
| Nicolas Alcala* | alcalan@iarc.fr | Developer to contact for support |
| Tiffany Delhomme | delhommet@students.iarc.fr | Developer |