README.md

July 19, 2017 ยท View on GitHub

Structural Variant Prediction Viewer

SVPV enables visualisation of predicted structural variant regions in paired-end whole genome sequencing alignments, and allows comparison of calls from differenct structural variant prediction algorithms. Statistics related to structural variants are presented in a form that allows users to visually identify false postive calls. Input is a set alignment files (SAM/BAM/CRAM format) along with a set of structural variant predictions on these alignments (VCF files). Output is a set of pdf files of structural variant regions. Please see the wiki for examples of SVPV plots of different structural variant categories.

SVPV supports VCF structural variant types deletion (DEL), duplication (DUP), copy number variation (CNV), inversion (INV), insertion (INS) and breakend ('BND'). Delly2-style translocations (TRA) are also supported.

Requirements

Command Line Mode

  • Python 2.7.+ and NumPy
  • R v3.+
  • SAMtools and BCFtools (version 1.3)
  • Linux environment, or access to linux via ssh

Note: SAMtools and BCFtools must be executable by typing 'samtools' and 'bcftools' into the terminal.

GUI Mode

  • All command line mode requirements, and:
  • X11 if running over ssh
  • python 2.7 tkinter
  • Recommended: GraphicsMagick or ImageMagick ('display')
    • GraphicsMagick build w/ --enable-magick-compat
    • Note: any other X11 capable pdf viewer specified by '-disp' will work
    • alternatively users can navigate to plot directory and open the file with a local pdf viewer

Installation

  • Navigate to desired install directory and clone this repository.

    git clone https://github.com/VCCRI/SVPV.git SVPV

  • Ensure that requirements are met. For convenience shell scripts are provided for Ubuntu and CentOS.

    sudo sh ./SVPV/set_up/Ubuntu_set_up.sh

  • Test that SVPV is working. If you get any error messages at this point it is likely that some requirements aren't met.

    python ./SVPV/SVPV -example

  • Test the gui is working

    python ./SVPV/SVPV -example -gui

  • all done!

Non-linux users

  • The easiest way to get SVPV running on your Windows or Mac is to run a virtual machine in software such as Oracle VM Virtual Box.
  • You can download an Ubuntu 16.04 image at osboxes.org
  • After your Ubuntu image is running follow the installation instructions above

Usage

Running in GUI mode allows users to select and view individual structural variant calls on some subset of the supplied samples. Running in batch mode (i.e. not GUI mode) will generates plots for each call with the suplied set of samples, matching the supplied filter arguments.

Run args:DescriptionNotes
-vcf1Comma separated list of structural variant prediction VCF/BCF filesrequired
-oOutput directoryrequired
-alnComma separated list of alignment files (indexed BAM/CRAM)required 2
-samplesComma separated list of samples to view, names must be the same as in VCFrequired 2
-guirun in gui modeoptional
-ref_vcf1Reference structural variant vcf/bcf file for annotationoptional
-ref_geneRefseq genes regene table file for annotation3optional
-manifestWhitespace delimited file, first column sample names,
second column alignment file path. Overrides '-samples' and '-aln' if also given.
optional
-pedTab separated pedigree file (GUI only)optional
-famRestrict to this family id only. Requires '-ped'.optional
-separate_plotsPlot each sample separatelyoptional
-l_svsshow SVs extending beyond the current plot area.optional
-dispPDF viewer command. GUI mode only. Default: "display"optional
-rd_lensequencing read length, optimises window size. Default: 100optional
-expwindow expansion, proportion of SV len added to each side. Default: 1optional
-bkpt_winbreakpoint window, number of read lengths to set windows around breakpoints
Default:5
optional
-n_binstarget number of bins for plot window. Default: 100optional

1vcfs may be specified by a file (e.g. '-vcf /path/to/file.vcf') or by a name and a file (e.g. '-vcf delly:/path/to/file'). If not specified names will be 'vcf 1', 'vcf 2', etc and 'reference' by default.

2'-samples' and '-aln' not required if '-manifest' is supplied.

3Availble for a variety of reference genomes at UCSC Table Browser

Filter args:DescriptionExample
-max_lenmaximum length of structural variants (bp)
-min_lenminimum length of structural variants (bp)
-afAllele frequency threshold-af <0.1
-gtsSpecify genotypes of given samplessample1:0/1,1/1;sample2:1/1
-chromRestrict to comma separated list of chromosomes
-svtypeRestrict to given SV type (DEL/DUP/CNV/INV)
-rgiRestrict to SVs that intersect refGenes,
'-ref_gene' must be supplied
-exonicRestrict to SVs that intersect exons of refGenes,
'-ref_gene' must be supplied
Plot args:DefaultDescription
-d1force sequencing depth plot on or off
-or1force orphaned reads plot on or off
-v1force inverted pairs plot on or off
-ss1force same strand pairs plot on or off
-cl1force clipped reads plot on or off
-se0force SAM 'secondary alignment' plot on or off
-su0force SAM 'supplementary alignment' plot on or off
-dm0force mate different molecule alignment plot on or off
-i1force inferred insert size plot on or off
-r1force refgenes plot on or off
-af1force allele frequency plot on or off
-l1force plot legend on or off

Usage example:

python SVPV -gui -o ./example/output/ -vcf delly:./example/delly.vcf,cnvnator:./example/cnvnator.vcf -manifest
/example/example.manifest -ref_gene ./example/hg38.refgene.partial.txt -ref_vcf 1000G:./example/1000G.vcf

Advance usage example:

python SVPV -vcf caller1_svs.vcf,caller2_svs.vcf -samples sample1,sample2,sample3 -aln s1.bam,s2.bam,s3.bam
-o /out/directory/ -ref_vcf 1000_genomes_svs.vcf -ref_gene hg38.refgene.txt -max_len 100000 -af <0.25 -gts
sample1:1/1,0/1;sample3:0/0 -svtype DEL -exonic -ss 0 -se 1

VCF Field Requirements:

SV TypeRequired VCF Fields
All TypesCHROM, POS, SVTYPE1, GT2
DEL/DUP/CNV/INVSVLEN or END
INSSVLEN or INSLEN*
BNDALT, ID, MATEID/PAIRID/EVENTID
TRA*CHR2, END

1If SVTYPE is not found then ALT is parsed for symbolic alternate alleles. These should match one of DEL, DUP, CNV, INS, INV, BND or TRA or the call will be ignored.

2For reference VCF only, GT is not required if AF is present.

*Included for compatibility with Delly2

Please see the VCF specifications for further details.

SVPV Plot Window Sizing and Types

  • Please see the wiki

Citation

Jacob E. Munro, Sally L. Dunwoodie, Eleni Giannoulatou; SVPV: a structural variant prediction viewer for paired-end sequencing datasets. Bioinformatics 2017; 33 (13): 2032-2033. doi: 10.1093/bioinformatics/btx117