CopyDetective: Detection threshold–aware copy number variant calling in whole-exome sequencing data

December 19, 2024 · View on GitHub

CopyDetective: Detection threshold–aware copy number variant calling in whole-exome sequencing data

CopyDetective is a novel algorithm for detection threshold aware CNV calling in matched whole-exome sequencing data.

Sarah Sandmann, Marius Wöste, Aniek O de Graaf, Birgit Burkhardt, Joop H Jansen, Martin Dugas, CopyDetective: Detection threshold–aware copy number variant calling in whole-exome sequencing data, GigaScience, Volume 9, Issue 11, November 2020, giaa118, https://doi.org/10.1093/gigascience/giaa118.

Requirements

To run CopyDetective, you need R (Version 3.5.2 or higher) and shiny.

Installation

To install CopyDetective, just download the R-scripts. All required packages are automatically installed.

Running CopyDetective

CopyDetective is available as a shiny GUI. You can run CopyDetective either by RStudio or by using shiny::runApp() in an R console. On the left, there are two tabs available for analysis:

  • Perform analysis: actual CNV calling is performed
  • Display results: display list of CNV calls and output plots for a selected sample

On the right, four output tabs are available:

  • Log: progress of the analysis
  • Detection thresholds: individual detection thresholds for every sample, including minimum CNV size and minimum cell fraction for deletions and duplications
  • CNV calls: merged CNV calls; if filtration by means of quality is selected, the calls are filted based on the selected filtration threshold
  • Plots: dependent on the selected options, up to four plots are displayed: 1) Raw CNV calls (all), 2) Raw CNV calls (sig), 3) Merged CNV calls, and 4) Filtered CNV calls

For the actual analysis with CopyDetective, several parameters are available.

Parameters

CategoryParameterDetails
InputInput folderPath to input files; input files must cover information on heterozygous polymorphisms called for every sample, including VAF and coverage of these polymorphisms in tumor and matching germline sample
Sample names fileA 2-column txt-file is required; the germline sample names must be defined in the first column, the matching tumor sample names in the seocnd column (no header)
OutputOutput folderPath to which output files will be written
Output filesThe following can be selected: Raw CNV calls, Merged CNV calls, Filtered CNV calls
Output plotsThe following can be selected: Raw CNV calls (all), Raw CNV calls (sig), Merged CNV calls, Filtered CNV calls
1. Quality AnalysisDetection thresholds available?Select either yes or no
yes: Upload detection thresholds fileA txt-file containing the columns: Sample, Window_del, CF_del, SNPs_del, Window_dup, CF_dup and SNPs_dup; an example is available in the example folder
no: How shall Quality Analysis be performed?Select either Exact or Simulation
no and Exact: VAF for polymorphisms in the control sampleSelect either Exact (variant calling results) or Simulated (expected value 0.5)
no and Simulation: Number of simulations10-99999; default: 500
no: Evaluate Cell Fractions: Fractions to consider1-100; default: 5-100
no: Evaluate Cell Fractions: In steps of...1-100; default: 5
no: Evaluate Window sizes: Percentile to consider for window selection0-100; default: 95
no: Strategy for detection threshold optimizationSelect one of the following: Compromize (cell fraction and window size), Force (cell fraction), Force (window size)
no: Force (cell fraction)Minimum (possible) cell fraction to consider
no: Force (window size)Minimum (possible) window size to consider
2. CNV callingDesired sensitivity0-1; default: 0.95
3. MergingMaximum allowed distance between raw calls100,000-59128,983; default: 20,000,000
4. FiltrationPerform final filtration?Select either yes or no
yes: Quality threshold0-600; default: 10.76

Example

An example, containing all necessary input for two patients can be found in the Example folder. Corresponding output files are provided in the subfolder output.

Reporting errors / Requesting features

If you should experience any errors using CopyDetective, or you are looking for some additional features, feel free to open an issue or to contact Sarah Sandmann ( sarah.sandmann@uni-muenster.de ).

Citation

Sarah Sandmann, Marius Wöste, Aniek O de Graaf, Birgit Burkhardt, Joop H Jansen, Martin Dugas, CopyDetective: Detection threshold–aware copy number variant calling in whole-exome sequencing data, GigaScience, Volume 9, Issue 11, November 2020, giaa118, https://doi.org/10.1093/gigascience/giaa118.