SomaticWrapper v3.0.0 (Compute1)

October 15, 2025 · View on GitHub

Automated Somatic Variant Calling Pipeline (HG38)

SomaticWrapper is a fully automated and modular pipeline for detecting somatic variants from paired tumor–normal WGS/WXS data on the LSF compute1 cluster (WashU).
It integrates multiple industry-standard variant callers — Strelka2, VarScan2, Mutect1, and Pindel — and produces comprehensive, annotated mutation calls in MAF format.


🔬 Overview

  • SNV calls: intersection of 2 out of 3 callers — Strelka2, Mutect1, VarScan2
  • Indel calls: intersection of 2 out of 3 callers — Strelka2, VarScan2, Pindel
  • Reference genome: Human GRCh38 (HG38)
  • Scheduler: LSF (supports job dependencies and groups)

Final output files:

  • dnp.annotated.maf → all variants
  • dnp.annotated.coding.maf → coding variants only

🚀 Improvements (v3.0.0)

  1. Added Step 0 — automatically submits the full pipeline (Steps 1 → 11) with job dependencies (j2 waits for j1, etc.).

  2. Added Step 22 — automatically submits the full pipeline (Steps 2 → 11) with job dependencies (j3 waits for j2, etc.).

  3. Added Step 23 — automatically submits the full pipeline (Steps 3 → 11) with job dependencies

  4. Added Step 24 — automatically submits the full pipeline (Steps 4 → 11) with job dependencies

  5. Added Step 25 — automatically submits the full pipeline (Steps 5 → 11) with job dependencies

  6. Added Step 26 — automatically submits the full pipeline (Steps 6 → 11) with job dependencies

  7. Added Step 27 — automatically submits the full pipeline (Steps 7 → 11) with job dependencies

  8. Added Step 28 — automatically submits the full pipeline (Steps 8 → 11) with job dependencies

  9. Added Step 29 — automatically submits the full pipeline (Steps 9 → 11) with job dependencies

  10. Added Step 30 — automatically submits the full pipeline (Steps 10 → 11) with job dependencies


⚙️ Environment Setup (Compute1)

Before running, update your ~/.bashrc to include the necessary environment variables:

export PATH=/storage1/fs1/songcao/Active/Software/anaconda3/bin:$PATH
export STORAGE1=/storage1/fs1/songcao/Active
export STORAGE2=/storage1/fs1/dinglab/Active
export STORAGE3=/storage1/fs1/m.wyczalkowski/Active
export LSF_DOCKER_VOLUMES="$STORAGE1:$STORAGE1 $STORAGE2:$STORAGE2 $STORAGE3:$STORAGE3"

Then activate:

source ~/.bashrc

🧩 Usage

Step 1. Download or clone this repository

git clone https://github.com/YourGitRepo/somaticwrapper.git
cd somaticwrapper

Step 2. Prepare your run and log directories

Example:

mkdir -p /storage1/fs1/songcao/Active/Projects/somatic/example_run_somatic_2025
mkdir -p /storage1/fs1/songcao/Active/Projects/somatic/example_run_somatic_2025/log

Step 3. Run the pipeline

Use --step 0 to run Steps 1–14 sequentially with built-in job dependencies:

perl somaticwrapper.pl   --step 0   --rdir /storage1/fs1/songcao/Active/Projects/somatic/example_run_somatic_2025   --log  /storage1/fs1/songcao/Active/Projects/somatic/example_run_somatic_2025/log   --ref /storage1/fs1/songcao/Active/Database/hg38_database/GRCh38.d1.vd1/GRCh38.d1.vd1.fa   --smg /storage1/fs1/songcao/Active/Database/SMG/smg_list.txt   --groupname example_run_somatic_2025   --users scao   --wgs 0   --srg 1   --sre 0   --exonic 1   --q long   --mincovt 14 --mincovn 8 --minvaf 0.05 --maxindsize 100

Option B – Run a specific step manually

perl somaticwrapper.pl --step 5 --rdir <run_dir> --log <log_dir> ...

🔢 Step Reference

StepDescription
0Submit steps (1–11) automatically with dependencies
1Run Strelka2
2Run VarScan2
3Run Pindel
4Run Mutect1
5Parse Mutect results
6Parse Strelka2 results
7Parse VarScan2 results
8Parse Pindel results
9QC VCF files
10Merge VCF files
11Generate MAF files
12Merge run-level MAF
13DNP annotation
14Clean unnecessary intermediate files
22Submit steps (2–11) automatically with dependencies
23Submit steps (3–11) automatically with dependencies
24Submit steps (4–11) automatically with dependencies
25Submit steps (5–11) automatically with dependencies
26Submit steps (6–11) automatically with dependencies
27Submit steps (7–11) automatically with dependencies
28Submit steps (8–11) automatically with dependencies
29Submit steps (9–11) automatically with dependencies
30Submit steps (10–11) automatically with dependencies

⚙️ Key Parameters

ParameterDescription
--rdirFull path to run directory containing per-sample folders
--logPath for log output (usually parent of rdir)
--srgBAM has read groups (1 = yes, 0 = no)
--sreRerun and overwrite results (1 = yes, 0 = no)
--wgs1 = WGS, 0 = WXS
--groupnameJob group name
--usersLSF user account (used in job group path)
--refHG38 reference FASTA
--smgSMG gene list file
--qLSF queue (research-hpc, ding-lab, or long)
--mincovtMinimum tumor coverage (≥ 14)
--mincovnMinimum normal coverage (≥ 8)
--minvafMinimum variant allele frequency (≥ 0.05)
--maxindsizeMaximum indel size (≤ 100)
--exonicOutput exonic region (1 = yes, 0 = no)

🧾 Example Output Files

run_dir/
├── <sample_name>/
│   ├── strelka/
│   ├── varscan/
│   ├── pindel/
│   ├── mutect1/
│   ├── merged.withmutect.vcf
│   ├── <sample>.withmutect.maf
│   └── <sample>.dnp.annotated.maf
└── log/
    ├── LSF_DIR_SOMATIC/
    └── tmpsomatic/

👤 Contact

Author: Song Cao
Email: scao@wustl.edu
Washington University in St. Louis