LevioSAM2 workflow

July 16, 2024 ยท View on GitHub

Dependencies

All dependent software is included in our Docker/Singularity container. Users of other installation approaches may install the dependencies using conda, pre-built binary, or building from source.

Paired-end workflows

Bowtie 2 example:

python leviosam2.py \
    -t 16 \
    -i ilmn-pe.bam \
    -o ilmn-pe-lifted \
    -C chm13v2-grch38.clft \
    -f grch38.fna \
    -fi bt2/grch38.fna \
    --sequence_type ilmn_pe \
    -a bowtie2 \
    --use_preset \
    --lift_bed_commit_source suppress_annotations.bed \  # optional
    --lift_bed_defer_target defer_annotations.bed   # optional

With the bash worklow (older workflow to be deprecated):

bash leviosam2.sh \
    -a bowtie2 -A -10 -q 10 -H 5 \
    -i ilmn-pe.bam \
    -o ilmn-pe-lifted \
    -f grch38.fna \
    -b bt2/grch38 \
    -C chm13v2-grch38.clft \
    -t 16 \
    -D defer_annotations.bed \ # optional
    -R suppress_annotations.bed # optional

BWA MEM example:

python leviosam2.py \
    -t 16 \
    -i ilmn-pe.bam \
    -o ilmn-pe-lifted \
    -C chm13v2-grch38.clft \
    -f grch38.fna \
    -fi bwa/grch38.fna \
    --sequence_type ilmn_pe \
    -a bwamem \
    --use_preset \
    --lift_bed_commit_source suppress_annotations.bed \  # optional
    --lift_bed_defer_target defer_annotations.bed   # optional

With the bash worklow (older workflow to be deprecated):

bash leviosam2.sh \
    -a bwamem -A 100 -q 30 -H 5 \
    -i ilmn-pe.bam \
    -o ilmn-pe-lifted \
    -f grch38.fna \
    -b bwa/grch38.fna \
    -C chm13v2-grch38.clft \
    -t 16 \
    -D defer_annotations.bed \  # optional
    -R suppress_annotations.bed  # optional

Single-end workflows

For both short and long reads. Different parameters are recommended for each sequence type.

  • Supported aligners: minimap2, winnowmap2, bwa mem and Bowtie 2

  • -g sets the max allowed size of overlapping chain gaps of an alignment, -H is the max edit distance of an alignment (post-realignment) to be committed. Please adjust these parameters according to your long read platform and library preparation

Minimap2 example with Pacbio:

python leviosam2.py \
    -t 16 \
    -i pacbio.bam \
    -o pacbio-lifted \
    -C chm13v2-grch38.clft \
    -f grch38.fna \
    --sequence_type pb_hifi \
    -a minimap2 \
    --use_preset \
    --lift_realign_config ../configs/pacbio_all.yaml \
    --lift_bed_commit_source suppress_annotations.bed  # optional

With the bash worklow (older workflow to be deprecated):

bash leviosam2.sh \
    -a minimap2 -g 1000 -H 100 -S -x ../configs/pacbio_all.yaml \
    -l map-hifi \
    -i pacbio.bam \
    -o pacbio-lifted \
    -f grch38.fna \
    -C chm13v2-grch38.clft \
    -t 16 \
    -R suppress_annotations.bed # optional

Minimap2 example with Nanopore (ultra-long reads are not fully supported):

python leviosam2.py \
    -t 16 \
    -i ont.bam \
    -o ont-lifted \
    -C chm13v2-grch38.clft \
    -f grch38.fna \
    --sequence_type ont \
    -a minimap2 \
    --use_preset \
    --lift_realign_config ../configs/ont_all.yaml \
    --lift_bed_commit_source suppress_annotations.bed  # optional

With the bash worklow (older workflow to be deprecated):

bash leviosam2.sh \
    -a minimap2 -g 1500 -H 6000 -S -x ../configs/ont_all.yaml \
    -l map-ont \
    -i ont.bam \
    -o ont-lifted \
    -f grch38.fna \
    -C chm13v2-grch38.clft \
    -t 16 \
    -R suppress_annotations.bed # optional

Output files

  • <out_prefix>-final.bam: The full lifted and realigned reads using the target coordinate system
  • <out_prefix>-unliftable.bam: The deferred and unliftable reads using the source coordinate system. See #20 for discussions.

The default output format is the BAM format. We also support writing the final alignments with the CRAM format using the Python workflow with the option (-O cram).

By default, temporary files generated by the workflow are removed. These files can be kept by toggling the --keep_tmp_files option in the Python script or the -K option in the Bash script. Temp files include deferred alignments (target coord.), realigned alignments (target coord.), and deferred reads (gzipped FASTQ).

Case Study

Please make sure leviosam2 and leviosam2.sh is in your $PATH with command leviosam2 -h and sh leviosam2.sh -h. If it fails, please use the following command to set $PATH (paths need to be adjusted locally) and make sure you see the help message.

PATH=<path/to/leviosam2/build>:<path/to/leviosam2/workflow>:${PATH}
mkdir case_study
cd case_study

# Download FASTQs
wget https://genome-idx.s3.amazonaws.com/lev/HG002.novaseq.pcr-free.0_3x-R1.fq.gz
wget https://genome-idx.s3.amazonaws.com/lev/HG002.novaseq.pcr-free.0_3x-R2.fq.gz

# Download FASTAs and pre-built indexes
## GRCh38
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
gunzip GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
samtools faidx GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
## Pre-build GRCh38 indexes
wget https://genome-idx.s3.amazonaws.com/bt/GRCh38_noalt_as.zip
unzip GRCh38_noalt_as.zip

## CHM13v2
wget https://genome-idx.s3.amazonaws.com/bt/chm13v2.0.zip
unzip chm13v2.0.zip

# Download levioSAM2 resources
wget https://genome-idx.s3.amazonaws.com/lev/chm13v2-grch38.tar.gz
tar xfzv chm13v2-grch38.tar.gz

# Align to CHM13
bowtie2 -p 16 -x chm13v2.0/chm13v2.0 -1 HG002.novaseq.pcr-free.0_3x-R1.fq.gz -2 HG002.novaseq.pcr-free.0_3x-R2.fq.gz | samtools sort -@ 4 -o ilmn-pe-chm13v2.bam

# Run the levioSAM2 workflow
leviosam2 index -c chm13v2-grch38/chm13v2-grch38.chain -p chm13v2-grch38/chm13v2-grch38 -F GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.fai
bash ../leviosam2.sh \
    -a bowtie2 -A -10 -q 10 -H 5 \
    -i ilmn-pe-chm13v2.bam \
    -o ilmn-pe-chm13v2_grch38 \
    -f GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
    -b GRCh38_noalt_as/GRCh38_noalt_as \
    -C chm13v2-grch38/chm13v2-grch38.clft \
    -D chm13v2-grch38/chm13v2-grch38-lt0.98-map_reduction_k100_e0.01.bed \
    -R chm13v2-grch38/chm13v2-grch38-source-unliftable-s_5000-winnowmap2-exc_1.bed \
    -t 16

The output is ilmn-pe-chm13v2_grch38-final.bam.