view
May 28, 2024 ยท View on GitHub
View the content of input file (pat/beta) as plain text. For beta files, view as bed. Possible filter by genomic region or sites range Output to stdout as default
Flags:
usage: view [-h] [-s SITES | -r REGION | -L BED_FILE] [--genome GENOME] [--strict] [--strip] [--min_len MIN_LEN] [--shuffle]
[--no_sort] [--sub_sample SUB_SAMPLE] [-o OUT_PATH] [-np]
input_file
View the content of input file (pat/beta) as plain text. Possible filter by genomic region or sites range Output to stdout as default
positional arguments:
input_file
optional arguments:
-h, --help show this help message and exit
-s SITES, --sites SITES
a CpG index range, of the form: "450000-450050"
-r REGION, --region REGION
genomic region of the form "chr1:10,000-10,500"
-L BED_FILE, --bed_file BED_FILE
Bed file. Columns <chr, start, end>. For some features columns 4-5 should be <startCpG, endCpG> (run
wgbstools convert -L BED_PATH)
--genome GENOME Genome reference name. Default is "default".
--strict pat: Truncate reads that start/end outside the given region. Only relevant if "region", "sites" or "bed_file"
flags are given.
--strip pat: Remove trailing dots (from beginning/end of reads).
--min_len MIN_LEN pat: Display only reads covering at least MIN_LEN CpG sites [1]
--shuffle pat: Shuffle reads order, while keeping the startCpG order (sort -k2,2n -k3,3R)
--no_sort pat: Keep read order, as in the original pat file
--sub_sample SUB_SAMPLE
pat: subsample from reads. Only supported for pat
-o OUT_PATH, --out_path OUT_PATH
Output path. [stdout]
-np, --nanopore BETA VERSION: pull very long reads starting before the requested region
random access (-r, -s flags)
View only reads (or values, in case of *.beta files) overlapping the specified genomic region. If no genomic region was provided, view outputs the whole file. The genomic region may be specified in one of two ways:
- CHROM:START-END, e.g.
-r chr1:10,747-10,758 - SITE1-SITE2, e.g.
-s 45-50. This is non-inclusive, i.e. only sites 45,46,47,48,49 will be considered.
This feature uses tabix and the *.csi index to achieve efficient random access (without reading the whole pat file). For beta files, it utilizes the fact that they have fixed size, so random access takes O(1).
strict (pat)
When specified with a region(s), view trims reads crossing the borders of the region. For example:
% wgbs_tools view FILE.pat.gz -s 167-168
chr1 166 TTTTT 1
chr1 167 TTTC 1
% wgbs_tools view FILE.pat.gz -s 167-168 --strict
chr1 167 T 1
chr1 167 T 1
sub_sample (pat)
Subsamle from the pat file, sub_sample of the reads. the count field is taken into consideration. For example, if a read has count>1, it may be outputed with a smaller count.
% wgbs_tools view FILE.pat.gz -s 167-168
chr1 166 TTTTT 5
% wgbs_tools view FILE.pat.gz -s 167-168 --sub_sample .4
chr1 166 TTTTT 2