PipeVal
October 18, 2024 ยท View on GitHub
Overview
PipeVal is an easy to use CLI tool that can be used to validate different inputs and parameters in various settings, including Nextflow scripts/pipelines. It can be used standalone or using a Docker container.
Its primary functions are to generate and/or compare checksum files and validate input files.
Validation Flowchart

Docker
The tool can be used via the docker image ghcr.io/uclahs-cds/pipeval:<tag>
Installation
The tool can be installed as a standalone command line tool. The following dependencies must be installed for this option:
| Tool | Version |
|---|---|
| Python | 3.10 |
| VCFtools | 0.1.16 |
Additionally, the libmagic C library must also be installed on the system.
Installing libmagic
On Debian/Ubuntu, install through:
sudo apt-get install libmagic-dev
On Mac, install through homebrew (https://brew.sh/):
brew install libmagic
libmagic can also be installed through the conda package manager:
conda install -c conda-forge libmagic
With the dependencies (and the proper versions) installed, install pipeval through one of the options below:
Install directly from GitHub through SSH
pip install git+ssh://git@github.com/uclahs-cds/package-PipeVal.git
Install directly from GitHub through HTTPS
pip install git+https://git@github.com/uclahs-cds/package-PipeVal.git
Install from cloned repository
<clone the PipeVal GitHub repository>
cd </path/to/cloned/repository>
pip install .
Usage
pipeval validate
usage: pipeval validate [-h] [-v] [-r CRAM_REFERENCE] path [path ...]
positional arguments:
path one or more paths of files to validate
options:
-h, --help show this help message and exit
-r CRAM_REFERENCE, --cram-reference CRAM_REFERENCE
Path to reference file for CRAM
-p PROCESSES, --processes PROCESSES
Number of processes to run in parallel when validating multiple files
-t, --test-integrity Whether to perform a full integrity test on compressed files
The tool will attempt to automatically detect the file type based on extension and perform the appropriate validations. The tool will also perform an existence check along with a checksum check if an MD5 or SHA512 checksum exists regardless of file type.
Supported Types
| File Type | Validation |
|---|---|
| BAM | Validate BAM/CRAM/SAM using pysam. Check for an index file in same directory as the BAM. Note: If a BAM input is missing an accompanying BAM index file in the same directory, validate will not throw an exception but will print a warning. |
| SAM | Validate SAM file using pysam. |
| CRAM | Validate CRAM file using pysam. Check for existence of an index file in the same directory as the CRAM. Accept an optional reference genome parameter for use with CRAM. In the absence of the parameter, the reference URL from the CRAM header will be used. Note: If a CRAM input is missing an accompanying CRAM index file in the same directory, validate will not throw an exception but will print a warning. |
| VCF | Validate VCF using VCFtools |
Note: If the input is invalid in any way, validate will exit with a non-zero status code.
Expected Output
- Valid input:
Input: path/to/input is valid <file-type>
- Invalid input or failed validation
Error: path/to/input <error message>
Validation Skipping
Certain validations can be skipped through environment variables.
| ENV VAR | Notes |
|---|---|
| PIPEVAL_SKIP_CHECKSUM | Flag to disable checksum validation. Set to true to disable checksum validation within PipeVal. |
pipeval generate-checksum
usage: pipeval generate-checksum [-h] [-t {md5,sha512}] [-v] path [path ...]
positional arguments:
path one or more paths of files to validate
options:
-h, --help show this help message and exit
-t {md5,sha512}, --type {md5,sha512}
Checksum type
Development
Testing for PipeVal itself can be done through pytest by running the following:
pytest
References
Pysam
- Repository: pysam-developers/pysam
Publications
VCFtools
- Repository: vcftools/vcftools
Publications
Discussions
- Issue tracker to report errors and enhancement ideas.
- Discussions can take place in package-PipeVal Discussions
- package-PipeVal pull requests are also open for discussion
Contributors
Please see list of Contributors at GitHub.
License
Author: Yash Patel (YashPatel@mednet.ucla.edu), Arpi Beshlikyan (abeshlikyan@mednet.ucla.edu), Madison Jordan (MBJordan@mednet.ucla.edu), Gina Kim (ginakim@mednet.ucla.edu)
PipeVal is licensed under the GNU General Public License version 2. See the file LICENSE for the terms of the GNU GPL license.
PipeVal is a tool which can be used to validate the inputs and outputs of various bioinformatic pipelines.
Copyright (C) 2020-2023 University of California Los Angeles ("Boutros Lab") All rights reserved.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.