simpleaf workflow run#
The simpleaf workflow run
command is designed to run potentially complex single-cell data processing workflows using an instantiated simpleaf workflow template. Please check our tutorial on running an workflow from an published template and developing custom template from scratch
simpleaf workflow run
exposes one required parameter group (though the options are mutually exclusive):
--template
takes the path to asimpleaf
workflow template (i.e. an un-evaluated JSONNET file). One can develop their own templates or grab published templates from the protocol estuary GitHub repository using the API we provide via the thesimpleaf workflow get
command, and fill in required information.--manifest
takes the path to asimpleaf
workflow manifest (i.e. a fully-instantiated JSON file that describes and enumrates all of the commands to be executed, with all relevant parameters fully specified). This manifest could e.g. be the result of a prior execution, or the result of applying thesimpleaf workflow patch
command to a template to produce one or more manifests with desired parameters replaced.
Additionally, the user may pass an --output
parameter to the run
invokation, but only if a template is being instantiated and run, as the --output
flag does not
make sense in the context of a fully-instantiated manifest.
--output
takes the path to the output directory for writing the log files and the results generated by invoking workflow commands. Note that paramater will only have an effect if the corresponding template allows passingoutput
as an external variable (all of the templates in the protocol estuary do). Further, if the output directory has already been manually overridden in the template, then--output
will have no effect and will not be used; in this case a warning to this effect will be printed.
When calling simpleaf workflow run
using a workflow template, simpleaf
will first instantiate the template, which is a Jsonnet program, into a workflow manifest in JSON format. Whereas the workflow template provides a “template” for the workflow and functions to handle features like basic logic, the resulting workflow manifest is a simple imperative description of the commands to be executed. To provide the greatest flexibility, the only requirement we set for a simpleaf
workflow template is that in the workflow manifest its results, the fields representing a command record, either a simpleaf
command or an external shell command, follow the format described in section Valid simpleaf workflow manifest format.
Then, simpleaf workflow run
will traverse the workflow manifest to collect the simpleaf
and external shell command records and place them into an execution queue, ordered by their step
number.
simpleaf workflow run
also exposes multiple flags for controlling the execution flow when invoking the commands. If none of the flags is set, simpleaf
will invoke all commands in the execution queue.
If setting the
--no-execution
flag,simpleaf
will parse the file passed to the--template
option, write the manifest and log files, and return without invoking any command.If setting the
--start-at
flag with astep
number,simpleaf
will ignore all previous steps (commands) and begin the invocation from the commands in the execution queue whose step is equal or next to that specific startingstep
.If setting the
--resume
flag,simpleaf
will try to find the log file from a previous run in the provided output folder to decide whichstep
to begin with.If setting the
--skip-step
flag with a set of comma-separatedstep
numbers,simpleaf
will ignore the commands whosestep
is in those numbers.
Workflow Output#
simpleaf workflow run
writes two log files to the output directory passed to --output
:
simpleaf_workflow_log.json
: This file records the meta and logging information of the workflow execution. For example, the runtime of each executed command and thestep
of the start and terminating command. If--resume
is set,simpleaf
will try to find this file in the provided output directory to decide which step(command) to start.workflow_execution_log.json
: This file is a modified version of the workflow manifest JSON discussed above. The only modification is that in this file, theactive
field of the successfully invoked commands (return code 0) becomes false.
workflow run: Full Usage#
The relevant options (which you can obtain by running simpleaf workflow run -h
) are:
Parse and instantiate a workflow template and invoke the workflow commands, or run an instantiated manifest directly
Usage: simpleaf workflow run [OPTIONS] <--manifest <MANIFEST>|--template <TEMPLATE>>
Options:
-t, --template <TEMPLATE> path to an instantiated simpleaf workflow template
-o, --output <OUTPUT> output directory for log files and the workflow outputs that have no explicit output directory
-m, --manifest <MANIFEST> path to an instantiated simpleaf workflow template
-h, --help Print help
-V, --version Print version
Control Flow:
-n, --no-execution return after instantiating the template (JSONNET file) into a manifest (JSON foramt) without actually executing the resulting manifest
-s, --start-at <START_AT> Start the execution from a specific Step. All previous steps will be ignored [default: 1]
-r, --resume resume execution from the termination step of a previous run. To use this flag, the output directory must contains the JSON file generated from a previous run
--skip-step <SKIP_STEP> comma separated integers indicating which steps (commands) will be skipped during the execution
Jsonnet:
-j, --jpaths <JPATHS> comma separated library search paths passing to internal Jsonnet engine as --jpath flags
The procedure of parsing a simpleaf workflow template#
In simpleaf workflow
, we use the Jrsonnet library, a rust implementation of Jsonnet, to parse and instantiate the workflow template.
Any valid Jsonnet program and JSON file is a valid simpleaf workflow template, as long as it can produce a valid workflow manifest.
When calling Jrsonnet, simpleaf workflow
automatically passes the following built-in arguments in addition to the provided template.
The output directory passed to
--output
as the external variableoutput
.The workflow utility library from the protocol estuary as the external variable
utils
.The path to the
utils
folder in the protocol estuary inALEVIN_FRY_HOME
as an additional library search directory.The paths passed to the
--lib-path
flag, if any, as additional library search directories.
This also means that any custom configuration program can access the __output
and __utils
variables in the Jsonnet program using std.extVar("__output")
and std.extVar("__utils")
. Note that the path to the parent directory of the file passed to --template
is an additional library search directory in Jrsonnet by default.
Valid simpleaf workflow manifest format#
Although any Jsonnet program or JSON file is a valid input for simpleaf workflow
, it doesn’t mean that all such files can be converted to a valid simpleaf
workflow manifest. To provide the greatest flexibility, we set only the below requirements for the fields representing a command record — either a simpleaf
command or an external command, in the simpleaf workflow manifest JSON file (not necessarily the template).
To ease the parsing process, all fields that represent arguments in an external command argument list be provided as strings, i.e., wrapped by quotes (
"value"
), even for integers like the number of threads (for example,[..., "-t", "16", ...]
for some external command that takes a number of threads via the-t
parameter).- A command record field must contain a
step
and aprogram_name
sub-field, where thestep
field represents which step, using an unassigned integer, this command constitutes in the workflow. Theprogram_name
field represents a valid program in the user’s execution environment as a string. For a simpleaf command, the correct
program_name
is the name of the simpleaf command as a string. For example, forsimpleaf index
, it is"simpleaf index"
and forsimpleaf quant
, it is"simpleaf quant"
.For an external command such as
awk
, if the binary is invokable given the user’sPATH
environment variable, it can just be"awk"
; if not, it must contain a valid full path to the binary, for example,"/usr/bin/awk"
.
- A command record field must contain a
A command record can also have a “active” boolean field, representing if this command is active. Simpleaf will ignore (neither parse nor invoke) all commands that are inactive ({“active”: false}). For command records missing this field, simpleaf will regard them as active commands.
If a field records a
simpleaf
command, the name of its sub-fields, exceptstep
andprogram_name
, must be valid simpleaf flags (for example, options like--fasta
, or-f
for short, forsimpleaf index
and--unfiltered-pl
(or-u
) forsimpleaf quant
). Those option names (sub-field names), together with their values, if any, will be used to call the corresponding simpleaf program. Sub-fields not named by a valid simpleaf flag will trigger an error.If a field records an external command, it must contain valid
step
andprogram_name
sub-fields as described above. In contrast tosimpleaf
command records, all arguments of an external shell command must be provided in an array, in order, with the name"arguments"
.simpleaf workflow
will parse the entries in the array to build the actual command in order. For example, to tellsimpleaf workflow
to invoke the commandls -l -h .
at step 7, one needs to use the following JSON record:{ "step": 7, "program_name": "ls", "active": true, "arguments": ["-l", "-h", "."] }