simpleaf workflow run¶
The simpleaf workflow run command is designed to run potentially complex single-cell data processing workflows using an instantiated simpleaf workflow template. Please check our tutorial on running an workflow from an published template and developing custom template from scratch
simpleaf workflow run exposes one required parameter group (though the options are mutually exclusive):
--templatetakes the path to asimpleafworkflow template (i.e. an un-evaluated JSONNET file). One can develop their own templates or grab published templates from the protocol estuary GitHub repository using the API we provide via the thesimpleaf workflow getcommand, and fill in required information.--manifesttakes the path to asimpleafworkflow manifest (i.e. a fully-instantiated JSON file that describes and enumrates all of the commands to be executed, with all relevant parameters fully specified). This manifest could e.g. be the result of a prior execution, or the result of applying thesimpleaf workflow patchcommand to a template to produce one or more manifests with desired parameters replaced.
Additionally, the user may pass an --output parameter to the run invokation, but only if a template is being instantiated and run, as the --output flag does not
make sense in the context of a fully-instantiated manifest.
--outputtakes the path to the output directory for writing the log files and the results generated by invoking workflow commands. Note that paramater will only have an effect if the corresponding template allows passingoutputas an external variable (all of the templates in the protocol estuary do). Further, if the output directory has already been manually overridden in the template, then--outputwill have no effect and will not be used; in this case a warning to this effect will be printed.
When calling simpleaf workflow run using a workflow template, simpleaf will first instantiate the template, which is a Jsonnet program, into a workflow manifest in JSON format. Whereas the workflow template provides a “template” for the workflow and functions to handle features like basic logic, the resulting workflow manifest is a simple imperative description of the commands to be executed. To provide the greatest flexibility, the only requirement we set for a simpleaf workflow template is that in the workflow manifest its results, the fields representing a command record, either a simpleaf command or an external shell command, follow the format described in section Valid simpleaf workflow manifest format.
Then, simpleaf workflow run will traverse the workflow manifest to collect the simpleaf and external shell command records and place them into an execution queue, ordered by their step number.
simpleaf workflow run also exposes multiple flags for controlling the execution flow when invoking the commands. If none of the flags is set, simpleaf will invoke all commands in the execution queue.
If setting the
--no-executionflag,simpleafwill parse the file passed to the--templateoption, write the manifest and log files, and return without invoking any command.If setting the
--start-atflag with astepnumber,simpleafwill ignore all previous steps (commands) and begin the invocation from the commands in the execution queue whose step is equal or next to that specific startingstep.If setting the
--resumeflag,simpleafwill try to find the log file from a previous run in the provided output folder to decide whichstepto begin with.If setting the
--skip-stepflag with a set of comma-separatedstepnumbers,simpleafwill ignore the commands whosestepis in those numbers.
Workflow Output¶
simpleaf workflow run writes two log files to the output directory passed to --output:
simpleaf_workflow_log.json: This file records the meta and logging information of the workflow execution. For example, the runtime of each executed command and thestepof the start and terminating command. If--resumeis set,simpleafwill try to find this file in the provided output directory to decide which step(command) to start.workflow_execution_log.json: This file is a modified version of the workflow manifest JSON discussed above. The only modification is that in this file, theactivefield of the successfully invoked commands (return code 0) becomes false.
workflow run: Full Usage¶
The relevant options (which you can obtain by running simpleaf workflow run -h) are:
Parse and instantiate a workflow template and invoke the workflow commands, or run an instantiated manifest directly
Usage: simpleaf workflow run [OPTIONS] <--manifest <MANIFEST>|--template <TEMPLATE>>
Options:
-t, --template <TEMPLATE> path to an instantiated simpleaf workflow template
-o, --output <OUTPUT> output directory for log files and the workflow outputs that have no explicit output directory
-m, --manifest <MANIFEST> path to an instantiated simpleaf workflow template
-h, --help Print help
-V, --version Print version
Control Flow:
-n, --no-execution return after instantiating the template (JSONNET file) into a manifest (JSON foramt) without actually executing the resulting manifest
-s, --start-at <START_AT> Start the execution from a specific Step. All previous steps will be ignored [default: 1]
-r, --resume resume execution from the termination step of a previous run. To use this flag, the output directory must contains the JSON file generated from a previous run
--skip-step <SKIP_STEP> comma separated integers indicating which steps (commands) will be skipped during the execution
Jsonnet:
-j, --jpaths <JPATHS> comma separated library search paths passing to internal Jsonnet engine as --jpath flags
The procedure of parsing a simpleaf workflow template¶
In simpleaf workflow, we use the Jrsonnet library, a rust implementation of Jsonnet, to parse and instantiate the workflow template.
Any valid Jsonnet program and JSON file is a valid simpleaf workflow template, as long as it can produce a valid workflow manifest.
When calling Jrsonnet, simpleaf workflow automatically passes the following built-in arguments in addition to the provided template.
The output directory passed to
--outputas the external variableoutput.The workflow utility library from the protocol estuary as the external variable
utils.The path to the
utilsfolder in the protocol estuary inALEVIN_FRY_HOMEas an additional library search directory.The paths passed to the
--lib-pathflag, if any, as additional library search directories.
This also means that any custom configuration program can access the __output and __utils variables in the Jsonnet program using std.extVar("__output") and std.extVar("__utils"). Note that the path to the parent directory of the file passed to --template is an additional library search directory in Jrsonnet by default.
Valid simpleaf workflow manifest format¶
Although any Jsonnet program or JSON file is a valid input for simpleaf workflow, it doesn’t mean that all such files can be converted to a valid simpleaf workflow manifest. To provide the greatest flexibility, we set only the below requirements for the fields representing a command record — either a simpleaf command or an external command, in the simpleaf workflow manifest JSON file (not necessarily the template).
To ease the parsing process, all fields that represent arguments in an external command argument list be provided as strings, i.e., wrapped by quotes (
"value"), even for integers like the number of threads (for example,[..., "-t", "16", ...]for some external command that takes a number of threads via the-tparameter).- A command record field must contain a
stepand aprogram_namesub-field, where thestepfield represents which step, using an unassigned integer, this command constitutes in the workflow. Theprogram_namefield represents a valid program in the user’s execution environment as a string. For a simpleaf command, the correct
program_nameis the name of the simpleaf command as a string. For example, forsimpleaf index, it is"simpleaf index"and forsimpleaf quant, it is"simpleaf quant".For an external command such as
awk, if the binary is invokable given the user’sPATHenvironment variable, it can just be"awk"; if not, it must contain a valid full path to the binary, for example,"/usr/bin/awk".
- A command record field must contain a
A command record can also have a “active” boolean field, representing if this command is active. Simpleaf will ignore (neither parse nor invoke) all commands that are inactive ({“active”: false}). For command records missing this field, simpleaf will regard them as active commands.
If a field records a
simpleafcommand, the name of its sub-fields, exceptstepandprogram_name, must be valid simpleaf flags (for example, options like--fasta, or-ffor short, forsimpleaf indexand--unfiltered-pl(or-u) forsimpleaf quant). Those option names (sub-field names), together with their values, if any, will be used to call the corresponding simpleaf program. Sub-fields not named by a valid simpleaf flag will trigger an error.If a field records an external command, it must contain valid
stepandprogram_namesub-fields as described above. In contrast tosimpleafcommand records, all arguments of an external shell command must be provided in an array, in order, with the name"arguments".simpleaf workflowwill parse the entries in the array to build the actual command in order. For example, to tellsimpleaf workflowto invoke the commandls -l -h .at step 7, one needs to use the following JSON record:{ "step": 7, "program_name": "ls", "active": true, "arguments": ["-l", "-h", "."] }