simpleaf workflow run#

The simpleaf workflow run command is designed to run potentially complex single-cell data processing workflows using an instantiated simpleaf workflow template. Please check our tutorial on running an workflow from an published template and developing custom template from scratch

simpleaf workflow run exposes one required parameter group (though the options are mutually exclusive):

--template takes the path to a simpleaf workflow template (i.e. an un-evaluated JSONNET file). One can develop their own templates or grab published templates from the protocol estuary GitHub repository using the API we provide via the the simpleaf workflow get command, and fill in required information.
--manifest takes the path to a simpleaf workflow manifest (i.e. a fully-instantiated JSON file that describes and enumrates all of the commands to be executed, with all relevant parameters fully specified). This manifest could e.g. be the result of a prior execution, or the result of applying the simpleaf workflow patch command to a template to produce one or more manifests with desired parameters replaced.

Additionally, the user may pass an --output parameter to the run invokation, but only if a template is being instantiated and run, as the --output flag does not make sense in the context of a fully-instantiated manifest.

--output takes the path to the output directory for writing the log files and the results generated by invoking workflow commands. Note that paramater will only have an effect if the corresponding template allows passing output as an external variable (all of the templates in the protocol estuary do). Further, if the output directory has already been manually overridden in the template, then --output will have no effect and will not be used; in this case a warning to this effect will be printed.

When calling simpleaf workflow run using a workflow template, simpleaf will first instantiate the template, which is a Jsonnet program, into a workflow manifest in JSON format. Whereas the workflow template provides a “template” for the workflow and functions to handle features like basic logic, the resulting workflow manifest is a simple imperative description of the commands to be executed. To provide the greatest flexibility, the only requirement we set for a simpleaf workflow template is that in the workflow manifest its results, the fields representing a command record, either a simpleaf command or an external shell command, follow the format described in section Valid simpleaf workflow manifest format.

Then, simpleaf workflow run will traverse the workflow manifest to collect the simpleaf and external shell command records and place them into an execution queue, ordered by their step number.

simpleaf workflow run also exposes multiple flags for controlling the execution flow when invoking the commands. If none of the flags is set, simpleaf will invoke all commands in the execution queue.

If setting the --no-execution flag, simpleaf will parse the file passed to the --template option, write the manifest and log files, and return without invoking any command.
If setting the --start-at flag with a step number, simpleaf will ignore all previous steps (commands) and begin the invocation from the commands in the execution queue whose step is equal or next to that specific starting step.
If setting the --resume flag, simpleaf will try to find the log file from a previous run in the provided output folder to decide which step to begin with.
If setting the --skip-step flag with a set of comma-separated step numbers, simpleaf will ignore the commands whose step is in those numbers.

Workflow Output#

simpleaf workflow run writes two log files to the output directory passed to --output:

simpleaf_workflow_log.json: This file records the meta and logging information of the workflow execution. For example, the runtime of each executed command and the step of the start and terminating command. If --resume is set, simpleaf will try to find this file in the provided output directory to decide which step(command) to start.
workflow_execution_log.json: This file is a modified version of the workflow manifest JSON discussed above. The only modification is that in this file, the active field of the successfully invoked commands (return code 0) becomes false.

workflow run: Full Usage#

The relevant options (which you can obtain by running simpleaf workflow run -h) are:

Parse and instantiate a workflow template and invoke the workflow commands, or run an instantiated manifest directly

Usage: simpleaf workflow run [OPTIONS] <--manifest <MANIFEST>|--template <TEMPLATE>>

Options:
  -t, --template <TEMPLATE>  path to an instantiated simpleaf workflow template
  -o, --output <OUTPUT>      output directory for log files and the workflow outputs that have no explicit output directory
  -m, --manifest <MANIFEST>  path to an instantiated simpleaf workflow template
  -h, --help                 Print help
  -V, --version              Print version

Control Flow:
  -n, --no-execution           return after instantiating the template (JSONNET file) into a manifest (JSON foramt) without actually executing the resulting manifest
  -s, --start-at <START_AT>    Start the execution from a specific Step. All previous steps will be ignored [default: 1]
  -r, --resume                 resume execution from the termination step of a previous run. To use this flag, the output directory must contains the JSON file generated from a previous run
      --skip-step <SKIP_STEP>  comma separated integers indicating which steps (commands) will be skipped during the execution

Jsonnet:
  -j, --jpaths <JPATHS>  comma separated library search paths passing to internal Jsonnet engine as --jpath flags

The procedure of parsing a simpleaf workflow template#

In simpleaf workflow, we use the Jrsonnet library, a rust implementation of Jsonnet, to parse and instantiate the workflow template. Any valid Jsonnet program and JSON file is a valid simpleaf workflow template, as long as it can produce a valid workflow manifest. When calling Jrsonnet, simpleaf workflow automatically passes the following built-in arguments in addition to the provided template.

The output directory passed to --output as the external variable output.
The workflow utility library from the protocol estuary as the external variable utils.
The path to the utils folder in the protocol estuary in ALEVIN_FRY_HOME as an additional library search directory.
The paths passed to the --lib-path flag, if any, as additional library search directories.

This also means that any custom configuration program can access the __output and __utils variables in the Jsonnet program using std.extVar("__output") and std.extVar("__utils"). Note that the path to the parent directory of the file passed to --template is an additional library search directory in Jrsonnet by default.

Valid simpleaf workflow manifest format#

Although any Jsonnet program or JSON file is a valid input for simpleaf workflow, it doesn’t mean that all such files can be converted to a valid simpleaf workflow manifest. To provide the greatest flexibility, we set only the below requirements for the fields representing a command record — either a simpleaf command or an external command, in the simpleaf workflow manifest JSON file (not necessarily the template).

To ease the parsing process, all fields that represent arguments in an external command argument list be provided as strings, i.e., wrapped by quotes ("value"), even for integers like the number of threads (for example, [..., "-t", "16", ...] for some external command that takes a number of threads via the -t parameter).
A command record field must contain a step and a program_name sub-field, where the step field represents which step, using an unassigned integer, this command constitutes in the workflow. The program_name field represents a valid program in the user’s execution environment as a string.
- For a simpleaf command, the correct program_name is the name of the simpleaf command as a string. For example, for simpleaf index, it is "simpleaf index" and for simpleaf quant, it is "simpleaf quant".
- For an external command such as awk, if the binary is invokable given the user’s PATH environment variable, it can just be "awk"; if not, it must contain a valid full path to the binary, for example, "/usr/bin/awk".
A command record can also have a “active” boolean field, representing if this command is active. Simpleaf will ignore (neither parse nor invoke) all commands that are inactive ({“active”: false}). For command records missing this field, simpleaf will regard them as active commands.
If a field records a simpleaf command, the name of its sub-fields, except step and program_name, must be valid simpleaf flags (for example, options like --fasta, or -f for short, for simpleaf index and --unfiltered-pl (or -u) for simpleaf quant). Those option names (sub-field names), together with their values, if any, will be used to call the corresponding simpleaf program. Sub-fields not named by a valid simpleaf flag will trigger an error.
If a field records an external command, it must contain valid step and program_name sub-fields as described above. In contrast to simpleaf command records, all arguments of an external shell command must be provided in an array, in order, with the name "arguments". simpleaf workflow will parse the entries in the array to build the actual command in order. For example, to tell simpleaf workflow to invoke the command ls -l -h . at step 7, one needs to use the following JSON record:
```
{
    "step": 7,
    "program_name": "ls",
    "active": true,
    "arguments": ["-l", "-h", "."]
}
```