quant
command#
- The
quant
command takes as input either: the index, reads, and relevant information about the experiment (e.g. the chemistry) OR
the directory containing the result of a previous mapping run, and relevant information about the experiemnt (e.g. the chemistry)
and runs all relevant the steps of the alevin-fry
pipeline. When processing a new dataset from scratch, the first option is the one you are likely interested in (you will provide the --index
, --reads1
and --reads2
arguments). If multiple read files are provided to the --reads1
and --reads2
arguments, those files must be comma (,) separated.
On the other hand, if you have already performed quantification or have, for some other reason, already mapped the reads to produce a RAD file, you can start the process from the mapped read directory directly using the --map-dir
argument instead. This latter approach makes it easy to test out different quantification approaches (e.g. different filtering options or UMI resolution strategies).
Note: If you use the unfiltered-permit-list -u
mode for permit-list generation, and you are using either 10xv2
or 10xv3
chemistry, you can provide the flag by itself, and simpleaf
will automatically fetch and apply the appropriate unifltered permit list. However, if you are using -u
with any other chemistry, you must explicitly provide a path to the unfiltered permit list to be used. The -d
/--expected-ori
flag allows controlling the like-named option that is passed to the generate-permit-list
command of alevin-fry
. This is an “optional” option. If it is not provided explicitly, it is set to “both” (allowing reads aligning in both orientations to pass through), unless the chemistry is set as 10xv2
or 10xv3
, in which case it is set as “fw”. Regardless of the chemistry, if the user sets this option explicitly, this choice is respected.
A note on the --chemistry
flag#
Note
The geometry specification language has changed in simpleaf
v0.9.0 and above. This change is to unify the geometry description language between simpleaf
and the tools in the backend that actually perform the fragment mapping. Further, the new laguage is more general, capable and exensible, so it will be easier to add more features in the future in a backward compatible manner. However, this means that if you have a custom_chemistries.json
file from before simpleaf
v0.9.0, you will have to re-create that file with the new chemistries by overwriting them with the custom geometry descriptions in the new format.
The --chemistry
option can take either a string describing the specific chemisty, or a string describing the geometry of the barcode, umi and mappable read. For example, the string 10xv2
and 10xv3
will apply the appropriate settings for the 10x chromium v2 and v3 protocols respectively. However, general geometries can be provided as well, in case the chemistry you are trying to use has not been added as a pre-registered option. For example, the instead of providing the --chemistry
flag with the string 10xv2
, you could instead provide it with the string "1{b[16]u[10]x:}2{r:}"
, or, instead of providing 10xv3
you could provide "1{b[16]u[12]x:}2{r:}"
.
The custom format is as follows; you must specify the content of read 1 and read 2 in terms of the barcode, UMI, and mappable read sequence. A specification looks like this:
1{b[16]u[12]x:}2{r:}
In particular, this is how one would specify the 10x Chromium v3 geometry using the custom syntax. The format string says that the read pair should be interpreted as read 1 1{...}
followed by read 2 2{...}
. The syntax inside the {}
says how the read should be interpreted. Here b[16]u[12]x:
means that the first 16 bases constitute the barcode, the next 12 constitute the UMI, and anything that comes after that (if it exists) until the end of read 1 should be discarded (x
). For read 2, we have 2{r:}
, meaning that we should interpret read 2, in it’s full length, as biological sequence.
It is possible to have pieces of geometry repeated, in which case they will be extracted and concatenated together. For example, 1{b[16]u[12]b[4]x:}
would mean that we should obtain the barcode by extracting bases 1-16 (1-based indexing) and 29-32 and concatenating them togehter to obtain the full barcode. A
Note
If you use a custom geometry frequently, you can add it to a json file custom_chemistries.json
in the ALEVIN_FRY_HOME
directory. This file simply acts as a key-value store mapping each custom geometry to the name you wish to use for it. For example, putting the contents below into this file would allow you to pass --chemistry flarb
to the simpleaf quant
command, and it would interpret the reads as having the specified geometry (in this case, the same as the 10xv3
geometry). Multiple custom chemistries can be added by simply adding more entries to this json file.
{
"flarb" : "1{b[16]u[12]x:}2{r:}"
}
The relevant options (which you can obtain by running simpleaf quant -h
) are below:
quantify a sample
Usage: simpleaf quant [OPTIONS] --chemistry <CHEMISTRY> --output <OUTPUT> --resolution <RESOLUTION> <--knee|--unfiltered-pl [<UNFILTERED_PL>]|--forced-cells <FORCED_CELLS>|--expect-cells <EXPECT_CELLS>> <--index <INDEX>|--map-dir <MAP_DIR>>
Options:
-c, --chemistry <CHEMISTRY> chemistry
-o, --output <OUTPUT> output directory
-t, --threads <THREADS> number of threads to use when running [default: 16]
-h, --help Print help information
-V, --version Print version information
Mapping Options:
-i, --index <INDEX> path to index
-1, --reads1 <READS1> comma-separated list of paths to read 1 files
-2, --reads2 <READS2> comma-separated list of paths to read 2 files
-s, --use-selective-alignment use selective-alignment for mapping (instead of pseudoalignment with structural constraints)
--use-piscem use piscem for mapping (requires that index points to the piscem index)
--map-dir <MAP_DIR> path to a mapped output directory containing a RAD file to skip mapping
Permit List Generation Options:
-k, --knee use knee filtering mode
-u, --unfiltered-pl [<UNFILTERED_PL>] use unfiltered permit list
-f, --forced-cells <FORCED_CELLS> use forced number of cells
-x, --explicit-pl <EXPLICIT_PL> use a filtered, explicit permit list
-e, --expect-cells <EXPECT_CELLS> use expected number of cells
-d, --expected-ori <EXPECTED_ORI> The expected direction/orientation of alignments in the chemistry being processed. If not provided, will default to `fw` for 10xv2/10xv3, otherwise `both` [possible
values: fw, rc, both]
--min-reads <MIN_READS> minimum read count threshold for a cell to be retained/processed; only used with --unfiltered-pl [default: 10]
UMI Resolution Options:
-m, --t2g-map <T2G_MAP> transcript to gene map
-r, --resolution <RESOLUTION> resolution mode [possible values: cr-like, cr-like-em, parsimony, parsimony-em, parsimony-gene, parsimony-gene-em]