multiplex-quant command¶
The multiplex-quant command runs the end-to-end simpleaf pipeline for 10x Flex Gene Expression data and related multi-barcode assays. Unlike quant command, which is designed around standard single-cell RNA-seq chemistries and a single cell-barcode whitelist, multiplex-quant handles the extra resources and steps required for Flex assays:
Flex chemistry lookup from the chemistry registry
probe set selection by organism
probe-set CSV to FASTA conversion and
probe_t2g.tsvgenerationprobe index construction with
piscem buildwhen neededcell barcode whitelist resolution
sample barcode list resolution
piscem map-scrnamulti-barcode permit-list generation with
alevin-fry generate-permit-listalevin-fry collateandalevin-fry quant
At present, multiplex-quant expects a registered Flex chemistry such as 10x-flexv1-gex-3p or 10x-flexv2-gex-3p and requires piscem plus alevin-fry to be configured with set-paths command.
Overview¶
The command needs:
a Flex chemistry name via
--chemistryan organism via
--organismfor automatic probe-set selectionpaired-end reads via
--reads1and--reads2an output directory via
--output
If the chemistry registry contains the needed metadata, simpleaf can automatically download and cache the probe set, the cell barcode whitelist, and the sample barcode list. If you already have local resources, you can override these defaults with --index, --probe-set, or --sample-bc-list.
The default output is the standard Matrix Market directory under af_quant/alevin. If you pass --anndata-out, simpleaf will additionally write an AnnData .h5ad file at af_quant/alevin/quants.h5ad.
For multiplex output, the resulting AnnData object is intended to preserve the extra sample-level structure of the experiment:
obs_namesare sample-qualified cell identifiersobs["cell_barcode"]stores the corrected cell barcode without the sample prefixobs["sample_name"]stores the sample / probe-barcode assignmentvar["gene_id"]remains the matrix feature identifiervar["gene_symbol"]is added when agene_id_to_name.tsvmapping is available from the probe set or indexunsstores the standardgpl_info,collate_info,quant_info, andsimpleaf_map_inforecords, and for multiplex runs it also storessample_infoplussimpleaf_multiplex_quant_info
The relevant options (which you can obtain by running simpleaf multiplex-quant -h) are below:
quantify a multiplexed sample (e.g. 10x Flex, or any custom multi-barcode protocol)
Usage: simpleaf multiplex-quant [OPTIONS] --output <OUTPUT>
Options:
-c, --chemistry <CHEMISTRY> Chemistry name (e.g. 10x-flexv1-gex-3p). Provides defaults for geometry, cell BC whitelist, sample BC list, and probe set. All can be overridden individually. If omitted, --geometry and --cell-bc-list are required
--organism <ORGANISM> Target organism for automatic probe set selection [possible values: human, mouse]
--cell-bc-list <CELL_BC_LIST>
Path to cell barcode whitelist (one barcode per line, overrides chemistry default)
--expected-ori <EXPECTED_ORI>
Expected read orientation: fw, rc, or both [default: both]
-o, --output <OUTPUT> Path to output directory
-t, --threads <THREADS> Number of threads to use [default: 16]
-r, --resolution <RESOLUTION> UMI resolution mode [default: cr-like] [possible values: cr-like, cr-like-em, parsimony, parsimony-em, parsimony-gene, parsimony-gene-em]
-h, --help Print help
-V, --version Print version
Mapping Options:
-i, --index <INDEX> Path to pre-built probe index (overrides auto-build)
-1, --reads1 <READS1> Comma-separated list of R1 FASTQ files
-2, --reads2 <READS2> Comma-separated list of R2 FASTQ files
Probe Set Options:
--probe-set <PROBE_SET> Path to probe set CSV or FASTA (overrides auto-download)
--sample-bc-list <SAMPLE_BC_LIST> Path to sample/probe barcode file with rotation mapping
--kmer-length <KMER_LENGTH> k-mer length for probe index building [default: 23]
Reference Options:
-m, --t2g-map <T2G_MAP> Path to a transcript-to-gene map file
--usa Resolve expression into separate spliced and unspliced counts. This requires splicing-aware probe annotations: either a probe CSV with a ``region`` column containing ``spliced`` / ``unspliced`` values, or a pre-built index with an adjacent 3-column t2g file
Piscem Mapping Options:
--skipping-strategy <SKIPPING_STRATEGY> The skipping strategy to use for k-mer collection [default: permissive] [possible values: permissive, strict]
--struct-constraints If piscem >= 0.7.0, enable structural constraints
--max-ec-card <MAX_EC_CARD> Maximum cardinality equivalence class to examine [default: 4096]
Permit List Options:
--min-reads <MIN_READS> Minimum read count threshold for unfiltered permit list [default: 10]
Output Options:
--anndata-out Generate an anndata (h5ad format) count matrix from the standard (matrix-market format) output
Resource resolution¶
multiplex-quant resolves resources in the following order:
Probe index: If
--indexis provided,simpleafaccepts either asimpleaf indexoutput directory, itsindex/subdirectory, thepiscem_idxprefix within that directory, or a multiplex probe-index directory/prefix. It will reuse adjacent metadata and t2g files when present.Probe set: If
--probe-setis provided, it overrides the registry entry. A CSV probe set is converted into a FASTA plus a gene-levelprobe_t2g.tsvautomatically, and if proberegionannotations are present it also produces a USA-mode t2g for--usa. A FASTA input is accepted as-is, andsimpleafgenerates an identity-style t2g mapping from the FASTA headers.Automatic probe-set selection: If neither
--indexnor--probe-setis provided,simpleaflooks up the requested--organismin the selected chemistry’s registered probe sets, downloads the matching probe CSV if needed, and builds a cached probe index.Cell barcode whitelist: This is resolved from the selected chemistry’s permit-list metadata in the registry.
Sample barcode list: This is resolved from
--sample-bc-listif provided, otherwise from the selected chemistry’s registry metadata.
USA-mode requirements¶
--usa is optional. If it is not provided, multiplex-quant collapses probe expression to the gene level even when splicing annotations are available.
If --usa is provided, the reference must carry splicing-aware annotations:
For probe CSV input, the CSV must contain a
regioncolumn and each included probe must have valuesplicedorunspliced.For pre-built indices,
simpleafmust be able to find an adjacent 3-column t2g such ast2g_3col.tsvorprobe_t2g_usa.tsv.FASTA probe sets do not encode splicing status, so they are not compatible with
--usaunless you also provide an explicit splicing-aware--t2g-map.
If the required splicing annotations are not available, simpleaf will stop with an error that explains which input is missing the needed information and suggests rerunning without --usa.
Examples¶
Use a registry-backed Flex chemistry with automatic resource resolution:
$ export ALEVIN_FRY_HOME=/path/to/af_home
$ simpleaf multiplex-quant \
--chemistry 10x-flexv2-gex-3p \
--organism human \
--reads1 sample_R1.fastq.gz \
--reads2 sample_R2.fastq.gz \
--output flex_out
Use local probe-set and sample-barcode files instead of downloading them:
$ simpleaf multiplex-quant \
--chemistry 10x-flexv1-gex-3p \
--organism mouse \
--probe-set /path/to/probe_set.csv \
--sample-bc-list /path/to/sample_bc.tsv \
--reads1 lane1_R1.fastq.gz,lane2_R1.fastq.gz \
--reads2 lane1_R2.fastq.gz,lane2_R2.fastq.gz \
--output flex_out
Use a pre-built probe index:
$ simpleaf multiplex-quant \
--chemistry 10x-flexv2-gex-3p \
--organism human \
--index /path/to/simpleaf_index_output \
--reads1 sample_R1.fastq.gz \
--reads2 sample_R2.fastq.gz \
--output flex_out
Request AnnData output in addition to the Matrix Market output:
$ simpleaf multiplex-quant \
--chemistry 10x-flexv1-gex-3p \
--organism human \
--reads1 sample_R1.fastq.gz \
--reads2 sample_R2.fastq.gz \
--output flex_out \
--anndata-out
Request USA-mode probe quantification:
$ simpleaf multiplex-quant \
--chemistry 10x-flexv2-gex-3p \
--organism human \
--probe-set /path/to/probe_set.csv \
--usa \
--reads1 sample_R1.fastq.gz \
--reads2 sample_R2.fastq.gz \
--output flex_out
Output¶
The command creates the requested output directory and writes:
af_map/: thepiscemmapping outputaf_quant/: thealevin-frypermit-list, collate, and quantification outputaf_quant/simpleaf_map_info.json: parsed mapping metadata copied into the quantification directory for downstream consumers such as AnnData conversionaf_quant/simpleaf_multiplex_quant_info.json: multiplex pipeline metadata copied into the quantification directory so it can be embedded into AnnDataunsaf_quant/gene_id_to_name.tsv: optional gene ID to gene symbol/name mapping copied when available from the probe set or indexaf_quant/alevin/quants.h5ad: optional AnnData output written when--anndata-outis requestedsimpleaf_multiplex_quant_info.json: a metadata record describing the resolved inputs, executed commands, and step timings
Notes¶
multiplex-quantis specific to registered Flex GEX chemistries and related multi-barcode protocols. For standard scRNA-seq chemistries and general custom geometries, use quant command.The Flex pipeline currently uses
piscemfor mapping.By default, probe expression is grouped at the gene level. Pass
--usaonly when the input probe set or pre-built index carries explicit splicing annotations.