index
command#
The index
command has two forms of input; either it will take a reference genome FASTA and GTF as input, from which it can build a spliced+intronic (splici) reference or a spliced+unspliced (spliceu) reference using roers (which is used as a library directly from simpleaf
, and so need not be installed independently), or it will take a single reference sequence file (i.e. FASTA file) as input (direct-ref mode).
In expanded reference mode, after the expanded reference is constructed, the resulting reference will be indexed with piscem build
or salmon index
command (depending on the mapper you choose to use), and a copy of the 3-column transcript-to-gene file will be placed in the index directory for subsequent use. The output directory will contain both a ref
and index
subdirectoy, with the first containing the splici reference that was extracted from the provided genome and GTF, and the latter containing the index built on this reference.
In direct-ref mode, the provided fasta file (passed in with --refseq
) will be provided to piscem build
or salmon index
directly. The output diretory will contain an index
subdirectory that contains the index built on this reference.
The relevant options (which you can obtain by running simpleaf index -h
) are:
build the (expanded) reference index
Usage: simpleaf index [OPTIONS] --output <OUTPUT> <--fasta <FASTA>|--ref-seq <REF_SEQ>>
Options:
-o, --output <OUTPUT> path to output directory (will be created if it doesn't exist)
-t, --threads <THREADS> number of threads to use when running [default: 16]
-k, --kmer-length <KMER_LENGTH> the value of k to be used to construct the index [default: 31]
--keep-duplicates keep duplicated identical sequences when constructing the index
-p, --sparse if this flag is passed, build the sparse rather than dense index for mapping
-h, --help Print help information
-V, --version Print version information
Expanded Reference Options:
--ref-type <REF_TYPE> specify whether an expanded reference, spliced+intronic (or splici) or spliced+unspliced (or spliceu), should be built [default: spliced+intronic]
-f, --fasta <FASTA> reference genome to be used for the expanded reference construction
-g, --gtf <GTF> reference GTF file to be used for the expanded reference construction
-r, --rlen <RLEN> the target read length the splici index will be built for
--dedup deduplicate identical sequences in roers when building an expanded reference reference
--spliced <SPLICED> path to FASTA file with extra spliced sequence to add to the index
--unspliced <UNSPLICED> path to FASTA file with extra unspliced sequence to add to the index
Direct Reference Options:
--ref-seq <REF_SEQ> target sequences (provide target sequences directly; avoid expanded reference construction)
Piscem Index Options:
--use-piscem use piscem instead of salmon for indexing and mapping
-m, --minimizer-length <MINIMIZER_LENGTH> the value of m to be used to construct the piscem index (must be < k) [default: 19]