SYNOPSIS
minimap [-lSOV] [-k kmer] [-w winSize] [-I batchSize] [-d dumpFile] [-f occThres] [-r bandWidth] [-m minShared] [-c minCount] [-L minMatch] [-g maxGap] [-T dustThres] [-t nThreads] [-x preset] target.fa query.fa > output.paf
DESCRIPTION
Minimap is a tool to efficiently find multiple approximate mapping positions between two sets of long sequences, such as between reads and reference genomes, between genomes and between long noisy reads. Minimap has an indexing and a mapping phase. In the indexing phase, it collects all minimizers of a large batch of target sequences in a hash table; in the mapping phase, it identifies good clusters of colinear minimizer hits. Minimap does not generate detailed alignments between the target and the query sequences. It only outputs the approximate start and the end coordinates of these clusters.
OPTIONS
Indexing options
- -k INT
-
Minimizer k-mer length [15]
- -w INT
-
Minimizer window size [2/3 of k-mer length]. A minimizer is the smallest k-mer
in a window of w consecutive k-mers.
- -I NUM
-
Load at most
NUM
target bases into RAM for indexing [4G]. If there are more than
NUM
bases in
target.fa,
minimap needs to read
query.fa
multiple times to map it against each batch of target sequences.
NUM
may be ending with k/K/m/M/g/G.
- -d FILE
-
Dump minimizer index to
FILE
[no dump]
- -l
-
Indicate that
target.fa
is in fact a minimizer index generated by option
-d,
not a FASTA or FASTQ file.
Mapping options
- -f FLOAT
-
Ignore top
FLOAT
fraction of most occurring minimizers [0.001]
- -r INT
-
Approximate bandwidth for initial minimizer hits clustering [500]. A
minimizer hit
is a minimizer present in both the target and query sequences. A
minimizer hit cluster
is a group of potentially colinear minimizer hits between a target and a query
sequence.
- -m FLOAT
-
Merge initial minimizer hit clusters if
FLOAT
or higher fraction of minimizers are shared between the clusters [0.5]
- -c INT
-
Retain a minimizer hit cluster if it contains
INT
or more minimizer hits [4]
- -L INT
-
Discard a minimizer hit cluster if after colinearization, the number of matching bases is below
INT
[40]. This option mainly reduces the size of output. It has little effect on
the speed and peak memory.
- -g INT
-
Split a minimizer hit cluster at a gap
INT-bp
or longer that does not contain any minimizer hits [10000]
- -T INT
-
Mask regions on query sequences with SDUST score threshold
INT;
0 to disable [0]. SDUST is an algorithm
to identify low-complexity subsequences. It is not enabled by default. If SDUST
is preferred, a value between 20 and 25 is recommended. A higher threshold masks
less sequences.
- -S
-
Perform all-vs-all mapping. In this mode, if the query sequence name is
lexicographically larger than the target sequence name, the hits between them
will be suppressed; if the query sequence name is the same as the target name,
diagonal minimizer hits will also be suppressed.
- -O
-
Drop a minimizer hit if it is far away from other hits (EXPERIMENTAL). This
option is useful for mapping long chromosomes from two diverged species.
- -x STR
-
Changing multiple settings based on
STR
[not set]. It is recommended to apply this option before other options, such
that the following options may override the multiple settings modified by this
option.
-
- ava10k
- for PacBio or Oxford Nanopore all-vs-all read mapping (-Sw5 -L100 -m0).
-
Input/output options
- -t INT
-
Number of threads [3]. Minimap uses at most three threads when collecting
minimizers on target sequences, and uses up to
INT+1
threads when mapping (the extra thread is for I/O, which is frequently idle and
takes little CPU time).
- -V
-
Print version number to stdout
OUTPUT FORMAT
Minimap outputs mapping positions in the Pairwise mApping Format (PAF). PAF is a TAB-delimited text format with each line consisting of at least 12 fields as are described in the following table:
|
When the alignment is available, column 11 gives the total number of sequence matches, mismatches and gaps in the alignment; column 10 divided by column 11 gives the alignment identity. As minimap does not generate detailed alignment, these two columns are approximate. PAF may optionally have additional fields in the SAM-like typed key-value format. Minimap writes the number of minimizer hits in a cluster to the cm tag.