ssake(1) assembling millions of very short DNA sequences

SYNOPSIS

Progressive assembly of millions of short DNA sequences by k-mer search through a prefix tree and 3' extension.

OPTIONS

-f
Fasta file containing all the [paired (-p 1) / unpaired (-p 0)] reads (required) paired reads must now be separated by ":"
-s
Fasta file containing sequences to use as seeds exclusively (specify only if different from read set, optional)
-m
Minimum number of overlapping bases with the seed/contig during overhang consensus build up (default -m 16)
-o
Minimum number of reads needed to call a base during an extension (default -o 3)
-r
Minimum base ratio used to accept a overhang consensus base (default -r 0.7)
-t
Trim up to -t base(s) on the contig end when all possibilities have been exhausted for an extension (default -t 0)>
-p
Paired-end reads used? (-p 1=yes, -p 0=no, default -p 0)
-v
Runs in verbose mode (-v 1=yes, -v 0=no, default -v 0, optional)
-b
Base name for your output files (optional)

============ Options below only considered with -p 1 ============

-d
Mean distance expected/observed between paired-end reads (default -d 200, optional)
-e
Error (%) allowed on mean distance e.g. -e 0.75 == distance +/- 75% (default -e 0.75, optional)
-k
Minimum number of links (read pairs) to compute scaffold (default -k 2, optional)
-a
Maximum link ratio between two best contig pairs *higher values lead to least accurate scaffolding* (default -a 0.70, optional)
-z
Minimum contig size to track paired-end reads (default -z 50, optional)
-g
Fasta file containing unpaired sequence reads (optional)

AUTHORS

This manual page was written by Andreas Tille <[email protected]> for the Debian system (but may be used by others). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation.

On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL.