fsm-lite(1) Frequency-based String Mining


fsm-lite -l <file> -t <file> [options]


A singe-core implementation of frequency-based substring mining used in bioinformatics to extract substrings that discriminate two (or more) datasets inside high-throughput sequencing data.



-l,--list <file>
Text file that lists all input files as whitespace-separated pairs
<data-name> <data-filename>
where <data-name> is unique identifier (without whitespace) and <data-filename> is full path to each input file. Default data file format is FASTA (uncompressed).
-t,--tmp <file>
Store temporary index data


-m,--min <int>
Minimum length to report (default 9)
-M,--max <int>
Maximum length to report (default 100)
-f,--freq <int>
Minimum frequency per input file to report (default 1)
-s,--minsupp <int>
Minimum number of input files with support to report (default 2)
-S,--maxsupp <int>
Maximum number of input files with support to report (default inf)
Verbose output


