cdbfasta(1)
Creates an index file for records from a multi-fasta file.
DESCRIPTION
Usage:
-
cdbfasta <fastafile> [-o <index_file>] [-r <record_delimiter>]
-
[-z <compressed_db>] [-i] [-m|-n <numkeys>|-f<LIST>]|-c|-C]
-
[-w <stopwords_list>] [-s <stripendchars>] [-v]
-
Creates an index file for records from a multi-fasta file.
By default (without -m/-n/-c/-C option), only the first
space-delimited token from the defline is used as a key.
-
<fastafile> is the multi-fasta file to index;
-o the index file will be named <index_file>; if not given,
-
the index filename is database name plus the suffix '.cidx'
-
-r <record_delimiter> a string of characters at the beginning of line
-
- marking the start of a record (default: '>')
-
-Q treat input as fastq format, i.e. with '@' as record delimiter
-
- and with records expected to have at least 4 lines
-
-z database is compressed into the file <compressed_db>
-
- before indexing (<fastafile> can be "-" or "stdin"
in order to get the input records from stdin)
-
-s strip extraneous characters from *around* the space delimited
-
- tokens, for the multikey options below (-m,-n,-f);
Default <stripendchars> set is: '",`.(){}/[]!:;~|><+-
-
-m ("multi-key" option) create hash entries pointing to
-
- the same record for all tokens found in
the defline
-
-n <numkeys> same as -m, but only takes the first <numkeys>
-
- tokens from the defline
-
-f indexes *space* delimited tokens (fields) in the defline as given
-
- by LIST of fields or fields ranges (the same syntax as UNIX 'cut')
-
-w <stopwordslist> exclude from indexing all the words found
-
- in the file <stopwordslist> (for options -m, -n and -k)
-
-i do case insensitive indexing (i.e. create additional keys for
-
- all-lowercase tokens used for indexing from the defline
-
-c for deflines in the format: db1|accession1|db2|accession2|...,
-
- only the first db-accession pair ('db1|accession1') is taken as key
-
-C like -c, but also subsequent db|accession constructs are indexed,
-
- along with the full (default) token; additionally,
all nrdb concatenated accessions found in the defline
are parsed and stored (assuming 0x01 or '^|^' as separators)
-
-a accession mode: like -C option, but indexes the 'accession'
-
- part for all 'db|accession' constructs found
-
-A like -a and -C together (both accessions and 'db|accession'
-
- constructs are used as keys
-
-v show program version and exit
-