formatrpsdb(1) Build databases for RPS Blast


formatrpsdb [-] [-E N] [-G N] [-S X] [-U str] [-b] [-f X] -i filename [-l filename] [-n str] [-o] [-t str] [-v N]


Formatrpsdb is a utility that converts a collection of input sequences into a database suitable for use with Reverse Position Specific (RPS) Blast. Each input sequence, together with its position-specific scoring matrix (PSSM), is ASN.1 encoded into a PssmWithParameters (or `scoremat') object and resides in a separate file. Scoremat objects can be created using blastpgp. Formatrpsdb is given a list of these files and produces the corresponding database.

Formatrpsdb is designed to perform the work of formatdb, makemat and copymat simultaneously, without generating the large number of intermediate files these utilities would need to create an RPS Blast database. Further, scoremat objects are in more general use than the binary format makemat requires. It is hoped that direct manipulation of scoremat objects will encourage conversion of more diverse sequence collections into RPS Blast databases.

Databases generated by formatrpsdb are binary compatible with databases generated by formatdb/makemat/copymat, although the database files will in general not be byte- for-byte identical.


A summary of options is included below.
Print usage message
-E N
The gap extension penalty (if not specified in the scoremat; default = 1)
-G N
The gap opening penalty (if not specified in the scoremat; default = 11)
-S X
For scoremats that contain only residue frequencies, the scaling factor to apply when creating PSSMs (default = 100)
-U str
Underlying score matrix (if not specified in the scoremat; default = BLOSUM62)
Scoremat files are binary (vs. text) ASN1.
-f X
Threshold for extending hits for RPS database (default = 11)
-i filename
Input file containing list of ASN.1 Scoremat filenames
-l filename
Log file name (default = formatrpsdb.log)
-n str
Base name of output database (same as input file if not specified)
Create index files for database
-t str
Title for database file
-v N
Database volume size in millions of letters (default = 0, which really means no limit)


The National Center for Biotechnology Information.