man hhsearch (1): search a database of HMMs with a query alignment or query HMM

SYNOPSIS

hhsearch ,-i query -d database /[,options/]

DESCRIPTION

HHsearch version 2.0.16 (January 2013) Search a database of HMMs with a query alignment or query HMM (C) Johannes Soeding, Michael Remmert, Andreas Biegert, Andreas Hauser Soding, J. Protein homology detection by HMM-HMM comparison. Bioinformatics 21:951-960 (2005).

-i <file>: input/query multiple sequence alignment (a2m, a3m, FASTA) or HMM
-d <file>: HMM database of concatenated HMMs in hhm, HMMER, or a3m format, OR, if file has extension pal, list of HMM file names, one per line. Multiple dbs, HMMs, or pal files with -d '<db1> <db2>...'

<file> may be 'stdin' or 'stdout' throughout.

Output options:

-o <file>: write results in standard format to file (default=<infile.hhr>)
-Ofas <file>: write pairwise alignments of significant matches in FASTA format Analogous for output in a3m, a2m, and psi format (e.g. -Oa3m)
-oa3m <file>: write MSA of significant matches in a3m format Analogous for output in a2m, psi, and hhm format (e.g. -ohhm)
-e [0,1]: E-value cutoff for inclusion in multiple alignment (def=0.001)
-seq <int>: max. number of query/template sequences displayed (def=1) Beware of overflows! All these sequences are stored in memory.
-cons: show consensus sequence as master sequence of query MSA
-nocons: don't show consensus sequence in alignments (default=show)
-nopred: don't show predicted 2ndary structure in alignments (default=show)
-nodssp: don't show DSSP 2ndary structure in alignments (default=show)
-ssconf: show confidences for predicted 2ndary structure in alignments
-p <float>: minimum probability in summary and alignment list (def=20)
-E <float>: maximum E-value in summary and alignment list (def=1E+06)
-Z <int>: maximum number of lines in summary hit list (def=500)
-z <int>: minimum number of lines in summary hit list (def=10)
-B <int>: maximum number of alignments in alignment list (def=500)
-b <int>: minimum number of alignments in alignment list (def=10)
-aliw [40,..[: number of columns per line in alignment list (def=80)
-dbstrlen: max length of database string to be printed in hhr file

Filter query multiple sequence alignment

-id: [0,100] maximum pairwise sequence identity (%) (def=90)
-diff [0,inf[: filter MSA by selecting most diverse set of sequences, keeping at least this many seqs in each MSA block of length 50 (def=100)
-cov: [0,100] minimum coverage with query (%) (def=0)
-qid: [0,100] minimum sequence identity with query (%) (def=0)
-qsc: [0,100] minimum score per column with query (def=-20.0)
-neff [1,inf]: target diversity of alignment (default=off)

Input alignment format:

-M a2m: use A2M/A3M (default): upper case = Match; lower case = Insert; '-' = Delete; '.' = gaps aligned to inserts (may be omitted)
-M first: use FASTA: columns with residue in 1st sequence are match states
-M [0,100]: use FASTA: columns with fewer than X% gaps are match states
-tags: do NOT neutralize His-, C-myc-, FLAG-tags, and trypsin recognition sequence to background distribution

HMM-HMM alignment options:

-norealign: do NOT realign displayed hits with MAC algorithm (def=realign)
-mact [0,1[: posterior probability threshold for MAC re-alignment (def=0.350) Parameter controls alignment greediness: 0:global >0.1:local
-glob/-loc: use global/local alignment mode for searching/ranking (def=local)
-alt <int>: show up to this many significant alternative alignments(def=2)
-vit: use Viterbi algorithm for searching/ranking (default)
-mac: use Maximum Accuracy (MAC) algorithm for searching/ranking
-forward: use Forward probability for searching
-excl <range>: exclude query positions from the alignment, e.g. '1-33,97-168'
-shift [-1,1]: score offset (def=-0.03)
-corr [0,1]: weight of term for pair correlations (def=0.10)
-sc: <int> amino acid score (tja: template HMM at column j) (def=1)
0: = log2 Sum(tja*qia/pa) (pa: aa background frequencies)
1: = log2 Sum(tja*qia/pqa) (pqa = 1/2*(pa+ta) )
2: = log2 Sum(tja*qia/ta) (ta: av. aa freqs in template)
3: = log2 Sum(tja*qia/qa) (qa: av. aa freqs in query)
5: local amino acid composition correction
-ssm {0,..,4}: 0: no ss scoring 1,2: ss scoring after or during alignment [default=2] 3,4: ss scoring after or during alignment, predicted vs. predicted
-ssw: [0,1] weight of ss score compared to column score (def=0.11)
-ssa: [0,1] SS substitution matrix = (1-ssa)*I + ssa*full-SS-substition-matrix [def=1.00)

Gap cost options:

-gapb [0,inf[: Transition pseudocount admixture (def=1.00)
-gapd [0,inf[: Transition pseudocount admixture for open gap (default=0.15)
-gape [0,1.5]: Transition pseudocount admixture for extend gap (def=1.00)
-gapf ]0,inf]: factor to increase/reduce the gap open penalty for deletes (def=0.60)
-gapg ]0,inf]: factor to increase/reduce the gap open penalty for inserts (def=0.60)
-gaph ]0,inf]: factor to increase/reduce the gap extend penalty for deletes(def=0.60)
-gapi ]0,inf]: factor to increase/reduce the gap extend penalty for inserts(def=0.60)
-egq: [0,inf[ penalty (bits) for end gaps aligned to query residues (def=0.00)
-egt: [0,inf[ penalty (bits) for end gaps aligned to template residues (def=0.00)

Pseudocount (pc) options:

-pcm {0,..,3}: position dependence of pc admixture 'tau' (pc mode, default=2) 0: no pseudo counts: tau = 0 1: constant tau = a 2: diversity-dependent: tau = a/(1 + ((Neff[i]-1)/b)^c) (Neff[i]: number of effective seqs in local MSA around column i) 3: constant diversity pseudocounts
-pca: [0,1] overall pseudocount admixture (def=1.0)
-pcb: [1,inf[ Neff threshold value for -pcm 2 (def=1.5)
-pcc: [0,3] extinction exponent c for -pcm 2 (def=1.0)

Context-specific pseudo-counts:

-nocontxt: use substitution-matrix instead of context-specific pseudocounts
-contxt <file> context file for computing context-specific pseudocounts (default=./data/context_data.lib)
-cslib: <file> column state file for fast database prefiltering (default=./data/cs219.lib)
-csw: [0,inf] weight of central position in cs pseudocount mode (def=1.6)
-csb: [0,1] weight decay parameter for positions in cs pc mode (def=0.9)

Other options:

-cpu <int>: number of CPUs to use (for shared memory SMPs) (default=1)
-v <int>: verbose mode: 0:no screen output 1:only warings 2: verbose
-maxres <int>: max number of HMM columns (def=15002)
-maxmem [1,inf[ max available memory in GB (def=3.0)
-scores <file> write scores for all pairwise comparisions to file
-calm {0,..,3} empirical score calibration of 0:query 1:template 2:both
: default 3: neural network-based estimation of EVD params

Example: hhsearch -i a.1.1.1.a3m -d scop70_1.71.hhm