hhblits(1) fast homology detection method to iteratively search a HMM database


hhblits ,-i query /[,options/]


HHblits version 2.0.16 (January 2013): HMM-HMM-based lightning-fast iterative sequence search HHblits is a sensitive, general-purpose, iterative sequence search tool that represents both query and database sequences by HMMs. You can search HHblits databases starting with a single query sequence, a multiple sequence alignment (MSA), or an HMM. HHblits prints out a ranked list of database HMMs/MSAs and can also generate an MSA by merging the significant database HMMs/MSAs onto the query MSA.

Remmert M., Biegert A., Hauser A., and Soding J. HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9:173-175 (2011) (C) Johannes Soeding, Michael Remmert, Andreas Biegert, Andreas Hauser

-i <file>
input/query: single sequence or multiple sequence alignment (MSA) in a3m, a2m, or FASTA format, or HMM in hhm format

<file> may be 'stdin' or 'stdout' throughout.


-d <name>
database name (e.g. uniprot20_29Feb2012) (default=)
[1,8] number of iterations (default=2)
[0,1] E-value cutoff for inclusion in result alignment (def=0.001)

Input alignment format:

-M a2m
use A2M/A3M (default): upper case = Match; lower case = Insert;
' -' = Delete; '.' = gaps aligned to inserts (may be omitted)
-M first
use FASTA: columns with residue in 1st sequence are match states
-M [0,100]
use FASTA: columns with fewer than X% gaps are match states

Output options:

-o <file>
write results in standard format to file (default=<infile.hhr>)
-oa3m <file>
write result MSA with significant matches in a3m format
-opsi <file>
write result MSA of significant matches in PSI-BLAST format
-ohhm <file>
write HHM file for result MSA of significant matches
-oalis <name>
write MSAs in A3M format after each iteration
-Ofas <file>
write pairwise alignments of significant matches in FASTA format Analogous for output in a3m and a2m format (e.g. -Oa3m)
-qhhm <file>
write query input HHM file of last iteration (default=off)
-seq <int>
max. number of query/template sequences displayed (default=1)
-aliw <int>
number of columns per line in alignment list (default=80)
-p [0,100]
minimum probability in summary and alignment list (default=20)
-E [0,inf[
maximum E-value in summary and alignment list (default=1E+06)
-Z <int>
maximum number of lines in summary hit list (default=500)
-z <int>
minimum number of lines in summary hit list (default=10)
-B <int>
maximum number of alignments in alignment list (default=500)
-b <int>
minimum number of alignments in alignment list (default=10)

Prefilter options

disable all filter steps
disable all filter steps (except for fast prefiltering)
disable additional filtering of prefiltered HMMs
-noblockfilter search complete matrix in Viterbi
max number of hits allowed to pass 2nd prefilter (default=20000)

Filter options applied to query MSA, database MSAs, and result MSA

show all sequences in result MSA; do not filter result MSA
[0,100] maximum pairwise sequence identity (def=90)
-diff [0,inf[
filter MSAs by selecting most diverse set of sequences, keeping at least this many seqs in each MSA block of length 50 (def=1000)
[0,100] minimum coverage with master sequence (%) (def=0)
[0,100] minimum sequence identity with master sequence (%) (def=0)
[0,100] minimum score per column with master sequence (default=-20.0)
-neff [1,inf]
target diversity of multiple sequence alignment (default=off)

HMM-HMM alignment options:

do NOT realign displayed hits with MAC algorithm (def=realign)
-mact [0,1[
posterior probability threshold for MAC re-alignment (def=0.350) Parameter controls alignment greediness: 0:global >0.1:local
use global/local alignment mode for searching/ranking (def=local)
-realign_max <int>
realign max. <int> hits (default=1000)
-alt <int>
show up to this many significant alternative alignments(def=2)
-premerge <int> merge <int> hits to query MSA before aligning remaining hits (def=3)
-shift [-1,1]
profile-profile score offset (def=-0.03)
-ssm {0,..,4}
0: no ss scoring 1,2: ss scoring after or during alignment [default=2] 3,4: ss scoring after or during alignment, predicted vs. predicted
-ssw [0,1]
weight of ss score (def=0.11)

Gap cost options:

-gapb [0,inf[
Transition pseudocount admixture (def=1.00)
-gapd [0,inf[
Transition pseudocount admixture for open gap (default=0.15)
-gape [0,1.5]
Transition pseudocount admixture for extend gap (def=1.00)
-gapf ]0,inf]
factor to increase/reduce gap open penalty for deletes (def=0.60)
-gapg ]0,inf]
factor to increase/reduce gap open penalty for inserts (def=0.60)
-gaph ]0,inf]
factor to increase/reduce gap extend penalty for deletes(def=0.60)
-gapi ]0,inf]
factor to increase/reduce gap extend penalty for inserts(def=0.60)
[0,inf[ penalty (bits) for end gaps aligned to query residues (def=0.00)
[0,inf[ penalty (bits) for end gaps aligned to template residues (def=0.00)

Pseudocount (pc) options:

-pcm {0,..,2}
position dependence of pc admixture 'tau' (pc mode, default=2) 0: no pseudo counts: tau = 0 1: constant tau = a 2: diversity-dependent: tau = a/(1 + ((Neff[i]-1)/b)^c) (Neff[i]: number of effective seqs in local MSA around column i)
[0,1] overall pseudocount admixture (def=1.0)
[1,inf[ Neff threshold value for -pcm 2 (def=1.5)
[0,3] extinction exponent c for -pcm 2 (def=1.0)
-pre_pca [0,1]
PREFILTER pseudocount admixture (def=0.8)
-pre_pcb [1,inf[ PREFILTER threshold for Neff (def=1.8)

Context-specific pseudo-counts:

use substitution-matrix instead of context-specific pseudocounts
-contxt <file> context file for computing context-specific pseudocounts (default=./data/context_data.lib)
<file> column state file for fast database prefiltering (default=./data/cs219.lib)

Predict secondary structure
add 2ndary structure predicted with PSIPRED to result MSA
-psipred <dir> directory with PSIPRED executables (default=)
-psipred_data <dir>
directory with PSIPRED data (default=)

Other options:

-v <int>
verbose mode: 0:no screen output 1:only warings 2: verbose (def=2)
-neffmax ]1,20] skip further search iterations when diversity Neff of query MSA
becomes larger than neffmax (default=10.0)
-cpu <int>
number of CPUs to use (for shared memory SMPs) (default=2)
-scores <file> write scores for all pairwise comparisions to file
<file> write all alignments in tabular layout to file
-maxres <int>
max number of HMM columns (def=15002)
-maxmem [1,inf[ max available memory in GB (def=3.0)


hhblits -i query.fas -o query.hhr -d ./uniprot20

hhblits -i query.fas -o query.hhr -oa3m query.a3m -n 1 -d ./uniprot20

Download databases from <ftp://toolkit.genzentrum.lmu.de/pub/HH-suite/databases/>.