megablast(1) Basic Local Alignment Search Tool

Other Alias

bl2seq, blast2, blastall, blastall_old, blastcl3, blastpgp, impala, rpsblast, seedtop

SYNOPSIS

bl2seq [-] [-A] [-D N] [-E N] [-F str] [-G N] [-I "start stop"] [-J "start stop"] [-M str] [-S N] [-T] [-U] [-V] [-W N] [-X N] [-Y X] [-a filename] [-d N] [-e X] [-g F] -i filename -j filename [-m] [-o filename] -p str [-q N] [-r N] [-t N]

blast2 [-] [-B N] [-D N] [-C x] [-E N] [-F str] [-G N] [-H] [-I "start stop"] [-J "start stop"] [-K N] [-L] [-M str] [-N] [-P X] [-Q N] [-R] [-S N] [-T N] [-W N] [-X N] [-Y X] [-Z N] [-a N] [-b N] [-c] [-d str] [-e X] [-f X] [-g] [-h N] [-i filename] [-j filename] [-k str] [-m N] [-n] [-o filename] -p str [-q N] [-r N] [-s] [-t N] [-u] [-v N] [-w N] [-y N] [-z N]

blastall [-] [-A N] [-B N] [-C x] [-D N] [-E N] [-F str] [-G N] [-I] [-J] [-K N] [-L start,stop] [-M str] [-O filename] [-P N] [-Q N] [-R filename] [-S] [-T] [-U] [-V] [-W N] [-X N] [-Y X] [-Z N] [-a N] [-b N] [-d str] [-e X] [-f X] [-g F] [-i filename] [-l str] [-m N] [-n] [-o filename] -p str [-q N] [-r N] [-s] [-t N] [-v N] [-w N] [-y X] [-z X]

blastall_old [-] [-A N] [-B N] [-C x] [-D N] [-E N] [-F str] [-G N] [-I] [-J] [-K N] [-L start,stop] [-M str] [-O filename] [-P N] [-Q N] [-R filename] [-S] [-T] [-U] [-W N] [-X N] [-Y X] [-Z N] [-a N] [-b N] [-d str] [-e X] [-f X] [-g F] [-i filename] [-l str] [-m N] [-n] [-o filename] -p str [-q N] [-r N] [-s] [-t N] [-v N] [-w N] [-y X] [-z X]

blastcl3 [-] [-A N] [-C x] [-D N] [-E N] [-F str] [-G N] [-I] [-J] [-K N] [-L start,stop] [-M str] [-O filename] [-Q N] [-R] [-S] [-T] [-U] [-W N] [-X N] [-Y X] [-Z N] [-a N] [-b N] [-d str] [-e X] [-f X] [-g F] [-i filename] [-m N] [-n] [-o filename] -p str [-q N] [-r N] [-s] [-t N] [-u str] [-v N] [-w N] [-y X] [-z X]

blastpgp [-] [-A N] [-B filename] [-C filename] [-E N] [-F T] [-G N] [-H N] [-I] [-J] [-K N] [-L N] [-M str] [-N X] [-O filename] [-P N] [-Q filename] [-R filename] [-S N] [-T] [-U] [-W N] [-X N] [-Y X] [-Z N] [-a N] [-b N] [-c N] [-d str] [-e X] [-f N] [-h X] [-i filename] [-j N] [-k filename] [-l str] [-m N] [-o filename] [-p str] [-q N] [-s] [-t N[u]] [-u N] [-v N] [-y X] [-z N]

impala [-] [-E N] [-F str] [-G N] [-H] [-I] [-J] [-M str] [-O filename] [-P filename] [-a N] [-b N] [-c N] [-d str] [-e X] [-h X] [-i filename] [-j N] [-m N] [-o filename] [-v N] [-y X] [-z N]

megablast [-] [-A N] [-D N] [-E N] [-F str] [-G N] [-H N] [-I] [-J] [-L start,stop] [-M N] [-N N] [-O filename] [-P N] [-Q filename] [-R] [-S N] [-T] [-U] [-V] [-W N] [-X N] [-Y X] [-Z N] [-a N] [-b N] [-d str] [-e X] [-f] [-g F] [-i filename] [-l str] [-m N] [-n] [-o filename] [-p X] [-q N] [-r N] [-t N] [-s N] [-v N] [-y N] [-z X]

rpsblast [-] [-F str] [-I] [-J] [-L start,stop] [-N X] [-O filename] [-P N] [-T] [-U] [-X N] [-Y X] [-Z N] [-a N] [-b N] -d filename [-e X] [-i filename] [-l filename] [-m N] [-o filename] [-p F] [-v N] [-y X] [-z N]

seedtop [-] [-C N] [-D N] [-E N] [-F] [-G N] [-I] [-J] [-K N] [-M str] [-O filename] [-S N] [-X N] [-d str] [-e X] [-f] [-i filename] [-k filename] [-o filename] [-p str] [-q N] [-r N]

DESCRIPTION

This manual page documents briefly the commands bl2seq, blast, blastall, blastcl3, blastpgp, impala, megablast, rpsblast, and seedtop. These commands are documented together because they have a lot of common options.

bl2seq performs a comparison between two sequences using either the blastn or blastp algorithm. Both sequences must be either nucleotides or proteins.

blast2 compares a sequence against either a local database or a second sequence; it incorporates most of the functionality of both bl2seq and blastall, but uses a semi-experimental new internal engine.

blastall and blastall_old find the best matches in a local database for a sequence. blastall uses a newer engine than blastall_old by default, but supports using the older engine as well (when invoked with the option -V F).

blastcl3 accesses the newest NCBI BLAST search engine (version 2.0). The software behind BLAST version 2.0 was written from scratch to allow BLAST to handle the new challenges posed by the sequence databases in the coming years. Updates to this software will continue in the coming years.

blastpgp performs gapped blastp searches and can be used to perform iterative searches in psi-blast and phi-blast mode.

impala searches a database of score matrices, prepared by copymat(1), producing BLAST-like output.

megablast uses the greedy algorithm of Webb Miller et al. for nucleotide sequence alignment search and concatenates many queries to save time spent scanning the database. This program is optimized for aligning sequences that differ slightly as a result of sequencing or other similar "errors". It is up to 10 times faster than more common sequence similarity programs and therefore can be used to swiftly compare two large sets of sequences against each other.

rpsblast (Reverse PSI-BLAST) searches a query sequence against a database of profiles. This is the opposite of PSI-BLAST that searches a profile against a database of sequences, hence the `Reverse'. rpsblast uses a BLAST-like algorithm, finding single- or double-word hits and then performing an ungapped extension on these candidate matches. If a sufficiently high-scoring ungapped alignment is produced, a gapped extension is performed and those (gapped) alignments with sufficiently low expect value are reported. This procedure is in contrast to IMPALA that performs a Smith-Waterman calculation between the query and each profile, rather than using a word-hit approach to identify matches that should be extended.

seedtop answers two relatively simple questions:

1.
Given a sequence and a database of patterns, which patterns occur in the sequence and where?
2.
Given a pattern and a sequence database, which sequences contain the pattern and where?

Some of these commands support multiple types of comparison, governed by the -p ("program") flag:

blastp
compares an amino acid query sequence against a protein sequence database.
blastn
compares a nucleotide query sequence against a nucleotide sequence database.
blastx
compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database. For bl2seq, the nucleotide should be the first sequence given.
psitblastn
compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands) using a position specific matrix created by PSI-BLAST.
tblastn
compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands). For bl2seq, the nucleotide should be the second sequence given.
tblastx
compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

OPTIONS

A summary of options is included below.
-
Print usage message
-A (bl2seq)
Input sequences in the form of accession.version
-A N (blastall, blastall_old, blastcl3, blastpgp, megablast)
Multiple Hits window size; generally defaults to 0 (for single-hit extensions), but defaults to 40 when using discontiguous templates.
-B N (blast2)
Produce on-the-fly output:
0
none (default)
1
table of offsets and quality values
2
add sequence data
3
text ASN.1
4
binary ASN.1
-B N (blastall, blastall_old)
Number of concatenated queries, in blastn or tblastn mode
-B filename (blastpgp)
Input Alignment File for PSI-BLAST Restart
-C X (blast2, blastall, blastall_old, blastcl3)
Use composition-based statistics for blastp or tblastn:
T, t, D, or d
Default (equivalent to 1 for blast2 and blastall_old and to 2 for blastall and blastcl3)
0, F, or f
No composition-based statistics
1
Composition-based statistics as in NAR 29:2994-3005, 2001
2
Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties
3
Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally
When enabling statistics in blastall, blastall_old, or blastcl3 (i.e., not blast2), appending u (case-insensitive) to the mode enables use of unified p-values combining alignment and compositional p-values in round 1 only.
-C filename (blastpgp)
Output File for PSI-BLAST Checkpointing
-C N (seedtop)
Score only or not (default = 1)
-D N (bl2seq)
Output format:
0
traditional (default)
1
tabular
-D N (blast2, blastall, blastall_old, blastcl3)
Translate sequences in the database according to genetic code N in /usr/share/ncbi/data/gc.prt (default is 1; only applies to tblast*)
-D N (megablast)
Type of output:
0
alignment endpoints and score
1
all ungapped segments endpoints
2
traditional BLAST output (default)
3
tab-delimited one line format
4
incremental text ASN.1
5
incremental binary ASN.1
-D N (seedtop)
Cost decline to align (default = 99999)
-E N (bl2seq, blastcl3, megablast)
Extending a gap costs N (-1 invokes default behavior)
-E N (blast2, blastall, blastall_old)
Extending a gap costs N (-1 invokes default behavior: non-affine if greedy, 2 otherwise)
-E N (blastpgp, impala, seedtop)
Extending a gap costs N (default is 1)
-F str (bl2seq, blast2, blastall, blastall_old, blastpgp,
blastcl3, impala, megablast, rpsblast) Filter options for DUST or SEG; defaults to T for bl2seq, blast2, blastall, blastall_old, blastcl3, and megablast, and to F for blastpgp, impala, and rpsblast.
-F (seedtop)
Filter sequence with SEG.
-G N (bl2seq, blastcl3, megablast)
Opening a gap costs N (-1 invokes default behavior)
-G N (blast2, blastall, blastall_old)
Opening a gap costs N (-1 invokes default behavior: non-affine if greedy, 5 if using dynamic programming)
-G N (blastpgp, impala, seedtop)
Opening a gap costs N (default is 11)
-H (blast2)
Produce HTML output
-H N (blastpgp)
End of required region in query (-1 indicates end of query)
-H (impala)
Print help (different from usage message)
-H N (megablast)
Maximal number of HSPs to save per database sequence (default is 0, unlimited)
-I "start stop" (bl2seq, blast2)
Location on first (query) sequence (applies only if file specified with -i contains a single sequence)
-I (blastall, blastall_old, blastcl3, blastpgp, impala, megablast,
rpsblast, seedtop) Show GIs in deflines
-J "start stop" (bl2seq, blast2)
Location on second (subject) sequence (applies only if file specified with -j contains a single sequence)
-J (blastall, blastall_old, blastcl3, blastpgp, impala, megablast,
rpsblast, seedtop) Believe the query defline
-K N (blast2, blastall, blastall_old, blastcl3, blastpgp)
Number of best hits from a region to keep. Off by default. If used a value of 100 is recommended. Very high values of -v or -b are also suggested.
-K N (seedtop)
Internal hit buffer size multiplier (wrt query length; default = 2)
-L (blast2)
Use (classical Mega BLAST) lookup table with width 12
-L start,stop (blastall, blastall_old, blastcl3, megablast,
rpsblast) Location on query sequence (for rpsblast, only valid in blastp mode)
-M str (bl2seq, blast2, blastall, blastall_old, blastcl3,
blastpgp, impala, seedtop) Use matrix str (default = BLOSUM62)
-M N (megablast)
Maximal total length of queries for a single search (default = 5000000)
-N (blast2)
Show only accessions for sequence IDs in tabular output
-N X (blastpgp, rpsblast)
Number of bits to trigger gapping (default = 22.0)
-N N (megablast)
Type of a discontiguous word template:
0
coding (default)
1
optimal
2
two simultaneous
-O filename (blastall, blastall_old, blastcl3,
blastpgp, impala, megablast, rpsblast, seedtop) Write (ASN.1) sequence alignments to filename; only valid for blastpgp, impala, rpsblast, and seedtop with -J, and only valid for megablast with -D2.
-P X (blast2)
Identity percentage cut-off
-P N (blastall, blastall_old, blastcl3, blastpgp, rpsblast)
Set to 1 for single-hit mode or 0 for multiple-hit mode (default). Does not apply to blastn.
-P filename (impala)
Read matrix profiles from database filename
-P N (megablast)
Maximal number of positions for a hash value (set to 0 [default] to ignore)
-Q N (blast2, blastall, blastall_old, blastcl3)
Translate query according to genetic code N in /usr/share/ncbi/data/gc.prt (default is 1)
-Q filename (blastpgp)
Output File for PSI-BLAST Matrix in ASCII
-Q filename (megablast)
Masked query output; requires -D 2
-R (blast2)
Compute locally optimal Smith-Waterman alignments. (This option is only available for gapped tblastn.)
-R filename (blastall, blastall_old)
Read PSI-TBLASTN checkpoint file filename
-R (blastcl3)
RPS Blast search
-R filename (blastpgp)
Input File for PSI-BLAST Restart
-R (megablast)
Report the log information at the end of output
-S N (bl2seq, blast2, blastall, blastall_old, blastcl3,
megablast) Query strands to search against database for blastn, blastx, tblastx:
1
top
2
bottom
3
both (default)
-S N (blastpgp)
Start of required region in query (default = 1)
-S N (seedtop)
Cutoff cost (default = 30)
-T (bl2seq, blastall, blastall_old, blastcl3, blastpgp, megablast,
rpsblast) Produce HTML output
-T N (blast2)
Type of a discontiguous word template:
0
coding (default)
1
optimal
2
two simultaneous
-U (bl2seq, blastall, blastall_old, blastcl3, blastpgp, megablast,
rpsblast) Use lower case filtering for the query sequence
-V (bl2seq, blastall, megablast)
Force use of legacy engine
-V (blast2)
Use variable word size approach to database scanning
-W N (bl2seq, blast2, blastall, blastall_old, blastcl3,
blastpgp, megablast, rpsblast) Use words of size N (length of best perfect match; zero invokes default behavior, except with megablast, which defaults to 28, and blastpgp, which defaults to 3. The default values for the other commands vary with "program": 11 for blastn, 28 for megablast, and 3 for everything else.)
-X N (bl2seq, blast2, blastall, blastall_old, blastcl3,
blastpgp, megablast, rpsblast, seedtop) X dropoff value for gapped alignment (in bits) (zero invokes default behavior, except with megablast, which defaults to 20, and rpsblast and seedtop, which default to 15. The default values for the other commands vary with "program": 30 for blastn, 20 for megablast, 0 for tblastx, and 15 for everything else.)
-Y X (bl2seq, blast2, blastall, blastall_old, blastcl3,
blastpgp, megablast, rpsblast) Effective length of the search space (use zero for the real size)
-Z N (blast2, blastall, blastall_old, blastcl3, blastpgp,
megablast, rpsblast) X dropoff value for final [dynamic programming?] gapped alignment in bits (default is 100 for blastn and megablast, 0 for tblastx, 25 for others)
-a filename (bl2seq)
Write text ASN.1 output to filename
-a N (blast2, blastall, blastall_old, blastcl3, blastpgp,
impala, megablast, rpsblast) Number of threads to use (default is one)
-b N (blast2, blastall, blastall_old, blastcl3, blastpgp,
impala, megablast, rpsblast) Number of database sequences to show alignments for (B) (default is 250)
-c (blast2)
Mask lower case
-c N (impala)
Constant in pseudocounts for multipass version; 0 (default) uses entropy method; otherwise a value near 30 is recommended
-c N (impala)
Constant in pseudocounts for multipass version (default is 10)
-d N (bl2seq)
Use theoretical DB size of N (zero stands for the real size)
-d str (blast2, blastall, blastall_old, blastcl3, blastpgp,
impala, megablast, seedtop) Database to use (default is nr for all executables except blast2, which requires a second FASTA sequence if this is not set)
-d filename (rpsblast)
RPS BLAST Database
-e X
Expectation value (E) (default = 10.0)
-f X (blast2, blastall, blastall_old, blastcl3)
Threshold for extending hits, default if zero: 0 for blastn and megablast, 11 for blastp, 12 for blastx, and 13 for tblasn and tblastx.
-f N (blastpgp)
Threshold for extending hits (default 11)
-f (megablast)
Show full IDs in the output (default: only GIs or accessions)
-f (seedtop)
Force searching for patterns even if they are too likely
-g F (bl2seq, blastall, blastall_old, blastcl3)
Do not perform gapped alignment (N/A for tblastx)
-g (blast2)
Use greedy algorithm for gapped extensions
-g F (megablast)
Make discontiguous megablast generate words for every base of the database (mandatory with the current BLAST engine)
-h N (blast2)
Frame shift penalty for out-of-frame gapping (blastx, tblastn only; default is zero)
-h X (blastpgp, impala)
e-value threshold for inclusion in multipass model (default = 0.002 for blastpgp, 0.005 for impala)
-i filename
Read (first, query) sequence or set from filename (default is stdin; not needed for blastpgp if restarting from scoremat)
-j filename (bl2seq, blast2)
Read second (subject) sequence or set from filename
-j N (blastpgp)
Maximum number of passes to use in multipass version (default = 1)
-k str (blast2)
Pattern for PHI-BLAST
-k filename (blastpgp, seedtop)
Input hit file for PHI-BLAST (default = hit_file)
-l str (blastall, blastall_old, blastpgp, megablast)
Restrict search of database to list of GI's [String]
-l filename (rpsblast)
Log messages to filename rather than standard error.
-m (bl2seq)
Use Mega Blast for search
-m N (blast2, blastall, blastall_old, blastcl3, blastpgp,
impala, megablast, rpsblast) alignment view options:
0
pairwise (default)
1
query-anchored showing identities
2
query-anchored, no identities
3
flat query-anchored, show identities
4
flat query-anchored, no identities
5
query-anchored, no identities and blunt ends
6
flat query-anchored, no identities and blunt ends
7
XML Blast output (not available for impala)
8
tabular (not available for impala)
9
tabular with comment lines (not available for impala)
10
ASN.1 text (not available for impala or rpsblast)
11
ASN.1 binary (not available for impala or rpsblast)
-n (blast2)
Show GIs in sequence IDs
-n (blastall, blastall_old, blastcl3)
MegaBlast search
-n (megablast)
Use non-greedy (dynamic programming) extension for affine gap scores
-o filename
Write final alignment report to filename rather than stdout
-p str (bl2seq, blast2, blastall, blastall_old, blastcl3)
Use the "program" (comparison type) str. The DESCRIPTION section covers this option in more detail.
-p str (blastpgp)
program option for PHI-BLAST (default = blastpgp)
-p X (megablast)
Identity percentage cut-off (default = 0)
-p F (rpsblast)
Query sequence is nucleotide, not protein
-p str (seedtop)
program name:
patmatchp
indicates which patterns occur in a sequence
patternp
indicates which sequences contain a pattern
-q N (bl2seq, blast2, blastall, blastall_old, blastcl3,
megablast, seedtop) Penalty for a nucleotide mismatch (blastn only) (default = -10 for seedtop, -3 for everything else)
-q N (blastpgp)
ASN.1 Scoremat input of checkpoint data:
0
no scoremat input (default)
1
restart from ASCII scoremat checkpoint file
2
restart from binary scoremat checkpoint file
-r N (bl2seq, blast2, blastall, blastall_old, blastcl3,
megablast, seedtop) Reward for a nucleotide match (blastn only) (default = 10 for seedtop, -10 for everything else)
-s (blast2)
No-op (formerly requested generating words for every base of the database)
-s (blastall, blastall_old, blastcl3, blastpgp)
Compute locally optimal Smith-Waterman alignments. For blastall, blastall_old, and blastcl3, this is only available in gapped tblastn mode.
-s N (megablast)
Minimal hit score to report (0 for default behavior)
-t N (bl2seq, blast2, blastall, blastall_old, blastcl3)
Length of a discontiguous word template (the largest intron allowed in a translated nucleotide sequence when linking multiple distinct assignments; default = 0; negative values disable linking for blastall, blastall_old, and blastcl3.)
-t N[u] (blastpgp)
Composition-based score adjustment. The first character is interpreted as follows:
0, F, or f
no composition-based statistics
1
composition-based statistics as in NAR 29:2994-3005, 2001
2, T, or t
composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties in round 1 (default)
3
composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally in round 1

When composition-based statistics are in use, appending u (case-insensitive) to the argument requests unified p-value combining alignment p-value and compositional p-value in round 1 only.

-t N (megablast)
Length of a discontiguous word template (contiguous word if 0 [default])
-u (blast2)
Do only ungapped alignment (always TRUE for tblastx)
-u str (blastcl3)
Restrict search of database to results of Entrez2 lookup
-u N (blastpgp)
ASN.1 Scoremat output of checkpoint data:
0
no scoremat output (default)
1
output ASCII scoremat checkpoint file (requires -J)
2
output binary scoremat checkpoint file (requires -J)
-v N (blast2, blastall, blastall_old, blastcl3, blastpgp,
impala, megablast, rpsblast) Number of one-line descriptions to show (V) (default = 500)
-w N (blast2)
Window size (max. allowed distance between a pair of initial hits; 0 invokes default behavior, -1 turns off multiple hits)
-w N (blastall, blastall_old, blastcl3)
Frame shift penalty (OOF algorithm for blastx)
-y X (blast2, blastall, blastall_old, blastcl3, blastpgp,
impala, rpsblast) X dropoff for ungapped extensions in bits (0.0 invokes default behavior: 20 for blastn, 10 for megablast, and 7 for all others.)
-y N (megablast)
X dropoff value for ungapped extension (default is 10)
-z N (blast2)
Longest intron length for uneven gap HSP linking (tblastn only; default is 0)
-z N (blastall, blastall_old, blastcl3, blastpgp, impala,
megablast, rpsblast) Effective length of the database (use zero for the real size)

BUGS

This manual page is long and confusing; individual pages might be better.

AUTHOR

The National Center for Biotechnology Information.