RG::Blast::Parser(3) fast NCBI BLAST parser

SYNOPSIS


use Data::Dumper;
use RG::Blast::Parser;
my $parser = RG::Blast::Parser->new(); # read from STDIN
open( EXAMPLE, '<', '/usr/share/doc/librg-blast-parser-perl/examples/converged.ali' ) || confess($!);
my $parser = RG::Blast::Parser->new( \*EXAMPLE, "converged.ali" ); # read from EXAMPLE, use name "converged.ali" in error messages
while( my $res = $parser->parse() )
{
print Dumper( $res );
}
eval {
my $res = $parser->parse();
# ...
};
if( $@ && $@ =~ /^parser error/ ) { warn("failed to parse blast result - exception caught"); }

DESCRIPTION

This package contains perl binding for a very fast C/C++ library for NCBI BLAST -m 0 (default) output parsing. BLAST results are returned in a convenient hash structure.

Multiple results may be concatenated for input. One result is parsed and returned at a time.

CONSTRUCTOR

new( [FILEHANDLE, [NAME]] )
Creates an ``RG::Blast::Parser''. Blast results are read from FILEHANDLE, STDIN by default. The input stream may be named NAME in error messages (default: ``STDIN'').

METHODS

parse( [TRACE_PARSING, [TRACE_SCANNING]] )
Parse one complete BLAST result and return it. If no results on input stream, returns ``undef''. In case of parser error it die()s with an (at present not very useful) error message.

The following structure is returned in a hash reference:

  {
    blast_version =>    STRING,
    references =>       [ STRING, ... ],
    rounds => [
        {
            oneline_idx =>      NUM,    # index of first one-line description of
                                        # this round in "onelines" array
            oneline_cnt =>      NUM,    # number of one-line descriptions of
                                        # this round
            hit_idx =>          NUM,    # index of first hit of this round in
                                        # "hits" array
            hit_cnt =>          NUM,    # number of hits of this round
            oneline_new_idx =>  NUM|undef# index of first new (not-seen-before)
                                        # one-line description of this round
                                        # in "onelines" array
            oneline_new_cnt =>  NUM     # number of new one-line descriptions of
                                        # this round
        }, ...
    ],
    q_name =>       STRING,
    q_desc =>       STRING|undef,
    q_length =>     NUM,
    db_name =>      STRING,
    db_nseq =>      NUM,
    db_nletter =>   NUM,
    onelines =>     [                   # one-line descriptions from all rounds
        {
            name =>         STRING,
            desc =>         STRING|undef,
            bit_score =>    NUM,
            e_value =>      NUM
        }, ...
    ],
    converged =>    BOOLEAN,
    hits =>         [                   # hits from all rounds
        {
            name =>         STRING,
            desc =>         STRING|undef,
            length =>       NUM,
            hsps =>         [
                {
                    bit_score =>    NUM,
                    raw_score =>    NUM,
                    e_value =>      NUM,
                    method =>       STRING,
                    identities =>   NUM,
                    positives =>    NUM,
                    gaps =>         NUM,
                    q_strand =>     STRING|undef,
                    s_strand =>     STRING|undef,
                    q_frame =>      NUM|undef,
                    s_frame =>      NUM|undef,
                    q_start =>      NUM,
                    q_ali =>        STRING,
                    q_end =>        NUM,
                    match_line =>   STRING,
                    s_start =>      NUM,
                    s_ali =>        STRING,
                    s_end =>        NUM
                }, ...
            ]
        }, ...
    ],
    tail =>         STRING              # bulk text after the last hit / one-line
                                        # description
  }

If you want tracing for parsing and scanning, you can enable them using the parameters of this call.

result()
Returns the last BLAST result parsed or ``undef'' if no last result.
get_trace_scanning()
Returns scan trace state as a Boolean value.
set_trace_scanning( BOOLEAN )
Set scan trace - debugging aid.

AUTHOR

Laszlo Kajan, <[email protected]>

COPYRIGHT AND LICENSE

Copyright (C) 2012 by Laszlo Kajan

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.