Lucy::Simple(3) Basic search engine.

SYNOPSIS

First, build an index of your documents.


my $index = Lucy::Simple->new(
path => '/path/to/index/'
language => 'en',
);
while ( my ( $title, $content ) = each %source_docs ) {
$index->add_doc({
title => $title,
content => $content,
});
}

Later, search the index.

    my $total_hits = $index->search( 
        query      => $query_string,
        offset     => 0,
        num_wanted => 10,
    );
    print "Total hits: $total_hits\n";
    while ( my $hit = $index->next ) {
        print "$hit->{title}\n",
    }

DESCRIPTION

Lucy::Simple is a stripped-down interface for the Apache Lucy search engine library.

METHODS

new

    my $lucy = Lucy::Simple->new(
        path     => '/path/to/index/',
        language => 'en',
    );

Create a Lucy::Simple object, which can be used for both indexing and searching. Two hash-style parameters are required.

  • path - Where the index directory should be located. If no index is found at the specified location, one will be created.
  • language - The language of the documents in your collection, indicated by a two-letter ISO code. 12 languages are supported:

        |-----------------------|
        | Language   | ISO code |
        |-----------------------|
        | Danish     | da       |
        | Dutch      | nl       |
        | English    | en       |
        | Finnish    | fi       |
        | French     | fr       |
        | German     | de       |
        | Italian    | it       |
        | Norwegian  | no       |
        | Portuguese | pt       |
        | Spanish    | es       |
        | Swedish    | sv       |
        | Russian    | ru       |
        |-----------------------|
    

add_doc

    $lucy->add_doc({
        location => $url,
        title    => $title,
        content  => $content,
    });

Add a document to the index. The document must be supplied as a hashref, with field names as keys and content as values.

search

    my $total_hits = $lucy->search( 
        query      => $query_string,    # required
        offset     => 40,               # default 0
        num_wanted => 20,               # default 10
    );

Search the index. Returns the total number of documents which match the query. (This number is unlikely to match "num_wanted".)

  • query - A search query string.
  • offset - The number of most-relevant hits to discard, typically used when ``paging'' through hits N at a time. Setting offset to 20 and num_wanted to 10 retrieves hits 21-30, assuming that 30 hits can be found.
  • num_wanted - The number of hits you would like to see after "offset" is taken into account.

BUGS

Not thread-safe.