DESCRIPTION
Goal
In this tutorial chapter, we'll refactor the apps we built in Lucy::Docs::Tutorial::Simple so that they look exactly the same from the end user's point of view, but offer the developer greater possibilites for expansion.To achieve this, we'll ditch Lucy::Simple and replace it with the classes that it uses internally:
- Lucy::Plan::Schema - Plan out your index.
- Lucy::Plan::FullTextType - Field type for full text search.
- Lucy::Analysis::PolyAnalyzer - A one-size-fits-all parser/tokenizer.
- Lucy::Index::Indexer - Manipulate index content.
- Lucy::Search::IndexSearcher - Search an index.
- Lucy::Search::Hits - Iterate over hits returned by a Searcher.
Adaptations to indexer.pl
After we load our modules...
use Lucy::Plan::Schema; use Lucy::Plan::FullTextType; use Lucy::Analysis::PolyAnalyzer; use Lucy::Index::Indexer;
... the first item we're going need is a Schema.
The primary job of a Schema is to specify what fields are available and how they're defined. We'll start off with three fields: title, content and url.
# Create Schema. my $schema = Lucy::Plan::Schema->new; my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new( language => 'en', ); my $type = Lucy::Plan::FullTextType->new( analyzer => $polyanalyzer, ); $schema->spec_field( name => 'title', type => $type ); $schema->spec_field( name => 'content', type => $type ); $schema->spec_field( name => 'url', type => $type );
All of the fields are spec'd out using the ``FullTextType'' FieldType, indicating that they will be searchable as ``full text'' --- which means that they can be searched for individual words. The ``analyzer'', which is unique to FullTextType fields, is what breaks up the text into searchable tokens.
Next, we'll swap our Lucy::Simple object out for a Lucy::Index::Indexer. The substitution will be straightforward because Simple has merely been serving as a thin wrapper around an inner Indexer, and we'll just be peeling away the wrapper.
First, replace the constructor:
# Create Indexer. my $indexer = Lucy::Index::Indexer->new( index => $path_to_index, schema => $schema, create => 1, truncate => 1, );
Next, have the $indexer object "add_doc" where we were having the $lucy object "add_doc" before:
foreach my $filename (@filenames) { my $doc = parse_file($filename); $indexer->add_doc($doc); }
There's only one extra step required: at the end of the app, you must call commit() explicitly to close the indexing session and commit your changes. (Lucy::Simple hides this detail, calling commit() implicitly when it needs to).
$indexer->commit;
Adaptations to search.cgi
In our search app as in our indexing app, Lucy::Simple has served as a thin wrapper --- this time around Lucy::Search::IndexSearcher and Lucy::Search::Hits. Swapping out Simple for these two classes is also straightforward:
use Lucy::Search::IndexSearcher; my $searcher = Lucy::Search::IndexSearcher->new( index => $path_to_index, ); my $hits = $searcher->hits( # returns a Hits object, not a hit count query => $q, offset => $offset, num_wanted => $page_size, ); my $hit_count = $hits->total_hits; # get the hit count here ... while ( my $hit = $hits->next ) { ... }
Hooray!
Congratulations! Your apps do the same thing as before... but now they'll be easier to customize.In our next chapter, Lucy::Docs::Tutorial::FieldType, we'll explore how to assign different behaviors to different fields.