Lucy::Analysis::SnowballStopFilter(3) Suppress a stoplist of common words.

SYNOPSIS


my $stopfilter = Lucy::Analysis::SnowballStopFilter->new(
language => 'fr',
);
my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
analyzers => [ $case_folder, $tokenizer, $stopfilter, $stemmer ],
);

DESCRIPTION

A ``stoplist'' is collection of ``stopwords'': words which are common enough to be of little value when determining search results. For example, so many documents in English contain ``the'', ``if'', and ``maybe'' that it may improve both performance and relevance to block them.

Before filtering stopwords:

    ("i", "am", "the", "walrus")

After filtering stopwords:

    ("walrus")

SnowballStopFilter provides default stoplists for several languages, courtesy of the Snowball project (<http://snowball.tartarus.org>), or you may supply your own.

    |-----------------------|
    | ISO CODE | LANGUAGE   |
    |-----------------------|
    | da       | Danish     |
    | de       | German     |
    | en       | English    |
    | es       | Spanish    |
    | fi       | Finnish    |
    | fr       | French     |
    | hu       | Hungarian  |
    | it       | Italian    |
    | nl       | Dutch      |
    | no       | Norwegian  |
    | pt       | Portuguese |
    | sv       | Swedish    |
    | ru       | Russian    |
    |-----------------------|

CONSTRUCTORS

new( [labeled params] )

    my $stopfilter = Lucy::Analysis::SnowballStopFilter->new(
        language => 'de',
    );
    
    # or...
    my $stopfilter = Lucy::Analysis::SnowballStopFilter->new(
        stoplist => \%stoplist,
    );
  • stoplist - A hash with stopwords as the keys.
  • language - The ISO code for a supported language.

INHERITANCE

Lucy::Analysis::SnowballStopFilter isa Lucy::Analysis::Analyzer isa Lucy::Object::Obj.