Lucy::Analysis::Normalizer(3) Unicode normalization, case folding and accent stripping


my $normalizer = Lucy::Analysis::Normalizer->new;

my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
analyzers => [ $normalizer, $tokenizer, $stemmer ],


Optionally, it performs Unicode case folding and converts accented characters to their base character.

If you use highlighting, Normalizer should be run after tokenization because it might add or remove characters.


new( [labeled params] )

    my $normalizer = Lucy::Analysis::Normalizer->new(
        normalization_form => 'NFKC',
        case_fold          => 1,
        strip_accents      => 0,
  • normalization_form - Unicode normalization form, can be one of 'NFC', 'NFKC', 'NFD', 'NFKD'. Defaults to 'NFKC'.
  • case_fold - Perform case folding, default is true.
  • strip_accents - Strip accents, default is false.


Lucy::Analysis::Normalizer isa Lucy::Analysis::Analyzer isa Lucy::Object::Obj.