Bloom::Filter(3) Sample Perl Bloom filter implementation

DESCRIPTION

A Bloom filter is a probabilistic algorithm for doing existence tests in less memory than a full list of keys would require. The tradeoff to using Bloom filters is a certain configurable risk of false positives. This module implements a simple Bloom filter with configurable capacity and false positive rate. Bloom filters were first described in a 1970 paper by Burton Bloom, see <http://portal.acm.org/citation.cfm?id=362692&dl=ACM&coll=portal>.

SYNOPSIS


use Bloom::Filter
my $bf = Bloom::Filter->new( capacity => 10, error_rate => .001 );
$bf->add( @keys );
while ( <> ) {
chomp;
print "Found $_\n" if $bf->check( $_ );
}

CONSTRUCTORS

new %PARAMS
Create a brand new instance. Allowable params are "error_rate", "capacity".
init
Calculates the best number of hash functions and optimum filter length, creates some random salts, and generates a blank bit vector. Called automatically by constructor.

ACCESSORS

capacity
Returns the total capacity of the Bloom filter
error_rate
Returns the configured maximum error rate
length
Returns the length of the Bloom filter in bits
key_count
Returns the number of items currently stored in the filter
on_bits
Returns the number of 'on' bits in the filter
salts
Returns the list of salts used to create the hash functions

PUBLIC METHODS

add @KEYS
Adds the list of keys to the filter. Will fail, return "undef" and complain if the number of keys in the filter exceeds the configured capacity.
check @KEYS
Checks the provided key list against the Bloom filter, and returns a list of equivalent length, with true or false values depending on whether there was a match.

INTERNAL METHODS

_calculate_shortest_filter_length CAPACITY ERR_RATE
Given a desired error rate and maximum capacity, returns the optimum combination of vector length (in bits) and number of hash functions to use in building the filter, where ``optimum'' means shortest vector length.
_get_cells KEY
Given a key, hashes it using the list of salts and returns an array of cell indexes corresponding to the key.

AUTHOR

Originally written by Maciej Ceglowski <[email protected]>. Currently maintained by Grzegorz RoXniecki <[email protected]>.

CONTRIBUTORS

Dmitriy Ryaboy <[email protected]> (big speedup in February 2007, thanks!)

COPYRIGHT AND LICENSE

(c) 2004 Maciej Ceglowski

This is free software, distributed under version 2 of the GNU Public License (GPL).