DESCRIPTION
A Bloom filter is a probabilistic algorithm for doing existence tests in less memory than a full list of keys would require. The tradeoff to using Bloom filters is a certain configurable risk of false positives. This module implements a simple Bloom filter with configurable capacity and false positive rate. Bloom filters were first described in a 1970 paper by Burton Bloom, see <http://portal.acm.org/citation.cfm?id=362692&dl=ACM&coll=portal>.SYNOPSIS
use Bloom::Filter
my $bf = Bloom::Filter->new( capacity => 10, error_rate => .001 );
$bf->add( @keys );
while ( <> ) {
chomp;
print "Found $_\n" if $bf->check( $_ );
}
CONSTRUCTORS
- new %PARAMS
- Create a brand new instance. Allowable params are "error_rate", "capacity".
- init
- Calculates the best number of hash functions and optimum filter length, creates some random salts, and generates a blank bit vector. Called automatically by constructor.
ACCESSORS
- capacity
- Returns the total capacity of the Bloom filter
- error_rate
- Returns the configured maximum error rate
- length
- Returns the length of the Bloom filter in bits
- key_count
- Returns the number of items currently stored in the filter
- on_bits
- Returns the number of 'on' bits in the filter
- salts
- Returns the list of salts used to create the hash functions
PUBLIC METHODS
- add @KEYS
- Adds the list of keys to the filter. Will fail, return "undef" and complain if the number of keys in the filter exceeds the configured capacity.
- check @KEYS
- Checks the provided key list against the Bloom filter, and returns a list of equivalent length, with true or false values depending on whether there was a match.
INTERNAL METHODS
- _calculate_shortest_filter_length CAPACITY ERR_RATE
- Given a desired error rate and maximum capacity, returns the optimum combination of vector length (in bits) and number of hash functions to use in building the filter, where ``optimum'' means shortest vector length.
- _get_cells KEY
- Given a key, hashes it using the list of salts and returns an array of cell indexes corresponding to the key.
AUTHOR
Originally written by Maciej Ceglowski <[email protected]>. Currently maintained by Grzegorz RoXniecki <[email protected]>.CONTRIBUTORS
Dmitriy Ryaboy <[email protected]> (big speedup in February 2007, thanks!)COPYRIGHT AND LICENSE
(c) 2004 Maciej CeglowskiThis is free software, distributed under version 2 of the GNU Public License (GPL).