Thursday, March 22, 2012

Google Guava BloomFIlter

Google Guava BloomFIlter: "BloomFilter Crash Course

BloomFilters are essentially bit vectors. At a high level BloomFilters work in the following manner:
Add the element to the filter.
Hash it a few times, than set the bits to 1 where the index matches the results of the hash.
When testing if an element is in the set, you follow the same hashing procedure and check if the bits are set to 1 or 0. This process is how a BloomFilter can guarantee an element does not exist. If the bits aren’t set, it’s simply impossible for the element to be in the set. However, a positive answer means the element is in the set or a hashing collision occurred. A more detaild description of a BloomFilter can be found here and a good tutorial on BloomFilters here. According to wikipedia, Google uses BloomFilters in BigTable to avoid disk lookups for non-existent items. Another interesting usage is using a BloomFilter to optimize a sql querry."

'via Blog this'