Thursday, April 05, 2012

High Scalability - High Scalability - Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory

High Scalability - High Scalability - Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory: "Linear Probabilistic Counter
The Linear Probabilistic Counter is space efficient and allows the implementer to specify the desired level of accuracy. This algorithm is useful when space efficiency is important but you need to be able to control the error in your results. This algorithm works in a two-step process. The first step assigns a bitmap in memory initialized to all zeros. A hash function is then applied to the each entry in the input data. The result of the hash function maps the entry to a bit in the bitmap, and that bit is set to 1. The second step the algorithm counts the number of empty bits and uses that number as input to the following equation to get the estimate."

'via Blog this'