Cheap and Secure Web Hosting Provider : See Now

Heuristic analysis of Bloom filters

, , No Comments
Problem Detail: 

I am currently watching a lecture on Bloom filters, and the professor is doing a heuristic analysis of Bloom filters.

It's all based on the following assumption:

All $h_{i}(x)$'s are uniformly random and independent (across different $i$'s and $x$'s)

Setup:

  • Bloom filter of length $n$ bits.
  • Data set $S$ is inserted into the Bloom filters.

The professors claims that for each bit of array $A$, the probability that it has been set to 1 is (under above assumption, and after data set has been inserted): $1 - (1 - 1/n)^{k|S|}$, where $k$ is the number of hash functions.

I am trying to understand why that's the probability.

He explains that as each of the $|S|$ objects gets inserted (0 or 1), there are $k$ "darts" thrown uniformly at random and independent from each other thrown into the array, so that gives me the intuition of why $k$ is being multiplied by $|S|$. He then says the probability that a given "dart" is one of the $n$ bits is $1/n$. Since there are a total of $k|S|$ "darts" thrown at the end, it makes sense that after the whole set $S$ has been inserted, the probability that a given bit has been hit is $(1/n)^{k|S|}$, and the probability that it hasn't been hit is $(1-1/n)^{k|S|}$. But I don't understand why all that is being subtracted from $1$ at the end to end up being $1 - (1 - 1/n)^{k|S|}$, so clearly I'm missing something...

Can somebody explain this to me?

Asked By : FrostyStraw

Answered By : Yuval Filmus

A bit is set to 1 if it has been hit. It has $k|S|$ chances of being hit, and each time it is hit with probability $1/n$. Using a union bound, the probability that a bit is hit is at most $k|S|/n$.

We can improve on this bound by calculating instead the probability that a bit is 0, that is, that it is not hit. For a bit not to be hit, it has to be missed each time. Since it has $k|S|$ opportunities to be hit, and each time it is missed with probability $1-1/n$, in total a bit is missed with probability $(1-1/n)^{k|S|}$. Thus a bit is not missed with probability $1-(1-1/n)^{k|S|}$.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/66184

3200 people like this

 Download Related Notes/Documents

0 comments:

Post a Comment

Let us know your responses and feedback