Cheap and Secure Web Hosting Provider : See Now

# Heuristic analysis of Bloom filters

, ,
Problem Detail:

I am currently watching a lecture on Bloom filters, and the professor is doing a heuristic analysis of Bloom filters.

It's all based on the following assumption:

All \$h_{i}(x)\$'s are uniformly random and independent (across different \$i\$'s and \$x\$'s)

Setup:

• Bloom filter of length \$n\$ bits.
• Data set \$S\$ is inserted into the Bloom filters.

The professors claims that for each bit of array \$A\$, the probability that it has been set to 1 is (under above assumption, and after data set has been inserted): \$1 - (1 - 1/n)^{k|S|}\$, where \$k\$ is the number of hash functions.

I am trying to understand why that's the probability.

He explains that as each of the \$|S|\$ objects gets inserted (0 or 1), there are \$k\$ "darts" thrown uniformly at random and independent from each other thrown into the array, so that gives me the intuition of why \$k\$ is being multiplied by \$|S|\$. He then says the probability that a given "dart" is one of the \$n\$ bits is \$1/n\$. Since there are a total of \$k|S|\$ "darts" thrown at the end, it makes sense that after the whole set \$S\$ has been inserted, the probability that a given bit has been hit is \$(1/n)^{k|S|}\$, and the probability that it hasn't been hit is \$(1-1/n)^{k|S|}\$. But I don't understand why all that is being subtracted from \$1\$ at the end to end up being \$1 - (1 - 1/n)^{k|S|}\$, so clearly I'm missing something...

Can somebody explain this to me?

#### Answered By : Yuval Filmus

A bit is set to 1 if it has been hit. It has \$k|S|\$ chances of being hit, and each time it is hit with probability \$1/n\$. Using a union bound, the probability that a bit is hit is at most \$k|S|/n\$.

We can improve on this bound by calculating instead the probability that a bit is 0, that is, that it is not hit. For a bit not to be hit, it has to be missed each time. Since it has \$k|S|\$ opportunities to be hit, and each time it is missed with probability \$1-1/n\$, in total a bit is missed with probability \$(1-1/n)^{k|S|}\$. Thus a bit is not missed with probability \$1-(1-1/n)^{k|S|}\$.