## Effectively Explain On a Mind Map

First Draft on the algorithm bloom filter

### Read More on Bloom Filter

- Reference Wikipedia

### False Positive

Bloom filters are a probabilistic data structure that can efficiently test whether an element is a member of a set. They are widely used in applications such as web caching, spell checking, and network routing. However, bloom filters have a drawback: they can produce false positives. In other words they can sometimes indicate that an element is in the set when it is not. We will analyze further the factors that affect the false positive rate of bloom filters and how to minimize it.

The false positive rate of a bloom filter depends on three parameters: the size of the filter (m), the number of hash functions (k), and the number of elements inserted into the filter (n). The optimal value of k that minimizes the false positive rate is given by

$\mathrm{k}\mathrm{=}\mathrm{(}\frac{\mathrm{m}}{\mathrm{n}}\mathrm{)}ln2$The false positive rate can then be approximated by

${\mathrm{p}\mathrm{=}\left(1\u2013{e}^{\u2013\frac{\mathrm{kn}}{m}}\right)}^{k}$To illustrate this formula, let’s consider an example. Suppose we want to use a bloom filter to store a dictionary of 100,000 words, and we want to achieve a false positive rate of 1%. How large should the filter be and how many hash functions should we use? Using the formula above, we can solve for m and k:

$\mathrm{m}\mathrm{=}\u2013\frac{n\mathrm{ln}p}{{\left(\mathrm{ln}2\right)}^{2}}$$\mathrm{k}\mathrm{=}\u2013\frac{2\mathrm{ln}p}{\mathrm{ln2}}$

Plugging in n = 100,000 and p = 0.01, we get m = 958,505 bits and k = 6.88 hash functions. Rounding up k to the nearest integer, we get k = 7 hash functions. Therefore, we need a bloom filter of size 958,505 bits and 7 hash functions to achieve a false positive rate of 1% for 100,000 words.

Of course, this is only an approximation and the actual false positive rate may vary depending on the distribution of the elements and the hash functions used. Hash collisions may occur. To measure the actual false positive rate, we can perform an empirical test by inserting the words into the filter and then querying it with random strings that are not in the dictionary. The ratio of false positives against the total queries gives us an estimate of the false positive rate.

In conclusion, bloom filters are a useful data structure that can quickly test the presence of an element into a set with a small space overhead. However, they also have a trade-off between space and accuracy: the smaller the filter, the higher the false positive rate. By choosing appropriate parameters for the filter size and the number of hash functions, we can optimize the false positive rate for our application. Notice that if the rate could be improved it could never be zero(0).