Bloom Filter - osvaldoandrade/ova-lib GitHub Wiki
Bloom Filter
Read this page when the real need is a cheap negative membership test before an exact lookup somewhere else.
Constructor and API
The constructor is:
bloom_filter *create_bloom_filter(int expected_elements, double false_positive_rate);
The public API is:
| Field | Meaning |
|---|---|
add |
inserts one key represented as bytes plus length |
might_contain |
returns true for “possibly present” and false for “definitely absent” |
clear |
clears the filter state |
current_fpp |
estimates the current false-positive probability |
free |
frees filter storage |
Construction Rules
expected_elements must be greater than 0.
false_positive_rate must satisfy 0 < p < 1.
If either rule is violated, construction returns NULL.
The implementation computes the bit count m and hash count k from the standard Bloom-filter formulas and allocates the bit array from those values.
Key Model
The API takes an opaque byte sequence plus length. It does not assume strings.
That means binary keys are valid input.
The shipped tests also verify that empty keys are supported:
bf->add(bf, NULL, 0) is valid
bf->might_contain(bf, NULL, 0) is valid
All empty keys are treated as the same logical key because the byte sequence length is 0
What the Result Means
false from bf->might_contain(bf, ...) means the key is definitely absent.
true means the key may be present. A later exact lookup in a map, set, tree, or trie may still say “not found.”
That is why the Bloom filter is a prefilter, not a replacement for exact storage.
current_fpp and Repeated Inserts
bf->current_fpp(bf) estimates the current false-positive probability from the current bit-array configuration and the count of add calls performed.
The current implementation increments that count on every add call. Repeated insertion of the same key therefore still increases the internal insert counter and can raise the reported estimate.
After bf->clear(bf), the bit array resets and current_fpp returns 0.0.
Ownership and Cleanup
The filter stores bits only. It never stores payload pointers and never frees caller payloads.
Destroy the filter with bf->free(bf).
Read Next
If a positive Bloom-filter result should lead to an exact structure, move to Maps, Sets, Trees, or Trie.