Bloom Filter - osvaldoandrade/ova-lib GitHub Wiki

Bloom Filter

Read this page when the real need is a cheap negative membership test before an exact lookup somewhere else.

Constructor and API

The constructor is:

bloom_filter *create_bloom_filter(int expected_elements, double false_positive_rate);

The public API is:

Field Meaning
add inserts one key represented as bytes plus length
might_contain returns true for “possibly present” and false for “definitely absent”
clear clears the filter state
current_fpp estimates the current false-positive probability
free frees filter storage

Construction Rules

expected_elements must be greater than 0.

false_positive_rate must satisfy 0 < p < 1.

If either rule is violated, construction returns NULL.

The implementation computes the bit count m and hash count k from the standard Bloom-filter formulas and allocates the bit array from those values.

Key Model

The API takes an opaque byte sequence plus length. It does not assume strings.

That means binary keys are valid input.

The shipped tests also verify that empty keys are supported:

bf->add(bf, NULL, 0) is valid

bf->might_contain(bf, NULL, 0) is valid

All empty keys are treated as the same logical key because the byte sequence length is 0

What the Result Means

false from bf->might_contain(bf, ...) means the key is definitely absent.

true means the key may be present. A later exact lookup in a map, set, tree, or trie may still say “not found.”

That is why the Bloom filter is a prefilter, not a replacement for exact storage.

current_fpp and Repeated Inserts

bf->current_fpp(bf) estimates the current false-positive probability from the current bit-array configuration and the count of add calls performed.

The current implementation increments that count on every add call. Repeated insertion of the same key therefore still increases the internal insert counter and can raise the reported estimate.

After bf->clear(bf), the bit array resets and current_fpp returns 0.0.

Ownership and Cleanup

The filter stores bits only. It never stores payload pointers and never frees caller payloads.

Destroy the filter with bf->free(bf).

Read Next

If a positive Bloom-filter result should lead to an exact structure, move to Maps, Sets, Trees, or Trie.