Chained Hash Table

A chained hash table data structure uses hashing with chaining to store data as an array, t, of lists.

An integer, n, keeps track of the total number of items in all lists (see fig.1)

An example of a chained hash table with n = 14 and t.length = 16

The hash value of a data item x, denoted as hash(x) is a value in the range {0,...,t.length-1}.

All items with hash value i are stored in the list at t[i].

To ensure that lists don't get too long, we maintain the invariant n≤t.length so that the average number of elements stored in one of these lists is n/t.length≤1.

Multiplicative Hashing

Multiplicative hashing is an efficient method of generating hash values based on modular arithmetic and integer division. It uses the div operator, which calculates the integral part of the quotient, while discarding the remainder.

Formally for any integers a≥0 and b≥1, adivb = ⌊a/b⌋

In multiplicative hashing, we use a hash table of size 2^d for some integer d (called the dimension). The formula for hashing an integer x∈{0,...,2^w-1} is:

hash(x) - ((z*x)mod2^w)div2^w-d

Here, z is a randomly chosen odd integer in {1,...,2^w-1}. This hash function can be realized very efficiently by observing that, by default, operations on integers are already done modulo 2^w where w is the number of bits in an integer (see fig. 2). Furthermore, integer division by 2^w-d is equivalent to dropping the rightmost w-d bits in a binary representation.

The operation of the multiplicative hash function with w = 32 and d = 8

The following lemma (CHT.1) shows that multiplicative hashing does a good job of avoiding collisions:

Lemma CHT.1:

Let x,y be any two values in {0,...,2^w-1} with x≠y. Then Pr{hash(x)=hash(y)}≤2/2^d

With lemma CHT.1, the performance of remove(x), and find(x) are easy to analyse:

Lemma CHT.2:

For any data value x, the expected length of the list t[hash(x)] is at most n_x+2, where n_x is the number of occurences of x in the hash table.

Proofs

Lemma CHT.2:

Let 𝑆 be the (multi-)set of elements stored in the hash table that are not equal to x. For an element y∈𝑆, define the indicator variable

indicator variable

and notice that, by Lemma CHT.1 E[I_y]≤2/2^d = 2/t.length. The expected length of the list t[hash(x)] is given by

proof formula

as required.

In order to prove Lemma CHT.1 we need a result from number theory. In the following proof we use the notation (b_r,...,b₀)₂ to denote Sum of bsubi formula , where each b_i is a bit, either 0 or 1. In other words, (b_r,...,b₀)₂ is the integer whose binary representation is given by b_r,...,b₀. We use ⋆ to denote a bit of unknown value.

Lemma CHT.3:

Let 𝑆 be the set of odd integers in {1,...,2^w-1}; let q and i be any two elements in 𝑆. Then there is exactly on value z∈𝑆 such that zq mod 2^w = i.

Suppose, for the sake of contradiction, that there are two such values z and z', with z > z'. Then

zq mod 2^w = z'q mod 2^w = i

(z-z')q mod 2^w = 0

But this means that

(z-z')q = k2^w (CH.1.1)

for some integer k. Thinking in terms of binary numbers, we have

binary representation

so that the w trailing bits in the binary representation of (z-z')q are all 0's. Futhermore, k≠0, since q≠0 and z-z'≠0. Since q is odd, it has no trailing 0's in its binary representation:

q={⋆,...,⋆,1}₂.

Since |z-z'| < 2^w, z-z' has fewer than w trailing 0's inits binary representation:

trailing 0's

Therefore, the product (z-z')q has fewer than w trailing 0's in its binary representation:

trailing 0's

Therefore (z-z')q cannot satisfy (z-z')q = k2^w, yielding a contradiction and completing the proof.

The utility of Lemma CHT.3 comes from the following observation: If z is chosen uniformly at random from 𝑆, then zt is uniformly distributed over 𝑆. In the following proof, it helps to think of the binary representation of z, which consists of w-1 random bits followed by a 1.

Lemma CHT.1:

First we note that the condition hash(x)=hash(y) is equivalent to the statement:

"the highest-order d bits of zx mod 2^w and the highest-order bits of zy mod 2^w are the same."

A necessary condition of that statement is that the highest-order d bits in the binary representation of z(x-y) mod 2^w are either all 0's or all 1's. That is,

(CHT.1.2)

when zx mod 2^w > zy mod 2^w or

(CHT.1.3)

when zx mod 2^w < zy mod 2^w. Therefore, we only have to bound the probability that z(x-y) mod 2^w looks like CHT.1.1 or CHT.1.2.

Let q be the unique odd integer such that (x-y) mod 2^w = zq2^r mod 2^w has w-r-1 random bits, followed by a 1, followed by r 0's:

We can now finish the proof: If r>w-d, then the d higher-order bits of z(x-y) mod 2^w contain both 0's and 1's, so the probability that z(x-y) mod 2^w looks like CHT.1.2 or CHT.1.3 is 0. If r = w-d, then the probability of looking like CHT.1.2 is 0, but the probability of looking like CHT.1.3 is 1/2^d-1 = 2/2^d (since we must have b₁,...,b_d-1 = 1,...,1). If r < w-d, then we must have b_w-r-1,...,b_w-r-d = 0,...,0 or b_w-r-1,...,b_w-r-d = 1,...,1. The probability of each of these cases is 1/2^{d^{and they are mutually exclusive, so the probability of either cases is 2/2d. Completing the proof}}

Summary

The following theorem summarizes the performance of a ChainedHashTable data structure:

Theorem CHTT.1:

A ChainedHashTable implements the USet interface. Ignoring the cost of calls to grow(), a ChainedHashTable supports the operations add(x), remove(x), and find(x) in O(1) expected time per operation.

Furthermore, beginning with an empty ChainedHashTable, any sequence of m add(x) and remove(x) operations results in a total of O(m) time spent during all calls to grow().

Chained Hash Table - WilfullMurder/DataStructures-Java GitHub Wiki

Chained Hash Table

Multiplicative Hashing

Lemma CHT.1:

Lemma CHT.2:

Proofs

Lemma CHT.2:

Lemma CHT.3:

Lemma CHT.1:

Summary

Theorem CHTT.1:

⚠️ GitHub.com Fallback ⚠️

Chained Hash Table - WilfullMurder/DataStructures-Java GitHub Wiki

Chained Hash Table

Multiplicative Hashing

Lemma CHT.1:

Lemma CHT.2:

Proofs

Lemma CHT.2:

Lemma CHT.3:

Lemma CHT.1:

Summary

Theorem CHTT.1:

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️