Hash Table - rFronteddu/general_wiki GitHub Wiki

A data structure that organizes data using hash functions to support quick insertion and search.

The key idea is to use an hash function to map keys to buckets:

when we insert, the hash function will decide in which bucket the key should be assigned and sorted in
similarly. when we search, the hash function will direct us to the bucket to search into.

Multiple keys can be in the same bucket, that's why once we fight the bucket of interest we still have to search over the values in it.

The hash function choice will depend on the range of keys values and on the number of buckets.

Hash Function

An optimal hash is an open problem, the objective is to assign keys to buckets uniformly. Ideally you would want 1 bucket per key but generally it is a trade-off between the amount of buckets and the capacity of a bucket.

Collision Resolution

A collision resolution algorithm should solve:

how to organize values in the same bucket
what if too many values are assigned tot he same bucket
how to search for a target value in a bucket

If n is constant and small we can use an array, otherwise, we could use a height-balanced binary search tree.

Complexity

If there are m keys we can achieve space complexity O(m). Time complexity has a strong relationship with key design. Ideally, both insertion and search should be O(1) (assuming array in the bucket is small enough to be considered constant).

Worst case, the maximum bucket size can be n so complexity will be O(1) for insertion and O(n) for search.

Typical Design

Hashable key
Each Each bucket contains an array to store all the values in the same bucket initially
If too many values are in 1 bucket, these values will be maintained in a height-balanced binary search tree:

Avg O(1) for insertion and search
Worst O(1) for insertion, O(log n) for search

Typical uses: finding duplicates, aggregating results

Designing the key

Sometimes it is useful to design keys, for example to group anagrams you cannot just use the string as is

To design a key is to build a mapping relationship between the original information and the actual key used by the hash map. When you design a key you must guarantee that:

All values belonging to the same group will be mapped in the same group
Values which need to be mapped in different groups, will not be matched in the same group.

The primary difference between this and designing an hash function is that an hash function may not satisfy 2.

Tricks

When the order doesn't matter, you can use the sorted concatenation of elements as key
If you only care about the offset from the first value you can use it as a key
In a tree you may want to use the tree node as a key or more generally the serialization of the subtree
In a matrix, you can use the row and column index as key
In sudoku, you can combine r and c to indicate a block (ex 3x3 block of a 9x9 sudoku is identified by: c/3+r(r/3))
Some time you may want to aggregate diagonally i+j or anti diagonally (i-j)