Hash Table - rFronteddu/general_wiki GitHub Wiki
A data structure that organizes data using hash functions to support quick insertion and search.
The key idea is to use an hash function to map keys to buckets:
- when we insert, the hash function will decide in which bucket the key should be assigned and sorted in
- similarly. when we search, the hash function will direct us to the bucket to search into.
Multiple keys can be in the same bucket, that's why once we fight the bucket of interest we still have to search over the values in it.
The hash function choice will depend on the range of keys values and on the number of buckets.
Hash Function
An optimal hash is an open problem, the objective is to assign keys to buckets uniformly. Ideally you would want 1 bucket per key but generally it is a trade-off between the amount of buckets and the capacity of a bucket.
Collision Resolution
A collision resolution algorithm should solve:
- how to organize values in the same bucket
- what if too many values are assigned tot he same bucket
- how to search for a target value in a bucket
If n is constant and small we can use an array, otherwise, we could use a height-balanced binary search tree.
Complexity
If there are m keys we can achieve space complexity O(m). Time complexity has a strong relationship with key design. Ideally, both insertion and search should be O(1) (assuming array in the bucket is small enough to be considered constant).
Worst case, the maximum bucket size can be n so complexity will be O(1) for insertion and O(n) for search.
Typical Design
- Hashable key
- Each Each bucket contains an array to store all the values in the same bucket initially
- If too many values are in 1 bucket, these values will be maintained in a height-balanced binary search tree:
=>
- Avg O(1) for insertion and search
- Worst O(1) for insertion, O(log n) for search
Typical uses: finding duplicates, aggregating results
Designing the key
Sometimes it is useful to design keys, for example to group anagrams you cannot just use the string as is
To design a key is to build a mapping relationship between the original information and the actual key used by the hash map. When you design a key you must guarantee that:
- All values belonging to the same group will be mapped in the same group
- Values which need to be mapped in different groups, will not be matched in the same group.
The primary difference between this and designing an hash function is that an hash function may not satisfy 2.
Tricks
- When the order doesn't matter, you can use the sorted concatenation of elements as key
- If you only care about the offset from the first value you can use it as a key
- In a tree you may want to use the tree node as a key or more generally the serialization of the subtree
- In a matrix, you can use the row and column index as key
- In sudoku, you can combine r and c to indicate a block (ex 3x3 block of a 9x9 sudoku is identified by: c/3+r(r/3))
- Some time you may want to aggregate diagonally i+j or anti diagonally (i-j)