Consistent Hashing - rFronteddu/general_wiki GitHub Wiki
Distributed Hash Table (DHT) is one of the fundamental components used in distributed scalable systems. Hash Tables need a key, a value, and a hash function which maps the key to a location where the value is stored.
Suppose we are designing a distributed caching system. Given n cache servers, an intuitive hash function would be key % n. While simple and commonly used, it has two major drawbacks;
- It is NOT horizontally scalable: Whenever we add a new cache, all existing mappings are broken.
- It may NOT be load balanced: Hard to balance with non-uniformly distributed data.
Consistent Hashing minimize reorganization (only k/n keys will need to be remapped where k is the total number of keys and n is the total number of servers). In Consistent Hashing objects are mapped to the same host if possible. When a host is removed from the system, the objects on that host are shared by other hosts; when a host is added, it takes its share from a few hosts without touching others'.
Imagine that the values of the hashing functions are placed on a ring such that the values are wrapped around.
- Given a list of cache servers, hash them to integers in the range
- To map a key to a server:
- Hash it to a single integer
- Move clockwise on the ring until finding the first cache it encounters.
- That cache is the one that contains the key
To handle unbalanced data we add virtual replicas for caches. Instead of mapping each cache to a single point on the ring, we map it to multiple points on the ring, i.e. replicas. This way, each cache is associated with multiple portions of the ring. If the hash function mixes well, as the number of replicas increases, the keys will be more balanced.