Hash Maps(Codecademy) - marinakosova/master-the-coding-interview GitHub Wiki
Maps
Hash Tables and Hash Functions
Shortly about Hash Maps(from Codecademy)
Hash map: A key-value store that uses an array and a hashing function to save and retrieve values. Key: The identifier given to a value for later retrieval. Hash function: A function that takes some input and returns a number. Compression function: A function that transforms its inputs into some smaller range of possible outputs.
Recipe for saving to a hash table:
- Take the key and plug it into the hash function, getting the hash code.
- Modulo that hash code by the length of the underlying array, getting an array index.
- Check if the array at that index is empty, if so, save the value (and the key) there.
- If the array is full at that index continue to the next possible position depending on your collision strategy.
Recipe for retrieving from a hash table:
- Take the key and plug it into the hash function, getting the hash code.
- Modulo that hash code by the length of the underlying array, getting an array index.
- Check if the array at that index has contents, if so, check the key saved there.
- If the key matches the one you're looking for, return the value.
- If the keys don't match, continue to the next position depending on your collision strategy.
Intro to Hash Maps
Hash maps are data structures that serve as efficient key-value stores. They are capable of assigning and retrieving data in the fastest way possible. This is because the underlying data structure that hash maps use is an array.
A value is stored at an array index determined by plugging the key into a hash function. Because we always know exactly where to find values in a hash map, we have constant access to any of the values it contains.
This quick access to values makes a hash map a good choice of data structure whenever we need to store a lot of values but need fast look-up time.
const LinkedList = require('./LinkedList');
const Node = require('./Node');
class HashMap {
constructor(size = 0) {
this.hashmap = new Array(size)
.fill(null)
.map(() => new LinkedList());
}
hash(key) {
let hashCode = 0;
for (let i = 0; i < key.length; i++) {
hashCode += hashCode + key.charCodeAt(i);
}
return hashCode % this.hashmap.length;
}
assign(key, value) {
const arrayIndex = this.hash(key);
const linkedList = this.hashmap[arrayIndex];
if (linkedList.head === null) {
linkedList.addToHead({ key, value });
return;
}
let current = linkedList.head;
while (current) {
if (current.data.key === key) {
current.data = { key, value };
}
if (!current.next) {
current.next = new Node({ key, value });
break;
}
current = current.next;
}
}
retrieve(key) {
const arrayIndex = this.hash(key);
let current = this.hashmap[arrayIndex].head;
while(current) {
if(current.data.key === key) {
return current.data.value;
}
current = current.next;
}
return null;
}
}
module.exports = HashMap;const LinkedList = require('./LinkedList');
const Node = require('./Node');
class HashMap {
constructor(size = 0) {
this.hashmap = new Array(size)
.fill(null)
.map(() => new LinkedList());
}
hash(key) {
let hashCode = 0;
for (let i = 0; i < key.length; i++) {
hashCode += hashCode + key.charCodeAt(i);
}
return hashCode % this.hashmap.length;
}
assign(key, value) {
const arrayIndex = this.hash(key);
const linkedList = this.hashmap[arrayIndex];
console.log(`Storing ${value} at index ${arrayIndex}`);
if (linkedList.head === null) {
linkedList.addToHead({ key, value });
return;
}
let current = linkedList.head;
while (current) {
if (current.data.key === key) {
current.data = { key, value };
}
if (!current.next) {
current.next = new Node({ key, value });
break;
}
current = current.next;
}
}
retrieve(key) {
const arrayIndex = this.hash(key);
let current = this.hashmap[arrayIndex].head;
while (current) {
if (current.data.key === key) {
console.log(`\nRetrieving ${current.data.value} from index ${arrayIndex}`);
return current.data.value;
}
current = current.next;
}
return null;
}
}
module.exports = HashMap;
const birdCensus = new HashMap(16);
birdCensus.assign('mandarin duck', 'Central Park Pond');
birdCensus.assign('monk parakeet', 'Brooklyn College');
birdCensus.assign('horned owl', 'Pelham Bay Park');
console.log(birdCensus.retrieve('mandarin duck'));
console.log(birdCensus.retrieve('monk parakeet'));
console.log(birdCensus.retrieve('horned owl'));
Long read (from Codecademy)
Being a map means relating two pieces of information, but a map also has one further requirement.
In order for a relationship to be a map, every key that is used can only be the key to a single value. There doesn’t need to be a value for every possible key, there just can’t be more than one value for a given key.
Hash Map Methodology
In the case of a map between two things, we don’t really care about the exact sequence of the data. We only care that a given input, when fed into the map, gives the accurate output. Developing a data structure that performs this is tricky because computers care much more about values than relationships. A computer doesn’t really care to memorize the astrological signs of all of our friends, so we need to trick the computer into caring.
We perform this trick using a structure that our computer is already familiar with, an array. An array uses indices to keep track of values in memory, so we’ll need a way of turning each key in our map to an index in our array.
Imagine we want our computer to remember that our good friend Joan McNeil is a Libra. We take her name, and we turn that name into a number. Let’s say that the number we correspond with the name “Joan McNeil” is 17. We find the 17th index of the array we’re using to store our map and save the value (Libra) there.
How did we get 17, though? We use a special function that turns data like the string “Joan McNeil” into a number. This function is called a hashing function, or a hash function. Hashing functions are useful in many domains, but for our data structure the most important aspect is that a hashing function returns an array index as output.
A hash function takes a string (or some other type of data) as input and returns an array index as output. In order for it to return an array index, our hash map implementation needs to know the size of our array. If the array we are saving values into only has 4 slots, our hash map’s hashing method should not return an index bigger than that.
It is actually a defining feature of all hash functions that they greatly reduce any possible inputs (any string you can imagine) into a much smaller range of potential outputs (an integer smaller than the size of our array). For this reason, hash functions are also known as compression functions.
Much like an image that has been shrunk to a lower resolution, the output of a hash function contains less data than the input. Because of this, hashing is not a reversible process. With just a hash value it is impossible to know for sure the key that was plugged into the hashing function.
How to Write a Hash Function
A very common hash function for integers, for example, is to perform the modular operation on it to make sure it’s less than the size of the underlying array. If the integer is already small enough to be an index into the array, there’s nothing to be done.
Many hash functions implementations for strings take advantage of the fact that strings are represented internally as numerical data. Frequently a hash function will perform a shift of the data bitwise, which is computationally simple for a computer to do but also can predictably assign numbers to strings.
Basic Hash Maps
Now that we have all of the main ingredients for a hash map, let’s put them all together. First, we need some sort of associated data that we’re hoping to preserve. Second, we need an array of a fixed size to insert our data into. Lastly, we need a hash function that translates the keys of our array into indexes into the array. The storage location at the index given by a hash is called the hash bucket.
Collisions
Remember hash functions are designed to compress data from a large number of possible keys to a much smaller range. Because of this compression, it’s likely that our hash function might produce the same hash for two different keys. This is known as a hash collision.
Separate Chaining
A hash map with a linked list separate chaining strategy follows a similar flow to the hash maps that have been described so far. The user wants to assign a value to a key in the map. The hash map takes the key and transforms it into a hash code. The hash code is then converted into an index to an array using the modulus operation. If the value of the array at the hash function’s returned index is empty, a new linked list is created with the value as the first element of the linked list. If a linked list already exists at the address, append the value to the linked list given.
Open Addressing: Linear Probing
Another popular hash collision strategy is called open addressing. In open addressing we stick to the array as our underlying data structure, but we continue looking for a new index to save our data if the first result of our hash function has a different key’s data.
A common open method of open addressing is called probing. Probing means continuing to find new array indices in a fixed sequence until an empty index is found.