Taking a Look at Non Binary Trees - Eishaaya/GMRDataStructuresTest GitHub Wiki

So why Non-Binary Trees?

In this chapter we will explain B-Trees and 2-3-4 trees so that you will have a basis of understanding when moving to the next chapter. The next chapter introduces red-black trees, which are self balancing trees that are complex in design, but very efficient when it comes to insertion and deletion. A red-black tree is actually a binary representation of a 2-3-4 tree. So by understanding a 2-3-4 tree, you can understand what is happening in a red-black tree. In addition, this chapter will expose you to other tree concepts that aren't used in binary trees. We do not expect you to implement a B-Tree or a 2-3-4 Tree directly, but we do want you to understand them because we will be tying these concepts together in the next chapter.

B-Trees & 2-3-4 Trees

If you generalize a binary search tree node to allow for multiple keys and keep the tree in perfect balance, you end up with a 2-3-4 tree. A perfect balance means that every path from root to leaf has the same length. Lets see how 2-3-4 trees get their name.

A 2-3-4 tree is also restricted to allow only 1, 2, or 3 keys per node:
- 2-node: one key, two children
- 3-node: two keys, three children
- 4-node: three keys, four children

A B-Tree is a self-balancing data structure that optimizes for reading and writing large blocks of data. If built correctly, a B-Tree will contain perfect balance because every path from root to leaf will have the same length. In a B-Tree, a node can contain more than one value and have more than two children. The number of values within a node determines the amount of children that node can have. A B-Tree is a 2-3-4 tree of order 4. In a B-tree, the order is defined as the maximum number of children a node can have, in a 2-3-4 tree the maximum is 4.

To explain visually, here are some examples of different nodes you can encounter in a B-Tree (the # of the node represents how many children that node can have):

2-Node

A 2-node is the standard type of node you encounter in a regular binary tree. The node has 1 value and 2 children.

3-Node

A 3-node contains 2 values and 3 children. The left child will represent values that are smaller than both values in the 3-node, the center child represents values that are between the two values in the 3-node, and the right child represents values that are larger than both values in the 3-node.

4-Node

A 4-node contains 3 values and 4 children. The first child is smaller than all 3 values in the 4-node. The second child is larger than the first value but smaller then the second and third value in the 4-node. The third child is larger than the first and second value but smaller than the third value in the 4-node. Finally, the fourth child is larger than all 3 values in the 4-node.

In B-trees, the concept continues as nodes have 'n' values and 'n+1' children as the tree requires. The B-Tree is not restricted to having a specific type of node either. It can contain 2-nodes, 3-nodes, 4-nodes, and more in the same tree. Though larger nodes start to become less efficient. In that case, restrictions can be made on the tree in order to create a B-tree of order 4; or a tree who's largest node possible is a 4-node. This is also known as a 2-3-4 Tree since only 2-nodes, 3-nodes, and 4-nodes can exist within the tree. From now on, we will refer to 2-3-4 trees when explaining functionality.

Search

For searching a 2-3-4 tree we can still use a standard binary search to find a specific value within the tree. We start at the top and compare the search value with values in the node then travel down the correct child path if the value is not found.

Ex: Search for L

Inserting

Upon insert, new values are added to the bottom of the tree by adding the new value to an existing leaf node. Thus transforming the type of node it becomes.

Adding to a 2-node would convert it into a 3-node:

Adding to a 3-node would convert it into a 4-node:

But adding to a 4-node is not possible due to our 2-3-4 limitation. To solve this, we split the 4-node by taking the center value and giving it to the parent and splitting the other 2 values into 2-nodes. This adds room at the leaf level for our new value to be stored. You can think of a B-Tree/2-3-4 tree as growing upwards rather than a binary tree which grows downwards.

Problem: This doesn't work if the parent is also a 4-node. Giving the center value to the parent would create a 5-node.

Solutions:

Bottom-up
- Use the same function to split the parent
- Continue up the tree while necessary
- Insert value on the leaf node
Top-down (recommended)
- Split 4-nodes on the way down the tree
- Insert value on the leaf node
- Ensures that there are no upper 4-nodes should the leaf node need to split.

Removing

When deleting from a 2-3-4 tree we must be careful as to not remove a node from the tree. Removing a node from the tree would cause the tree to become invalid. This is because 2-3-4 tree must always be a perfect balance tree (all paths from root to leaf have the same length) and removing any node completely would result in a height change for that sub-tree. Deleting from a leaf 3-node or leaf 4-node is simple, we just remove the value from the node. The situation to look out for is when deleting a value that exists in a 2-node.

One method to remove safely from a 2-3-4 tree is to transform the tree on the way down the search path to make sure that the current node is not a 2-node. So that when we find the value to delete, it is part of a 3-node or a 4-node. To do this we can either combine siblings together (if the sibling is also a 2-node) or borrow from a sibling (if the sibling is a 3-node or a 4-node). Since it is a B-Tree, we always have a sibling to work with.

If removing from a 3-node or a 4-node that is a leaf node, we can just remove the value from the node and the story is done. If removing from an internal 3-node or 4-node (nodes with children), we consider a similar method to the binary tree delete in which the current node will obtain a value from one of our sub-tree's greatest node or smallest node (depending which is available). This is done to keep the current node at its current size in order to keep the structure of the tree. We can then recursively delete for the leaf node we borrowed from. If we cannot obtain a value from one of our sub tree's (in the case that our children are 2-nodes) we can combine siblings, effectively bringing the value we want to delete lower down the tree. In the end, only leaf nodes should be affected.

If you would like to toy around with B-Tree's, we suggest visiting this visualizer: https://www.cs.usfca.edu/~galles/visualization/BTree.html . One good example is to add the alphabet in-order, then delete 'p' (which should be in the root node).

Time and Space Complexities

Searching in 2-3-4 tree take O(log(n)) as the tree is always balanced.

Insertion takes O(log(n)) node splits, and each node split takes constant time, therefore it is also O(log(n)).

Deletion is similar to insertion and is also O(log(n)).

Space complexity is O(n) due to n elements in the tree.

2-3-4 trees are great regarding memory and time complexities, so why aren't they widely used?

Well, as you read this section, understanding it seems easy compared to implementing it. The implementation is complex due to different node types and the switching of the node types during insertion and deletion.

There are certain trees such as Red-Black trees, which are an isometry (equal in measurement) of 2-3-4 trees, but are structured like binary search trees that allow for an easier implementation, as you will see in the next lesson.

That is why 2-3-4 trees are studied to understand the theory behind Red-Black trees, since they are more widely used.

<-- Previous | Next -->