Space Efficient Linked List - WilfullMurder/DataStructures-Java GitHub Wiki

Space Efficient Linked List

Introduction:

A downside of linked lists (other than access times on deep elements) is their space usage. Every node in a DoublyLinkedList requires two ancillary references to the next and previous nodes in the list. Two fields in a node are specifically dedicated to list maintainence, and only one field is used for data.

A simple idea for reducing the wasted space is implemented in the SpaceEfficientLinkedList. Instead of storing individual elements in a DoublyLinkedList, we store a block (array) that encompasses several items. Moer accurately, a SpaceEfficientLinkedList is parameterised by a block of size b. Every node in a SpaceEfficientLinkedList stores a block that can hold up to b+1 elements.

For rationale that will be clarified later, it will be useful if we can use Deque operations on each block. The data structure chosen for this is a boundedDeque, developed from the ArrayDeque structure. The boundedDeque deviates from the ArrayDeque in one small way: When a new boundedDeque is instantiated, the size of the backing array is fixed at b+1 and never grows or shrinks. The important property of a boundedDeque is that it permits the addition or removal of elements either at the front or back in constant time, which is beneficial when elements are shifted from one block to another. So, a SpaceEfficientLinkedList is a doubly-linked list of blocks.

Space requirements

A SpaceEfficientLinkedList puts tight constraints on the number of elements in a block: Excepting that a block is the last block, then that block contains at least b-1 elemetnts and at most b+1 elements. Meaning that, if a SpaceEfficientLinkedList contains n elements, then it has at most n/(b-1)+1 = O(n/b) blocks. The boundedDeque for each block contains an array of length b+1 but, for every block except the last, so, at most a constant amount of space is wasted in the array. The remaining memory used by a block is also constant. Meaning the wasted space in a SpaceEfficientLinkedList is O(b+n/b). Therefore, by choosing a value of b within a constant factor of √n, we can force the space-overhead of a SpaceEfficientLinkedList approach the √n lower bound given in RootishArrayStack.

Finding Elements:

The initial challenge with a SpaceEfficientLinkedList is finding an item with the given index, i. This is because the location of an element is comprised of two parts:

  1. The node, u, that contains the block that contains the element with index i
  2. The index, j, of the element within its block.
So, to find the block that contains a particular element, we begin much in the same manner as we do in a DoublyLinkedList. Either we begin at the front of the list and iterate forward, or at the back of the list iterating backward, until we reach the specified node. The major difference being that, each time we move to the next node, we skip over whole blocks of elements.

Recall that, with the exception of at most one block, each block contains at least b-1 elements, so each step in our search takes us b-1 elements closer to the desired element. So, searching forward, we reach the correct node after O(1+i/b) steps. Searching backward, we reach the node after O(1+(n-i)/b) steps. The algorithm takes the smaller of these two quantities based on the value of i, and so locates the item with index i in O(1+min{i,n-i}/b).

Once we deal with locating an item with index i, the get(i) and set(i,x) operations simply get or set a particular index within the correct block. As the running times of these operations are dictated by the find operation they also run in O(1+min{i,n-i}/b) time.

Adding an Element:

Adding an element to a SpaceEfficientLinkedList is a tad more complicated. If we think about the easiest option first, which would be to implement add(x), where x is added to the end of the list, it makes life easier. If the last block is full or does not exist, then we allocate a new block and append the list of blocks. As we are now confident the last block exists and is not full, we append x to the last block.

The complexity comes from our desire to add to the list interior via the add(i,x) method. Firstly, we locate i to get the node, u, whose block contains the ith list item. Now, the problem is that we want to insert x into u's block, but need to be prepared for the case where u's block already contains b+1 elements, and so is full and has no room for x.

So, let u0,u1,u2,... denote u, u.next, u.next.next, and so on. We explore u0,u1,u2,... looking for a node that can provide space for x. Three cases occur during our exploration (see Fig.1):

SEListExplore

  1. We quickly, in r+1≤b steps, find a node, ur whose block is not full. So, we perform r shifts of an element from one block to the next, so that free space in ur becomes free space in u0. We then insert x into u0's block.
  2. We quickly, in r+1≤b steps, run off the end of the list of blocks. In this case, we add a new empty block to the end of the list of blocks and proceed as in the first case.
  3. After b steps we do not find any non-full block. In this case, u0,...,ub-1 is a sequence of b blocks that each contain b+1 elements. We insert a new block ub at the end of this sequence and spread the original b(b+1) elements so that each block of u0,...,ub contains exactly b elements. Now u0's block contains only b elements so it has room for us to insert x.
The running time of the add(i,x) operation depends on which of the three cases occurs. Cases 1 and 2 involve examining and shifting elements through at most b blocks, therefore, taking O(b) time. Howerver, case 3 involves calling the spread(u) method, which moves b(b+1) elements and takes O(b2) time. If we ignore the cost of case 3 (which we will account for later through amortization) this means that the total running time to locate i and perform insertion of x is O(b+min{i,n-i}/b).

Removing an element:

This operation is not disimilar to adding an element. Firstly, we locate the node, u, that contains the element with index, i. Meanwhile, we must anticipate the case where we cannot remove an element from u without causing u's block to become smaller than b-1.

As before, let u0,u1,u2,... denote u, u.next, u.next.next, and so on. We explore u0,u1,u2,... looking for a node that can lend an element to make the size of u0's block at least b-1. Again, there are three cases to consider (see Fig.2):

SElistRemoving

  1. We quickly, in r+1≤b steps, find a node whose block contains more than b-1 elements. In this case, we perform r shifts of an element from one block into the previous, so that the extra element in ur is instead an extra element in u0. Then we remove the appropriate element from u0's block.
  2. We quickly, in r+1≤b steps, run off the end of the list of blocks. In this case, ur is the last block, and there is no need for ur's block to contain at least b-1 elements. Consequently, we proceed as above, by borrowing an element from ur to create an extra element in u0. If this produces an empty block at ur then we remove it.
  3. After b steps we do not find any block containing more than b-1 elements. In this case, u0,...,ub-1 is a sequence of b blocks each containing b-1 elements. We collect these b(b-1) elements into u0,...,ub-2 so that each of these b-1 blocks contain exactly b elements and then remove the now empty ub-1. Now u0's block contains b elements and we can remove the appropriate element.
So, much like the add(i,x) operation, the running time of the remove(i) operation is O(b+min{i,n-i}/b) if we ignore the cost of the gather(u) method that occurs in Case 3.

Amortized Analysis of Spreading and Gathering:

After analysing the previous operations, the spread(u) and gather(u) methods that could be called by remove(i) and add(i,x) methods need to be analysed. The running time of both of these methods is dictated by the two nested loops. Both the inner and outer loops execute at most b+1 times, so the total running time of both methods is O((b+1)2)=O(b2). Nonetheless, the following lemma shows that these methods execute on at most one out of every b calls to add(i,x) or remove(i).

Lemma SELL.1

If an empty SpaceEfficientLinkedList is created and any sequence of m≥1 calls to add(i,x) and remove(i) is performed, then the total time spent during all calls to spread(u) and gather(u) is O(bm).

Proof:

Using amortized analysis, we describe a node, u, as fragile if u's block does not contain b elements. So, u is either the last node, or contains b-1 or b+1 elements. Any node whose block contains b elements is rugged. We define the potential of a SpaceEfficientLinkedList as the number of fragile nodes it contains. As the analysis of remove(i) and gather(u) is identical, we can examine only the add(i,x) operation and its relation to the number of calls to spread(u).

Regard that, if Case 1 occurs during the add(i,x) method, then only one node, ur, has the size of its block adjusted. So, at most one node goes from being rugged to fragile. If Case 2 occurs, then a new node is created, according to the above definition this node is fragile, however, no other node changes size so the number of fragile nodes increases by one. Hence, in either case the potential of the SpaceEfficientLinkedList increases by one.

Lastly, if Case 3 occurs, it is because u0,...,ub-1 are all fragile nodes. So, spread(u0) is called and the b fragile nodes are replaced with b+1 rugged nodes. Finally, x is added to u0's block, and u0 becomes fragile. The potential decreases by b-1.

In summation, due to the list being empty, the potential starts at 0. Each time Case 1 or 2 occurs, the potential increases by 1 at most. Each time Case 3 occurs, the potential decreases by b-1. The potential is never less than 0. So, we can conclude that, for every occurence of Case 3 there are at least b-1 occurences of Case 1 or 2. Therefore, for every call to spread(u) there are at least b calls to add(i,x), completing the proof.

Summary:

The folowing theorem summarizes the performance of the SpaceEfficientLinkedList data structure:

Theorem SELL.1.1

A SpaceEfficientLinkedList implements the List interface. Ignoring the cost of calls to spread(u) and gather(u), a SpaceEfficientLinkedList with block size b supports the operations

  • get(i) and set(i,x) in O(1+min{i,n-i}/b) time per operation
  • add(i,x) and remove(i) in O(b+min{i,n-i}/b) time per operation.
Additionally, starting with an empty SpaceEfficientLinkedList, any sequence of m add(i,x) and remove(i) operations result in a total of O(bm) time spent during all calls to spread(u) and gather(u).

The space, measured in words, used by a SpaceEfficientLinkedList that stores n elements is n+O(b+n/b).

Note:

The SpaceEfficientLinkedList is a trade-off between an ArrayList and a DoublyLinkedList wherein the relative mix of the two structures is dependent on the block size, b. At one end b=2, each SpaceEfficientLinkedList node stores at most 3 values, which is not disimilar to a DoublyLinkedList. At the other end, b>n, all the elements are stored in a single array, like an ArrayList. In between these ends lies a trade-off between the time it takes to add or remove a list item and the time it takes to locate a particular list item.

⚠️ **GitHub.com Fallback** ⚠️