Hashed Array Trees - sgml/signature GitHub Wiki

File: HashedArrayTree.java

  • Author: Keith Schwarz ([email protected])
  • An implementation of the List abstraction backed by a hashed array tree
  • (HAT), a data structure supporting amortized O(1) lookup, append, and
  • last-element removal. In this sense it is akin to a standard dynamic
  • array implementation. However, a hashed array tree also has the advantage
  • that its memory overhead is only O(sqrt(n)) rather than the typical O(n)
  • found in dynamic arrays and their variants.
  • Internally, the hashed array tree is implemented as an array of pointers
  • that optionally point to an array of elements. The topmost array and each
  • element always have the same size, which is always a power of two. In
  • this sense, the hashed array tree is essentially a two-dimensional array
  • of elements. However, the advantage of the hashed array tree is that the
  • topmost array pointers are all initially null and only filled in when space
  • is needed. This means that the maximum overhead of the structure is the
  • size of the topmost array, plus the number of unused elements in the current
  • block. Here is one sample HAT:
  •       [ ] [ ] [ ] [ ]
    
  •        |   |   |
    
  •        v   v   v
    
  •       [0] [4] [8]
    
  •       [1] [5] [9]
    
  •       [2] [6] [ ]
    
  •       [3] [7] [ ]
    
  • Here, the topmost array of pointers has three pointers in use, each of which
  • point to an array of the corresponding number of elements.
  • Whenever an element is added to a hashed array tree, one of three cases
  • must hold:
    1. There is extra space at the end of the final subarray (for example, in
  • the top picture). In that case, the element is added to that position.
    1. The final subarray is full, but space for another subarray exists in the
  • topmost array. In that case, a new array is allocated and the element
  • is added to that array.
    1. The final subarray is full and no new arrays remain open. In that case,
  • since the topmost array has size 2^n and each array has size 2^n, there
  • must be a total of 2^(2n) elements in the hashed array tree. We
  • next double the size of the topmost array to 2^(n + 1), then allocate
  • 2^(n - 1) subarrays of size 2^(n + 1) elements for a total of 2^(2n)
  • elements' worth of space. The elements from the old HAT are then copied
  • over, a new array is allocated for the new element, and it is added as
  • the first element of that array.
  • Let's now talk about the performance and memory usage of this structure.
  • First, we note that we can perform lookups in O(1), assuming that the
  • machine is transdichotomous (meaning that a single machine word can hold
  • the size of any array). This can be done by breaking the input index in
  • half, then using the first half to choose which array to look into (the
  • "hashed" part of HAT) and the second half to choose which index to select.
  • This trick is similar to the trick used to represent a two-dimensional
  • array using a linear structure.
  • Next, let's think about how much time it takes to do an append operation.
  • Each append when space remains takes O(1) to look up the proper position
  • in the hashed array tree for writing, but appends can be much more expensive
  • when the HAT needs to be resized. Fortunately, this does not happen very
  • often. Whenever the HAT doubles in size, its capacity grows from 2^(2n) to
  • 2^(2n + 2), meaning that four times as many elements must be inserted before
  • the next copy operation. If we define a potential function as twice the
  • number of filled-in elements in the HAT above half capacity (i.e. the
  • number of elements in the arrays in the second half), then we can prove an
  • amortized O(1) time for append. Consider any series of appends. If the
  • append does not expand the HAT, then it takes O(1) time and increases the
  • potential by 1/2. If the append does expand the HAT, and the HAT's topmost
  • array has size 2^n, then there must be 2^(n-1) elements in the latter half
  • of the HAT, so the potential is 2^(2n). The time required to move each
  • of the 2^(2n) elements is 2^(2n), and in the new HAT there are no elements
  • in the latter half of the array. Consequently, the new potential is zero.
  • The actual time required to perform the append is thus 2^2n + O(1), and
  • the decrease in potential is -2^2n, so the amortized cost is O(1) as
  • expected.
  • Last, let's talk about the cost to do a remove. This is similar to the
  • append case - we delete the last element of the last array, removing the
  • array from the topmost array if it becomes empty. We also compact the
  • HAT if it becomes too sparse by shrinking from a HAT of size 2^(2n + 2) to
  • a HAT of size 2^(2n) if the HAT becomes one-eighth full. A similar
  • potential method can be used to show that this operation runs in amortized
  • O(1).
  • Finally, let's consider the memory overhead of the HAT. For any HAT of
  • topmost array size 8 or more, since the HAT is always at least one-eighth
  • full, there must be at least one full array. This array has size equal to
  • the size of the topmost array (call this k), and so if every array were to
  • be filled in to capacity, there would be a total of k arrays of size k,
  • for a total of k^2 elements. Of this capacity, we know that at least an
  • eighth of them are filled in, so k^2 must be at most 8n, and so
  • k = O(sqrt(n)). To finish the analysis, the overhead of the structure is
  • at most the overhead of this top-level array, plus potentially k - 1
  • unused elements in some array. This is a total of O(k) = O(sqrt(n))
  • overhead, which is what we originally desired. */

https://en.wikipedia.org/wiki/Dynamic_array

https://blog.domenic.me/es6-iterators-generators-and-iterables/

An iterator is an object with a next method that returns { done, value } tuples.

An iterable is an object which has an internal method, written in the current ES6 draft specs as obj@@iterator, that returns an iterator.

A generator is a specific type of iterator whose next results are determined by the behavior of its corresponding generator function. Generators also have a throw method, and their next method takes a parameter.

A generator function is a special type of function that acts as a constructor for generators. Generator function bodies can use the contextual keyword yield, and you can send values or exceptions into the body, at the points where yield appears, via the constructed generator’s next and throw methods. Generator functions are written with function* syntax.

A generator comprehension is a shorthand expression for creating generators, e.g. (for (x of a) for (y of b) x * y).

The new for-of loop works on iterables, i.e. you do for (let x of iterable) { /* ... */ }