Focus Optimization in an RRB Tree (or tail optimization in a PersistentVector) - GlenKPeterson/Paguro GitHub Wiki
Clojure's mutable TransientVector is about 5x faster than its immutable PersistentVector (see: Performance Test). The most significant difference between the Mutable and Persistent vector is that the Mutable one allocates a 32-element tail and adds items to it without copying anything. The Persistent one allocates a new tail and copies the old into it every time the tail is appended to.
The shallow tree structure that most of the data is stored in has to be identical between Mutable and Persistent so that you can quickly yield a Persistent collection from a Mutable one. The Mutable/builder must not provide a way to accidentally modify the internals of the resulting persistent collection. So the opportunity for Mutable optimization lies almost exclusively in the tail.
In an RRB Tree, the tail is called the "focus" because it optimizes inserts near whatever index the last insert/append was made to - it can move around the data structure. Random inserts defeat this optimization when the vector becomes large (the focus stores a bigger percentage of the data in a smaller RRB-Tree). So allocating a 32-element focus every time would be be unacceptable for random inserts.
But we can reduce allocations and copies in the focus of the RRB-tree only as that focus grows using the following algorithm:
- Track the number of focus elements separately from the length of the focus array: numFocusElements
- If there is no focus, the new item becomes the single element of a new focus.
- If an add operation finds the focus array has 32 elements, push the focus into the tree, same as you would for a persistent collection, and the new item becomes the single element of a new focus.
- If an add operation finds the focus array full (with less than 32 elements), allocate a new array of double the length and copy the elements of the old array into the new one.
- If there is an item at numFocusElements, throw a ConcurrentModificationException (see Synchronization below)
- Add the item to the existing focus array and increment numFocusElements
Here we compare theoretical number of operations assuming all operations take the same amount of time:
- Hickey's PersistentVector tail/focus which allocates a new focus and copies the old one for each append
- A Hybrid approach which doubles the length of the focus (as explained above)
- Hickey's MutableVector tail/focus which allocates a 32-length focus once and never copies.
Focus Size | Allocations | Copies | Allocations | Copies | Allocations | Copies |
---|---|---|---|---|---|---|
Persistent | Hybrid | Mutable | ||||
1 | 1 | 0 | 1 | 0 | 32 | 0 |
2 | 2 | 1 | 2 | 1 | 0 | 0 |
3 | 3 | 2 | 4 | 2 | 0 | 0 |
4 | 4 | 3 | 0 | 0 | 0 | 0 |
Cumulative | 10 | 6 | 7 | 3 | 32 | 0 |
5 | 5 | 4 | 8 | 4 | 0 | 0 |
6 | 6 | 5 | 0 | 0 | 0 | 0 |
7 | 7 | 6 | 0 | 0 | 0 | 0 |
8 | 8 | 7 | 0 | 0 | 0 | 0 |
Cumulative | 36 | 28 | 15 | 7 | 32 | 0 |
9 | 9 | 8 | 16 | 8 | 0 | 0 |
10 | 10 | 9 | 0 | 0 | 0 | 0 |
11 | 11 | 10 | 0 | 0 | 0 | 0 |
12 | 12 | 11 | 0 | 0 | 0 | 0 |
13 | 13 | 12 | 0 | 0 | 0 | 0 |
14 | 14 | 13 | 0 | 0 | 0 | 0 |
15 | 15 | 14 | 0 | 0 | 0 | 0 |
16 | 16 | 15 | 0 | 0 | 0 | 0 |
Cumulative | 136 | 120 | 31 | 15 | 32 | 0 |
17 | 17 | 16 | 32 | 16 | 0 | 0 |
18 | 18 | 17 | 0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... |
Cumulative | 392 | 376 | 63 | 31 | 32 | 0 |
Notes
Mutable starts out far behind due to the one-time allocation. Around 4 to 12 items, Hybrid gains an increasing lead on the PersistentVector tail. Then Mutable beats Hybrid as we near 32 items, but both do close to an order of magnitude less cumulative work than the Persistent tail.
These numbers ignore the number of internal objects copied/modified for each add - usually there will be a tree pointer and a size. Some approaches have a separate tailSize or possibly a lock object (explained in the next section). These numbers also also ignore the cost of pushing the full tail into the shallow tree when it has 32 items in it.
We could use a regular array (like the Persistent) for the first ~4 elements and switch to the Hybrid approach later. This requires an extra if statement for both building and accessing. This Hybrid approach would provide near the same gains as the Mutable if inserts are usually localized to fall within the focus. And it would work nearly as well as the Persistent if inserts are totally random and the data structure is large.
TODO: Implement and time.
Synchronization
I think we can modify the above "Hybrid" algorithm to provide thread-safe apparent immutability in a way that approaches the speed of the Mutable one by making the following changes:
- Synchronize all access to the focus array (maybe by wrapping it in a Focus object that holds the array and numFocusElements - then synchronize on the Focus object). Note that only access to the focus needs to be synchronized because the tree is immutable.
- Share this synchronized Focus object between copies as much as possible.
- If there is an item at numFocusElements, it means that another thread has already appended into to the underlying array. Make a copy of the focus (up to numFocusElements - 1), exit the synchronized block, and add the element to the new focus. Then return a new RRB-Tree with the new, modified Focus. This is the worst case scenario, which hopefully won't come into play very often.
- Removing an item from the focus or inserting an item at a random place in the focus would have to copy the focus as well.
Conclusion
While this won't help the PersistentVector (because it only does appends and doesn't have to worry about insert locality), it can potentially speed up the RRB-Tree to the point that it eliminates the performance gap between building mutable and immutable/persistent data structures such that a Mutable version is no longer necessary.
This needs an implementation and some timings to be complete, but that will take a little time.