UPDATE request implementation (ropes) - tsafin/tarantool GitHub Wiki
Whatβs wrong with the old UPDATE?
The current UPDATE request implementation has a few logical defects. The old algorithm recalculates tuple indexes at the last moment, and thus it canβt operate with inserted fields in the same request which inserts them, and when a request containing several operations, e.g. inserts, is split in two or more requests, the end result may be different from the case when all these operations are put together. Some operation sequences, such as push_back(field)
(it equals to insert(field, -1)
) pop_back(field)
(it equals to delete(-1)
) have non-obvious behavior.
That is why we should develop a new algorithm which doesnβt have the described defect. The algorithm should satisfy the following criteria:
- Any sequence of operations, executed either in a single request or as two or more smaller requests (with the same ordering) should have the same effect;
- The algorithm should have computational complexity O(n*log(n)) or lower, where n is the number of operations in the request packet (we assume that n << m, where m is the number of fields in the tuple);
- The algorithm should be optimized to minimize memory allocation.
An alternative
Back to the future
A simple implementation which satisfies the criteria is:
- move all tuple fields to a vector,
- apply all operations to the fields of the vector,
- write the vector to the new tuple.
It has computational complexity O(mm), where m -- the number of fields in the tuple. However m >> n, therefore O(mm) > O(n*log(n)).
A list can help
The algorithm from the previous section can be modified in the following way:
- create a list which initially contains only one element -- a blob with all tuple fields?
- when there is an operation on field i, iterate through the list until an element which "contains" the required field is found. If the element spans more than one field, spit it in 3 parts: a head (all fields before the required field), the required field, and a tail (all fields after the required field).
Pseudo code:
struct list {
struct list *next;
struct field *first_field;
int field_count;
};
field *
get_field_from_list(list *head, int n)
{
for (list *l = head; l != NULL; ++l) {
if (l->field_count < n)
if (l->field_count == 1)
/* this node contains only one field, and we can return it right away */
return l->first_field;
else
/* We should cut out the requested field from a blob with more than one field */
return split_list(list *l, n);
n -= l->field_count;
}
return NULL;
}
struct field *
split_list(list *head, int n)
{
/* create a list node with the required field only (field_count == 1) */
list *cut_node = create_list(get_field(l->first_field, n), 1);
/* create list node with tail after required field */
list *cut_ tail = create_list(get_field(l->first_field, n + 1), l->field_count - n - 1);
/* decrease fields count in head, now it have only fields before required field */
head->field_count = n;
/* save pointer to tail */
list *tail = head->next;
/* linking cut_node and cut_tail to list */
head->next = cut_node;
cut_node->next = cut_tail;
cut_tail->next = tail;
return cut_node->first_field;
}
Since the algorithm uses lazy node allocation, it is easy to show that the list will have fewer nodes than n + 2, therefore computational complexity of the algorithm is O(n*n).
Lowering algorithm complexity
Rope: introduction
Linear data structures like vector, list or queue have linear computational complexity of search or insertion. We can try to use a tree-like data structure instead, such as 'rope'.
Rope is a data structure which allows to access sequential data (like elements of a tree or a list) using a tree representation. Computational complexity of a search in a rope is O(log(n)) (it's faster than list) and computational complexity of an insert into a rope is O(log(n)) (it's faster than vector). A re-balancing procedure helps to keep rope depth under O(log(n)).
Operations Rope List Vector
index O(log(n)) O(n) O(1)
split O(log(n)) O(1) O(n)
concatenate O(log(n)) O(1) O(n)
insert O(log(n)) O(1) O(n)
delete O(log(n)) O(1) O(n)
Rope has two main operations: split and concatenate. Other operations like search, insertion, deletion are implemented by means of split and concatenate.
Tuple representation in the rope
A tuple can be represented as a string, where tuple fields are characters of the string. Leaf nodes of the rope will therefore contain pointers either to fields of the old tuple, or to the new fields.
struct rope {
/* number of fields in the rope */
int size;
/* the depth of the rope */
int depth;
/* pointer to left child (NULL for leaf) */
struct rope *left;
/* pointer to right child (NULL for leaf) */
struct rope *right;
/* pointer on fields data (NULL for internal nodes) */
void *fields;
};
In the beginning, the rope has only one leaf node which contains all tuple fields:
t = {f0, f1, ... , fn-1}
^----------------+
r |
βββ {size = n, data = &t[0]}
When we'd like to apply an operation to a field, we should first extract it to a separate node which contains only this field. The new field value should be stored in a temporary buffer. When all operations are applied, we should do rope traversal and calculate the resulting tuple length. In the next rope traversal values of new fields are stored in a new tuple.
Extracting a field
When we'd like to modify i-th field of a tuple, we should first extract it to a rope node which contains only this field. The extract algorithm is:
- Find leaf node which has i-th field. If the node has only i-th field
(node->size == 1)
then return the node; - Split the rope in 3 parts: the first one
r1
which has fields from 0 to i-1, the second oner2
which has only i-th field and the third oner3
which has fields from i+1 to n-1; - Concatenate
r1
withr2
, and then concatenate the result withr3
; - Return the node which has i-th field.
Pseudocode:
rope
extract_field_from_rope(rope root, int i)
{
rope *r = find_rope_node(root, i);
if (r->size == 1)
return r;
/* The node constans mode than one field. Split itβs on tree parts and concat it. */
rope cut_node = rope_split(root, i - 1);
rope cut_tail = rope_split(cut_node, 1);
rope_concat(root, cut_node);
rope_concat(root, cut_tail);
return cut_node;
}
For example we have a rope:
t = {0, 1, 'a', 'b'}
^
+----------------+
r |
βββ {size = 4, data = &t[0]}
-
Split rope
r
in three:t = {0, 1, 'a', 'b'} ^ ^ ^ | | +------------------+ | +--------------------+ | +----------------+ | | r1 | | | | | | | βββ {size = 1, data = &t[0]} | | r2 +------+ | | | | βββ {size = 1, data = &t[1]} | r3 +--------+ | | βββ {size = 2, data = &t[2]}
-
Concatenate the ropes:
r βββ {size = 4} βββ {size = 1, data = &t[0]} βββ {size = 3} βββ {size = 1, data = &t[1]} βββ {size = 2, data = &t[2]}
Inserting a new field
Not all operations work under existed fields. The INSERT operation creates new field. We should create new leaf node which contain one inserted field:
op = {opcode = INSERT, field_no = i, value = new_field}
ri = {size = 1, data = &op.value}
Then we should insert it to the rope by following algorithm:
- Split the rope on two part. The first one
r1
contains all fields from 0 to j - 1 and another oner2
all fields from j to n - 1; - Concatenate
r1
withri
then concatenate result withr2
.
Pseudocode:
rope
insert_field_to_rope(rope root, int i, void *field)
{
ri = rope_new(1, field);
r1 = root;
r2 = rope_split(root, i);
rope_concat(r1, ri);
rope_concat(r1, r2);
return ri;
}
For example, we have rope r
and inserting field f
which we should insert before the first element:
r
βββ {size = 4}
βββ {size = 1, data = &t[0]}
βββ {size = 3}
βββ {size = 1, data = &t[1]}
βββ {size = 2, data = &t[2]}
ri
βββ {size = 1, data = &f}
-
Split
r
onr1
andr2
:r1 βββ {size = 1, data = &t[0]} r2 βββ {size = 3} βββ {size = 1, data = &t[1]} βββ {size = 2, data = &t[2]}
-
Concatenate
r1
andri
r1 βββ {size = 1, data = &t[0]} βββ {size = 1, data = &f}
-
Concatenate result with
r2
:r βββ {size = 5} βββ {size = 2} β βββ {size = 1, data = &t[0]} β βββ {size = 1, data = f} βββ {size = 3} βββ {size = 1, data = &t[1]} βββ {size = 2, data = &t[2]}
Delete a field
Also the UPDATE command can delete existed fields, that's why we should support delete operations from ropes. The algorithm is pretty much like extract algorithm:
- Split the rope in 3 part: the first one
r1
which has fields from 0 to i-1, the second oner2
which has only i-th field and the third oner3
which has fields from i+1 to n-1; - Concatenate
r1
withr3
; - Delete
r2
.
Pseudocode:
void
delete_field_from_rope(rope root, int i)
{
rope cut_node = rope_split(root, i - 1);
rope cut_tail = rope_split(cut_node, 1);
rope_concat(root, cut_tail);
rope_delete(cut_node);
}
For example, we have rope r
and we'd like to delete field 2:
r
βββ {size = 5}
βββ {size = 2}
β βββ {size = 1, data = &t[0]}
β βββ {size = 1, data = f}
βββ {size = 3}
βββ {size = 1, data = &t[1]}
βββ {size = 2, data = &t[2]}
-
Split
r
onr1
,r2
andr3
:r1 βββ {size = 2} βββ {size = 1, data = &t[0]} βββ {size = 1, data = f} r2 βββ {size = 1, data = &t[1]} r3 βββ {size = 2} βββ {size = 2, data = &t[2]}
-
Concatenate
r1
andr3
:r βββ {size = 4} βββ {size = 2} β βββ {size = 1, data = &t[0]} β βββ {size = 1, data = f} βββ {size = 2} βββ {size = 2, data = &t[2]}
-
Delete
r2
.
Optimization issues
If we'd like to perform update operations in place, we should allocate additional buffers for intermediate results, since we can't modify the original tuple. However, if we allocate new tuple space in advance and then we do operations right in the new tuple, we can reduce additional memory allocation to few cases where it is really needed. For example, if we set field to a huge value and then, with the next operation, we reduce it to a small value, the intermediate result (after SET) can't overflow the new tuple buffer.
The gist of the new algorithm
For memory optimization we make some changes in the algorithms. We donβt do operations in place in the rope; instead, we save the operation to the rope node. Output of the algorithm is a sorted sequence of operations which produces the same result as the input sequence.
struct rope {
/* number of fields in the rope */
int size;
/* the depth of the rope */
int depth;
/* pointer to parent (NULL for root) */
struct rope *parent;
/* pointer to left child (NULL for leaf) */
struct rope *left;
/* pointer to right child (NULL for leaf) */
struct rope *right;
/* pointer on fields data (NULL for interior nodes) */
void *fields;
/* fields length */
int fields_length;
/* The list of operations which should be applied to this field */
list *ops;
};
On the first stage we apply only INSERT and DELETE operations, another operations we just push back to node operations list and update fields_length (from leaf to parent). On the next stage, when all operations were grouped by field number and we result tuple length was calculated, we can apply operations to fields.
Pseudocode:
void
do_update(tuple, op_list)
{
/* put all operations to the rope */
rope root = rope_new(&tuple.value[0], tuple.cardinality, tuple.length);
foreach (op = op_list)
switch (op.opcode) {
case INSERT:
rope ri = insert_field_to_rope(root, op.field_no, op.value);
break;
case DELETE:
delete_field_from_rope(root, op.field_no);
break;
default:
rope r = extract_field_from_rope(root, op.field_no);
update_fields_length(r, op);
list_push_back(r.ops, op);
break;
}
tuple = tuple_new(root.field_length);
/* Apply ops to each field and write result to new tuple */
foreach_leaf (l = root)
apply_ops_to_field(tuple, l->field, l->ops);
}