UPDATE request implementation (ropes) - tsafin/tarantool GitHub Wiki

What’s wrong with the old UPDATE?

The current UPDATE request implementation has a few logical defects. The old algorithm recalculates tuple indexes at the last moment, and thus it can’t operate with inserted fields in the same request which inserts them, and when a request containing several operations, e.g. inserts, is split in two or more requests, the end result may be different from the case when all these operations are put together. Some operation sequences, such as push_back(field) (it equals to insert(field, -1)) pop_back(field) (it equals to delete(-1)) have non-obvious behavior.

That is why we should develop a new algorithm which doesn’t have the described defect. The algorithm should satisfy the following criteria:

Any sequence of operations, executed either in a single request or as two or more smaller requests (with the same ordering) should have the same effect;
The algorithm should have computational complexity O(n*log(n)) or lower, where n is the number of operations in the request packet (we assume that n << m, where m is the number of fields in the tuple);
The algorithm should be optimized to minimize memory allocation.

An alternative

Back to the future

A simple implementation which satisfies the criteria is:

move all tuple fields to a vector,
apply all operations to the fields of the vector,
write the vector to the new tuple.

It has computational complexity O(mm), where m -- the number of fields in the tuple. However m >> n, therefore O(mm) > O(n*log(n)).

A list can help

The algorithm from the previous section can be modified in the following way:

create a list which initially contains only one element -- a blob with all tuple fields?
when there is an operation on field i, iterate through the list until an element which "contains" the required field is found. If the element spans more than one field, spit it in 3 parts: a head (all fields before the required field), the required field, and a tail (all fields after the required field).

Pseudo code:

	struct list {
		struct list *next;
		struct field *first_field;
		int field_count;
	};

	field *
	get_field_from_list(list *head, int n)
	{
		for (list *l = head; l != NULL; ++l) {
			if (l->field_count < n)
				if (l->field_count == 1)
					/* this node contains only one field, and we can return it right away */
					return l->first_field;
				else
					/* We should cut out the requested field from a blob with more than one field */
					return split_list(list *l, n);
			n -= l->field_count;
		}
		return NULL;
	}

	struct field *
	split_list(list *head, int n)
	{
		/* create a list node with the required field only (field_count == 1) */
		list *cut_node = create_list(get_field(l->first_field, n), 1);
		/* create list node with tail after required field */
		list *cut_ tail = create_list(get_field(l->first_field, n + 1), l->field_count - n - 1);
		/* decrease fields count in head, now it have only fields before required field */
		head->field_count = n;
		/* save pointer to tail */
		list *tail = head->next;
		/* linking cut_node and cut_tail to list */
		head->next = cut_node;
		cut_node->next = cut_tail;
		cut_tail->next = tail;

		return cut_node->first_field;
	}

Since the algorithm uses lazy node allocation, it is easy to show that the list will have fewer nodes than n + 2, therefore computational complexity of the algorithm is O(n*n).

Lowering algorithm complexity

Rope: introduction

Linear data structures like vector, list or queue have linear computational complexity of search or insertion. We can try to use a tree-like data structure instead, such as 'rope'.

Rope is a data structure which allows to access sequential data (like elements of a tree or a list) using a tree representation. Computational complexity of a search in a rope is O(log(n)) (it's faster than list) and computational complexity of an insert into a rope is O(log(n)) (it's faster than vector). A re-balancing procedure helps to keep rope depth under O(log(n)).

Operations	Rope		List	Vector
index		O(log(n))	O(n)	O(1)
split		O(log(n))	O(1)	O(n)
concatenate	O(log(n))	O(1)	O(n)
insert		O(log(n))	O(1)	O(n)
delete		O(log(n))	O(1)	O(n)

Rope has two main operations: split and concatenate. Other operations like search, insertion, deletion are implemented by means of split and concatenate.

Tuple representation in the rope

A tuple can be represented as a string, where tuple fields are characters of the string. Leaf nodes of the rope will therefore contain pointers either to fields of the old tuple, or to the new fields.

	struct rope {
		/* number of fields in the rope */
		int size;
		/* the depth of the rope */
		int depth;
		/* pointer to left child (NULL for leaf) */
		struct rope *left;
		/* pointer to right child (NULL for leaf) */
		struct rope *right;
		/* pointer on fields data (NULL for internal nodes) */
		void *fields;
	};

In the beginning, the rope has only one leaf node which contains all tuple fields:

    t = {f0, f1, ... , fn-1}
         ^----------------+
    r                     |                
    └── {size = n, data = &t[0]}

When we'd like to apply an operation to a field, we should first extract it to a separate node which contains only this field. The new field value should be stored in a temporary buffer. When all operations are applied, we should do rope traversal and calculate the resulting tuple length. In the next rope traversal values of new fields are stored in a new tuple.

Extracting a field

When we'd like to modify i-th field of a tuple, we should first extract it to a rope node which contains only this field. The extract algorithm is:

Find leaf node which has i-th field. If the node has only i-th field (node->size == 1) then return the node;
Split the rope in 3 parts: the first one r1 which has fields from 0 to i-1, the second one r2 which has only i-th field and the third one r3 which has fields from i+1 to n-1;
Concatenate r1 with r2, and then concatenate the result with r3;
Return the node which has i-th field.

Pseudocode:

	rope
	extract_field_from_rope(rope root, int i)
	{
		rope *r = find_rope_node(root, i);
		if (r->size == 1)
			return r;
	
		/* The node constans mode than one field. Split it’s on tree parts and concat it. */
		rope cut_node = rope_split(root, i - 1);
		rope cut_tail = rope_split(cut_node, 1);
		rope_concat(root, cut_node);
		rope_concat(root, cut_tail);
	
		return cut_node;
	}

For example we have a rope:

    t = {0, 1, 'a', 'b'}
         ^
         +----------------+
    r                     |
    └── {size = 4, data = &t[0]}

Split rope r in three:

 t = {0, 1, 'a', 'b'}
      ^  ^   ^
      |  |   +------------------+
      |  +--------------------+ |
      +----------------+      | |
 r1                    |      | |
 |                     |      | |
 └── {size = 1, data = &t[0]} | |
 r2                    +------+ |
 |                     |        |
 └── {size = 1, data = &t[1]}   |
 r3                    +--------+
 |                     |
 └── {size = 2, data = &t[2]}

Concatenate the ropes:

 r
 └── {size = 4}
     ├── {size = 1, data = &t[0]}
     └── {size = 3}
         ├── {size = 1, data = &t[1]}
         └── {size = 2, data = &t[2]}

Inserting a new field

Not all operations work under existed fields. The INSERT operation creates new field. We should create new leaf node which contain one inserted field:

op = {opcode = INSERT, field_no = i, value = new_field}

ri = {size = 1, data = &op.value}

Then we should insert it to the rope by following algorithm:

Split the rope on two part. The first one r1 contains all fields from 0 to j - 1 and another one r2 all fields from j to n - 1;
Concatenate r1 with ri then concatenate result with r2.

Pseudocode:


	rope
	insert_field_to_rope(rope root, int i, void *field)
	{
		ri = rope_new(1, field);
		r1 = root;
		r2 = rope_split(root, i);
		rope_concat(r1, ri);
		rope_concat(r1, r2);
		return ri;
	}

For example, we have rope r and inserting field f which we should insert before the first element:

    r
    └── {size = 4}
        ├── {size = 1, data = &t[0]}
        └── {size = 3}
            ├── {size = 1, data = &t[1]}
            └── {size = 2, data = &t[2]}
    ri
    └── {size = 1, data = &f}

Split r on r1 and r2:

 r1
 └── {size = 1, data = &t[0]}

 r2
 └── {size = 3}
     ├── {size = 1, data = &t[1]}
     └── {size = 2, data = &t[2]}

Concatenate r1 and ri

 r1
 ├── {size = 1, data = &t[0]}
 └── {size = 1, data = &f}

Concatenate result with r2:

 r
 └── {size = 5}
     ├── {size = 2}
     │   ├── {size = 1, data = &t[0]}
     │   └── {size = 1, data = f}
     └── {size = 3}
         ├── {size = 1, data = &t[1]}
         └── {size = 2, data = &t[2]}

Delete a field

Also the UPDATE command can delete existed fields, that's why we should support delete operations from ropes. The algorithm is pretty much like extract algorithm:

Split the rope in 3 part: the first one r1 which has fields from 0 to i-1, the second one r2 which has only i-th field and the third one r3 which has fields from i+1 to n-1;
Concatenate r1 with r3;
Delete r2.

Pseudocode:


	void
	delete_field_from_rope(rope root, int i)
	{
		rope cut_node = rope_split(root, i - 1);
		rope cut_tail = rope_split(cut_node, 1);
		rope_concat(root, cut_tail);
		rope_delete(cut_node);
	}

For example, we have rope r and we'd like to delete field 2:

    r
    └── {size = 5}
        ├── {size = 2}
        │   ├── {size = 1, data = &t[0]}
        │   └── {size = 1, data = f}
        └── {size = 3}
            ├── {size = 1, data = &t[1]}
            └── {size = 2, data = &t[2]}

Split r on r1, r2 and r3:

 r1
 └── {size = 2}
     ├── {size = 1, data = &t[0]}
     └── {size = 1, data = f}
 r2
 └── {size = 1, data = &t[1]}

 r3
 └── {size = 2}
     └── {size = 2, data = &t[2]}

Concatenate r1 and r3:

 r
 └── {size = 4}
     ├── {size = 2}
     │   ├── {size = 1, data = &t[0]}
     │   └── {size = 1, data = f}
     └── {size = 2}
         └── {size = 2, data = &t[2]}

Delete r2.

Optimization issues

If we'd like to perform update operations in place, we should allocate additional buffers for intermediate results, since we can't modify the original tuple. However, if we allocate new tuple space in advance and then we do operations right in the new tuple, we can reduce additional memory allocation to few cases where it is really needed. For example, if we set field to a huge value and then, with the next operation, we reduce it to a small value, the intermediate result (after SET) can't overflow the new tuple buffer.

The gist of the new algorithm

For memory optimization we make some changes in the algorithms. We don’t do operations in place in the rope; instead, we save the operation to the rope node. Output of the algorithm is a sorted sequence of operations which produces the same result as the input sequence.


	struct rope {
		/* number of fields in the rope */
		int size;
		/* the depth of the rope */
		int depth;
		/* pointer to parent (NULL for root) */
		struct rope *parent;
		/* pointer to left child (NULL for leaf) */
		struct rope *left;
		/* pointer to right child (NULL for leaf) */
		struct rope *right;
		/* pointer on fields data (NULL for interior nodes) */
		void *fields;
		/* fields length */
		int fields_length;
		/* The list of operations which should be applied to this field */
		list *ops;
	};

On the first stage we apply only INSERT and DELETE operations, another operations we just push back to node operations list and update fields_length (from leaf to parent). On the next stage, when all operations were grouped by field number and we result tuple length was calculated, we can apply operations to fields.

Pseudocode:

	void
	do_update(tuple, op_list)
	{
		/* put all operations to the rope */
		rope root = rope_new(&tuple.value[0], tuple.cardinality, tuple.length);
		foreach (op = op_list)
			switch (op.opcode) {
			case INSERT:
				rope ri = insert_field_to_rope(root, op.field_no, op.value);
				break;
			case DELETE:
				delete_field_from_rope(root, op.field_no);
				break;
			default:
				rope r = extract_field_from_rope(root, op.field_no);
				update_fields_length(r, op);
				list_push_back(r.ops, op);
				break;
			}

		tuple = tuple_new(root.field_length);
		/* Apply ops to each field and write result to new tuple */
		foreach_leaf (l = root)
			apply_ops_to_field(tuple, l->field, l->ops);
	}