Proposal for unified handling of variable length objects in schema (part of #2127) - tsafin/tarantool GitHub Wiki
Summary
Handling variable-length objects is anything but trivial. Since the support for long names is being introduced, we can hardly afford allocating a fixed length name buffer since it results in significant wasted space.
We would like to combine all variable length parts of an object into a single allocation.
We would like to avoid allocating MAX_PATH
(4Kb) buffer for vinyl path in key_def
which is seldom used but inflates a key_def
.
Finally, key_def
already has a VLA portion — namely, parts
array. We would like to see all variable length fields handled in a consistent fashion.
Proposal
All variable-length parts must be a pointer, eg:
struct key_def {
const char *name; /* !!! */
struct key_opts {
const char *path; /* !!! */
} opts;
struct key_part {
uint32_t field_no;
enum field_type
field_type;
} *parts; /* !!! */
};
Schema objects are initialised either explicitly (for pre-defined objects) or by a parser in on_replace
trigger in system spaces.
We suggest the following routine (using key_def
for illustration):
- The parser initialises a "prototype"
key_def
. The prototype may reference other memory chunks via pointers. - To "finalize" the prototype, the parser calls
key_def_dup()
on it. Key_def_dup()
allocates a sufficiently sized memory chunk and copies the passed prototypekey_def
into it. The function then copies the variable length parts after the duplicatekey_def
and fixes pointers in the duplicate.
It's convenient that key_def_dup()
receives nothing but a prototype key_def
. Compare it with key_def_new()
we currently have: it has 6 parameters for various attributes of a key_def
.
A prototype key_def
is fully compatible with a finalised one.
Special Cases
For special cases like parts
array in a key_def
the added pointer may actually impede the performance.
We anticipate for the special cases like this one. The parser prepares parts
in a separate buffer like it does with other variable-length fields. Key_def_dup(prototype, parts)
renders the final object. Yet the basic principle remains the same: the parser passes the fully initialised object to key_def_dup
and the later finalises it.
key_def
Managing memory for prototype The parser must allocated and manage memory for variable length fields somehow.
While having large fixed-size buffers in long lived objects doesn't sound like a good idea, we believe such buffers are ok for short lived objects as the prototype key_def
. I.e:
struct key_def prototype;
char name[BOX_NAME_MAX];
/* Set name */
prototype.name = name;
strncpy(name, BOX_NAME_MAX, "bandersnatch");
It may be convenient to pack these transient buffers in a structure to make it easy to pass around when existing code structure forces us to (ex: space_def
in alter_space
object).
opts_create
/opts_encode
Killing Variable-length fields doesn't agree with the options parser (without over-complicating the interface).
A data-driven parser and serialiser sounds like a good idea. However, it's serialising capacity is ever used in key_def_tuple_update_lsn()
. The later could be implemented differently by e.g. locating lsn
field, moving the tail to adjust for a changed LSN size and patching in the new value.
We suggest writing parser routines explicitly, some convenience functions to iterate fields in MsgPack are needed.
Added bonus: we may actually make key_opts
a union, since most attributes only make sense for a single index type.