Serialization format for template references - intel/device-modeling-language GitHub Wiki
In the context of saved
variables with template types: Given an object a[1].b.c[2][3]
, we need to decide how to store that reference in a checkpoint.
Our discussion included three alternatives. Given an object a[1].b.c[2][3]
- Simple string:
- printf-style format string plus indices:
- Full structure of name components and indices:
We want a format that is easy to understand for a human reader; in particular, when you inspect a checkpoint, you should be able to guess that it's a reference. We also want a format that is easy to manipulate by a script, such as a checkpoint updater.
1 is the most human readable form, 3 is the least human readable. 3 is the most script-friendly format, 1 is the least script-friendly.
We wrote Python expressions to translate between the three formats; here 3 was a clear winner. For instance:
- 1->3:
[part.split('[', 1)[0], [int(i) for i in re.findall(r'\[(\d)\]', part)](/intel/device-modeling-language/wiki/part.split('[',-1)[0],-[int(i)-for-i-in-re.findall(r'\[(\d)\]',-part)) for part in x.split('.')]
- 2->3:
list(zip((part.split('[',1)[0] for part in x[0].split('.')), (x[1][a:b] for (a,b) in itertools.pairwise(itertools.accumulate([0] + [part.count('[%u]') for part in x[0].split('.')])))))
- 3->2:
['.'.join(s+'[%u]'*len(indices) for (s, indices) in x), [i for (s, indices) in x for i in indices]]
However, when we look closer at the kind of operations you want to do, it's typically to identify references to a particular object, and maybe rename that object, and perhaps split a one-dimensional array into two arrays. For instance, transform a[i<4].b.*
into a[i<2][j<2].d.*
. Here,
- resort to regexps
refs = ["a[%u].d" + name[7:], [indices[0] // 2, indices[0] % 2, indices[1:](/intel/device-modeling-language/wiki/"a[%u].d"-+-name[7:],-[indices[0]-//-2,-indices[0]-%-2,-indices[1:) if name[:7].rstrip('.') == 'a[%u].b' else [name, indices] for [name, indices] in refs]
refs = [ref[0][0], [ref[0][1] // 2, ref[0][1] % 2](/intel/device-modeling-language/wiki/ref[0][0],-[ref[0][1]-//-2,-ref[0][1]-%-2), ["d"], ref[2:] if ref[0][0] == 'a' and ref[1][0] == 'b' else ref for ref in refs]
Here, 1 is much worse, but there is no clear winner between 2 and 3. The added structure of indices in 3 doesn't really help since the typically knows this statically for a given transformation, and direct slicing of strings in 2 is rather convenient and readable. But 2 also has a pitfall; it's easy to accidentally include an object named a[%u].bb
The conclusion is that we go for alternative 2: compared to 3 it's significantly easier to read for a human, without any big practical disadvantages in scripting; compared to 1 it is much friendlier for scripting and still acceptably human readable.