Roadmap for literals and initializers - intel/device-modeling-language GitHub Wiki

On this page, we sketch how the syntax for literals and initializers can evolve over time. Originates from a design discussion in Dec 2021.

Parse method args as initializers

Initializers are a compact generalization of expressions, which is possible because the type of an initializer is known in advance. And we know method argument types in advance, so it should be OK to write:

method m(bytes_t data) { ... }
...
m({len, data});

We already have that assignment source is an initializer, so this would practically eliminate the need to add a syntax for struct literals.

Param values as initializers

Perhaps we can permit the syntax param list = {1, 2, 3}; for typed parameters.

Cast operand as initializer (compound literals)

cast({.x=3, .y=5}, some_struct_t)

C compound literals are the intuitive implementation of this -- however, they are dangerous as such a literal can appear in a DMLC-generated block that does not correspond to the block in DML in which the cast is used, which would give it unexpectedly short lifetime. This is especially dangerous with DML expressions translated to statement expressions. The probable best solution to this is to add support for 1. expressions corresponding to multiple C statements; and 2. add support for those statements to be inserted at points not corresponding to the expression. This would allow us to codegen a DML compound literal by declaring an (non user-visible) variable for it at the beginning of the C block corresponding to the DML block, and using that variable to represent the compound literal.

Using alloca to codegen the allocation for compound literals as a cheap means of side-stepping this problem is not an option as that would blow the stack if e.g. a compound literal is used within a loop.

One option is to control the lifespan of compound literals using malloc/free when needed, but this is expensive, and correct insertion of free calls probably requires some new compiler mechanics.

Another option is to initially forbid uses of compound literals that require a lifespan, such as taking the address of a compound literal. Perhaps sufficient to reduce compound literals to rvalues, and forbid TArray as compound literal. This restriction can be lifted with improved compiler infrastructure (maybe if we go for a LLVM back-end).

Throwing method calls

With the addition of tuples as run-time values, the only thing that would prevent arbitrary method calls from being used as expressions would be throwing methods. This could be resolved through statement expressions or through support of expressions corresponding to multiple C statements.

Common type

The type of the expression a ? b : c is the common type of b and c. C has an extremely complex definition of this, today DML has a simpler but somewhat vague definition. We want to make this definition clearer; in particular we need a definition that guarantees that "common type" is an associative binary operator on types ("common type" would be a partial function, whose value would behave as the LUB of a semilattice). The idea is that this allows us to infer a base type from a list literal, e.g. given uint64 x, int64 y, the common type of 1, x, y is uint64, so the list literal [1,x,y] would evaluate to a list of uint64.

List initializers

It would make sense to permit the [...] syntax also for array initializers. We can eventually deprecate the {...} syntax for array initializers. List initializers could also be used for other list-like types, like the planned vector types. Together with eventually permitting initializers in new places, this means that methods with array parameters can be called like m([3,4]), and irregular register offsets can be expressed as register r @ cast([20,10,30], int[3])[1].

List literals

The list expression syntax [x,y,z] is similar to #? in that its subexpressions can be discarded early in compile-time. One can argue that we would improve readability if we change syntax to #[x,y,x], to emphasize this compile-time property. This would allow the syntax [x,y,z] as a shorthand for cast([x,y,z], <common type of x, y and z>).

List literals

New proposed syntax [x,y,z] for list literals. This is an expression, that allows indexing; [x,y,z][i] is allowed even for non-constant i. The most important use case is register r[i<4] @ [1, 2, 3, 5][i]. The type of [x,y,z][i] would be the common type of x, y and z, which has funny effect in some corners ([1, 2, 3.0][1] evaluates to 2.0, and [-1, x][0] < 0 evaluates to false iff typeof(x) is uint64). A list literal will not be a value by itself; it's only accessible through direct indexing or as the iterable of a foreach/select statement.

A list literal is constant if all values are constant. Indexing a constant list literal with a constant evaluates to a constant value.

The reason why [1,2,3] is not considered a value, is that it has the same lifespan problems as compound literals. If we overcome this problem, then we can make [1,2,3] a value, but it's unclear how useful that would be when we support [1,2,3] as syntax for array initializers.

Migration considerations

Changing [] to #[] would be problematic for compatibility: Today there is [] lists that are heterogeneous or contain non-values; this is not valid for a list literal.

Initially we can fill this gap with irregular semantics, e.g. [a,b] is evaluated to a list literal if a and b are values that have a common type, and is re-interpreted as #[a,b] otherwise. We can slowly deprecate the latter case. We can permit compile-time lists with non-constant values only with #[] syntax.

Non-constant elements in list expressions

We should add support for non-constant elements, such as [1,2,x]. If we split the syntax into list literals [] and compile-time lists #[], then non-constant elements should be allowed in both, but then we don't need it for the legacy support of [] syntax for compile-time lists.

It's really easy to add support for non-constant elements in list expressions, but before we do that we should decide whether list syntax should be changed to #[].

Tuple types, literals and initializers

  • The tuple literal (1, 2) is a value of type (int64, int64), which is a tuple type.
  • An initializer can have the form (initializer1, initializer2), which is a tuple initializer. This requires that we either have a single target of matching tuple type, or a tuple of targets matching initializer1 and initializer2.
  • Tuple values are structurally typed.

Tuple deconstruction

The syntaxes local (int a, int b) = some_tuple; and (a, b, c) = some_tuple; are both permitted to deconstruct tuple values. The latter syntax does not permit assignment chains. some_tuple may either be an expression of tuple type, or a tuple initializer.

Dictionary literals and initializers

[1: 2, 3: 4] or ["foo": 5, "bar": 6].

The key type is possibly restricted to strings and uint64. The dictionary's value type is the common type of the types of values.

Empty list literal

[] as an initializer works fine both for arrays and dictionaries.

[] as an expression is not allowed initially. We could permit it, and let it represent a special value that supports the operations of dictionaries and lists, producing the same result as an empty dictionary or list would produce (i.e., indexing disallowed, and iteration ends immediately).

Grammar

The overloaded nature of []/() as syntax for literals, initializers, and -- in the case of () -- deconstruction patterns requires the grammar to be tailored accordingly. The following are pseudorules demonstrating how the grammar can be written to accommodate this.

// All normal expressions
expression_except_collection_literal <- ...

expression_except_tuple_literal <- expression_except_collection_literal | list_literal | dict_literal

expression <- expression_except_tuple_literal | tuple_literal

// Rules for assignment chains
assign_chain <- expression_except_tuple_literal assign_chain
assign_chain <- expression_except_tuple_literal EQUALS initializer

// Assignment not using tuple deconstruction
assign_stmt <- assign_chain

// Assignment using tuple deconstruction
assign_stmt <- tuple_literal EQUALS initializer

// scalar initializer
initializer <- expression_except_collection_literal

// other forms of initializers
initializer <- tuple_initializer | list_initializer | dict_initializer | struct_initializer