Design decisions - IS4Code/Sona GitHub Wiki

There are several decisions made during the shaping of the language that merit further elaboration. This page serves as an overview of those decisions, together with explanation.

Keywords or special symbols?

Problem: In which locations should keywords be used, as opposed to special characters or operator-like constructs?

Decision: Lua-like style of keywords vs. operators shall be retained (and extended), maintaining the same underlying principles, i.e. to use keywords in control structures, to delimit different parts of the program, or to indicate special behaviour, as opposed to symbols used for operations and being limited to expressions.

Rationale: In a lightweight scripting language, code needs to be self-explanatory to be well-understood by beginners, who often learn by examples and discover the language as they go. When a piece of code serves as a starting point to discover a feature not found in languages people are familiar with, such a feature should be clearly identified in code to be looked up by, and in that case, a keyword is the better option. Compare with F# having over 10 kinds of "parentheses", mostly without any hints to give off their meaning ((|…|), [|…|], {|…|}...), and in some situations with different meaning depending on context (e.g. both computation expressions and record expressions use {…}).

Comments

Problem: What style of comments should be used? Lua uses a style of comments not found in common programming languages, with a bit unwieldy but consistent syntax of --[[ (or --[=[, --[==[ and so on) for block comments. This combines line comments with the long brackets used for strings, having the benefit of supporting nested comments this way. This is something F# also supports, but in a different way ‒ block comments in F# ((*…*)) recognize a limited number of normal tokens, including strings and other block comments.

Decision: The // and /*…*/ syntax found in C# and other C-like languages is borrowed.

Rationale: While both features sound useful in theory, they are not that commonly utilized, and also have some disadvantages ‒ block comments in Lua require special parsing rules not expressible in some grammar languages, while F# comments in a sense must contain valid F# code, making them cumbersome for use in other situations.

Newlines in strings

Problem: What does a newline in a normal string mean? In F#, both normal and verbatim strings may freely contain any newline characters with no special handling, while Lua prohibits an end-of-line sequence in normal strings, and normalizes them to \n in bracketed strings.

Decision: Verbatim strings shall interpret newline characters (\r and \n) literally, as any normal string character, like in F# and C#. In normal strings, however, a newline sequence shall be interpreted as a line terminator in the compiler's environment, using whichever character sequence Environment.NewLine reports.

Rationale: In the traditional view, text files consist of a sequence of lines, each terminated by a newline character, which is arguably one of the reasons multi-line strings are commonly rejected, as parsers would need to split a single token between multiple lines. Modern parsers, however, do not read input files by lines, but by characters. Moreover, when there are two styles of strings already, as is common in .NET languages, it is beneficial to ascribe different meaning to both kinds of strings in a way that is consistent with their purpose. Due to the interpreted nature of the language, the compiler's environment is usually the same as the target environment, and thus normalizing the line endings improves consistency of behaviour of code coming from sources with improper normalization.

Statements as expressions and expressions as statements

Problem: To what degree should expressions and statements be interchangeable?

Decision: Arbitrary expressions shall be usable as statements only when enclosed in parentheses, with the exception of member expressions (those that are a part of a function call or assignment). Statements shall be usable in place of expressions only when the parent expression establishes a new "execution context", and the statement encloses one or many blocks, i.e. control statements after inline or inside sequence expressions. Simple assignment is allowed as an expression only in parentheses.

Rationale: In dynamically-typed languages like Lua, sole expressions are generally an error when used in a statement position, as they may have no side effects to be observed, and thus could likely be a mistake. In F#, however, the type system detects when an expression has an unobserved result, and thus prevents these mistakes. Nevertheless, there should not be an overlap in the syntax of expressions and statements anyway, as it may lead to unexpected look-aheads (e.g. a function declaration could also be used as an expression), thus, in situations when F#-style operators are used, they should be wrapped in parentheses. A simple name must be usable as a statement to allow custom operations in computations. Arbitrary statements in unexpected places are generally not much usable and result in overly complicated code, plus statements like return, break, and continue would require deconstructing the whole expressions.

Ignoring expression results

Problem: How should an expression with an unobserved result be handled?

Decision: In contexts where this applies, an unobserved expression is treated as an error. To suppress the error and explicitly ignore the result, the expression must be followed by !.

Rationale: Currently, this situation may happen only in a member statement, i.e. a statement formed from a simple name or (…)-enclosed expression, optionally followed by member access, function calls, or indexing, where the expression has an observable value (non-unit). There is no need to syntactically restrict which expressions may appear as a sole statement (unlike in Lua, see above), and so, to further simplify the syntax, all of them can be treated as a single case. Nevertheless, guarding against unobserved results is still important, especially in the context of F# (where statements like printf("%s = %d"; "A") can cause hard-to-detect bugs), and so the warning that F# normally gives is promoted to an error. This is balanced by an extremely easy and intuitive way of indicating that discarding the result is intended.

Currying or tuples for calls

Problem: Should function calls correspond to the currying (f 1 2 3) or tuple (f(1, 2, 3)) form? In F#, the curried form is more commonly used for functions to improve composition, while the tuple form is used for methods.

Decision: All of function calls, declarations, expressions, and types shall use the tuple form for ,-separated arguments, and currying for ;-separated arguments. The echo command shall use the curried form for ,-separated arguments, and does not allow ;.

Rationale: It is not possible to distinguish a function call from a method call in syntax only, hence the adopted argument form must be consistent for both. Since .NET methods are exposed using the tuple form, adopting it as the default is better. The curried form is already expressible this way, as f()()(), but, since ; is commonly used in programming languages in locations where empty elements are allowed, using it as a syntactic sugar like f(;;) is desirable for calling idiomatic F# functions (and is far less surprising than f(,,)). The echo command is usually implemented using printf and similar functions, where tuple arguments are not used.

Collection construction

Problem: What syntax should be reserved for the construction of lists, arrays, or arbitrary sequences?

Decision: Distinct syntax shall be used only for the two basic .NET types ‒ arrays ([…]) and enumerables ({…}, resembling that of records). {} shall denote an empty sequence.

Rationale: The list is the go-to collection type in F#, used for aggregating values of arbitrary quantity. However, it does not interact well with the .NET ecosystem as a whole, and is less performant than arrays. For the sake of simplicity, list should be relegated to a second-class citizen, freeing up the […] syntax to be used for arrays, which benefit from optimized construction more than list does. Thanks to the elision of parentheses in simple function calls, List.ofSeq{…} can be used to construct a list explicitly, maintaining a readable syntax even for custom collection types (the implementation of List.ofSeq uses optimized construction, so the only incurred cost is that of enumeration).

Short function call syntax

Problem: In Lua, a function can be called without parentheses if the argument is a literal string or a table constructor. To what extent should this syntax be retained?

Decision: Parentheses shall be omittable only when the argument is a sequence, record, or a string/character literal, and only when following an identifier.

Rationale: The syntax is beneficial for calling "constructor functions" like utf8"xyz" or list{1, 2, 3} without parentheses, however, it could also be confusing ‒ Lua allows constructs like f{}{}{} as a chain of calls, but there are situations where this syntax could be ambiguous, such as f(){X = y} = z. Because curried calls can already be specified in another way (f({}; {}; {})), it is better to limit this syntax only to places where a function call can be clearly identified. For consistency, this omission should be enabled only in situations when the argument is delimited by special characters; parentheses should be used in all other situations.

Syntax of types

Problem: What syntax to pick to express compound types? In F#, the types of expressions are usually written in a very different way from the expressions that produce them. For example, fun (x, y) z -> () creates a value of type _ * _ -> _ -> unit.

Decision: To keep the syntax clearer, the types of functions, tuples, and similar shall be denoted in a way that matches the corresponding expressions, for example the type of (0, 1) is (int, int), the type of function(x as int) as int is function(int) as int, and so on.

Computation expressions

Problem: How should F# computation expressions be exposed, in a way that is convenient yet general enough to be usable for async, task, query, option, and other commonly used workflows?

Bind

Problem: How to expose the monadic "bind" operation, which is denoted using ! in F# and represents a transition from a wrapped type to the unwrapped type?

Decision: The operation shall be denoted using the follow keyword, acting as a pseudo-operator on the expression used in statements that allow it. For example, let a = follow b translates to let! a = b in F#. A single follow statement translates to do!.

Rationale: In line with the principles of the language, this operation needs to be indicated by a keyword, to be easily recognizable and searchable. While such a keyword could modify the statement itself, using it alongside the expression makes it clear that it has a different type than the corresponding storage location, easily explainable to people coming from languages with await as a generalized version thereof, and potentially usable even in sub-expressions in case F# decides to support it.
The keyword should be chosen in a way that best depicts the operation, and is reasonably accurate. While other languages have operations like await or unwrap, these put too much focus on only one aspect of the underlying operation. follow has the benefit of being semantically close to await but in a spatial dimension rather than temporal, while hinting at the continuation passing that happens under the hood.

Blocks and expressions

Problem: F# only has computation expressions because F# does not have statements. In a language with statements and blocks, how should a computation be expressed in a way that is convenient and applicable in all situations?

Decision: A computation shall be entered using the with statement, taking an expression that evaluates to the computation builder. All statements that follow the with statement in the same block are executed as a part of the computation. The statement itself (including its trailing statements) is returning (meaning it produces a definite value that is used as a result of the current function), but with measures that prevent mistakes: either the trailing statements must be terminating or returning (thus there is no open path), or the with statement itself must not be a part of a conditionally returning statement (thus an open path ends immediately with the end of the function). These restrictions can be lifted by using follow with instead, which has the same syntactic category as its trailing statements (thus it allows even break and continue), however, the code ensures that the computation block's effects are observed (by translating to do! in an outer computation, and by ensuring it returns the unwrapped type outside a computation).

Rationale: The syntax needs to be sufficiently versatile, yet not too verbose, to be usable in the two primary locations: functions and inline do. Unlike in C#, async is not a keyword in F#, and the syntax has to reflect this fact by supporting any expression. This can be achieved easily for inline do (as the two keywords can serve as delimiters), yet functions do not offer that many opportunities for additional expressions. For this reason, the placement of the syntax is better to closely follow the order in F# (e.g. let f(): Async<_> = async {), correctly placing the workflow last in the function "definition" as it is merely an implementation detail (and it could also depend on the parameters). With the placement of the syntax identical to the position of the first statement, making it an actual statement solves all the previous issues, and works at the start of all blocks (including inline do).
The actual keyword is chosen to express a change in context akin to the with statement found in other languages which brings an object's members into scope. This is aimed at custom operations that could be used for a similar purpose, and despite colliding with with used in expressions (to modify records), such an issue is unlikely to come up in practice ‒ the expression version of with has priority over the statement when applicable, but that cannot occur at a beginning of a block, and even in other locations, expression with needs {…} to specify the members, which is unlikely to occur in the statement. Lastly, with … is preferable over with … do (unlike the F# syntax) to avoid the awkward situation of always having to write end end in the majority of functions using it (and the same intent can still be expressed using do with …). follow with targets a particular use case of computation expressions that are effectively synchronous with outer code. For example, one may use a computation expression to modify a specific object using custom operations, and wish to continue normal code afterwards. This would not be possible with plain with (since it is returning), but follow inline do with or simple inline do could achieve a similar result. However, if the inner block is supposed to interrupt of return, such a signal could not be delivered to the outer code, thus a separate statement type needs to be introduced to understand this. If the workflow supports it, this statement can perform do! on the unwrapped value (thus supporting seamless nested computations), and even if the outer code is not in a computation, the soundness of this operation can be enforced by ensuring the wrapped type is identical to the unwrapped type (such as by returning a special marker type from the inner code and requiring it on the outside).

`return` and `yield`

Problem: The return and yield keywords can be used in F# to output values from computations. These two operations behave equivalently, but their usage differs by convention. How should these keywords be exposed?

Decision: The return statement is always regarded as returning ("early return"). After with, any point of implicit return (where a unit value would normally be produced) causes return() to be used instead; return is translated to return; return follow is translated to return!. yield is translated to yield; yield.. is translated to yield!; yield follow is translated to let! = … and subsequent yield. yield return is translated to return but is not a returning statement; yield return follow is translated to return!; yield break is treated as a returning statement but without producing return.

Rationale: People coming from imperative languages to F# do not realize that return does not perform early returning, and that it is perfectly possible to have multiple consecutive return statements (if the workflow supports it), therefore, to minimize confusion, return should behave consistently in all locations, including computation expressions. However, sequence-like workflows might not support the return operation, or they might assign special meaning to it, in which case there should be a way to skip generating return at the end of such blocks. These requirements are similar to yielding, thus yield return works well to indicate a yield-style return (i.e. return but with more code after it), while yield break should already be understood by people coming from C# and has usefulness on its own.

Sequences

Problem: How much should computation expressions interact with sequences?

Decision: A sequence-like computation can be created using { with …, which is followed by collection elements as usual (including do). The elements are translated into yield/yield! in the same manner as in normal sequences, including do … end blocks which can be used for explicit yielding. follow is allowed on individual elements and inside blocks, but return is disallowed (only yield return can be used).

Rationale: While this syntax is not strictly necessary due to the general nature of computation expressions allowing inline do with … to achieve a similar result, using the sequence syntax for a sequence-like object better communicates the intent and avoids the problems with implicit return, in addition to offering simplified syntax for yielding the elements. Interestingly, { with … do is essentially the same as F#'s … {, despite being formed by 3 distinct pieces of syntax.

Modules

Problem: In what manner should F# modules be defined in the language?

Decision: The language shall use package as the nomenclature for what are modules in F#.

Rationale: Languages use the term "module" to denote a grouping of related code elements, however, there are differences in concrete aspects of such a grouping. In object-oriented environments like .NET, Java, JavaScript, or C++, "modules" share certain properties: a module tightly groups arbitrary code, isolates members that are internal, only exposes members that are explicitly exported, and declares dependencies on other modules. F# modules do not match these aspects ‒ they do not maintain explicit dependencies, are not isolated from other code in the project, and allow arbitrary extensions to place other members in the module (like in namespaces). Hence, they are not beneficial nor required for writing modular code any more than namespaces, classes, or individual files are. However, unlike namespaces, they offer deterministic sequential (at the file level) initialization of values within them. This is similar to packages in Lua, which are also initialized in a single file and returned as a table. For this reason, and for the similarity with Java's packages (which are also more like namespaces), package aligns more with the meaning and the intended usage of the code element (despite also being a polysemous term, e.g. NuGet packages).

Type keywords

Problem: Which types, if any, should be identifiable by keywords?

Decision: The types for which keywords are available shall be all types that have literals, warrant specialized conversions, or can use pre-existing keywords for that purpose (unit, function, delegate, exception).

Rationale: While F# does not have any keywords that identify types, using keywords has a few advantages: Having special-cased names of core types improves clarity of code, but also allows for flexibility ‒ int and float could have configurable bit size while maintaining the correspondence to literals (so that 0 is always int and 0.0 is always float regardless of the underlying type; likewise, char and string could have configurable encoding), but this also needs to extend to conversions: float("0") should convert the string to whichever type float corresponds to. At that point, all types that could be treated as operators make sense to be included, such as bool, byte, or decimal, with the range of types that are supported as literals being the natural extent of this support (this also has the advantage of allowing the syntax byte 0 to serve as a clearer alternative to 0uy). In addition, shorter syntax like float "0" could be employed, and float? "0" could also be given a meaning (a value of type float? that indicates whether the conversion succeeded). However, there are a few outliers: object does not have any literals (null is of a different type), but it is a reasonably fundamental type, and object x can be ascribed the meaning box(x). unit already needs to be a keyword to define units of measure. void is necessary for interop, pointers (if they are supported at all), and typeof void, but it can also be utilized to mean ignore when used as a conversion, as is used in some languages. Lastly, function, delegate, and exception are primarily used for defining members of the corresponding kinds, but they alone could be utilized for identifying the corresponding types, as well as conversions.

Built-in text outputting statement

Problem: Should there be a built-in statement for outputting text from the program, and if so, in what form?

Decision: echo shall be the standard statement to use for outputting any value from the application.

Rationale: While this is a strictly additive feature that many languages, including F# and C#, can do without, having a built-in statement for this purpose might be beneficial for a few reasons:

It offers syntax shortcuts that wouldn't be possible otherwise; for example echo a instead of echo(a).
It supports transforming the arguments in a non-trivial fashion: echo a, b, c does not have to be printfn(a, b, c).
It improves decoupling. Code that internally uses printfn can switch to Console.WriteLine or a custom logger without impacting the bulk of the code.

Keyword

Problem: What keyword should be chosen for the statement? Both F# and Lua use "print" for this operation.

Decision: The keyword shall be strictly echo.

Rationale: Realistically, print and echo are the only two choices for the keyword, since anything else (such as output) is too risky to turn into a keyword. Even though many languages use print for this purpose, it is generally a function and not a keyword, and F# already uses that name as a component in the names of several functions. For this reason, echo is preferable, since it tends to be a keyword in languages that use it (PHP and shell languages), and the name is also more general in nature.

Newlines

Problem: Should echo append a newline or not?

Decision: By default, echo shall be implemented through printfn, appending a newline at the end. However, it shall be configurable so that this behaviour can be changed.

Rationale: Languages generally disagree on how "clever" the default outputting function should be: print in Lua appends a newline while io.write does not; echo in PHP does not but it does in Batch, Shell, PowerShell, etc. Given echo is only a single statement and not two, it should also be taken into account how easily it is to make a mistake when a newline is actually intended: echo "X\n" is incorrect, because it doesn't take the platform into account; echo "X", Environment.NewLine is better, but the output writer may actually be configured to use a different line ending sequence. To respect the writer's preference and to enforce good habits, echo should append a newline by default, since the opposite is trivial to achieve manually.

Formatting

Problem: Should echo perform any formatting on its own?

Decision: echo shall not do formatting by default. Regardless of which function echo is routed to, it attempts to treat the input as a single string.

Rationale: Even though the target function is configurable, the API should remain consistent ‒ there should not be any difference in the interpretation of echo a, b, c across different implementations, why is exactly the reason such a statement could be beneficial. Even the possibility to use C-style format specifiers has limited uses ‒ they are generally eliminated in other places (such as interpolated strings) and for them to occur implicitly here would be unusual. The user is still free to call a formatting function prior to using echo, or a new syntax like %"%d + %d = %d"(2, 3, 5) could be adopted, but in either case such an operation should be explicit.

A possible concern is performance ‒ being able to perform formatting whilst writing to an output is beneficial, however F# has got us covered there too: an interpolated string used for a printf-style function is translated directly to its argument (like it happens for FormattableString), without creating an intermediate string. This makes echo a, b, c being translated to echo $"{a}{b}{c}" a viable transformation, since it works for all string-accepting functions as well as the default printf-style functions.

Configurability

Problem: Should the user be able to change the function used for echo?

Decision: echo shall be configurable through #pragma echo and #echo to set the outputting function, which, by default, shall be printfn.

Rationale: For the aforementioned reasons, echo should append a newline by default, which however takes away from the convenience when newlines are not desirable. In addition, the user may wish to choose a different output function. There are two ways to enable this choice: #pragma echo works everywhere but can pick the function only from ExtraTopLevelOperators (printf, printfn, eprintf, eprintfn), while #echo only is usable only as a statement, but supports any arbitrary function value.