Syntax - IS4Code/Sona GitHub Wiki

The syntax is governed by several basic principles, inspired in majority by Lua:

Small vocabulary and keyword preference

The collection of keywords and operators is kept small, to make all programs recognizable and not burdened with niche special syntax. Instead, the recognized tokens may be placed together to form compounds, like inline function, or may have different meanings depending on context. Like in Lua, control statements prefer keywords over parentheses, and all blocks are terminated by end; there is no short syntax to use a single statement instead of a block.

The lack of explicitness in operators sometimes needs to be balanced by explicit type declarations, for example the operator .. is used to concatenate both strings and sequences, but also to form ranges of numbers, so in contexts where multiple options are applicable, the type must be given explicitly to force to correct interpretation.

Whitespace is irrelevant

Aside from a few exceptions, the syntax does not distinguish between whitespace or depend on it for purposes other than separating tokens. It is very well possible to write a program or parts of it in a single line, without affecting any of the semantics.

The exceptions where whitespace is significant are:

Newlines in strings and verbatim strings differ. Verbatim strings use the exact sequence used in the file, usually one used by the platform where it was produced, while normal strings interpret it as the newline sequence used by the compiler's environment. If this behaviour is not desirable, the newline must be expressed as \r\n, \n, or Environment.NewLine. Note that all of these options yield different results ‒ \r\n/\n force a particular line ending sequence (Windows/network or Linux), while Environment.NewLine uses the target environment's sequence (which may be different from the compiler and source).
Directives operate in a mode where all whitespace is significant ‒ newline is used to end the directive, while other pieces of whitespace (spaces, tabs, or form feed) are used to separate arguments within a directive. This may be avoided by using parentheses, which switch to the normal parsing mode ‒ for example, #item attr 1 +2 is interpreted as having 2 parameters, while #item attr (1 +2) does not treat the space as a separator.
Line comments (//) terminate at the end of the line. The newline itself is not a part of the comment.

Semicolons are optional

Each statement may be terminated by exactly one semicolon, but there is generally no need for that unless in very specific scenarios:

Like in Lua, there is a situation where parentheses could be interpreted as a function call:
```
f()
(x).g()
```
Here, f()(x) is the preferred interpretation per the language rules, treating the result of calling f as a function. This situation does not pose as big of a risk as in Lua, since F# prohibits a type-unsafe call, but to prevent the syntax from being interpreted this way, ; must be used.
The semicolon is also meaningful when multiple functions are declared in one place:
```
function f()
  g()
end
function g()
  h()
end
function h()
  f()
end
```
This group is treated as a single statement, declaring all functions at once and thus allowing one to refer to functions below it. Placing ; between the functions separates them from each other, making the later-defined functions no longer callable from earlier-defined ones.
Some other locations also allow an optional semicolon, despite not separating two statements. For example, in function() as int; return 0 end or (as class; 1, 2, 3), it is used to contents of the tuple or function from its type.

Reading syntax in this documentation

Syntax is expressed as pieces of grammar utilizing the ANTLR language. These illustrations can be used to understand the general structure of the syntax, however, they are merely a simplified form of the real syntax and do not accurately reflect the behaviour of the parser (see below for the list of differences).

Here is an example grammar to show the syntax conventions:

statement:
  (
    'echo' expression (',' expression)* |
    'exit'
  ) ';'?;

expression:
  identifier |
  expression [-+*/] expression |
  [-+] expression |
  '(' expression ')';

This grammar declares two rules, statement and expression, defined by a sequence of subrules. Parentheses are used to group together subrules, the | character specifies alternatives within a grouping, the * character makes the preceding rule repeating and optional, and the ? character makes the preceding subrule optional. In this case, the intended interpretation is that statement is formed either by echo followed by one or more expressions (separated by ,), or exit alone, and both may optionally be followed by a semicolon. The expression rule is recursive, defined either as a plain identifier (an external rule not shown here), two expressions separated by an operator (one of +, -, *, /), an expression preceded by + or -, or an expression wrapped in parentheses.

There are several key differences from real ANTLR:

There is no distinction between parser rules (matching tokens) and lexer rules (matching characters).
Comments, whitespace, and other pieces of lexer-specific syntax are generally ignored, unless specified in text.
There is no notion of a parse tree; the grammar only serves to illustrate valid structures, but not how the parser interprets them. This means the following:
- The order of rules and alternatives is not important. In case of ambiguity, the most "specific" rule or alternative applies.
- There is no prescribed order in which recursion is applied. If necessary, operator priority is specified in text.
- Non-greedy matches (??, *?, +?) are not indicated. The most sensible matching mode is usually used.
Error and other auxiliary rules are not included.