Syntax Design - Spicery/Nutmeg GitHub Wiki

Outline of Concepts

  • We can parse a standalone Nutmeg source file without reference to any other files.
    • Every construct can be unambiguously identified without any type information.
  • We can resolve a standalone Nutmeg source file without reference to any other files.
    • Every identifier in the file can be classified as local|captured|global.
    • Every non-global identifier in the file can be labelled with an id that is unique within the largest enclosing local scope.
  • As a basic guide we tend to borrow syntax from our key influences, which are Pop-11, Python, and Java.
    • We aim to only invent syntax when there's no good precursor syntax in our influencing sources.
  • There are two main syntactic categories Expressions and Statements.
  • Statements and expressions can be mutually nested, so there is no deep division between the two categories.
  • We prefer the 'outfix' syntax of Pop-11 for 'bigger' syntax such as conditionals, loops etc.
    • Because these are expressions in Nutmeg, outfix-design helps avoids unpleasant syntactic ambiguity.
    • It also helps establish good layout for learners.
  • We do not need compound-statements ('braces').
    • Because all contexts that support statements allow for a series of statements.
    • If you need to convert a statement into an expression, you can use a let form.
  • We do not require a statement terminator (semi-colon) but allow newlines to play that role.
    • A newline implies termination when a semi-colon would be legal and
    • The next token cannot not be a continuation.
  • Terse alternatives
    • With interactive scripting in mind we allow the verbose outfix syntax to be shortened
    • We allow end as an all-purpose closer e.g. end instead of endswitch
    • We allow : as an all-purpose alternative to then or do
    • And we also allow : to appear after any outfix keyword e.g. else as well as then and do
  • Function and method calling syntax unified
    • The following are identical expressions: f(x, y), x.f(y), (x,y).f, x .f y
    • The . syntax makes the next identifier infix or postfix

About 'Outfix' Syntax

The phrase 'outfix' syntax here is used to describe syntactic features that start and end with distinctive keywords. In the below example, if starts the expression and finishes with endif. Internal expressions and statements are not allowed to be adjacent but are always separated by reserved words. In this case the two parts are separated by the word then.

if EXPRESSION then STATEMENTS endif

Outfix syntax makes expressions easy to read, layout and provides syntactic redundancy that makes error checking and reporting easier. The main downside is that it requires a bit more typing.

About Function/Method Call Syntax

In languages such as C# or Python, calling a function f(x, y) is syntactically distinct from calling a method x.f( y ). This arises from the fact that methods are, confusingly, second-class citizens that have no independent identity away from an object to which they might be applied. Methods have their own name space and cannot be abstracted over in the ordinary way (double abstraction is required).

In Nutmeg, this confusion is eliminated. Methods are first class values that can be passed as parameters, embedded into structures, and abstracted over in the ordinary way - exactly the same way as functions/procedures. This reification of methods is part and parcel of Nutmeg's core mission, to maximise cross-over of functional and procedural programming techniques. (Prior examples of this can be found in the Common Lisp Object System and Pop-11's ObjectClass library.) Consequently there is no value in retaining the rigid distinction between function and method calling.

However the techniques of Object-Oriented programming are as relevant to Nutmeg as they are for any other programming language. And since programmers are very used to the infix method call to signal the subject of a method, the syntax is retained in the most straightforward way. The . keyword indicates that the next token is to be treated as a tightly binding infix or postfix operator. Why postfix as well? Because we additionally wish to dissolve the superfluous distinction between methods and properties, which ultimately arise because of the non-reified method semantics, and at the same time support fluent (i.e. postfix) programming.

The parser must decide between the infix interpretation versus postfix interpretation by looking at the next symbol. If the next symbol is compatible with an infix interpretation then it is obliged to use the infix interpretation, otherwise it must use the postfix interpretation. (By looking only at the next symbol we make it easy for programmers to figure out what is going on.)

For example, the expression y := x.f.g.h; resolves .f, .g and .h as postfix. By contrast, y := x.f().g().h() resolves the same symbols as postfix. Furthermore the two expressions will parse to exactly the same code-tree.