Syntax extensions - vilinski/nemerle GitHub Wiki

Nemerle has builtin syntax extension capabilities. They are limited to some fixed elements of language grammar, but their ability to defer parsing of entire fragments of token stream makes them quite powerful and usable.

Note: Remember that in order to compile anything that has a macro in it, you must use ncc -r Nemerle.Compiler.dll ''foo.n'' or similar.

Table of Contents

Expression level extensions

We have mainly focused on the ability of extending the basic syntactic entity in Nemerle, which is the expression.

The new rules of parsing are triggered by the set of user definable distinguished keywords and operators. When the parser encounters one of those at the valid position of expression beginning, then it executes a special parsing procedure for syntax extension related to the distinguished token.

All syntax extensions are specified by macro definitions. Each macro can optionally have syntax definition, which describes how given macro should be called when the syntax occurs in program. For example

macro while_macro (cond, body) 
syntax ("while", "(", cond, ")", body) {
  <[ 
    def loop () {
      when ($cond) { 
        $body; 
        loop () 
      }
    }
    loop ()
  ]>
}

creates a macro introducing while loop construct and syntax to the language.

Raw token extensions

Nemerle has a very powerful method for introducing virtually arbitrary syntax into language. It allows for specifying that given part of program input will not be interpreted by main Nemerle parser, but will be passed in easy to use representation to some macro.

The parentheses tree concept

The parser and all syntax extensions operate on token streams created by the lexer and the so-called pre-parse stage. The lexing phase simply transforms program text into a stream of tokens (like identifiers, numbers, operators, distinguished keywords, etc.).

The next phase groups this stream into a tree of parentheses. We have distinguished four types of them ({} () [] <[]>). Tokens inside those parentheses are also divided into groups separated by special separator tokens. For example the following program fragment

fun f (x : string) {
  def y = System.Int32.Parse (x);
  y + 1
}

is after the pre-parse stage represented as the token tree

   '''[''' fun '','' f '','' '''(''' '''[''' x '','' ':' '','' string ''']''' ''')''' '','' '''{'''
     '''[''' def '','' y '','' '=' '','' System '','' '.' '','' Int32 '','' '.' '','' Parse '','' '''(''' '''[''' x ''']''' ''')''' ''']''' '',''
     '''[''' y '','' '+' '','' 1 ''']'''
   '''}''' ''']'''

where matched parentheses groups are distinguished with () {} [] and their elements are separated with ,. Note that groups like () and {} contain tokens enclosed by [], which represents loose token groups - divisions of tokens split by separators (, for () [] and ; for {} <[]>).

Parentheses tokens

So, according to the description above we have following kinds of special tokens, which represents whole fragments of unparsed code:

  • Token.BracesGroup - for { }
  • Token.RoundGroup - for ( )
  • Token.SquareGroup - for []
  • Token.QuoteGroup - for <[]>, used in macro code quotation
  • Token.LooseGroup - list of tokens grouped inside one of above brackets and separated by separator token specific for each of bracket kinds
All the available tokens produced by lexer can be viewed here

Passing token groups to the macro

Those raw grouping tokens can passed as a parameter of macro. We simply have to name it when specifying macro parameter:

macro BuildXml (group : Token) 
syntax ("xml", group)
{
  ...
}

in code, where such a macro was imported we can use the new syntax:

foo () : void {
  def doc = xml (<node name="foo">My name is foo</node>);
  // macro produced some XmlNode for us, we can use it
  print (doc.InnerXml); 
}

Inside such macro we can use our own specialized parser. For example some small domain specific language can be embedded easily inside Nemerle program provided a simple syntax extension.

⚠️ **GitHub.com Fallback** ⚠️