Nemerle language (part 3) - vilinski/nemerle GitHub Wiki

Nemerle language – part 3

Author: Chistyakov Vladislav Yuryevich
Original: http://www.rsdn.ru/article/Nemerle/TheNemerleLanuage-prt-3.xml
Translator: Peter Nikouline

The console output of lexical parsing result

Nemerle’s Installer (for. Net 3.5) + integration with MS VS 2008
A code to the article

In the previous chapter we described the function “lexer”. The function lexer makes a lexical analysis of text entered into the console. This isn’t a full calculator but it makes sense to test the results of the function’s work. To do that change the readInput function (described in an earlier section):

def readInput()
{
  def inputString = ReadLine();
  
  unless (inputString == "")
  {
    def lexemes = lexer(inputString);
      
    WriteLine("You have entered:");
    
    foreach (lexeme in lexemes)
    {
      | Token.Number(value)  => WriteLine(value);
      | Token.Operator(name) => WriteLine(name);
      | Token.Error          => WriteLine("Input error!");
    }
      
    readInput();
  }
}

The new version of the readInput function transmits a prompt (inputString variable) from the console into lexer function. As described earlier, lexer function parses a string and converts it to the list of tokens. This list is stored in the lexemes variable. Further the foreach cycle are moving through elements of the list.
You already know foreach macro. But this time we use an interesting feature of this macro which I haven’t mentioned yet. The foreach macro is quite not trivial. In particular it allows you to use the short form of pattern matching. Let’s consider the following code from readInput function:

foreach (lexeme in lexemes)
{
      | Token.Number(value)  => WriteLine(value);
      | Token.Operator(name) => WriteLine(name);
      | Token.Error          => WriteLine("Input error!");
}

This is an abbreviated form for:

   foreach (lexeme in lexemes)
    match (lexeme)
    {
        | Token.Number(value)  => WriteLine(value);
        | Token.Operator(name) => WriteLine(name);
        | Token.Error          => WriteLine("Input error!");
    }

Matching with the pattern of variant data types – the “Constructor” pattern

In the above example a body of the match operator contains a pattern matching. Here patterns of the pattern matching are occurrences of variants. In this example the lexeme is a variable of Token type which is the variant data type. The Number, the Operator and the Error are occurrences of the Token variant type and hence they are subtypes. This type of pattern matching can recognize what specific subtype is an object of Token type to which a reference is placed in the lexeme variable. If the lexeme variable indicates the occurrence of Token.Number type then the value will be matched with “Token.Number (value)” pattern. Accordingly we will execute “WriteLine (value);” expression. If the lexeme variable indicates the occurrence of Token.Operator type then the value will be matched with “Token.Operator (name)” pattern. In this case we will execute “WriteLine (name);” expression and so on.
I think that the only confusing point here is an expression like “Number (value)”. This is so-called “Constructor” pattern. This type of pattern describes an instance of an object as a constructor that creates such object. For example, the pattern

| Token.Number(1)

describes an instance of Token.Number variant occurrence. The value of “value” field of the instance is 1. “Token.Number(1)” is a syntax of constructor’s call (it creates an instance of an object of Token.Number type). The constructor has a single parameter with value of 1. Parameters of the constructor correspond to variant’s fields since the constructors are created automatically based on a list of fields.
But what is the meaning of “value” in the “Token.Number (value)” pattern? In fact the patterns can be nested (nested repeatedly!). “value” is the “variable” pattern which is already familiar for us. It is matched with any value. This type of a pattern introduces a new variable and associates it with a value of appropriate (in order) parameter. Therefore the sample Token.Number (value) is mached with any instance of the Token.Number occurrence and introduces “value” variable. The “value” variable contains a value that corresponds to the “value” field of this occurrence. Notice here could be any other type of a pattern (e.g., literal or “tuple” pattern if a tuple would be the field’s type) instead of the "variable"pattern.
Variables introduced in the pattern are visible only within the handlers of the occurrences of the match operator. That is they are visible after the “=>” string until the next pattern or the closing curly parenthesis of the match operator.
You may use any name that starts with a lowercase letter or an underscore (the uppercase letter in the patterns sees as the type name) in “variable” pattern. Nemerle compiler considers uppercase letter as the type’s name in the patterns.
For example, you can rewrite the occurrence of the match operator with the “Token.Number (value)” pattern as follows:

| Token.Number(val) => WriteLine(val);

In this case the value of the “value” field will still be put in the “val” variable. You can use any other name that starts with a lowercase letter or underscore.
Contrary to expectations the “Token.Error” pattern is not an abbreviation for “Token.Error ()”.This kind of a pattern is called “a pattern for type-checking”.
Token.Error pattern doesn’t differ from the “Token.Error ()” pattern as this variant’s occurrence has no fields. But it will differ from the “constructor” pattern for Token.Number or Token.Operator occurrences since it only checks a type of a value passed to the match operator paying no attention to the values of its fields. This kind of a pattern doesn’t introduce any variables.
Let’s return to the readInput function and run a new version of our example. We’ll type “3+4*w” string on the console. The console output will be the following:

3 + 4 * w
        ^
expected number or operator
You have entered:
3
+
4
*
Input error! 

The first line is the user input. The second and third lines are the error messages printed by the lexer function. The remaining lines are the result of work of the readInput function.
So we already know how to read tokens. Now we turn to the parsing and calculation of expressions.

The parsing and calculation of expressions

To begin with we’ll write a simplified version of the calculator. It will evaluate expressions without regard to the priorities of the operators and without an accurate diagnosis of errors. This can be done by writing a very simple function:

def parseAndEval(tokens : list[Token]) : option[int]
{
  def loop(firstValue : int, tokens : list[Token]) : option[int]
  {
    match (tokens)
    {
      | Token.Operator("+") :: Token.Number(secondValue) :: tail => loop(firstValue + secondValue, tail)
      | Token.Operator("-") :: Token.Number(secondValue) :: tail => loop(firstValue - secondValue, tail)
      | Token.Operator("*") :: Token.Number(secondValue) :: tail => loop(firstValue * secondValue, tail)
      | Token.Operator("/") :: Token.Number(0)           :: _    => None()
      | Token.Operator("/") :: Token.Number(secondValue) :: tail => loop(firstValue / secondValue, tail)
      | []                                                       => Some(firstValue)
      | _                                                        => None()
    }
  }
  
  match (tokens)
  {
    | Token.Number(n) :: tail => loop(n, tail)
    | _                       => None()
  }
}

We’ll modify the readInput function to call the parseAndEval function:

def readInput()
{
  def inputString = ReadLine();
  
  unless (inputString == "")
  {
    def lexemes = lexer(inputString);
    
    match (parseAndEval(lexemes))
    {
      | Some(value) => WriteLine(value);
      | None        => WriteLine("Expression error!");
    }
      
    readInput();
  }
}

Optional values – “option [T]” type

First of all pay attention on the return type of the parseAndEval function. This function returns the option [int] type.
The “option” is a generalized (generic) type (i.e., the type that has type parameters) which allows us to describe an optional value. The “option” type is needed when a function can’t always return a value. In our case, if the functions transferred parseAndEval incorrect sequence of tokens, it can not evaluate expression and return the correct result. In our case parseAndEval function can’t evaluate expression and return the correct result if we transfer incorrect sequence of tokens into the parseAndEval function.
The “option” type is the variant data type that is declared in the standard library Nemerle as follows:

variant option[T]
{
  | None
  | Some { val : T; }
}

The occurrence of “option [T].Some” is used when there is a real result that you want to return. The occurrence of “option [T].None” is used in the case when there is nothing to return (we need to show that there is no result).
Because the “option” is an ordinary variant we may instantiate it in the following way:

option[int].Some(1)

or

option[int].None()

Naturally it is possible to use type inference:

option[_].Some(1)
option[_].None()

or:

option.Some(1)
option.None()

as Nemerle can display not only parameter types but the number of type parameters.
A careful reader must have noticed that we used Some (…) and None () instead of the above options in the code of parseAndEval function.
That is the type of “option” isn’t indicated. This is because the type of “option” is implicitly opened (as if we added “using” directive in any file).

Description of the parseAndEval function

The ParseAndEval function consists of a nested loop function and the match operator:

  match (tokens)
  {
    | Token.Number(n) :: tail => loop(n, tail)
    | _                       => None()
  }

The first occurrence of the match operator:

| Token.Number(n) :: tail => loop(n, tail)

recognizes the case where the first token is a number (Token.Number) in the list. This is necessary because the recognition of expressions is recursive (or iterative) operation. At each step of the recursion expression looks as follows:

”computed value"  "operator "  "number"

The second occurrence:

| _                       => None()

describes the case where a list of tokens is empty or doesn’t begin with Token.Number. This is an erroneous situation. It forces us to return None () that means the lack of a result.

The loop subfunction

The loop function parses the expression. It receives two match parameters:

  • firstValue is a value of expression‘s left-hand side (calculated by this point);
  • tokens is the rest of the sequence of tokens that should be parsed.

At each iteration the loop function detects one operator token (Token.Operator) and a number following it (i.e., Token.Number token). Then the loop function makes addition, subtraction, multiplication or division (depending on the value of “name” field in Token.Operator).
There are four occurrences in the match statement:

| Token.Operator("+") :: Token.Number(secondValue) :: tail => loop(firstValue + secondValue, tail)

differing only by the operator (see selection).
Let us examine in more details the model of the pattern applied in these occurrences:

Token.Operator("+") :: Token.Number(secondValue) :: tail

This pattern is interesting because it is a complex nested pattern. It consists of a pattern of list’s constructor. In turn the list consists of two elements and a tail (which contains zero or more elements). Here is a pattern without embedded patterns:

“first element” :: “second element ” :: “end of list”

The end of the list is associated with the tail variable in this pattern. You’ve already seen such using. So we don’t dwell on it. But the first and second elements contain “constructor” nested pattern of variant type. The first element contains the pattern matchig with the operator, for example, “Token.Operator (“+”)”. A value of the “name” field is determined by the string literal. This means that the pattern can be matched only with an instance of Token.Operator type. The “name” field of that instance should contains the specified literal (“+”,“-”,“*” or “/”). The second element must be matched with the “Token.Number (secondValue)” pattern. This means that the second element must be of Token.Number type the “value” field of which may have any value. Moreover the value of the “value” field is associated with the “secondValue”variable. Thus the value of this field of the list’s second element will be available through the “secondValue” variable inside the handler of match operator occurrence (that is after “=>” string).
If you take a closer look it is easy to understand how powerful pattern matching is! Judge for yourself. Using only one pattern we not only recognized that the list consists of at least two elements at this particular time but we were also able to recognize that the first element of the list is the operator of a concrete type and the second element is the number (Token.Number). But that’s not all! In addition we have also tied up local variables with the values of the fields we need for calculations. In this manner we recognized complicated data structure and got access to the necessary values. If this code was written in one of the mainstream programming languages (Java, C# or C++) then we would have to write a lot of non-trivial code instead of a concise and easy to read line.
Occurrence:

| Token.Operator("/") :: Token.Number(0) :: _ => None()

triggered when a user enters an expression containing a division by zero. This is an invalid operation which will crash the program if it’s missed. Therefore you want to process it separately. This occurrence is before the following:

| Token.Operator("/") :: Token.Number(secondValue) :: tail => loop(firstValue / secondValue, tail)

so it does not allow calculation of dividing by zero. The whole expression is incorrect since division by zero is inadmissible. And since the expression is incorrect we can’t return any value. Instead it returns None () that symbolizes the lack of results.
Occurrence:

| [] => Some(firstValue)

triggered if there are not a single unparsed token in the list of tokens. This means that the calculations have been successfully completed and we required returning the results of calculations. Values of calculations are accumulated in firstValue parameter. So all you need to do is to return this value pre-wrapping it in “Some” which symbolizes the success of the calculations.
Finally if neither sample was match then we deal with incorrect sequence of tokens, for example, with two consecutive operators (Token.Operator) or with two numbers (Token.Number). Since all correct situations have been recognized by other patterns we can use “_” to parse all other values. This is what happens in the last occurrence of match operator:

  | _  => None()

The value of “None ()” represents the absence of the correct result again.

Developing the “Calculator” example

As you can see it is easy to create a simple calculator on Nemerle. Let’s run the resulting application and try to test it.
Enter “2 * 3 + 4” expression into console and press Enter. You’ll receive the result:

10

Not bad!
Now we will try to enter “2 * 3 + 4 + error”. Thus we will receive:

2 * 3 + 4 + error
            ^
The number or the operator is expected
Error in expression!

It is quite good also. Our lexer not only has revealed an error but also has specified its place in expression.
Now we will try to enter “2 * 3 + 4 + * 5 – 6 / 7”. As a result our calculator will output the message:

2 * 3 + 4 + * 5 - 6 / 7
Error in expression!

Two operators follow successively after “4” figure. That is incorrect and caused an error. However it is very inconvenient that we should try to discover similar errors. It would be more convenient if our calculator itself would specify a place in the expression where an error resides (as it is done in case of lexical analysis).
Let’s improve our calculator by adding more exact diagnostics of errors.

Improving diagnostic messages of the calculator

To improve diagnostic messages of the calculator we have to place additional information into lexemes. We’ll place the information about on what line‘s position they correspond. We’ll add new StartPos field in the Token type because this field is needed for any type of token.
Here’s how Token will look like with this field:

variant Token
{
  | Number   { value : int; }
  | Operator { name  : string; }
  | Error
  
  public StartPos : int;
}

As you can see this is very similar to declaring fields of occurrences of the variant type. The only difference is that the field is declared inside the curly braces belonging to the description of the variant type.
In addition the field is marked by public modifier (see next section).
The general syntax for declaring the field is as follows:

user_attributes? access_modifier? ”mutable”? Identifier “:” type (“=”initializer expression)?

NOTE
Here and below I will give syntax in the EBNF notation.
We’ll give literals used in expressions directly in quotation marks (“”).
“*” – means that the proceeding expression before it (grammar rule or simply a rule) can be repeated zero or more times.
“+” – means that the proceeding expression before it can be repeated one or more times.
“?” – means that the proceeding expression before it can be repeated zero or one time.
The parentheses form a sub-rule. EBNF-operators can be applied to such sub-rule.

The above construction says that the field may consist of:

  • optional user attributes;
  • optional access modifier;
  • optional “mutable” modifier;
  • compulsory identifier (which specifies field’s name);
  • compulsory colon;
  • compulsory type description that follows a colon;
  • optional initialization expression.

If access modifier is not specified in the description of a type member then a member has private access modifier by default (see the following section about access modifiers). This rule is violated for the fields of variant types and fields of enumerations (we’ll discuss them later). Public access is specified for them by default (if the access modifier is not specified then the access modifier is “public”).
Modifier “mutable” indicates that the field must be mutable. This will allow changing the value of the field at any place where the field can be accessed (which is determined by access modifiers).
An optional initialization expression specifies the initial value of the field. The compiler put this expression in the beginning of constructor’s body making object initialization (see “Constructors” section coming shortly).
Well talk about the user attributes at the end of this paper.

Access modifiers

“public” is the access modifier. Any member of the type or type can have different access modifiers. Access modifiers allow you to prevent unauthorized access to type’s internals or a type itself. With their help a programmer can reduce the number of places from which members of a type will be available. It makes a life of programmers who will maintain and develop code easier in the future.
Nemerle supports the following access modifiers:

  • public – a member of the type is available anywhere;
  • internal – a member of the type is available anywhere in the current assembly (library or executable) but isn’t available in other assemblies;
  • protected – a member of the type is available within the type and the inheritors of this type (we’ll discuss the inheritors towards the end of this section);
  • protected internal – a member of the type is available anywhere in the current assembly and in the inheritors of this type located in other assemblies;
  • private – a member of the type is visible only within the current type.

Thus the “public” modifier of the StartPos field means that the field is available anywhere.

The constructors

Since the field is immutable (not marked as mutable) it can not be changed and its initial value may be set only when creating an object (an object of type Token in our case). We use a special method (called “constructor”) to initialize objects (including the variant types). The general syntax for the constructor is as follows:

user_attributes? access_modifier? this “(“ parameters “)” block

Here “this” is the keyword.
The “parameters” is a list of parameters (which is the same for any other functions). The block contains zero or more expressions separated by semicolons and placed in curly braces.
Constructors can be any number (from zero to infinity). If the type doesn’t have a constructor then compiler generates a constructor with no parameters with “public” access modifier.
You may set a value to immutable field only from a constructor or from the initialization expression but it is passed to the constructor too. Therefore we need to add a constructor to Token variant type in order to set the StartPos field.
Here’s how it looks:

variant Token
{
  | Number   { value : int; }
  | Operator { name  : string; }
  | Error
  
  public this(startPos : int)
  {
    StartPos = startPos;
  }
  
  public StartPos : int;
}

Since the Token type is very small I described it completely and highlighted the constructor. As you can see the constructor has the “startPos” parameter. We pass the initial value for the StartPos field through the “startPos” parameter. The constructor copies the value in the field.
I ask once again to draw attention to the fact that the assignment of a value for immutable field is possible only in the constructor! This feature will save us from changing the value of the field accidentally in the future. Even if the code of our application grows to enormous size we’ll still believe that a change of field’s value is possible only in one place. That is why it makes sense to prefer immutable fields in compare with mutable ones. It is possible that not all fields of the object will be initialized in the constructor. What happens with uninitialized values? They will receive the values taken by default for its type. For example, it will be zero for a number.

The inheritance of constructors of variant types by their occurrences

Nemerle automatically generates constructors for all variant’s occurrences in variant type. There are parameters for all fields of variant’s occurrences (in corresponding order) in such constructors. Let’s assume the variant type has its own constructor. Then parameters of the variant type’s constructor are automatically added to the list of parameters of the constructors of variant’s occurrences of variant type. If a variant type has more than one constructor then Nemerle creates a corresponding number of constructors for each variant’s occurrence.
For example, if we have X variant type:

variant X
{
  | A { z : int; y : string; }
  | B
}

then the following constructors will be created for variant’s occurrences by default:
for X.A type:

public this(z : int, y : string)
{
  this.z = z;
  this.y = y
}

for X.B type:

public this()
{
}

If you add a constructor in X type:

variant X
{
  | A { z : int; y : string; }
  | B

  public this(w : double)
  {
  }
}

then Nemerle will create a constructor for X.A type:

public this(w : double, z : int, y : string)
{
  base(w);
  this.z = z;
  this.y = y
}

for X.B type:

public this(w : double)
{
  base(w);
}

NOTE
We should clarify what is meant by “this” and “base” in the body of constructors in these examples. The structure “this.z” means an access to a field of current type. Such the structure is required as parameter’s names of the constructor match the names of the fields. We use such structure to distinguish the field from the parameters. In principle nothing prevents the use of this syntax always (even if the names don’t cross). But this is a matter of taste.
“base (w)” is a call to the constructor of the base type (an occurrence of the variant type is a subtype of the variant type in which it is declared). If the type has a base type then we need to call one of the constructors of the base type.

If there is more than one constructor in a variant type then Nemerle forms the corresponding number of constructors for each of variant’s occurrence (generated by the scheme described above).

NOTE
The presence of several members of the types or functions with the same name (and the constructors always have the same name) is called overloading. These members differ in the numbers, types and locations of parameters. Nemerle supports overloading for almost anything except for local functions and variables. Each declaration of local function or a variable overlaps names coinciding with the name of local function or a variable.

Thus declaring a constructor with one startPos parameter we added the same parameter to all occurrences of Token variant type. It means the old version of the program will not compile. So the old version should be changed.
I will give a truncated code and will highlight amendments (I don’t show the bodies of functions in which there was any change):

def lexer(text : string) : list[Token]
{
  mutable index = 0;
  
  def peek() : char ...
  def read() : char ...
  def isDigit(ch)   ...

  def loop(res : list[Token]) : list[Token]
  {
    def number(ch : char, accumulator : int = 0) : int ...

    def error(startPos)
    {
      WriteLine(string(' ', index - 1) + "^");
      WriteLine("expected a number or an operator ");
      (Token.Error(startPos) :: res).Reverse()
    }
    
    def startPos = index;
    def ch       = read();

    match (ch)
    {
      | ' ' | '\t'            => loop(res) // spaces are ignored
      | '+' | '-' | '*' | '/' => loop(Token.Operator(startPos, 
                                                     ch.ToString()) :: res)
      | '\0'                  => res.Reverse()
      | _ when isDigit(ch)    => loop(Token.Number(startPos, number(ch)) :: res)
      | _                     => error(startPos)
    }
  }
  
  loop([])
}

def parseAndEval(tokens : list[Token]) : option[int]
{
  def loop(firstValue : int, tokens : list[Token]) : option[int]
  {
    match (tokens)
    {
      | Token.Operator("+") :: Token.Number(secondValue) :: tail => loop(firstValue + secondValue, tail)
      | Token.Operator("-") :: Token.Number(secondValue) :: tail => loop(firstValue - secondValue, tail)
      | Token.Operator("*") :: Token.Number(secondValue) :: tail => loop(firstValue * secondValue, tail)
      | Token.Operator("/") :: (Token.Number(0) as zeroNum) :: _ =>
        WriteLine(string(' ', zeroNum.StartPos) + "^");
        WriteLine("Division by zero");
        None()
      
      | Token.Operator("/") :: Token.Number(secondValue) :: tail => loop(firstValue / secondValue, tail)

      | [] => Some(firstValue)
      
      | _ :: Token.Error :: _  | Token.Error :: _  => None() // error message is already displayed!
      
      | unexpectedToken :: _  => 
        WriteLine(string(' ', unexpectedToken.StartPos) + "^");
        WriteLine("Expected '+', '-', '*' or '/' and a number follows it");
        None()
    }
  }
  
  match (tokens)
  {
    | Token.Number(n) :: tail => loop(n, tail)
    | _                       => None()
  }
}

def readInput()
{
  def inputString = ReadLine();
  
  unless (inputString == "")
  {
    def lexemes = lexer(inputString);
    
    match (parseAndEval(lexemes))
    {
      | Some(value) => WriteLine(value);
      | None        => ();
    }
      
    readInput();
  }
}

As you can see we added the current position to the beginning of the argument list of occurrences of Token variant type. Now for each token we keep its text’s position.
There are more substantial changes in the loop function embedded in parseAndEval function. But they are very simple also. Now we not only recognize error situations when parsing expressions but also output the meaningful error messages indicating the specific location in which they happened. Information about the error’s location is just taken from the new “StartPos” field. We no longer need the “Error in expression!” old message since now the error messages are directly output when parsing the expression. Therefore we replace it with “()” void-literal that means “do nothing”.

Once again the pattern matching

There are two new features for you related to pattern matching in this code.
First feature is an opportunity to specify two patterns for one occurrence of the match operator:

| _ :: Token.Error :: _  | Token.Error :: _  => None() // error message is already displayed!

In general you can specify one or more patterns for a single entry of match operator. If there is more than one pattern then each subsequent pattern must begin with a vertical bar (“|”). The “=>” operator should follow the last pattern. Thus there are two patterns in the above excerpt:

_ :: Token.Error :: _

and

Token.Error :: _

The first pattern is compared in the case when the second item of the list contains the Error token (symbolizing an error).
The second pattern is compared in the case when the Error token found in the first item of the list.
Since the error message of lexical parsing has already been displayed on the console (directly under the lexical parsing) then all you need to do is to return “None ()” function symbolizing the lack of a correct result.
The second new feature is the specification of a name of part of the pattern:

| Token.Operator("/") :: (Token.Number(0) as zeroNum) :: _ =>

Virtually any part of the the pattern can be associated with the name. More precisely the name is associated with the expression mapped to a part of the pattern for which the name is given. During binding a local variable is created that can be accessed from the handler of occurrence of match operator. In our case the “zeroNum” name was associated with the second token of the list. This token is needed in order to get the value of its position (the value of its StartPos field). It is necessary to form an error message indicating the exact place of the line in which the error is detected.

HINT
There are parentheses around “Token.Number (0) as zeroNum”. They are required in order the compiler can understand what is the “as” operator related with. The parentheses are not part of the pattern. They help to adjust the priorities of operators.

Testing the result

If you run a new version of the calculator and try to enter expressions that contain errors then our calculator will give intelligible error message and indicate the place where it occurred. Examples:

1/0
  ^
Divide by zero

1/w
  ^
expected a number or an operator
4 / 2 + 5 7
          ^
Expected '+', '-', '* ' or '/ ' and a number follows it

If you enter an expression that contains no errors then the expression will give the correct result:

4 / 2 + 5
7

Priorities of expressions

Launch our calculator, enter the following expression and press Enter:

4 + 3 * 2

I will not be surprised if the answer might surprise you:

14

Since elementary school we know that multiplication and division have precedence over addition and subtraction. Consequently the value of this expression would have to be equal to ten:

4 + (3 * 2) = 4 + 6 = 10

Why is our calculator wrong? The answer is obvious. It doesn’t take into account priorities of the operators and is considering an expression as follows (wrong from our point of view):

(4 + 3) * 2 = 7 * 2 = 14

Let’s rewrite the calculator so that it took into account the priorities.
It seems quite easily to extend the test to ensure the priority of multiplication. But it’s very naive approach that does not lead to good results.
We’ll describe the parser as a set of recursive functions because the grammar of expressions is recursive in nature.
The grammar of expressions can be regarded as a set of additions and subtractions:

A + B - C + ...

where A, B, C, etc. are a multiplication or division. Such operations can occur one or more times. If the expression doesn’t have an addition then it will have the only “A” operation of multiplication. If the expression contains an addition operation then it will have “A + B”, etc. The fact can be described in terms of the grammar as follows:

additionOrSubtraction = multiplicationOrDivision (('+' | '-') multiplicationOrDivision) *

This rule can be described in words like that. Each addition can be considered as a multiplication operation follows the “+” or “-” sign and
another multiplication operation. The asterisk means the expression in parentheses can be repeated zero or more times.
The operation of multiplication/division can be considered as a single number followed by the operation of multiplication or division zero or more times followed by a different number. Grammar rule for the multiplication / division will look like this:

multiplicationOrDivision = simpleExpression (('*' | '/') simpleExpression) *

The “simpleExpression” is the most basic expression in our grammar. While it will correspond the Token.Number in the case of our calculator. In future we will extend the concept of “simpleExpression” by introducing the support for unary minus operation or expressions in parentheses.
With such a view of the grammar it will be easy to write functions that perform the parsing of expressions in accordance with the grammar.
The simplest and most obvious to the human way of constructing a parser is to write a recursive top-down parser. In addition this method provides most understandable error messages.
In a nutshell the essence of this method is as follows. For each rule you must write a function that produces the parsing in accordance with the rule. If one rule refers to another then simply call the appropriate function.
Here is the implementation of the grammar by recursive descent parser:

def parseAndEval(tokens : list[Token]) : option[int]
{
  def parseSimpleExpr(tokens : list[Token]) : option[int] * list[Token]
  {
    | Token.Number(value) :: tail => (Some(value), tail)
    | _                           => (None(),      tokens)
  }

  def parseMulOrDiv(tokens : list[Token]) : option[int] * list[Token]
  {
    def loop(firstValueOpt, tokens)
    {
      | (Some(first), Token.Operator("*") :: tokens) =>
        match (parseSimpleExpr(tokens))
        {
          | (Some(second), tokens) => loop(Some(first * second), tokens)
          | (None, _)              => (None(), tokens)
        }

      | (Some(first), Token.Operator("/") :: tokens) =>
        match (parseSimpleExpr(tokens))
        {
          | (Some(second), tokens) => loop(Some(first / second), tokens)
          | (None, _)              => (None(), tokens)
        }
        
      | (Some, tokens) => (firstValueOpt, tokens)
      | (None, tokens) => (None(), tokens)
    }
    
    loop(parseSimpleExpr(tokens))
  }

  def parseSumOrSub(tokens : list[Token]) : option[int] * list[Token]
  {
    def loop(firstValueOpt, tokens)
    {
      | (Some(first), Token.Operator("+") :: tokens) =>
        match (parseMulOrDiv(tokens))
        {
          | (Some(second), tokens) => loop(Some(first + second), tokens)
          | (None, _)              => (None(), tokens)
        }

      | (Some(first), Token.Operator("-") :: tokens) =>
        match (parseMulOrDiv(tokens))
        {
          | (Some(second), tokens) => loop(Some(first - second), tokens)
          | (None, _)              => (None(), tokens)
        }
        
      | (Some, tokens) => (firstValueOpt, tokens)
      | (None, tokens) => (None(), tokens)
    }
    
    loop(parseMulOrDiv(tokens))
  }
  
  parseSumOrSub(tokens)[0]
}

There is only one place that you aren’t yet familiar in this code:

parseSumOrSub(tokens)[0]

I mean the operation 0 applied to the return value of parseSumOrSub function. This is an indexing operation. In general it has already been used in this example to retrieve the character in the string. This operation is also applicable to a tuple. Indices of tuples can only be integer constants (or rather literal). An index value of “0” allows you to get the first element of the tuple. An index value of “1” allows you to get the second element of the tuple and so on. The index value is checked at compile time. So it’s impossible to pass a value greater or equal than the number of elements in the tuple and less than zero.
Thus the above expression returns the first element of the tuple which in turn returned by parseSumOrSub function. This is a calculated value of the expression wrapped in “Some” or “None” in case of an error. The second element of the tuple contains the tail of the list with tokens. It doesn’t matter what contents the list has after the completion of the expression parsing.

NOTE
Indexing tuples only by literal done with a purpose. This allows the compiler to verify the correctness of the index and calculate the type of the value that we get as a result of tuple’s indexing. If you need to retrieve values from a tuple based on the dynamically derived index then we may create a simple function for this purpose. However this need is likely to indicate that your choise of using the tuple is wrong.

Although this code doesn’t contain any other innovations let us analyze it in more detail.
There are three sub-functions inside the parseAndEval function. They correspond to the three rules I have described above:

  • parseSimpleExpr sub-function corresponds to the “SimpleExpression” rule. It recognizes the number.
  • parseMulOrDiv sub-function corresponds to the “multiplicationOrDivision” rule. It detects zero or more consecutive multiplications. If there are no multiplications then parseAndEval function returns a value from the first call of parseSimpleExpr sub-function.
  • parseSumOrSub sub-function corresponds to the “additionOrSubtraction” rule. It detects zero or more consecutive additions. If there are no additions then parseAndEval function returns a value from the first call of parseMulOrDiv sub-function.

All three functions are built on a single template. They take a list of tokens as an input and return a tuple. The tuple consists of the values of the evaluated expression and the remains of the list of tokens that weren’t parsed by this call. The evaluated expression is wrapped in “option [T]” type that allows you to return None() function in case of calculation’s error.
If successful the function returns the result wrapped in Some () and part of the list passed to this function as a tokens parameter with the exception of recognized tokens. Thus the list is getting shorter and shorter with each parsed rule. As a result if the entire expression is successfully parsed then the parseAndEval function returns the result of calculation wrapped in Some () and an empty list of tokens. In case of failure the parseAndEval function returns None () and the initial list (obtained as a parameter).
The ParseSimpleExpr function is quite simple:

  def parseSimpleExpr(tokens : list[Token]) : option[int] * list[Token]
  {
    | Token.Number(value) :: tail => (Some(value), tail)
    | _                           => (None(),      tokens)
  }

The ParseSimpleExpr function checks whether the first token is a number in the list (that is whether it has the Token.Number type).
If so it returns the number wrapped in Some() and the rest of the list which is bound with a “tail” variable by the pattern.
If the first token is not a number (i. e. the list is empty) the ParseSimpleExpr function returns None () and an initial list of tokens.
Two other functions are more complex but organized in a similar manner. They consist of a call to another rule (parseMulOrDiv for parseSumOrSub and parseSimpleExpr for parseSumOrSub). The results of parsing the rule are passed to the loop function.
Note that the return value of parseXxx functions is a tuple and the loop function takes two parameters. If the elements of a tuple correspond by types and quantities to the function’s parameters then Nemerle allows you to send a tuple directly to the function. Nemerle transforms the tuple to the list of arguments and passes them to function. For example, the code:

loop(parseMulOrDiv(tokens))

we would describe as follows in the absence of this feature:

def (valueOpt, tokens) = parseMulOrDiv(tokens);
loop(valueOpt, tokens)

But thanks to the support of the transformation of tuples into the list of parameters we avoid such conversion.
The loop function (below is its version in parseSumOrSub function) organizes the cycle described in the rules (see above).

def loop(firstValueOpt, tokens)
{
  | (Some(first), Token.Operator("+") :: tokens) =>
    match (parseMulOrDiv(tokens))
    {
      | (Some(second), tokens) => loop(Some(first + second), tokens)
      | (None, _)              => (None(), tokens)
    }

  | (Some(first), Token.Operator("-") :: tokens) =>
    match (parseMulOrDiv(tokens))
    {
      | (Some(second), tokens) => loop(Some(first - second), tokens)
      | (None, _)              => (None(), tokens)
    }
    
  | (Some, tokens) => (firstValueOpt, tokens)
  | (None, tokens) => (None(), tokens)
}

The template:

(Some(first), Token.Operator("+") :: tokens)

is mached if the value of firstValueOpt parameter contains the correct value (i.e. it’s the occurrence of Some) and the first item in the list is the addition operator. In this case the embedded rule is called (I mean parseMulOrDiv) the result of which is also mached with the pattern. If parsing of nested rule was successful then we’ll execute the following occurrence of the match:

| (Some(second), tokens) => loop(Some(first + second), tokens)

We added the first value to the second one and packed in Some again. After that we passed it along with the rest of the list of tokens to the recursive loop function.
If parseMulOrDiv will fail we’ll return None () as the first element of the tuple which leads to a match with the following occurrence:

| (None, _)              => (None(), tokens)

Of course here you could add a code to output an error message but the code was dropped for simplicity. In addition the code doesn’t handle the “division by zero” error.

Refactoring

The code of parsing expressions is small, but picky reader, I think, has already drawn attention to the fact that it contains a lot of similar code. Let’s change the code in such way that, on the one hand, it has not changed behavior, but on the other hand, has no unnecessary duplication.

HINT
Sooner or later we may want to change our code to support new functionality or to correct errors. Duplicate code can prevent it because we have to change more than one area of code. It is very easy to make mistakes, not changing all the similar pieces of code, or modifying a similar but irrelevant code. Over time the amount of duplicate code in your project will grow and sooner or later exceed a certain critical mass. After that it will be great difficult to modify code in this project. To prevent this from happening you need to consistently brush up code in the project. A process of improvement project code without changing its behavior is called refactoring. Pay attention to the fact that the elimination of code duplication and “refactoring” are not synonymous. There are many types of refactoring and a variety of reasons to do refactoring.

Let us carry out the refactoring of our calculator project. The purpose of refactoring will be the elimination of code duplication. First you need to identify pieces of code that are duplicated. Let’s start with the nested loop function (from the parseSumOrSub function):

def loop(firstValueOpt, tokens)
{
  | (Some(first), Token.Operator("+") :: tokens) =>
    match (parseMulOrDiv(tokens))
    {
      | (Some(second), tokens) => loop(Some(first + second), tokens)
      | (None, _)              => (None(), tokens)
    }

  | (Some(first), Token.Operator("-") :: tokens) =>
    match (parseMulOrDiv(tokens))
    {
      | (Some(second), tokens) => loop(Some(first - second), tokens)
      | (None, _)              => (None(), tokens)
    }
    
  | (Some, tokens) => (firstValueOpt, tokens)
  | (None, tokens) => (None(), tokens)
}

The selected fragment is duplicated with the fragment below. In addition, it’s duplicated with the same code from parseMulOrDiv function except a call (parseMulOrDiv), the string literal in the pattern (“+”/ “-” / “*” / “/”) and the actual operations performed on “first” and “second”.
To eliminate this duplication we can do one of two things.

  1. Move duplicate code into separate function.
  2. Take advantage of flexible pattern matching creating a polymorphic processor with two patterns.

Eliminating duplication by moving code to a separate function

Magnificent support of functional programming style in Nemerle allows you to select almost any code snippets into specific function.
Separating the code into a separate function, you need to make sure that all data available before the refactoring transferred to a new function. This may be done by explicit transfer of parameters or by implicit transfer through closures. It’s very important to know a position of the new function in the second case. Indeed, the data may not be available in a place where the new function is declared.
In addition, if (as in our case) there are a lot of extracted fragments and they have some differences (in this case they are operators and string literals in the pattern) then differ snippets of the code also need to be passed as parameters. Many software programmers are well aware how to pass parameters as a data but don’t quite understand how to pass parameters as snippets of code. It’s not difficult if your language (as, for example, Nemerle) may pass functions as parameters.
Here’s how the parseSumOrSub function might look after moving this fragment in a separate parseOperation function.

def parseOperation(first, subRuleParser, tokens, operator, loop)
{
  match (subRuleParser(tokens))
  {
    | (Some(second), tokens) => loop(Some(operator(first, second)), tokens)
    | (None, _)              => (None(), tokens)
  }
}
def parseSumOrSub(tokens : list[Token]) : option[int] * list[Token]
{
  def loop(firstValueOpt, tokens)
  {
    | (Some(first), Token.Operator("+") :: tokens) =>
      def plus(a, b) { a + b }
      parseOperation(first, parseMulOrDiv, tokens, plus, loop)

    | (Some(first), Token.Operator("-") :: tokens) =>
      def minus(a, b) { a + b }
      parseOperation(first, parseMulOrDiv, tokens, minus, loop)
      
    | (Some, tokens) => (firstValueOpt, tokens)
    | (None, tokens) => (None(), tokens)
  }
  
  loop(parseMulOrDiv(tokens))
}

In this code snippet we highlighted the names of functions that are passed and used in parseOperation functions.
The simplest and most illustrative example is the “operator” parameter of parseOperation function. It allows you to abstract the particular operator.

NOTE
Remember and learn how to apply this approach. Of course, this is not a panacea but the approach provides incredible flexibility.

We use “plus” and “minus” functions in only one place in the program. Their bodies are very short expressions. It turns out that the description of the function takes up more space than its body. In fact, these functions are only needed to pass a short expression as a parameter to another function. This is a very common case when using functional programming. Therefore, there are special means in Nemerle to solve this problem. In particular, Nemerle supports lambdas and partial application.

Lambdas

Lambdas are anonymous functions which are declared on a place. Nemerle supports two variants of the lambda syntax:

“Fun” “(“ arguments “)” (":" type_of_return_value)? block

and

“Fun” “(“ arguments “)” (":" type_of_return_value)? block
“(“ arguments “)” “=>” expression | argument “=>”expression

NOTE
The “|” sign means “or” alternative in the grammar. This means that the syntax of lambda can start with either a single argument or any number of arguments in the brackets.

The first syntax is the original Nemerle syntax. It’s very similar to the announcement of the local function where we use “fun” keyword instead of the name. If you replace the “plus” function by “fun” keyword:

def plus(a, b) { a + b }
parseOperation(first, parseMulOrDiv, tokens, plus, loop)

you can write:

parseOperation(first, parseMulOrDiv, tokens, fun(a, b) { a + b }, loop)

In fact, the compiler converts lambdas to local functions. So that eventually the code will be substituted something like:

parseOperation(first, parseMulOrDiv, tokens, 
  { def unique_name(a, b) { a + b }; unique_name }, loop)

So you can regard lambdas as syntactic sugar to local functions.
Lambdas may do all that local functions do except the recursive call themselves (because a lambda just don’t have a name).
The second syntax is actually a macro that is declared in the standard library of macros. It emulates the syntax of lambda expressions in C # 3.0. If you rewrite the above piece of code using this macro then you will have the following code:

parseOperation(first, parseMulOrDiv, tokens, (a, b) => a + b, loop)

If lambdas have no parameters then you specify empty parentheses:

() => ...

If lambda has exactly one parameter then the brackets can be omitted. For example, “increment” function looks like (which returns the value of its parameter, increased by one):

x => x + 1

When you using both types of syntax you may specify (or omit) the type of parameters:

fun(a : int, b : int) { a + b }
(a : int, b : int) => a + b

However, it’s possible to specify the return type of the lambda only in the case of the first type of syntax:

fun(a : int, b : int) : int { a + b }

In Nemerle this isn’t a particular problem because, firstly, type inference helps out almost always, and, secondly, the type can be specified directly in expression:

(a, b) => (a + b) : int

Partial application

Lambdas are very flexible and concise mechanism but they are cumbersome in simple cases such as the conversion of an operator into function.
In cases where you need to convert a single statement or a single function call into a function you can use use partial application in Nemerle. The essence of this is very simple. If we have a binary operator (for example, the “+” operator) then it can be converted into a function substituting the operands by “_” wildcard character. Thus, to obtain an analog of the above lambdas we just need to write:

_ + _

Here’s how a “parseOperation” call will look using partial application:

parseOperation(first, parseMulOrDiv, tokens, _ + _, loop)

To obtain a function similar to the "increment” function (x => x + 1) you can specify a unity as one of the operands specific value:

_ + 1

You can use variables trapped in the external context as values. However, the complex expressions can not be used as values. Use a lambda if you need the complex expressions.
Later I will show how partial application may be applied to functions and methods.

Eliminating code duplication by using multiple patterns

Another way to eliminate code duplication is to use multiple (polymorphic) patterns. So, instead of making the code as a separate function one would simply describe the occurrence of the match operator containing two patterns and the special “with” operator:

| (Some(first), Token.Operator("+") :: tokens) with operator = _ + _
| (Some(first), Token.Operator("-") :: tokens) with operator = _ - _ =>
  match (parseMulOrDiv(tokens))
  {
    | (Some(second), tokens) => loop(Some(operator(first, second)), tokens)
    | (None, tokens)         => (None(), tokens)
  }

In the first case (when the duplicate code was moved into the function) an abstraction from a specific operator occurred due to the transfer of function-operator as a parameter of local function. In this case we use the “operator” variable which is introduced by the “with” keyword.
Note that we give its value of variable for each pattern. The variable takes the value “_ + _” for a pattern recognizing the “+” operator. And the variable takes the value “_ – _” for a pattern recognizing the “-” operator.
It’s impossible to specify the type of a variable introduced through “with”. The compiler itself will infer their types based on the type of initializers (follow directly “=”).
If the types of expressions are different then the compiler will try to deduce the most common type (you’ll learn later what it means). If the compiler will fail to deduce the most common type then it will generate an error message.
In this example the variable’s type will be inferred as:

int * int -> int

that is the type of the variable will be the functional one.
This allows you to make its call which takes place three lines below.

Continuation of refactoring

We eliminated the duplication of a small fragment of code inside “parseSumOrSub” function but we still have the duplication of a large fragment of code. I’m talking about nested “loop” functions in the “parseSumOrSub” and “parseMulOrDiv” functions.
We can take out the code of the loop function from parseSumOrSub and parseMulOrDiv functions (preliminarily abstracted the code) as we did in the case of removal of the part of the functionality into parseOperation function.
However, it’s not easy because these functions differ in the patterns:

| (Some(first), Token.Operator("+") :: tokens) with operator = _ + _
| (Some(first), Token.Operator("-") :: tokens) with operator = _ - _ =>

and

| (Some(first), Token.Operator("*") :: tokens) with operator = _ * _
| (Some(first), Token.Operator("/") :: tokens) with operator = _ / _ =>

and Nemerle patterns are not first-class entities, unfortunately (can’t be placed in a variable or passed as a parameter).
To resolve this problem it is necessary to move the code of checking the type of operator (i.e. “+”, “-", “*”, "/ " string literals) into defenders (guards) of the patterns but the values themselves passed as parameters. Here’s the solution:

def loop(input : option[int] * list[Token], parseSubExpr, op1, f1, op2, f2)
{
  match (input)
  {
    | (Some(first), Token.Operator(op) :: tokens) when op == op1 with f = f1
    | (Some(first), Token.Operator(op) :: tokens) when op == op2 with f = f2 =>
      match (parseSubExpr(tokens))
      {
        | (Some(second), tokens) => 
          loop((Some(f(first, second)), tokens), parseSubExpr, op1, f1, op2, f2)
          
        | (None, _)              => (None(), tokens)
      }

    | (Some as firstOpt, tokens) => (firstOpt, tokens)
    | (None, tokens)             => (None(), tokens)
  }
}

def parseSimpleExpr(tokens : list[Token]) : option[int] * list[Token]
{
  | Token.Number(value) :: tail => (Some(value), tail)
  | _                           => (None(),      tokens)
}
def parseMulOrDiv(tokens : list[Token]) : option[int] * list[Token]
{
  loop(parseSimpleExpr(tokens), parseSimpleExpr, "*", _ * _, "/", _ / _)
}
def parseSumOrSub(tokens : list[Token]) : option[int] * list[Token]
{
  loop(parseMulOrDiv(tokens), parseMulOrDiv, "+", _ + _, "-", _ - _)
}

We transmit strings describing the operators in op1, op2 parameters and functions corresponding to these operators in f1 and f2 parameters. Their names are shortened for brevity.
Thus, we have almost completely gotten rid of code duplication in parseAndEval function and cut the code essentially. The only remaining duplication in the code is duplication of the pattern:

(Some(first), Token.Operator(op) :: tokens)

It can be also eliminated leaving one pattern and move checking inside the handler. But there is no practical meaning to do that, as patterns (which are close to each other) are very small and declarative. All this will ensure that such duplication doesn’t become a problem in the future. Elimination of this duplication will inevitably lead to a complication of handler code that is no better of pattern duplication.
By now, everyone in this implementation should be familiar to you, except, perhaps, the first “input” parameter of the new (universal) loop function. It has the “tuple” type. You are already familiar with tuples. The function parameter is a tuple in that case. In general, it’s nothing new in this. Just pay attention to it.
This parameter is made as a tuple rather than two separate parameters (as before). So we could pass the return value of parseSubExpr function through it. We pass parseSubExpr function as a parameter also. This allows us to get rid of extra variables and make the code a bit more succinct.

Once again the error handling

The new version of parseAndEval function (unlike the old version) evaluates expressions according to the priorities. However, parseAndEval function doesn’t have error messages again which is a degradation of functionality. Let’s fix this defect. We’ll have to change the parseSimpleExpr and parseMulOrDiv functions:

def parseSimpleExpr(tokens : list[Token]) : option[int] * list[Token]
{
  | Token.Number(value) :: tail => (Some(value), tail)
  | Token.Error :: _            => (None(), tokens)
  | unexpectedToken :: _        =>
    WriteLine(string(' ', unexpectedToken.StartPos) + "^");
    WriteLine("Expected number");
    (None(), tokens)
  
  | _                           =>
    WriteLine("Expected number in the end of expression");
    (None(), tokens)
  
}
def parseMulOrDiv(tokens : list[Token]) : option[int] * list[Token]
{
  def div(a, b)
  {
    if (b == 0)
    {
      WriteLine("division by zero");
      0
    }
    else a / b
  }

  loop(parseSimpleExpr(tokens), parseSimpleExpr, "*", _ * _, "/", div)
}

You may see that references to functions are passed to the loop function as parameters. That fact allows creating advanced function that is responsible for the division. This function checks the value of the divisor. If the divisor is zero then we output an error message and returns “0”.
Now our calculator correctly responds to errors. However, the error message of division by zero doesn’t display the location of error in the expression. This is due to the fact that the div function doesn’t get enough information. Although the div function can’t tell the external function that the calculation failed.
Let’s just change our code to fix this problem. We need to create two versions of the loop function, one of which will be implemented by the other. One version will work the same way as the old and the second will be different by type of f1 and f2 functions. Original type of these functions is “int * int → int”. These parameters will be of the “int * int * int → option [int]” type in the new version of the loop function. The new (final) parameter will transfer the position of the current token.
The wrap of the return value into the option[] will inform the caller that the calculation has been a failure or return the result (wrapped in Some) if the calculation is successful.
We have to rename the second “loop” function since the local functions override names already defined in Nemerle. Let us just add “2” to its name. Here’s how new version of parseAndEval function will look like:

def parseAndEval(tokens : list[Token]) : option[int]
{
  def loop2(input : option[int] * list[Token], parseSubExpr, op1, f1, op2, f2)
  {
    match (input)
    {
      | (Some(first), Token.Operator(op) as tok :: tokens) when op == op1 with f = f1
      | (Some(first), Token.Operator(op) as tok :: tokens) when op == op2 with f = f2 =>
        match (parseSubExpr(tokens))
        {
          | (Some(second), tokens) => 
            loop2((f(first, second, tok.StartPos), tokens), parseSubExpr, op1, f1, op2, f2)
            
          | (None, _)              => (None(), tokens)
        }

      | (Some as firstOpt, tokens) => (firstOpt, tokens)
      | (None, tokens)             => (None(), tokens)
    }
  }
  def loop(input : option[int] * list[Token], parseSubExpr, op1, f1, op2, f2)
  {
    loop2(input, parseSubExpr, op1, (a, b, _) => Some(f1(a, b)), 
                               op2, (a, b, _) => Some(f2(a, b)))
  }
  def parseSimpleExpr(tokens : list[Token]) : option[int] * list[Token]
  {
    | Token.Number(value) :: tail => (Some(value), tail)
    | Token.Error :: _            => (None(), tokens)
    | unexpectedToken :: _        =>
      WriteLine(string(' ', unexpectedToken.StartPos) + "^");
      WriteLine("Expected number ");
      (None(), tokens)
    
    | _                           =>
      WriteLine("Expected number in the end of expression ");
      (None(), tokens)
    
  }
  def parseMulOrDiv(tokens : list[Token]) : option[int] * list[Token]
  {
    def div(a, b, currentTokenPos)
    {
      if (b == 0)
      {
        WriteLine(string(' ', currentTokenPos) + "^");
        WriteLine("division by zero");
        None()
      }
      else Some(a / b)
    }

    loop2(parseSimpleExpr(tokens), parseSimpleExpr, "*", (a, b, _) => Some(a * b), "/", div)
  }
  def parseSumOrSub(tokens : list[Token]) : option[int] * list[Token]
  {
    loop(parseMulOrDiv(tokens), parseMulOrDiv, "+", _ + _, "-", _ - _)
  }
  
  parseSumOrSub(tokens)[0]
}

Let’s describe the types of loop2, loop and div functions explicitly to better understand what is happening.

def loop2(
  input          : option[int] * list[Token],
  parseSubExpr   : list[Token] -> option[int] * list[Token],
  op1            : string,
  f1             : int * int * int -> option[int],
  op2            : string,
  f2             : int * int * int -> option[int]
)
def loop(
  input          : option[int] * list[Token],
  parseSubExpr   : list[Token] -> option[int] * list[Token],
  op1            : string,
  f1             : int * int -> int,
  op2            : string,
  f2             : int * int -> int
)
def div(a : int, b : int, currentTokenPos : int) : option[int]

I have highlighted the differences between the loop2 and loop functions so that you’ll see clearly a difference between them. The loop function is implemented through loop2 function. We use simple lambda-adapters for the f1 and f2 parameters:

(a, b, _) => Some(f1(a, b))

transforming function of the type “int * int → int” into the function of the type “int * int * int → option [int]”. The third parameter doesn’t use lambdas. That’s why we specify ‘_’ instead of the last parameter to avoid generation of diagnostic messages about this. It can be used even if we have several unused parameters. When the compiler meets “_” as the name of the parameter it will generate for each parameter a unique name like “_N_wilcard_1234” where will be a unique number instead of 1234. Such options will not be visible in the IDE, so refer to them would be impossible. If the parameter is not used in a function but I’d like to see its value in the debugger then you can name such a parameter that starts with an underscore:

(a, b, _currentTokenPos) => Some(f1(a, b))

Latest “strokes”

To date our calculator has become almost a full application. To use the calculator in real tasks it should support the expressions in brackets and works with negative numbers (in other words, it would support the unary minus).
To implement a support for unary minus we need to extend the parseSimpleExpr function a little bit introducing one more match operator into it:

def parseSimpleExpr(tokens : list[Token]) : option[int] * list[Token]
{
  | Token.Operator("-") :: tail => 
    match (parseSimpleExpr(tail))
    {
      | (Some(value), tokens) => (Some(-value), tokens)
      | failResult            => failResult
    }
...

If we meet a minus in the expression where expected "a simple expression” (SimpleExpr) then we attempt to recognize one more simple expression. If it succeeds then the value of this expression is inverted by applying a unary minus. Thus, after making this change we can compute values of the following expressions:

-1 + -2 = -3
--1 + 2 = 3

Somewhat more complicated to add support of parentheses. The first thing you need to do is to add new types of tokens – OpenBrace (opening parenthesis) and CloseBrace (closing parenthesis).
Further, we’ll add one more occurrence into the parseSimpleExpr function (as in the case of the unary minus). It will recognize the emergence of an open parenthesis. If successful then you need to call parseSumOrSub function (because parseSumOrSub is the topmost rule of our parser) and verify that the first token is a closing parenthesis in the list.
Almost everything you need to do you has to understand without further explanation. So I just give you the complete code for the calculator by selecting the added fragments.
No wonder I said “almost all”. The parseSumOrSub function announced below the parseSimpleExpr function and hence will not be visible in it (accessible). We can’t rearrange these functions as parseSumOrSub uses the parseMulOrDiv and parseMulOrDiv uses parseSimpleExpr in turn. It means that our functions are mutually recursive. You may explicitly declare several functions as a mutually recursive to allow local functions apply not only to local functions defined above but also to the functions announced below them.
That’s why the second and subsequent mutually recursive functions need to be declared not by “def” keyword but by “and”:

def parseSimpleExpr(tokens : list[Token]) : option[int] * list[Token]
{
  ...
}
and parseMulOrDiv(tokens : list[Token]) : option[int] * list[Token]
{
  ...
}
and parseSumOrSub(tokens : list[Token]) : option[int] * list[Token]
{
  ...
}

The Nemerle compiler will understand that functions are mutually recursive and will allow them to call each other. So here’s the complete code for our calculator (we selected new code responsible for the support of parentheses):

using Nemerle.Collections;
using Nemerle.Text;
using Nemerle.Utility;

using System;
using System.Collections.Generic;
using System.Console;
using System.Linq;

variant Token
{
  | Number   { value : int; }
  | Operator { name  : string; }
  | OpenBrace
  | CloseBrace
  | Error

  public this(startPos : int)
  {
    StartPos = startPos;
  }
  
  public StartPos : int;
}

module Program
{
  Main() : void
  {
    def lexer(text : string) : list[Token]
    {
      mutable index = 0;
      
      def peek() : char
      {
        if (index >= text.Length) '\0' else text[index]
      }

      def read() : char
      {
        def ch = peek();

        when (index < text.Length)
          index++;

        ch
      }

      def isDigit(ch) { ch >= '0' && ch <= '9' }

      def loop(res : list[Token]) : list[Token]
      {
        def number(ch : char, accumulator : int = 0) : int
        {
          def highOrderValue    = accumulator * 10;
          def currentOrderValue = ch - '0' : int;
          def currentValue      = highOrderValue + currentOrderValue;
      
          if (isDigit(peek()))
            number(read(), currentValue);
          else
            currentValue
        }

        def error(startPos)
        {
          WriteLine(string(' ', index - 1) + "^");
          WriteLine("expected number or operator");
          (Token.Error(startPos) :: res).Reverse()
        }
        
        def startPos = index;
        def ch       = read();

        match (ch)
        {
          | '('                   => loop(Token.OpenBrace (startPos) :: res)
          | ')'                   => loop(Token.CloseBrace(startPos) :: res)
          | ' ' | '\t'            => loop(res) // we ignore spaces
          | '+' | '-' | '*' | '/' => loop(Token.Operator(startPos, ch.ToString()) :: res)
          | '\0'                  => res.Reverse()
          | _ when isDigit(ch)    => loop(Token.Number(startPos, number(ch)) :: res)
          | _                     => error(startPos)
        }
      }
      
      loop([])
    }
  
    def parseAndEval(tokens : list[Token]) : option[int]
    {
      def loop2(input : option[int] * list[Token], parseSubExpr, op1, f1, op2, f2)
      {
        match (input)
        {
          | (Some(first), Token.Operator(op) as tok :: tokens) when op == op1 with f = f1
          | (Some(first), Token.Operator(op) as tok :: tokens) when op == op2 with f = f2 =>
            match (parseSubExpr(tokens))
            {
              | (Some(second), tokens) => loop2((f(first, second, tok.StartPos), tokens), parseSubExpr, op1, f1, op2, f2)
              | (None, _)              => (None(), tokens)
            }

          | (Some as firstOpt, tokens) => (firstOpt, tokens)
          | (None, tokens)             => (None(), tokens)
        }
      }
      def loop(input : option[int] * list[Token], parseSubExpr, op1, f1, op2, f2)
      {
        loop2(input, parseSubExpr, op1, (a, b, _) => Some(f1(a, b)), 
                                   op2, (a, b, _) => Some(f2(a, b)))
      }
      def parseSimpleExpr(tokens : list[Token]) : option[int] * list[Token]
      {
        | Token.Operator("-") :: tail => 
          match (parseSimpleExpr(tail))
          {
            | (Some(value), tokens) => (Some(-value), tokens)
            | failResult            => failResult
          }
          
        | Token.OpenBrace as openBraceTok :: tail => 
          match (parseSumOrSub(tail))
          {
            | (Some(value), Token.CloseBrace :: tail) => (Some(value), tail)
            | (Some, _) => // there is no closed parenthesis after subexpression!
              WriteLine(string(' ', openBraceTok.StartPos) + "^");
              WriteLine("It’s no closed parenthesis");
              (None(), tokens)
              
            | failResult => failResult
          }
        
        | Token.Number(value) :: tail => (Some(value), tail)
        | Token.Error         :: _    => (None(), tokens)
        | unexpectedToken :: _        =>
          WriteLine(string(' ', unexpectedToken.StartPos) + "^");
          WriteLine("Expected number");
          (None(), tokens)
        
        | _                           =>
          WriteLine("Expected number in the end of expression");
          (None(), tokens)
        
      }
      and parseMulOrDiv(tokens : list[Token]) : option[int] * list[Token]
      {
        def div(a, b, pos)
        {
          if (b == 0)
          {
            WriteLine(string(' ', pos) + "^");
            WriteLine("divizion by zero");
            None()
          }
          else Some(a / b)
        }
        loop2(parseSimpleExpr(tokens), parseSimpleExpr, "*", (a, b, _) => Some(a * b), "/", div)
      }
      and parseSumOrSub(tokens : list[Token]) : option[int] * list[Token]
      {
        loop(parseMulOrDiv(tokens), parseMulOrDiv, "+", _ + _, "-", _ - _)
      }
      
      match (parseSumOrSub(tokens))
      {
        | (None, _)                     => None()

        // The list of tokens must be empty if all expression is parsed!
        | (Some as value, [])           => value

        // we catch additional open parenthesis and so on
        | (Some, unexpectedToken :: _)  => 
          WriteLine(string(' ', unexpectedToken.StartPos) + "^");
          WriteLine("Unrecognized token ");
          None()
      }
    }
    
    def readInput()
    {
      def inputString = ReadLine();
      
      unless (inputString == "")
      {
        def lexemes = lexer(inputString);
        
        match (parseAndEval(lexemes))
        {
          | Some(value) => WriteLine(value);
          | None        => ();
        }
          
        readInput();
      }
    }

    readInput();
  }
}

HINT
Now our calculator is suitable for practical use. Of course, I could give the final version immediately but decided to develop this example in small iterations. There are two reasons for this. First of all the material is easier to express when there are few changes. Second of all I wanted to demonstrate an iterative method of software development. The essence of this method is very simple. We get a simplified implementation of the program very quickly (a prototype) and then we bring it to perfection by successive changes. At some stages the program can’t even do anything useful but we may run and test it at each stage of development. Especially this method is good in tandem with the use of modular and integration testing.
Nemerle perfectly supports this style of programming and I like to use it when write Nemerle programs. I advise you to apply an iterative method of software development also.
It may seem that this method is not applicable when it comes to development of a large project. But it’s not true. We may divide large projects into distinct stages each of which can be developed using this technique.

Let’s test our calculator:

2 * (3 + 4)
14
2 * (3 + 4
    ^
Unclosed parenthesis
2 * (3 + 4))
           ^
Unrecognized token
2 * (3 + -4)
-2
2 * (3 + -4) + 5
3

Modules … or another refactoring

The lexer and parseAndEval local functions contain an mplementation of our calculator. It’s is a good solution for our problem. But what to do if you need to use this code again? For example, we might want to create not only the console version of the calculator but also the GUI version. Or we may want to use our calculator as an engine for expressions calculation in some of the accounting or management automation system.
Of course, you can simply copy the code of parseAndEval and lexer functions into a new project but it’s a bad decision. In the future you may need to change the calculator code (for example, to add another opportunity or to correct an error). As the result you will either have to modify multiple versions of code (in different projects), either version will differ. If you are an experienced programmer then you understand perfectly well the consequences. Code reuse by “Copy & Past” is the worst thing that can come up with programming.
In all modern programming languages the problem is solved by the removal of the common part of code in a shared library (the assembly in the terminology of .Net). However, local functions can’t be placed in the library (they are local!).
To put functions in a library you need to convert them into methods of a module (or class, but more on that later).
I have already described a module at the beginning. But I did not tell you how to create your own modules. It is time to rectify that omission.
Thus the module is a custom data type (as described variant types above) which allows describing a globally visible functions (called static methods in .Net).
Let’s transform the local lexer and parseAndEval functions in static methods (global functions) Lexer and ParseAndEval:

using Nemerle.Collections;
using Nemerle.Text;
using Nemerle.Utility;

using System;
using System.Collections.Generic;
using System.Console;
using System.Linq;

public variant Token
{
  ...
}

public module CalcParser
{
  public Lexer(text : string) : list[Token]
  {
    ...
  }
  
  public ParseAndEval(tokens : list[Token]) : option[int]
  {
    ...
  }
}

module Program
{
  Main() : void
  {
    def readInput()
    {
      def inputString = ReadLine();
      
      unless (inputString == "")
      {
        def lexemes = CalcParser.Lexer(inputString);
        
        match (CalcParser.ParseAndEval(lexemes))
        {
          | Some(value) => WriteLine(value);
          | None        => ();
        }
          
        readInput();
      }
    }

    readInput();
  }
}

Changes are highlighted as always. Besides I don’t give the body and methods of CalcParser.Lexer and CalcParser.ParseAndEval since they are identical to the corresponding bodies of local functions.
Now you can put this module into the library and used repeatedly after the transformation of local functions in the methods of the module CalcParser. In addition, the module can be reused without prejudging the library but then its use will be limited to a single project.
Pay attention to the “public” access modifiers. I have already described them above but mentioned that they were acting on type’s members. But this is not the whole truth. Access modifiers act on types in the same way. “Public” modifier of methods means that methods can be used outside of types. “Public” modifier of the type means that the type (and hence all its public members) can be used outside the assembly.
By default (that is if the access modifier is not specified explicitly) the level of access of members in modules (as well as classes and structures) is a “private”. Thus, if we want to use some member of the module from outside then we need to mark it with a modifier “public”.
Types have “internal” access modifier by default which allows their use in the current assembly but doesn’t allow accessing it from another assembly.
Marking module modifier as “public” we tell the compiler that we want to provide access to the module not only for types of an assembly in which it declared but also for any other type of an assembly.
In addition, we marked the variant type “Token” by “public” access modifier. This is because the public methods of the CalcParser module use a type in its signature (the description of the function without its body). The type used in the signature of the public members of the other public type must also be public. Otherwise, the compiler generates an error message.

WARNING
Don’t make the types or their members public (i.e., don’t mark them by “public” modifier just in case). External users (as is the outer code in this case) should receive only clear API (Application Programming Interface) which should provide only those features that you plan ahead.
All the implementation details should be hidden from external users.

Overloading

Providing an interface of your code to external users you should strive to ensure that it was flexible on the one hand and easy to use on the other.
There are Lexer and ParseAndEval functions in our calculator. But the user will parse the text string in most cases. Calling two functions is not very useful in such circumstances. On the other hand, the user may wish to obtain only a list of tokens or to provide list of tokens (prepared independently) to our parser. This leads to the fact that it is unwise to hide Lexer and ParseAndEval, exposing some ParseAndEval function outside which takes a string and generates lexical analysis and parsing (in one step).
In such cases best of all to provide more low-level but more versatile functions and higher-level function (which takes a string and generates parsing and the calculation in one call).
In fact, a function that takes a list of tokens as input and a function that takes a string as input can be named as ParseAndEval since they both produce the parsing and expression evaluation.
Nemerle allows us to describe one type of two functions having the same name but different in number and / or type of parameters. This is called overloading.
Below is a slightly modified version of the calculator.

public variant Token
{
  ...
}

public module CalcParser
{
  public Lexer(text : string) : list[Token]
  {
    ...
      def error(startPos)
      {
        Error = (startPos, "expected number or operator ");
        (Token.Error(startPos) :: res).Reverse()
      }
    ...
  }
  
  public ParseAndEval(tokens : list[Token]) : option[int]
  {
    ...
  }

  public ParseAndEval(text : string) : option[int]
  {
    ParseAndEval(Lexer(text))
  }
}

module Program
{
  Main() : void
  {
    def readInput()
    {
      def inputString = ReadLine();
      
      unless (inputString == "")
      {
        match (CalcParser.ParseAndEval(inputString))
        {
          | Some(value) => WriteLine(value);
          | None        => ();
        }
          
        readInput();
      }
    }

    readInput();
  }
}

Now the module contains two “ParseAndEval” functions one of which takes a list of tokens and the other the string. In this case we removed the “Lexer” function call from the readInput function and we passed “inputString” string into “ParseAndEval” function. It means we call the second (in order of location) “ParseAndEval” function.
It is said that “ParseAndEval” function is overloaded. More precisely, the name of the function is overloaded.
The process of selecting a specific overloading by a compiler is called overloading resolution. At the end of this paper I will give a precise description of the overloading resolution algorithm used in Nemerle. Now you need to know that the compiler selects a specific overloading by the most exact match argument types to the types of formal parameters of functions.

Built-in DSL and and an example of calculator based on PEG-parser

I hope that “calculator” example is not only allowed you to explore new Nemerle capabilities but also demonstrated that Nemerle is very expressive language. The presence of pattern matching, variant types and local functions allows us to make any code, associated with the recognition, concise and expressive. However, similar opportunities exist in several other languages. But Nemerle provides even more expressive means for solving complex problems.
I want to give another realization of the calculator in this section. This implementation uses the PegGrammar macro. This macro allows describing the grammar of the language in the form of PEG (Parsing Expression Grammar). PEG enables us to describe formal languages in terms of recognition rules of rows of these languages. In essence, this language describes a recursive top-down parser (similar to the one that we wrote by hand in the previous sections). Therefore, this language is very well suited for describing parsers and automatic code generation of parsers for these descriptions.
To create a parser using the PegGrammar macro you need to do two things:

  1. Describe the grammar of the language (including handling of error situations).
  2. Implement handlers for the named rules.
    Language description (rules of a language) is placed in a metaatribut of the PegGrammar class.

WARNING
Classes will be described in next section and attributes a bit later. So far to understand this example it is enough to know that the class is another user-defined type and metaatribut is a kind of macro that is applied to types or their members. If you are familiar with C# then metaatribut can be represented as a custom attribute (Custom Attribut). It leads to the execution of a macro (at compile time) with the same name.

The class marked with such metaatribut is automatically converted into a class of a parser. Macro PegGrammar parses the grammar that is sent to it and generates code for the parser. Macro PegGrammar inserts calls of handlers in the right places.
Handlers are formed as methods of a class which marked by the PegGrammar macro. Handler names must correspond to the rules names. If the parser recognizes a rule it calls the handler with an appropriate name. Parameters of handlers must comply with subrules of processed rule.
The rules are divided into the terminal and nonterminal. Terminal rules describe character sequences (one might say “words” of a language). They can’t be recursive. Nonterminal rules describe the grammar of the language. They can be complex and recursive. For example, the number is a terminal symbol and the addition operation is nonterminal. Boundary between terminal and nonterminal rules is very shaky in the PEG. After recognition of the terminal rule by a parser an object of Nemerle.Peg.NToken type is created. After recognition of the nonterminal rule by a parser an object of Nemerle.Peg.VToken [T] type is created. Simplistically we can say that NToken describes the recognized sequence of symbols and VToken [T] – a value formed by the handler.
If the rule has a handler then it is required to describe the return type of this rule. It stated directly after the name of the rule (after the colon). The rest corresponds to PEG-syntax notation. I will not go into details of this notation here. I will only say that it is very similar to EBNF. The main difference is that instead of the “|” operator, which allows to enumerate equal alternative rules, the PEG uses so-called “/” priority choice operator (ordered choice). Thus, the “rule1 / rule2” means that the parser must first try to identify the rule1 and if that fails try to recognize the rule2. That eliminates the ambiguity. If the rule1 is successfully detected then the parsing is recognized as successful despite the fact that the rule2 can be recognized also.
Below is the complete code for the calculator that was created using the PegGrammar macro. Its behavior is almost similar to that of the latest version of the calculator written by us manually.
The main distinguishing feature of this version of the parser is that it doesn’t have a separate lexical analyzer. Grammar directly describes the desired language (including whitespace).

using Nemerle.Collections;
using Nemerle.Peg;
using Nemerle.Text;
using Nemerle.Utility;
using Nemerle;

using System;
using System.Collections.Generic;
using LRPEGCC;

namespace Calculator
{
  type LoopTokens = NToken * NToken * VToken[int];
  
  [Record] public class ParserFatalError : Exception
  {
    public Pos : int;
  }
  
  [PegGrammar(start,
  grammar
  {  
    any                   = ['\u0000'..'\uFFFF']; // recognizes any character
    digit                 = ['0'..'9']+; // recognizes a number 
    spaces                = (' ' / '\t')*; // recognizes space characters
    
    num                   : int = digit spaces;
    unaryMinus            : int = '-' spaces simplExpr;
    parenthesesExpr       : int = '(' spaces sumOrSub ')' spaces;
    parenthesesExprError  : int = '(' spaces sumOrSub (any / !any);
    simplExpr             : int = num / parenthesesExpr / unaryMinus 
                                  / parenthesesExprError / simplExprError;
    simplExprError        : int = any;
    inputError            : int = any;
    mulOrDiv              : int = simplExpr (('*' / '/') spaces simplExpr)*;
    sumOrSub              : int = mulOrDiv  (('+' / '-') spaces mulOrDiv )*;
    mainRule              : int = sumOrSub inputError?;
    start                 : int = spaces mainRule !any;
  })]
  public class CalcParser
  {    
    private num(digit : NToken, _ : NToken) : int
    {
      int.Parse(digit.GetText())
    }
    
    private unaryMinus(_ : NToken, _ : NToken, simplExpr : VToken[int]) : int
    {
      - simplExpr.Value
    }
    
    private parenthesesExpr(_ : NToken, _ : NToken, simplExpr : VToken[int], 
                            _ : NToken, _ : NToken) : int
    {
      simplExpr.Value
    }
    
    private parenthesesExprError(_ : NToken, _ : NToken, last : VToken[int], 
                                 _ : NToken) : int
    {
      throw ParserFatalError("Expected closed parenthesis or "
        + "'+', '-', '*', '/', followed a number or an expression",
        last.EndPos);
    }
    
    private inputError(tok : NToken) : int
    {
      throw ParserFatalError("Expected '+', '-', '*', '/', followed a number or an expression", tok.StartPos);
    }
    
    private simplExprError(tok : NToken) : int
    {
      throw ParserFatalError("Expected a number or an expression in parentheses", 
        tok.StartPos);
    }
    
    private mainRule(expr : VToken[int], _ : option[VToken[int]]) : int
    {
      expr.Value
    }

    private simplExpr(expr : VToken[int]) : int
    {
      expr.Value
    }
    
    private start(_ : NToken, expr : VToken[int], _ : NToken) : int
    {
      expr.Value
    }
    
    private mulOrDiv(firstExpr : VToken[int], lst : List[LoopTokens]) : int
    {
      DoOpHelper(firstExpr, lst)
    }
    
    private sumOrSub(firstExpr : VToken[int], lst : List[LoopTokens]) : int
    { 
      DoOpHelper(firstExpr, lst)
    }
     
    private DoOpHelper(firstExpr : VToken[int], lst : List[LoopTokens]) : int
    {
      def doOp(x : int, y : int, op : string) : int
      {
        match (op)
        {
          | "*" => x * y
          | "/" => x / y
          | "+" => x + y
          | "-" => x - y
          | _   => assert(false);
        }
      }
           
      mutable r = firstExpr.Value;
      
      foreach ((opTok, _, secondTok) in lst)
        r = doOp(r, secondTok.Value, opTok.GetText());
    
      r  
    }
  }
}

p.
using Calculator;
using System.Console;

module Program
{
  Main() : void
  {
    def calcParser = CalcParser();
    
    def calc(text : string) : void
    {

      try
      {
        match (calcParser.Parse(text))
        {
          | Some(result) => WriteLine(result);
          | None         => ()
        }
      }
      catch 
      { | e is ParserFatalError => 
          WriteLine(string(' ', e.Pos) +  "^");
          WriteLine(e.Message);
      }
    }
    
    def readInput()
    {
      def res = ReadLine();
      
      unless (res == "")
      {
        calc(res);
        readInput();
      }
    }
    
    readInput();
  }
}

As you can see the parser code consists of two parts:

  1. Formal description of a grammar of parsed expressions.
  2. Handlers of rules that contain semantic actions (direct logic).

This greatly facilitates the development of a parser and its support. It is much easier to add a new rule, to change available rule or to change existing semantics than writing a new parse function or to eliminate of code duplication.
This approach is called the language-oriented programming or the use of domain-specific languages.

Let me describe the general idea of this approach. Instead of solving the problem with help of general purpose programming language we create a specialized language. Using such specialized language it is easier to describe the problem and the compiler or interpreter of the language. Description at a specialized language is usually a declarative (i.e., the description of the problem itself not its implementation). This allows us to dramatically reduce the amount of description and makes easier to understand the whole problem entirely.

Nemerle macros allow you to create embedded (in Nemerle) domain-specific languages. PegGrammar is an example of implementation of the compiler of such embedded language. This is a very difficult macro, so we will not write its equivalent. If you are interested in its implementation then the complete source code for this macro and examples demonstrating its use can be found here (the macro is located in a subdirectory LRPEGCC).

It is possible to integrate DSL with other code (written directly on Nemerle) very tightly due to the fact that the macro is running during compilation of the program and can analyze the code.
As you may see this code uses the classes and exceptions to handle error situations. Exceptions allow you to interrupt the normal execution of the program and pass control to the exception handler or interrupt the program (if appropriate exceptions are absent). The throw operator generates exceptions. Exception handling is performed by the try / catch operator.
To fully understand exceptions you have to be introduced to the classes in the next section (of course, if you’re not familiar with them so far). So now it is enough to understand that after the call:

throw ParserFatalError(...)

the control immediately jumps to the exception handler:

      catch 
      { | e is ParserFatalError => 

Thus there is an emergency exit from all functions that were called before an exception. Their return values are ignored. The exception object is the only available and reliable source of information after throwing an exception. I’ll discuss objects, their classes and how to use objects in the next section.

References

⚠️ **GitHub.com Fallback** ⚠️