Nemerle for OOP Programmers Week 3 - vilinski/nemerle GitHub Wiki

This week we will learn more about functional programming in Nemerle. We will find out about tuples, variants, and advanced pattern matching.

Table of Contents

Tuples

Tuples in Nemerle are similar to tuples in math, or in SQL for that matter. They are immutable, heterogeneous (a tuple can hold values of several types at once) finite sequences of objects. The most common examples of tuples is a pair or triple:

def our_pair = (1, 2);
def our_triple = (1, 12, -10);

As of version 0.9.3 of the Nemerle compiler, up to 19 elements can be held in a tuple (the syntax looks clumsy for 5 or 6 elements already, and you cannot use loops for iterating over tuples).

A tuple can hold values of several types:

def another_pair = ("foo", 3.0);

Tuples are useful for returning multiple values from functions or holding several things in a single cell in containers. One can always use record-like classes, however this requires additional hassle of a separate class definition.

Tuple pattern

The most common way of decomposing a tuple is by using a tuple pattern. It looks mostly like the tuple definition:

def divmod (x, y) {
  def div = x / y;
  def mod = x % y;
  (div, mod)
}

match (divmod (142, 13)) {
  | (d, m) =>
    System.Console.WriteLine ($ "div=$d, mod=$m");
}

// Output: div=10, mod=12

Here we define a local function -- introduced in Week 2 -- called divmod, which returns the tuple (div, mod). The result is then matched against a tuple pattern. As you can see, the pattern binds the first element of the tuple to d and the second to m.

Single-case matching has a special, shorter syntax in Nemerle, as shown here:

def (d, m) = divmod (142, 13);
System.Console.WriteLine ($ "div=$d, mod=$m");

The divmod function could be written more compactly as well:

def divmod (x, y) {
  (x / y, x % y)
}

The arguments to the tuple constructor are just expressions, and the names div and mod used for elements didn't really matter.

Tuple indexer

For cases where pattern matching doesn't seem right (for example, you want just the first element of a tuple returned by a nested call), there is a special tuple indexer. Using it, the example above could be rewritten as:

def divmod (x, y) {
  (x / y, x % y)
}

def r = divmod (142, 13);
System.Console.WriteLine ($ "div=$(r[0]), mod=$(r[1])");

Now, the tuple result gets assigned to r, and the $ macro expands it's elements by index. As with arrays, tuple indexing is 0-based (the first element of a tuple has index 0).

Indexing rules

In case tuples have started to look too much like arrays, here are some rules about tuple indexers to help clarify things.

Unlike an array indexer, it is not possible to supply anything beside a constant to the tuple indexer. For example, the following code will not work:

def pair = (2, "foo");
def idx = int.Parse (System.Console.ReadLine ());
def element = pair [idx]; // dynamic access is invalid

In this contrived example, it should be clear why this is prohibited: depending on user input the type of element would be string or int. The only solution would be to use a compile-time type of object for element, however we felt that this wouldn't be useful enough to mandate implementation.

Because tuples are immutable, you also cannot assign values to tuple elements, as in:

def p = (3, 4);
p [0] = 17; // incorrect assignment to tuple

Tuple type

The constructor of tuple types is the star (*). For example the tuple ("foo", 42, 3.1415) has type string * int * double. This notion comes from math, where the times (×) symbol is used, as in dimensions L × W × H.

A fully-typed standalone version of divmod would look like:

static divmod (x : int, y : int) : int * int {
   (x / y , x % y)
}

Side note: relationship between tuple and function types

The tuple type constructor is also used in function types, which is not an accident. We consider a function taking more than one argument exactly the same as a function taking a single tuple argument of the corresponding type. For example the following code is perfectly valid:

def print_two_ints (x, y) {
  System.Console.WriteLine ($ "first=$x, second=$y");
}
def divmod (x, y) {
  (x / y, x % y)
}
print_two_ints (divmod (143, 77))

Side note: tuple assignment

Nemerle supports multiple assignment. That is, one can use a pseudo-tuple of possible assignment targets. For example:

mutable x = 17, y = 32;
(x, y) = (y, x);          // swap x and y
(x, y) = (y + 12, 2 * x); // or even change their values

All the source expressions are evaluated first and the assignment is done afterwards (this is why the swap works fine).

Variants

This section borrows some text from the Grokking Nemerle tutorial.

Variants (called data types or sum types in SML and OCaml) are forms of expressing data of several different kinds.

The simplest example of variants are enum types known from C (or C#).

// C
enum Color {
  Red, 
  Yellow, 
  Green 
}

You can define C#-like enum types in Nemerle too, but we will talk about it next week. Now let us look at the simplest variant type:

// Nemerle
variant Color {
   | Red
   | Yellow
   | Green
}

However, the variant options might be more useful because they can carry some extra data with them:

variant Color {
  | Red
  | Yellow
  | Green
  | Different {
      red : float;
      green : float;
      blue : float;
    }
}

So if the color is neither red, yellow nor green, it can be represented with RGB. You can create variant objects just like any other object, by using its constructor. All variant options have an implicit [Record] macro invocation on them. We talked about this macro 2 weeks ago, it adds a constructor assigning to each field of a class:

def blue = Color.Different (0, 0, 1);
def red = Color.Red ();

In the OO world, modeling variants with sub classing can sometimes be seen:

// C#
class Color {
  class Red : Color { }
  class Green : Color { }
  class Yellow : Color { }
  class Different : Color {
    public float red;
    public float green;
    public float blue;
  
    public Different (float red, float green, float blue) {
      this.red = red;
      this.green = green;
      this.blue = blue;
    }
  }
}

Of course, you need to write a constructor, mark fields as public, and so on. When you're done -- using this kind of stuff can get quite involved -- you will need to use lots of runtime type checks.

On the other hand, Nemerle provides an easy and convenient method of dealing with variants -- pattern matching.

Matching over variants

We already used pattern matching on lists, so you can imagine doing a switch over variant options. For example, a function returning string representation of a color could look like this:

variant Color {
  | Red
  | Yellow
  | Green
  | Different {
      red : float;
      green : float;
      blue : float;
    }
}

def string_of_color (color)
{
  match (color) {
    | Color.Red => "red"
    | Color.Yellow => "yellow"
    | Color.Green => "green"
    | Color.Different (red = r, green = g, blue = b) => 
      $ "rgb($r, $g, $b)"
  }
}

System.Console.WriteLine (string_of_color (Color.Red ()));
System.Console.WriteLine (string_of_color (Color.Different (1, 1, 0)));

/* Output:
red
rgb(1, 1, 0)
*/

The first three patterns state we're not interested in any possible fields in the case of Red, Yellow and Green. The last pattern, for the Different case, binds values of the red, green and blue to r, g and b respectively. You can omit matched fields at will, as well as change the ordering.

It is also possible to use a shortcut here:

| Color.Different (r, g, b) =>

This is only available when you specify all the fields in the given object, and in the right order.

Using variants as trees

The example above, while simple, is not the best usage of variants. Variants are best at handling tree-like data structures. A common example of tree data structures are XML documents. However, we will deal with plain binary trees first.

The following example defines the type of trees of integers (representing sets).

variant Tree {
  | Node {
      left  : Tree;
      elem  : int;
      right : Tree;
    }
  | Null

  public override ToString () : string
  {
    match (this) {
      | Tree.Node (l, e, r) => $ "($l $e $r)"
      | Tree.Null => "."
    }
  }
}

So a tree node is either an inside node with an element and two subtrees or a null tree without any elements inside. We have also overridden the ToString method, so the $ macro and WriteLine work properly on trees (the default implementation would yield "Tree.Node" or "Tree.Null" only).

We can now define a method for inserting elements to the tree. It should be defined inside the Tree variant:

public Insert (e : int) : Tree
{
  match (this) {
    | Tree.Node (l, cur, r) =>
      if (e < cur)
        Tree.Node (l.Insert (e), cur, r)
      else if (e > cur)
        Tree.Node (l, cur, r.Insert (e))
      else
        this
    | Tree.Null =>
      Tree.Node (Tree.Null (), e, Tree.Null ())
  }
}

The function checks if the element to insert is smaller than the element in the current node, and if so, inserts it in the left subtree. If it's bigger, it inserts the element in the right subtree. Otherwise, it has to be equal, so it doesn't insert it again: it just returns the unchanged tree.

If the function hits an empty subtree, it creates a new leaf in that place.

Having a Contains check wouldn't be a bad idea, either:

public Contains (e : int) : bool
{
  match (this) {
    | Tree.Node (l, cur, _) when e < cur => 
      l.Contains (e)

    | Tree.Node (_, cur, r) when e > cur => 
      r.Contains (e)

    | Tree.Node => true
    | Tree.Null => false
  }
}

This function works much like insert -- it checks if the element could be found in the left or in the right subtree, and looks for it there. If not, either it has found the matching node, or null, and returns the appropriate result.

There is however one new feature used here -- when guards in matching. Any matching clause can have an additional condition attached. The function first checks if the pattern matches the value, and if so, the condition is evaluated. If that yields true, the given branch is taken, otherwise it proceeds with further match clauses.

This example could also be written with regular ifs:

public Contains (e : int) : bool
{
  match (this) {
    | Tree.Node (l, cur, r) =>
      if (e < cur)
        l.Contains (e)
else if (e > cur)
        r.Contains (e)
else
  true

    | Tree.Null => false
  }
}

Finally we can test our example:

// we start with an empty tree
def t = Tree.Null ();
// add a few elements
def t = t.Insert (13).Insert (34).Insert (23);
// and display it
System.Console.WriteLine (t); 
System.Console.WriteLine (t.Contains (13)); 
System.Console.WriteLine (t.Contains (42)); 

/* Output:
(. 13 ((. 23 .) 34 .))
True
False
*/

XML trees

As you can see binary trees are not very interesting, so we will go on to XML. Whether XML is interesting remains a doubtful question, but at least it is somewhat more practical.

variant Node {
  | Text { 
      value : string; 
    }
  | Element {
      name : string; 
      children : list [Node];
    }
}

This variant defines a simplistic data structure to hold XML trees. An XML node is either a text node with some specified text inside, or an element node with a name and zero or more children. A sequence of children is represented as a Nemerle list.

For example the following tree:

<xml><tree>
  <branch>
    <leaf/>
  </branch>
  <branch>
    Foo
  </branch>
</tree>
</xml>

would be represented by:

Node.Element ("tree", 
  [Node.Element ("branch", [Node.Element ("leaf", [])]),
   Node.Element ("branch", [Node.Text ("Foo")])])

Of course XML by itself is just a data format. Using data in the above form wouldn't be too easy. So we want some different internal representation of data, and use XML only to save it or send it over the network.

variant RefrigeratorContent
{
  | Beer { name : string; volume : float; }
  | Chips { weight : int; }
  | Ketchup

  public static FromXml (node : Node) : RefrigeratorContent
  {
    match (node) {
      | Node.Element ("ketchup", []) =>
        RefrigeratorContent.Ketchup ()

      | Node.Element ("beer", 
          [Node.Element ("name", [Node.Text (name)]),
           Node.Element ("volume", [Node.Text (volume)])]) =>
        RefrigeratorContent.Beer (name, float.Parse (volume))

      | Node.Element ("chips",
          [Node.Element ("weight", [Node.Text (weight)])]) =>
        RefrigeratorContent.Chips (int.Parse (weight))

      | _ =>
        throw System.ArgumentException ()
    }
  }
}

The most interesting thing here are the nested patterns. Until now we have always used very shallow patterns -- we just checked the top-level object and possibly its bound fields. However it is possible to look very deeply in the structure of object. Instead of binding values of fields to variables, this function checks if they match given patterns. This is all that is happening here -- nested patterns.

Probably the most annoying thing about the example above is the amount of times you have to say RefrigeratorContent and Node. Fortunately both can be skipped, however for quite different reasons.

RefrigeratorContent used for object construction can be skipped, because we are in the body of the RefrigeratorContent variant definition. If we wanted to skip it elsewhere, we would have to say using RefrigeratorContent; first.

On the other hand, Node in matching can be skipped, because we provided the type of the node parameter in the match statement, therefore the compiler will just look up the appropriate variant options in this type.

So the shorter example would be:

variant RefrigeratorContent
{
  | Beer { name : string; volume : float; }
  | Chips { weight : int; }
  | Ketchup

  public static FromXml (node : Node) : RefrigeratorContent
  {
    match (node) {
      | Element ("ketchup", []) => Ketchup ()

      | Element ("beer", [Element ("name", [Text (name)]),
                          Element ("volume", [Text (volume)])]) =>
        Beer (name, float.Parse (volume))

      | Element ("chips", [Element ("weight", [Text (weight)])]) =>
        Chips (int.Parse (weight))

      | _ =>
        throw System.ArgumentException ()
    }
  }
}

Once we have something to put into a refrigerator, we can now build the refrigerator.

[Record]
class Refrigerator
{
  public minimal_temperature : float;
  public content : list [RefrigeratorContent];
  
  public static FromXml (node : Node) : Refrigerator
  {
    match (node) {
      | Element ("refrigerator", 
          Element ("minimal-temperature", [Text (min_temp)]) :: content) =>
        def parse_content (lst) {
	  match (lst) {
	    | x :: xs =>
	      RefrigeratorContent.FromXml (x) :: parse_content (xs)
	    | [] => []
	  }
        }
        Refrigerator (float.Parse (min_temp), parse_content (content)); // (*)
      | _ =>
        throw System.ArgumentException ("node")
    }
  }
}

The reader will easily note that: a) the XML deserialization code looks a bit like a junk b) it can be generated automatically, and c) without matching it would be even worse. Later we will learn how to write macros to generate this kind of code automatically.

We can however still make it somewhat better, without resorting to macros. First off, if you wrote the map example from the previous week, then the parse_content function should familiar to you. In fact it can be replaced with map altogether, like this:

Refrigerator (float.Parse (min_temp), map (content, RefrigeratorContent.FromXml));

We have passed a static global function as a functional value. It works exactly the same as with local functions.

Because the map function is generally useful, it is included in the standard library as a member function of the list data structure. So we remove parse_content, and replace line marked by (*) with:

Refrigerator (float.Parse (min_temp), content.Map (RefrigeratorContent.FromXml));

Side note: skipping match

Because it is quite a common pattern to do matching directly on arguments, Nemerle provides a way to skip match in such cases. If your function starts with | it is implicitly surrounded with match (single_parm) { ... }, or in the case there is more than one parameter, with match (parm1, parm2, ..., parmN) { ... } (so you can use tuple patterns to get to a particular parameter). So we can further shorten the above example by two more lines:

public static FromXml (node : Node) : Refrigerator
{
  | Element ("refrigerator", 
      Element ("minimal-temperature", [Text (min_temp)]) :: content) =>
    Refrigerator (float.Parse (min_temp), content.Map (RefrigeratorContent.FromXml));
  | _ =>
    throw System.ArgumentException ("node")
}

Exercises

Exercise 1: Tree.Iter

Add an Iter (f : int -> void) : void method to the Tree variant, implementing inorder traversal (that is you first traverse the left subtree, then call f on current element, and traverse the right subtree).

Exercise 2: XML parsing

Write a function that reads XML from specified files and puts it into the Node variant defined above. Then write a function to dump your data in a lispy format, something like:

(tree
(branch
(leaf)
)
(branch
($text "Foo")
)
)

Then you can implement indentation of output.

(tree
  (branch
    (leaf))
  (branch
    ($text "Foo")))

Then copy the refrigerator variants from the above, fix any errors you find in them and try to parse the following file:

<xml><?xml version='1.0' encoding='utf-8' ?>
<refrigerator>
  <minimal-temperature>-3.0</minimal-temperature>
  <beer>
    <name>Hyneken</name>
    <volume>0.6</volume>
  </beer>
  <beer>
    <name>Bydweisser</name>
    <volume>0.5</volume>
  </beer>
  <beer>
    <name>Plsner</name>
    <volume>0.5</volume>
  </beer>
  <chips>
    <weight>500</weight>
  </chips>
  <ketchup/>
</refrigerator>
</xml>

Warning: you do not need to write the XML parser. You should not do it, actually. Use the System.Xml namespace from the .NET Framework. In order to link with the System.Xml library you need to compile with the -r:System.Xml option. For example:

  ncc -r System.Xml myprogram.n

should do.

ToC | Week 4

⚠️ **GitHub.com Fallback** ⚠️