Grok Variants and matching - vilinski/nemerle GitHub Wiki

This page is a part of the Grokking Nemerle tutorial.

Variants (called data types or sum types in SML and OCaml) are forms of expressing data of several different kinds.

Matching is a way of destructuring complex data structures, especially variants.

Table of Contents Variants Matching Other patterns Using variants as trees XML trees Exercises

Variants

The simplest example of variants are enum types known from C.

<c>// C enum Color { Red, Yellow, Green } </c>

//  Nemerle
variant Color {
   | Red
   | Yellow
   | Green
}

Note that you can define C#-like enum types in Nemerle anyway.

//  Nemerle
enum Color {
   | Red
   | Yellow
   | Green
}

However, the variant options might be more useful because they can carry some extra data with them:

 variant RgbColor {
   | Red
   | Yellow
   | Green
   | Different {
       red : float;
       green : float;
       blue : float;
     }
 }

So if color is neither red, yellow nor green, it can be represented with RGB. You can create variant object just like any other object, by using its constructor (which is always implicitly provided):

   //  ...
   def _blue = RgbColor.Different (0f, 0f, 1f);
   def _red = RgbColor.Red ();

You can think about variants as of a union with a selector in C. In the OO world, modeling variants with sub classing can be seen sometimes:

// C#
class Color {
  class Red : Color { }
  class Green : Color { }
  class Yellow : Color { }
  class Different : Color {
    float red;
    float green;
    float blue;
  
    public Different (float red, float green, float blue) {
      this.red = red;
      this.green = green;
      this.blue = blue;
    }
  }
}

Of course you need to write a constructor, mark fields public and so on. When you're done -- using this kind of stuff is quite hard -- you need to use lots of runtime type checks.

On the other hand, Nemerle provides an easy and convenient method of dealing with variants -- pattern matching.

Matching

Pattern matching is accomplished with the match expression. Its semantics are to check each pattern in turn, from top to bottom, and execute the expression after the first pattern that matched. If no pattern matched, an exception is raised. This is like the switch statement in C, but using a large dose of steroids.

string_of_color (color : RgbColor) : string
{
  match (color) {
    | RgbColor.Red => "red"
    | RgbColor.Yellow => "yellow"
    | RgbColor.Green => "green"
    | RgbColor.Different (r, g, b) => $"rgb($r, $g, $b)"
  }
}

If the type of the object you are matching on is known at compile time to be RgbColor, you can omit the RgbColor. prefix in the patterns.

The main idea behind patterns is that they match values that look like them. For example, the Nemerle compiler creates a default constructor for the Different variant option with the following body:

public this (red : float, green : float, blue : float)
{
  this.red = red;
  this.green = green;
  this.blue = blue;
}

Therefore, the constructor call Color.Different (r, g, b) creates a new variant option instance with specified arguments. The pattern looks the same -- it binds actual values of red, green and blue fields to r, g and b respectively. You can also spell the field names explicitly:

| Color.Different (red = r, green = g, blue = b) => $"rgb($r, $g, $b)"

Other patterns

We have already seen a so called "constructor pattern" in action. It is used to match over variants. The constructor pattern consists of variant option name (starting with an uppercase letter) followed by optional tuple or record pattern.

// examples of constructor patterns
// plain one, without sub-pattern:
Color.Red 
// followed by tuple pattern:
Color.Different (r, g, b)
// followed by record pattern:
Color.Different (red = r, green = g, blue = b)

The variable pattern matches any value, and binds it to a specified variable. The variable pattern is an identifier starting with a lowercase letter. They are used mostly inside other patterns, but here we give a (rather pointless) example of using them as the top-level pattern:

// it prints 42
match (42) {
  | x => // here x is bound to 42
    printf ("%d\n", x)
}

The throw-away pattern, written _, matches any value and has no further effects. It is a way of specifying the default: case in matching.

match (color) {
  | Color.Red => "red"
  | _ => "other"
}

The tuple pattern consists of one or more patterns separated by commas and surrounded by parens. We have already used them in the section about tuples above. There they were used in def expressions.

// matches any pair, binding its elements
// to specified variables
(first_element, second_element)

// matches a pair whose first element is Foo
(Foo, _)

The record pattern consists of class name and zero or more named patterns separated by commas enclosed in parentheses. It matches a class, whose field values are matched by sub-patterns.

class Foo {
  public number : int;
  public name : string;
}

StringOfFoo (f : Foo) : string
{
  if (f.name == "")
    f.number.ToString ()
  else
    f.name
}

// do the same as above
StringOfFooMatch (f : Foo) : string
{
  match (f) {
    | Foo where (name = "", number = k) =>
      k.ToString ()
    | Foo where (name = s) => 
      s
  }
}

It might be doubtful if StringOfFooMatch is any better than StringOfFoo. Record patterns are mostly useful when used inside a complex pattern, or when they contain complex patterns.

The literal pattern is an integer, character or string constant. It matches the exact specified value.

StringOfInt (n : int) : string
{
  match (n) {
    | 0 => "null"
    | 1 => "one"
    | 2 => "two"
    | 3 => "three"
    | _ => "more"
  }
}

IntOfString (n : string) : int
{
  | "null" => 0
  | "one" => 1
  | "two" => 2
  | "three" => 3
  | _ => 42
}

Note lack of match in the second example. When the function body starts with | -- the match expression is automagically inserted.

The as pattern tries to match a value with a pattern enclosed within it, and in case of success binds the value that matched to a specified variable. We will show it later.

There are also two patterns related to types. The first one is the type enforcement pattern. Its job is to give a hint to type inference engine. It consists of a pattern followed by a colon and a type, like: x : Foo or (x,y) : Foo * Bar. It requires type of the matched value to be statically known to subtype the type specified.

The other pattern related to type is the type check pattern. Is is used to check whether the value matched has the specified type. It consists of a variable followed by the keyword is and a type. If the runtime type of the value matched is subtype of the type specified, the branch is taken and the matched value is bound to specified variable.

match (some_value) {
  | x is SomeType =>
    // use x
  | x is SomeOtherType =>
    // use x
  | _ => ...
}

Using variants as trees

The example above, while simple, is not the best usage of variants. Variants are best at handling tree-like data structures. A common example of tree data structures are XML documents. However, we will deal with plain binary trees first.

The following example defines the type of trees of integers (representing sets).

variant Tree {
  | Node {
      left  : Tree;
      elem  : int;
      right : Tree;
    }
  | Null
}

// return tree t with element e inserted
Insert (t : Tree, e : int) : Tree
{
  match (t) {
    | Tree.Node (l, cur, r) =>
      if (e < cur)
        Tree.Node (Insert (l, e), cur, r)
      else if (e > cur)
        Tree.Node (l, cur, Insert (r, e))
      else
        // node already in the tree,
        // return the same tree
        t
    | Tree.Null =>
      Tree.Node (Tree.Null (), e, Tree.Null ())
  }
}

// check if specified integer is in the tree
Contains (t : Tree, e : int) : bool
{
  match (t) {
    | Tree.Node (l, cur, r) when e < cur => 
      Contains (l, e)

    | Tree.Node (l, cur, r) when e > cur => 
      Contains (r, e)

    | Tree.Node => true
    | Tree.Null => false
  }
}

XML trees

As you can see binary trees are not very interesting, so we will go to XML. Whether XML is interesting remains a doubtful question, but at least it is somewhat more practical.

variant Node {
  | Text { 
      value : string; 
    }
  | Element {
      name : string; 
      children : list [Node];
    }
}

This variant defines a simplistic data structure to hold XML trees. An XML node is either a text node with a specified text inside, or an element node, with a name and zero or more children. A sequence of children is represented as a Nemerle list data structure (Nemerle has even a special syntax for lists). The type is written here list [Node] -- a list of nodes. We will learn more about polymorphic variants later.

For example the following tree:

<xml><tree> &lt;branch&gt; &amp;lt;leaf&amp;gt;&amp;lt;/leaf&amp;gt; &lt;/branch&gt; &lt;branch&gt; Foo &lt;/branch&gt; </tree> </xml>

would be represented by:

Node.Element ("tree", 
  [Node.Element ("branch", [Node.Element ("leaf", [])]),
   Node.Element ("branch", [Node.Text ("Foo")])])

Of course XML by itself is just a data format. Using data in the above form wouldn't be too easy. So we want some different internal representation of data, and use XML only to save it or send it over the network.

class Refrigerator
{
  minimal_temperature : float;
  content : list [RefrigeratorContent];

  public this(min_temp : float, cont : list[RefrigeratorContent])
  {
    minimal_temperature = min_temp;
    content = cont;
  }
}

variant RefrigeratorContent
{
  | Beer { name : string; volume : float; }
  | Chips { weight : int; }
  | Ketchup
}

Now we'll write simple XML parsing function.

ParseRefrigerator (n : Node) : Refrigerator
{
  | Node.Element ("refrigerator", 
      Node.Element ("minimal-temperature", [Node.Text (min_temp)]) 
            :: content) =>
        Refrigerator (System.Single.Parse (min_temp), 
                      ParseRefrigeratorContent (content))
  | _ =>
    throw System.ArgumentException ()
}

ParseRefrigeratorContent (nodes : list [Node]) 
  : list [RefrigeratorContent]
{
  | [] => []
  
  | node :: rest =>
    def food =
      match (node) {
        | Node.Element ("ketchup", []) =>
          RefrigeratorContent.Ketchup ()

        | Node.Element ("beer", 
            [Node.Element ("name", [Node.Text (name)]),
             Node.Element ("volume", [Node.Text (volume)])]) =>
          RefrigeratorContent.Beer (name, System.Single.Parse (volume))

        | Node.Element ("chips",
            [Node.Element ("weight", [Node.Text (weight)])]) =>
          RefrigeratorContent.Chips (System.Int32.Parse (weight))

        | _ =>
          throw System.ArgumentException ()
      };
    food :: ParseRefrigeratorContent (rest)
}

The reader will easily note that a) this code looks a bit like a junk, b) it can be generated automatically and c) in C# it would be even worse. Later we will learn how to write macros to generate this kind of code automatically.

But let's leave the ideology behind. There are probably few interesting things about this example. The first is the usage of list patterns and constructors. We can check if a list is empty, and if not, deconstruct it with the following code:

match (l) {
  | [] => 
    // the list is empty
    // ...
  | head :: rest =>
    // the list isn't empty, the first element 
    // of the list is bound to the 'head' variable
    // and the rest of the list to 'rest'
    // ...
}

We can also construct new lists with :: operator -- it prepends an element to an existing list. If we know all list elements in advance we can use the [ ... ] thing in both expressions and patterns.

The second interesting thing is that we throw an exception in case of problems. We will talk about it later, for now assume, it just terminates the program with an error message.

Exercises

2.2 (2 points). Write a function that reads XML from specified files and puts it into the Node variant defined above. Then write a function to dump your data in a lispy format, something like:

(tree
(branch
(leaf)
)
(branch
($text "Foo")
)
)

For an extra point implement indentation of output.

(tree
  (branch
    (leaf))
  (branch
    ($text "Foo")))

Then copy the Parser functions from above, fix any errors you find in them and try to parse the following file:

<xml><?xml version='1.0' encoding='utf-8' ?> <refrigerator> &lt;minimal&#45;temperature&gt;&#45;3.0&lt;/minimal&#45;temperature&gt; &lt;beer&gt; &amp;lt;name&amp;gt;Hyneken&amp;lt;/name&amp;gt; &amp;lt;volume&amp;gt;0.6&amp;lt;/volume&amp;gt; &lt;/beer&gt; &lt;beer&gt; &amp;lt;name&amp;gt;Bydweisser&amp;lt;/name&amp;gt; &amp;lt;volume&amp;gt;0.5&amp;lt;/volume&amp;gt; &lt;/beer&gt; &lt;beer&gt; &amp;lt;name&amp;gt;Plsner&amp;lt;/name&amp;gt; &amp;lt;volume&amp;gt;0.5&amp;lt;/volume&amp;gt; &lt;/beer&gt; &lt;chips&gt; &amp;lt;weight&amp;gt;500&amp;lt;/weight&amp;gt; &lt;/chips&gt; &lt;ketchup&gt;&lt;/ketchup&gt; </refrigerator> </xml>

Warning: you do not need to write the XML parser. You should not do it, actually. Use the System.Xml namespace from the .NET Framework. In order to link with the System.Xml library you need to compile with the -r:System.Xml option. For example:

  ncc -r System.Xml myprogram.n

should do.