Nemerle for OOP Programmers Week 4 - vilinski/nemerle GitHub Wiki

This week we will learn about the remaining patterns, get some insights about performance of functional programs, and finally learn how to define polymorphic (generic) types.

This is the last lesson about FP during this course.

Table of Contents Advanced pattern matching Type check pattern Record pattern as pattern Type hint pattern Alternative clauses with clauses Performance considerations Tail calls vs loops Programming with accumulators Reverse list Length of list Fold Side note: redefining symbols Parametric polymorphism aka generics Generic types (and immutable collections) Generic methods Generic specifier For constructors For methods For types in static member references Constraints on type variables Excercises One Two Three

Advanced pattern matching

This section discusses a few matching constructs that remain to be covered.

Type check pattern

The type check pattern checks if the given value has the given type. It is similar to is in C# and Java, even the syntax looks alike:

using System.Console;

def check (o : object) {
  match (o) {
    | i is int => WriteLine ($ "an int, $(i * 2) / 2!");
    | s is string => WriteLine ($ "a string: $s!");
    | _ => WriteLine ("something else");
  }
}

check (21);
check ("foo");
check (3.0);

/* Output:
an int, 42 / 2!
a string: foo!
something else
*/

In this matching pattern, you can see how i is used as an int, and s as a string. The is pattern checks if the match value (o, an object) has one of the given types, and if so, binds the value to specified identifier. The identifier is given a static type for each branch in the match. The compiler implicitly casts the value to the type of the matching identifier. So the common C# pattern:

if (x is Foo) {
  Foo y = (Foo)x;
  ... use y ...
} else { ... }

or:

Foo y = x as Foo;
if (y != null) {
  ... use y ...
} else { ... }

becomes:

match (x) {
  | y is Foo => ... use y ...
  | _ => ...
}

This is useful when you need to cast an object as one of its inherited base classes, or as an interface implemented by the object.

Note that the C# construction as doesn't have anything to do with the Nemerle as pattern, presented below.

Record pattern

We have already seen the record pattern as a subpattern of the constructor pattern used for variants. The idea is to take an object and match on its fields:

class MyClass {
  public foo : int;
  public bar : string;
}

def c = MyClass ();
match (c) {
  | (foo = 0, bar = null) =>
    System.Console.WriteLine ("fine");
  | _ =>
    System.Console.WriteLine ("oops");
}

Field matching is quite flexible: you can skip some fields and reorder them.

You can also explicitly state the type of the class you're matching on. This is useful when the compiler complains about unknown types (there are some limitations of type inference in the current implementation, that may prevent it from working here). For instance, you could write a variation of the match above as:

def check_class (c) {
  match (c) {
    | MyClass where (foo = 0) => true
    | _ => false
  }
}

def result = check_class (MyClass ())

The record pattern can also match on individual readable properties:

class MyClass {
  [Accessor]
  public foo : int;
  public bar : string;
}

def c = MyClass ();
match (c) {
  | (Foo = 0) =>
    System.Console.WriteLine ("fine");
  | _ =>
    System.Console.WriteLine ("oops");
}

Properties do not take part in tuple-pattern -> record-pattern automatic conversion, where you can write a tuple pattern that is automatically translated to record pattern (we talked about this in Week 3).

as pattern

The as pattern binds a value matched by a subpattern to an identifier. This is useful when you want to check if a value has a given structure, but you're interested in still handling it as a whole, for example:

match (construct_list ()) {
  | [1, _] as l =>
    handle_one_something (l)
  | _ => {}
}

Another example, perhaps more interesting, is:

variant Foo {
  | A { x : int; mutable y : string; }
  | B
}

match (some_foo ()) {
  | A (3, _) as a =>
    a.y = "three";
  | _ => {}
}

Here the compiler knows the static type of a is Foo.A, so you can assign values to its y field.

Type hint pattern

This pattern matches to explicitly declared types. It is used for hinting the compiler about the static type of given matched value or subvalue. If the compiler gets the wrong idea about a type you have left for it to infer, it will scream. If it doesn't know the type, it will appreciate the hint. Example:

def foo (l) {
  | (s : string) :: _ => s [0]
  | [] => '?'
}

assert (foo (["foo", "bar"]) == 'f');

Here we hint the compiler that the type of s (the first element of the list) will be string so we can use the indexer to get its first character (at the time of this writing, there is a bug in our type inference engine that prevents deducing the type of s early enough for this snippet to work unhinted).

The assert macro checks if given condition is true, and if it's not, it throws an exception.

The example above can be rewritten as:

def foo (l : list [string]) {
  | s :: _ => s [0]
  | [] => '?'
}

assert (foo (["foo", "bar"]) == 'f');

with the same effect, although it is slightly longer (because we have to specify the list part of the type).

Alternative clauses

You can combine several matching clauses in a single branch. For example:

match (my_list) {
  | [1]
  | [1, 2] => WriteLine ("one or one, two");
  
  | [x]
  | [7, x] => WriteLine ($ "$x or seven, $x");

  | _ => WriteLine ("something else");
}

Here, the first two branches match against multiple lists. The second branch even matches against some value x, appearing in both its clauses. If you use a value in a matching clause, it can't change types, and it has to appear for each clause in the branch. In other words, you couldn't do this:

| [7]   // x needs to appear here,
| [7, x] => WriteLine ($ "$x or seven, $x");   // or not here

There is a way to work around this limitation, by using a with clause.

with clauses

When using alternative clauses it is sometimes useful to match some smaller pattern with a value x, and a bigger pattern with x and another value y. Now when we construct a single matching branch for the two patterns, we cannot use y in the smaller, as it doesn't occur there. The rub is we have to include y for all matches in the branch, because of the rule on matching values explained above.

The with clause solves this problem by providing a way of specifying a default y value in such cases. For example:

def foo (_) {
  | [x] with y = 3
  | [x, y] => x * y
  | _ => 42
}

assert (foo (3) == 9);
assert (foo (3, 4) == 12);
assert (foo (3, 4, 5) == 42);

The ability to match irregular length lists by providing with clause defaults comes in handy when creating more complex patterns.

Performance considerations

We now switch gears to the topic of FP performance. Tail calls, an FP method of implementing functions, are covered in detail. We will also examine looping and in-lining, which are compiler optimizations for tail calls that minimize stack use and function call overhead. Accumulators, an FP construct implemented with tail calls, are also introduced here.

Tail calls vs loops

When you have a while loop, like this:

while (true)
  // do something

it is possible to rewrite it like this:

def my_loop () {
  // do something
  my_loop ();
}

my_loop ();

This code calls my_loop which first executes the body of the loop, and at the end calls itself (and executes the body of the loop, and calls itself, and...). As you can see the result is the same as with the while loop.

If the while loop has a condition, it is also possible to rewrite:

while (some_condition)
  // do something

into:

def my_loop () {
  when (some_condition) {
    // do something
    my_loop ();
  }
}

my_loop ();

The performance of the two pieces of code is exactly the same. The Nemerle compiler can see that there is nothing left to do after call to my_loop inside my_loop, and replaces this call with an internal jump. Moreover, it notes that my_loop is called just once outside the body of the function, and therefore inlines it, replacing the call with the function body code itself. These optimizations together eliminate stack overhead in this example.

In fact, the compiler internally transforms all kinds of loops into local functions.

Calling some function f within f, in such a place that there is nothing more left to do in the body of f, is called tail recursion. It can be always replaced with a jump (internally, by the compiler).

To make functional programs run fast, tail recursion is a good idea for loops that are going to execute, say, one million times or more. Of course, there is no point rewriting things that are more readable as while loops to tail recursion, except when you want to obey some strict FP rules, like "never use mutable variables".

However there are problems where a solution with tail recursion is actually shorter and more readable (well, readablity depends on expertise with FP) than one with a loop.

Programming with accumulators

An accumulator is a parameter of a function that is used to hold the result constructed so far. When the function executes for the last time, it returns the accumulator, possibly transformed using some other function.

Reverse list

def rev (l, acc = []) {
  match (l) {
    | x :: xs => rev (xs, x :: acc)
    | [] => acc
  }
}

System.Console.WriteLine (rev ([1, 2, 3]));

// Output: [3, 2, 1]

This function adds the first element of the input list l to the accumulator acc, and calls itself with the rest of the list. It accumulates head elements until the list is exhausted, at which point it returns acc.

Because the result list is built from the end, while the input list is traversed from the beginning, the function returns a reversed list.

This is a common property of programming with list accumulators -- the result lists are reversed.

Note how rev is called inside itself in a tail position. This means the call is transformed to a jump, so it is very fast and doesn't consume stack space.

Length of list

It is also possible to use types other than list for accumulators:

def length (l, acc = 0) {
  match (l) {
    | _ :: xs => length (xs, acc + 1)
    | [] => acc
  }
}

System.Console.WriteLine (length ([4, 44, 22]));

// Output: 3

Here we use an integer as a accumulator, to compute the length of the list.

Fold

There is one more important concept related to programming with accumulators: the fold function. Fold is a built-in method for most Nemerle collections, so it is a handy shortcut to writing certain accumulator functions yourself. We will talk about fold for lists here. The signature is:

List.FoldLeft['a,'b] (l : list ['a], ini : 'b, f : 'a * 'b) : 'b

FoldLeft takes a list of type 'a, an initial value of type 'b, a function relating 'a to 'b, and returns a value of type 'b.

The expression List.FoldLeft ([x1, x2, ..., xN], ini, f) is equivalent to: f (xN, f (... f (x2, f (x1, ini))...). So, folding is a way to efficiently nest function calls on a list of arguments. When folding, the function is applied to the first element of the list, then successively to the result of that call plus the next element, until the whole list is evaluated.

This definition in code may be easier to understand:

def fold (l, acc, f) {
  match (l) {
    | x :: xs =>
      fold (xs, f (x, acc), f)
    | [] => acc
  }
}

This has a very similar form to both functions shown above. The rev function replaces f (x, acc) with x :: acc, and length replaces it with 1 + acc. Now, because function f is a parameter to fold, you can implement both rev and length with the built-in FoldLeft method:

def rev (l) { List.FoldLeft (l, [], fun (x, xs) { x :: xs }) }
def length (l) { List.FoldLeft (l, 0, fun (_, acc) { 1 + acc }) }

One can also use the FoldLeft member of the list variant:

def rev (l) { l.FoldLeft ([], fun (x, xs) { x :: xs }) }
def length (l) { l.FoldLeft (0, fun (_, acc) { 1 + acc }) }

which makes the implementations even shorter.

Gaining proficiency with the fold function may be a little hard at first. It's good idea to take a function using direct tail recursion and try to rewrite it using fold to learn the cases where it is useful (this is only to learn things, of course there is no point rewriting already working stuff just to use fold).

Side note: redefining symbols

As we wrap up our exploration of functional performance, let's take a moment to look at the topic of symbol redefinition.

From Week 2, recall this list filter example:

def filter (l, f) {
  match (l) {
    | x :: xs =>
      def xs' = filter (xs, f);
      if (f (x)) x :: xs'
      else xs'
    | [] => []
  }
}

For clarity, we used xs' as the name for the filtered result of list xs. But, we could have simply reused xs. It would look like:

def xs = filter (xs, f);

While this reminds us of assignment in imperative programming (from this point on in the program the new xs is in scope, so it looks as if the original was changed), it is quite different from the theoretical point of view.

Note that you can rebind an identifier to a different type. For example, the fragment:

def x = [1, 2];
def x = "foo";

is perfectly valid. You could even do that with mutable variables, though a warning would be generated. However, the more common use is to rebind an identifier to the results of some computations on its previous meaning. For example:

def result = 2 * 2;
def result = $ "the result is: $result";

Both bindings of result have similar semantic meaning, though they are quite different from the compiler point of view. The second result is an entirely new entity, and not related to the first in any physical way.

The advantage of this approach is that you won't accidently use xs instead of xs' in your code. The disadvantage is that it is less clear what is going on as xs changes its meaning.

The Nemerle compiler will warn about redefinitions of mutable symbols (and redefinitions with mutable symbols), because it is not a very mainstream programming concept, and the relative likelihood of misunderstanding the difference between redefinition and assignment.

Parametric polymorphism aka generics

Generic types (and immutable collections)

Generic types allow the definition of strongly typed containers that can hold values of arbitrary types. An example of such a container is the array -- the type of elements stored must match the type of the array. Similarly, lists contain only elements that match the list's type. The difference is that for arrays and lists, the type must be specified at compile time.

Restricting a list's type improves static code safety. Instead of using a hold-anything list type, object, you can restrict your list to a single type, say list [int]. This prevents adding a string to this list, and gives the guarantee that when you read an element from it, it will be an integer. This saves you a runtime cast, which is costly and can crash the application if you don't provide extra error handling to handle miscasts.

Of course there are cases when you would want ints and strings in a single list; you can use list [object] in such cases. Then you can add anything to the list, but you need to cast the values taken from it during runtime, with the drawbacks previously mentioned.

Generics offer a middle road: they allow you to build classes that enforce type consistency internally, but don't explicitly define the type, allowing the user to use instances of the class with any arbitrary type at runtime. With generics, you can build type flexibility into your classes, while avoiding the problems associated with runtime casting.

Let's start with a naive Set implementation:

class Set ['a]
{
  mutable storage : list ['a] = [];
  public Add (e : 'a) : void
  {
    when (! Contains (e))
      storage ::= e;
  }
  public Contains (e : 'a) : bool
  {
    storage.Contains (e)
  }
}

def s1 = Set ();
s1.Add (3);
s1.Add (42);
assert (s1.Contains (3));
// s1.Add ("foo"); // error here!
def s2 = Set ();
s2.Add ("foo");
assert (s2.Contains ("foo"));

Instances of the same Set class are used at runtime to handle sets of integers and strings. Having type affinity, however, the same set can't store elements of different types.

Here, the generic parameter 'a is used to define the type of the storage list and method parameters. We talked about generic parameters in Week 2, refer to it for more details.

We can add elements to the set and check if a given element is already there. This is not a very functional approach as the set is updated in-place. So another implementation (and interface!) would be:

class Set ['a]
{
  storage : list ['a] = [];
  
  public this ()
  {
    this ([])
  }

  this (s : list ['a])
  {
    storage = s;
  }
  
  public Add (e : 'a) : Set ['a]
  {
    if (Contains (e)) this
    else Set (e :: storage)
  }

  public Contains (e : 'a) : bool
  {
    storage.Contains (e)
  }
}

def s1 = Set ();
def s1 = s1.Add (3).Add (42);
assert (s1.Contains (3));
// s1.Add ("foo"); // error here!
def s2 = Set ().Add ("foo");
assert (s2.Contains ("foo"));

A few details of this class bear some explanation:

The public constructor, this (), is used when creating instances in outside code
A private constructor, this (s : list ['a]), is used in the Add method when returning a new instance of Set containing the new element
The internal object reference, this, is also used in Add to return itself if the element already exists in the list

The line def s1 = s1.Add (3).Add (42); may also raise questions. Because . binds to the left, elements are added from left to right, with the last element added being the first in the list. It is equivalent to:

def s1 = s1.Add (3);
def s1 = s1.Add (42);

The key idea here is that we create a new set instance for each element added. When we don't explicitly save older instances, they are garbage collected, as is the case here. However, when we do save an instance of a set by maintaining a reference to it, it will not be affected by adding new elements later. There are cases when this is useful.

For example, a map holding name -> value bindings in the compiler needs to be restored when the scope ends. If we use an immutable map for this, all we have to do is save the state before we enter the scope, and restore the saved reference when we leave.

Additionally, it is easy to wrap immutable structures in a mutable object. This is exactly what we did in the first example -- we wrapped an immutable list in a mutable set. The Nemerle standard library contains a handful of immutable collections.

Generic methods

We've already seen how to define generic methods. You just need to specify the type parameters:

public Length['a] (l : list ['a], acc = 0) : int
{
  match (l) {
    | _ :: xs => Length (xs, acc + 1)
    | [] => acc
  }
}

You are required to specify the generic parameters you're going to use. So, the following declaration is not valid:

class C {
  public foo (x : list [T]) : void { }
}

but this one is valid:

class C {
  public foo[T] (x : list [T]) : void { }
}

Generic specifier

There are three kinds of generic specifiers. They are all used to specify generic parameters explicitly in the code. All three cases are optional, the compiler will always try to infer the generic parameters.

For constructors

This is a very obvious one. If you have a Set['a], like the one before, and you want to ensure a set of ints is being created use:

def s = Set.[int] ();
s.Add ("foo"); // error!
def s2 = Set.[int] ();
s2.Add (3); // OK

Note that unlike in C# you don't have to specify this type, compiler will happily do it for you in most cases.

For methods

This is also quite simple, when a method has generic parameters (like the Length method above) you can supply them:

def l1 = Length.[string] (["foo","bar"]); // OK
def l2 = Length.[string] ([1, 2, 3]); // error

For types in static member references

And this one is more complicated. The type parameters are in scope also in static members. This means the following code is valid:

class Foo['a] {
  static public bar () : void
  {
    System.Console.WriteLine (typeof ('a));
  }
}

When you try to call Foo.bar () the compiler cannot know what 'a you had in mind (in this case it will assume System.Object). However you cannot do Foo.bar.[int] () because the bar method is not generic. Therefore you can do: Foo[int].bar (). You also can do: Foo.[int].bar (), so the dot is optional (but only in this case, we know this sucks, we're working on it).

Constraints on type variables

For certain projects, like model-view-controller frameworks, it is sometimes necessary for types to be substituted with typed variables that also conform to some specific interface. We will address this issue in more detail, as it is probably new for most readers.

For example, elements stored in an ordered binary tree must provide a comparison method so they can be properly sorted. To do this, we define an appropriate interface, IComparable, and then build a variant structure, Tree, which defines a node that can store generic type 'a. Further, Tree requires that any 'a to be stored must implement the IComparable interface:

interface IComparable ['a] {
  CompareTo (elem : 'a) : int;
}

variant Tree['a] 
  where 'a : IComparable['a]
{
  | Node {
      left : Tree['a];
      elem : 'a;
      right : Tree['a];
    }
  | Tip
}

In our variant declaration, we have added a where parameter to further constrain the type of 'a that the variant will accept. Such lower bounds on type variables, where the type variables can occur in types that bound them, are called F-bounded polymorphism in type theory.

In fact the IComparable interface is already defined in the standard library, but that is beside the point.

Once we have ensured that all elements in the tree have both subtype IComparable and a generic type 'a, we can use the CompareTo method to insert an element into a tree:

module TreeOperations {
  public Insert['a] (t : Tree['a], e : 'a) : Tree['a]
    where 'a : IComparable['a]
  {
    match (t) {
      | Node (l, c, r) =>
        if (e.CompareTo (c) < 0)
          Node (Insert (l, e), c, r)
        else if (e.CompareTo (c) > 0)
          Node (r, c, Insert (r, e))
        else
          Node (r, e, l)
      | Tip =>
        Node (Tip (), e, Tip ())
    }
  }
}

Now, people familiar with C# or Java might ask, why not simply use something like:

interface IComparable {
  CompareTo (elem : IComparable) : int;
}

variant Tree
{
  | Node {
      left : Tree;
      elem : IComparable;
      right : Tree;
    }
  | Tip
}

But this is only half good. The most common use for a tree is to store elements of some specific type, for example strings. We don't want integers and strings to be stored in the same tree, for the very simple reason that we cannot compare integers with strings in a reasonable way. Well, even if we could, we plainly cannot predict what other types beside integers and strings might implement IComparable, and thus have their value passed to a string's CompareTo.

The above design makes it impossible to ensure statically whether we're using the tree with consistent types. In this scheme, when inserting nodes to the tree, we upcast all elements to IComparable, regardless of type. But we will get a runtime exception when one type's CompareTo method is passed an argument from another, incompatible type. The second drawback is that when we extract elements out of the tree, we need to downcast them to a specific type. This is another possibility for runtime errors in trees of mixed type.

So, to guarantee runtime safety in such a design, we would have to implement CompareTo for all permutations of types we wish to support, plus build a downcast that would somehow coerce all types into the target type. Not only is this a lot of work, it is not likely to work very well. F-bounded polymorphism neatly avoids this mess by ensuring not only conformance to an interface, but to an underlying type as well. This two-layered checking makes type safety possible in systems that are bounded by interface and type.

To better understand this issue, look at the following example:

interface IFrobincatable {
  Frobnicate (x : int) : void;
}

class C1 : IFrobincatable 
{
  public this () {}
  public Frobnicate (_ : int) : void {}
}

class C2 : IFrobincatable 
{
  public this () {}
  public Frobnicate (_ : int) : void {}
}

module M {
  f1['a] (o : 'a) : 'a
    where 'a : IFrobincatable
  {
    o.Frobnicate (3);
    o
  }

  f2 (o : IFrobincatable) : IFrobincatable
  {
    o.Frobnicate (3);
    C1 ()
  }

  Main () : void
  {
    def x1 = f1 (C1 ()); // x1 : C1
    def x2 = f1 (C2 ()); // x2 : C2
    def x3 = f2 (C1 ()); // x3 : IFrobincatable
    def x4 = f2 (C2 ()); // x4 : IFrobincatable
  }
}

In the Main function, calls to f1 always return a value of the type passed to it. However, f2 doesn't give this guarantee: even though the line def x4 = f2 (C2 ()) passed in a value of type C2, what it got back was C1. Even though f2 compiles, users of f2 may get runtime errors or misbehavior because their type assumptions aren't correctly supported.

Excercises

One

Write a version of the functional Set class above using (unbalanced) trees, like the ones we've seen last week. Use the System.IComparable [T] interface.

Two

Write the following tail recursive functions:

Sum (l : list [int]) : int returning sum of elements of the list
RevMap ['a, 'b] (l : list ['a], f : 'a -> 'b) : list ['b] doing exactly what Map does, but returning reversed list
RevFilter ['a] (l : list ['a], f : 'a -> bool) : list ['a] (similar)
Map ['a, 'b] (l : list ['a], f : 'a -> 'b) : list ['b] using RevMap and Rev (remember to ensure f's are called on the list in the same order as with regular map)

Three

Rewrite the functions above (except for Map) to use FoldLeft.

ToC | Week 5