Nemerle for OOP Programmers Week 2 - vilinski/nemerle GitHub Wiki

This week we will learn some basics of functional programming. You're not assumed to know anything about this topic, as we will go into much detail here. There is a lot of new material this week -- this is where we leave the safe and known OO land.

Table of Contents Introduction to Functional Programming Local functions Passing functions as parameters Lexical scoping Lists Pattern matching The wildcard pattern Literal patterns List patterns Variable patterns Simple list manipulation Filter Side note: anonymous functions First N Types and Generics Parametric types Function types Generic functions Side note: names of generic parameters Exercises Exercise 1 Exercise 2: iterators Exercise 3: flatten Exercise 4: SW again

Introduction to Functional Programming

The basic idea of Functional Programming (FP) is to consider the entire program as an expression to be evaluated, and not to think about computation as being performed step by step and updating memory. So the functional program in general does not update this and that memory location, but rather creates new objects (this is why virtually all functional languages rely on the garbage collector.) This is how functions work in math -- they take values and return values, but don't change anything in particular.

There are pure functional languages, which prohibit any change to existing objects (in other words both fields of objects and variables are always immutable.) They are also very often lazy, which means parameters passed to a function are not evaluated until the function actually needs them (which may seem like an optimization to you, unless you've ever tried Haskell :-).

Nemerle is neither pure nor lazy, but it does support the key FP concept of immutability.

The second key concept is the ability to use functions as data, that is, to pass functions as parameters and store them in data structures. This feature is known as first class functions. Nemerle provides this.

There are other related concepts -- for example type inference and parametric polymorphism (generics) -- which are very often used in functional languages. Variant data structures and pattern matching are also notable concepts. We will find out more about these later, but for now we will focus on core features of FP.

To recap, the basic ideas behind FP in Nemerle are that:

You can make fields of persistent data structures immutable (this is even the default, last week you saw that to make a field updatable, you need to use the mutable modifier).
You can define your temporary local variables to be immutable (def is shorter than mutable isn't it? :-)
You can use functions as values, as well as other FP goodies.

OK, I can see this is getting a little boring, so let's proceed with some examples.

Local functions

In addition to defining methods in classes or modules, one can define some small helper functions inside other functions. For example:

class LocalFunctionExample {
  public static Main () : void
  {
    def say_hello () : void {
      System.Console.WriteLine ("Hello!");
    }

    say_hello ();
    say_hello ();
  }
}

It shouldn't come as a big surprise to see "Hello!" written twice by this program. The say_hello local function is defined inside Main, and can only be referenced within Main (it's local to it, right?).

You can also skip Main altogether, as we've seen earlier:

def say_hello () : void {
  System.Console.WriteLine ("Hello!");
}

say_hello ();
say_hello ();

Moreover, because say_hello is local, type inference is supported, so we can skip the void type annotation:

def say_hello () {
  System.Console.WriteLine ("Hello!");
}

say_hello ();
say_hello ();

Passing functions as parameters

Once you have defined a local function, you can pass it to another function as a parameter. For example we can have a special function that runs the passed function f twice:

def run_twice (f) {
  f ();
  f ();
}

We didn't specify a type for f, because type inference takes care of this. Nemerle sees that f is used as a function in the body of run_twice.

Now, we can define a function say_hello, and pass it to run_twice:

def say_hello () {
  System.Console.WriteLine ("Hello!");
}

run_twice (say_hello);

Lexical scoping

Local functions can see all the identifiers defined in their parent function. For example:

def hello = "Hello!";
def say_hello () {
  System.Console.WriteLine (hello);
}

say_hello ();
say_hello ();

When a local function references an identifier, it sees the nearest one defined before it (in program text). This property is called lexical scoping. So here, when say_hello refers to hello, it gets the closest previous definition: "Hello!". Note that other hello identifiers may be defined in other contexts and used without conflict. The rules of lexical scoping keep them separate.

Lists

Lists are a very important data structure in FP, and are used even more often than arrays in imperative programming.

Lists are constructed out of list cells. Each cell has a head (the element held in the cell) and tail -- a pointer to another cell. There is a special cell called nil which serves as an empty list. List cells cannot be changed once they are constructed. On the one hand, it means you cannot update the cell number or order of a list once it is constructed, but on the other hand mischief can't be done to a list that you pass to others.

While lists cannot be changed once constructed, they can share the tail. For example:

       42
       |
       V
  10-->34-->27-->nil

Here the lists [10, 34, 27] and [42, 34, 27] share the list [34, 27]. Because lists are immutable this sharing cannot do any harm. All Nemerle lists share the empty list.

The list constructor is ::, while nil is referred to as []. Therefore 1 :: [] means we want a list cell with 1 in the head and the empty list in the tail, which is another way of saying we want a single element list with 1. After this object is constructed we can make a two element list:

def l1 = 1 :: [];
def l2 = 42 :: l1;

Here, we chain list l1 to l2 to make a longer list. Of course, you can define this list in one expression: 42 :: 1 :: []. If you don't like all those colons, there is shorter notation: [42, 1]. This short form is convenient to use in pattern matching.

Pattern matching

The basic idea behind a matching like this:

match (some_value) {
  | pattern1 => expr1
  | pattern2 => expr2
  ...
}

is to inspect some_value, check if it looks like pattern1, and if so, evaluate expr1 return it as the result. Patterns are checked until a match is found, and the corresponding expression is returned. If all patterns fail to match (which is uncommon, because patterns should be exhaustive, that is, cover all possible values), an exception is thrown.

There are several kinds of patterns, we shall briefly introduce some of them here.

The wildcard pattern

Wildcards are written as _ and match any value, including the null pointer (don't confuse this with the nil list cell). It's like the default: case in switch in C-like languages.

Literal patterns

Each literal (like 42, 3.1415, "foobar", and null) is a valid pattern. This allows one to implement switch functionality with match:

match (some_number) {
  | 1 => "one"
  | 2 => "two"
  | 3 => "three"
  | _ => "a lot"
}

List patterns

When matching list patterns, both the :: constructor and the [1, 2] syntax are available:

match (some_list) {
  | [42, 42] => "two forty-two"
  | [42, _] => "forty-two on first position of two-element list"
  | [_, 42] => "forty-two on second position of two-element list"
  | 42 :: _ => "forty-two on first position"
  | _ :: 42 :: _ => "forty-two on second position"
  | [] => "an empty list!"
  | _ => "another list"
}

Note that the ordering of pattern matching clauses matters, because it is possible they can overlap (here the first case overlaps the second and third, while the second overlaps the fourth, and so on). In the case of multiple possible matches, the first match wins.

Variable patterns

The variable pattern matches any value, just like the wildcard pattern, but additionally binds the value matched to a specified identifier. The identifiers used have to start with a lowercase letter (in order to avoid conflicts with variant options). They are always immutable (there is no mutable modifier allowed here).

For example a function to display all elements of a list would be:

using System.Console;

def display (l) {
  match (l) {
    | head :: tail =>
      Write ($ "$head, ");
      display (tail)
    | [] =>
      WriteLine ("end")
  }
}

display ([1, 2, 3])

// Output: 1, 2, 3, end

The display function takes a list and matches it against the list head :: tail. If the match list is not empty, the element of the current cell is bound to head and the rest of the list is bound to tail. The function prints the value of head and recursively calls itself to print out the rest of the list.

When the function hits the end of the list, nil [] is matched, end is written, and the function terminates.

Simple list manipulation

We now know the basic tools for list manipulation -- list construction and matching. Therefore we can proceed with some example functions.

Filter

filter is a function that is given a list and a functional value as a parameter. It will return the list of elements from the original list for which the functional value returned true.

For example:

def is_even (x) { x % 2 == 0 }
System.Console.WriteLine (filter ([1, 2, 3, 4], is_even));

should print out [2, 4].

In Nemerle, all basic data structures have the ToString method overridden to return contents of the data structure in a human readable form, therefore WriteLine does a good job displaying the list.

So we're back to the filter code:

def filter (l, f) {
  match (l) {
    | x :: xs =>
      def xs' = filter (xs, f);
      if (f (x)) x :: xs'
      else xs'
    | [] => []
  }
}

First, the function does a match on the list argument. For the non- empty list case the head is bound to x and the tail to xs. It is a common convention to call the head x or y and the rest of the list xs or ys.

Once the head and tail are bound the function calls itself to filter the rest of the list. The result is put in value xs'. In Nemerle the apostrophe (') character is a valid part of identifier names. A transform of x is often called x' (say x-prime); this convention comes from mathematics.

Finally, depending on the return value of f (x), the function adds element x to the result, or skips it. If you're looking for a return statement here, you won't find it! As you will recall from Week 0, Nemerle functions simply return the the last value computed, which in this example is either x :: xs' or xs'.

Of course when the list is empty, filtering it yields an empty list.

Side note: anonymous functions

While testing the filter example, you might want to use a shorthand that allows you to filter on any arbitrary expression on the fly. To that end, it is possible to rewrite the even filter as:

System.Console.WriteLine (filter ([1, 2, 3, 4], fun (x) { x % 2 == 0 }));

If we use a local function just once, it is possible to make it anonymous. The expression:

fun (some_parms) { some_expression }

is equivalent to:

{
  def tmp (some_parms) { some_expression }
  tmp
}

that is, defining a local function and returning it right away. Anonymous functions allow us to skip the superflous naming of truly one time, 'throw-away' functions.

First N

The first_n function returns first N elements of the list passed.

def first_n (l, n) {
  if (n == 0)
    []
  else
    match (l) {
      | x :: xs => x :: first_n (xs, n - 1)
      | [] => throw System.ArgumentException ("List too short")
    }
}

System.Console.WriteLine (first_n (["foo", "bar", "baz"], 2))

// Output: ["foo", "bar"]

Note how both arguments to first_n change as the function executes: it recursively calls itself for the rest of the list with the counter decremented. It is also interesting to see how the function return value includes a call to itself.

Another new thing here is the throw expression. It raises an exception passed as an argument. System.ArgumentException is a class from BCL. Handling exceptions will be discussed later (next week probably).

Types and Generics

We have used local functions which rely on type inference throughout this lesson. However type inference is not available for global functions; therefore it is good to know the exact way things get typed in Nemerle.

Parametric types

The list type is parametric (generic), so a list of integers has type list [int] and a list of strings has type list [string]. Note how square brackets ([]) are used for specifying the argument to the type. This is unlike C++, Java and C#, where the triangle brackets (<>) are used.

There are other generic types provided by the BCL and standard Nemerle library. For example Nemerle.Collections.Hashtable [K, V] is a mutable collection mapping values of type K to values of type V (it inherits from System.Collections.Generic.Dictionary [K, V]).

Another important example of parametric types are arrays. Array is a list-like type, so we felt there is no reason to make a special case of it. The syntax for array types in Nemerle is array [''type''], where type is the type of the elements in the array.

Function types

The type of a function taking a string and returning an int, like the int.Parse method, is: string -> int. The -> symbol is called the arrow, so you can read the previous type as string arrow int. When a function returns nothing, we use the void type. For example the System.Threading.Thread.Sleep method, which sleeps for n milliseconds has the type int -> void. Similarly, when a function takes no arguments, we use the void type before the arrow. So the type of Main is void -> void.

We will talk about types of multi-argument functions next week.

We now know how to write basic function types, so we can now write a fully typed version of filter:

def filter (l : list [int], f : int -> bool) : list [int] {
  match (l) {
    | x :: xs =>
      def xs' = filter (xs, f);
      if (f (x)) x :: xs'
      else xs'
    | [] => []
  }
}

The body is exactly the same as previously, but we have provided explicit types for the arguments.

OK, the question here is: why type the function as int? Because we intend to use it with a list of integers. Fine, you might respond, but why limit our function to just integers? As you correctly observe, the structure of the filter function would work equally well for, say, a list of strings or a list of FooBars, if only there was a way to type it less restrictively. This leads us naturally to the idea of generic functions.

Generic functions

What we would really like for our filter function is a generic type, one that would allow the function to work on all kinds of data. Unfortunately, because of subtyping and related theoretically difficult problems, the type inferencer in Nemerle cannot infer generic types for you. No worries, you can still have generics, you just have to specify them yourself, using a generic type declaration for your function:

def funname[''gentype0''] (parmlist) : funtype {body}

Here, we define a generic type parameter, ''gentype0'', to be used in the function. Think of ''gentype0'' as a placeholder for any type. Substitute ''gentype0'' any place you would use a single specific type in a function you want to make generic.

With this knowledge, let's make a generic filter:

using System.Console;

def filter[T] (l : list [T], f : T -> bool) : list [T] {
  match (l) {
    | x :: xs =>
      def xs' = filter (xs, f);
      if (f (x)) x :: xs'
      else xs'
    | [] => []
  }
}

WriteLine (filter ([1, 2, 3, 4], fun (x) { x % 2 == 0 }));
WriteLine (filter (["foo", "bar", "foox"], fun (x) { x.StartsWith ("foo") }));

/* Output:
[2, 4]
["foo", "foox"]
*/

The new, generic definition of filter can be read: for any type T, the first parameter has the type list [T], the second T -> bool and the result type is list [T]. This allows a list of integers to be filtered as easily as a list of strings. However, the function type passed has to match the type of the list (try passing the even lambda expression and a list of strings). This is because we use the same generic type parameter T for list l and function f.

Using the same syntax, you can define generic parameter types for these contexts:

class classname[''gentype0''] {body}
public methodname[''gentype0''] (parmlist) : methodtype {body}
variant varname[''gentype0''] {body} (you will see more on variants in Week 3)

Because generic parameter declarations use the compact list syntax, you can also declare multiple generics:

class classname[''gentype0'', ''gentype1'', ... , ''gentypeN''] {body}

The use of generics in FP can bring a level of abstractness and compactness to code that can be hard to match using the object-oriented paradigm by itself.

Side note: names of generic parameters

There is a convention, originating from ML, to call type parameters 'a, 'b, 'c... which reads alpha, beta, gamma... As Nemerle supports apostrophes in identifiers, you're free to follow it.

In C++ there is a convention to call generic type parameters T, while in C# one mostly uses descriptive names (something like ElementType). Nemerle is still new, so perhaps a generic naming convention for it will emerge, and come into common use. The standard library uses the ML convention.

Exercises

In the first few exercises you should write a handful of short list manipulation functions. This is widely recognized as a good introduction to FP and I couldn't come out with anything better.

The functions should throw an exception for invalid data (empty lists, element not found and so on). System.ArgumentException will be OK.

Functions should come with one or two example invocations.

Exercise 1

head[T] (l : list [T]) : T returning first element of l
tail[T] (l : list [T]) : list [T] returning l without the first element
length[T] (l : list [T]) : int computing the length of l
max (l : list [int]) : int returning maximal element of given list of integers
last[T] (l : list [T]) : T returning the last element of a list
last_n[T] (l : list [T], n : int) : list [T] returning last n elements
nth[T] (l : list [T], n : int) : T returning nth element of l
append[T] (l1 : list [T], l2 : list [T]) : list [T] returning concatenation of l1 and l2 (please don't use the overloaded + operator that does exactly that ;-)
range (l: list[int], from : int, to : int) : list [int] returning list of all integers between from and to inclusive

Exercise 2: iterators

iter[T] (l : list [T], f : T -> void) : void which should apply given function f to each element of l and ignore the results (as there are none)
map[A, B] (l : list [A], f : A -> B) : list [B] applying specified function f to each element of the list l and returning lists of results of applications
forall[T] (l : list [T], p : T -> bool) : bool returning true if given predicate p is true for all elements of the list l of l
find[T] (l : list [T], p : T -> bool) : T returning first element of l for which the condition p is true

Exercise 3: flatten

Write the function: flatten[T] (l : list [list [T]]) : list [T] concatenating given lists together. For example:

System.Console.WriteLine (flatten ([[1, 2], [3, 4], [5]]))

should output [1, 2, 3, 4, 5].

Exercise 4: SW again

Take the Star Wars example from the previous week, possibly enriched with your Imperial Trooper and modify it to hold a lists of persons instead of 3 persons.

ToC | Week 3