Grok Base structure of programs - vilinski/nemerle GitHub Wiki

This page is a part of the Grokking Nemerle tutorial.

This chapter explains what a basic program looks like in Nemerle. This is the longest and probably the hardest lesson, so don't worry if you don't understand everything at once.

Table of Contents Running the compiler Methods Fields Expressions A simple function Imperative loops and value definitions Local functions Type inference External functions Output formatting String interpolation Arrays Type enforcement and variants of `if` expression Multidimensional arrays Miscellaneous information Exercises -- List 1

Running the compiler

In order to run programs written in Nemerle you need to compile them to .NET bytecode first. This is done with the ncc (Nemerle Compiler Compiler) command. Assuming the Nemerle is installed properly on your system, you need to do the following:

write the program text with your favorite text editor and save it as a file with extension .n, for example myfile.n
run the Nemerle compiler by typing ncc myfile.n
the output goes to out.exe
run it by typing out (Windows) or mono out.exe (Linux)

You cannot define either functions or values at the top level in Nemerle. You need to pack them into classes or modules. For C#, Java and C++ programmers: a module is a class with all members static, there can be no instance of module class.

For now you should consider the module to be a form of packing related functions together.

class SomeClass
{
   //  ... some comment ...
   some_field : int;

   /*  ... some ...
       ... multiline comment ... */ 
   other_field : string;
}

The above example also introduces two kinds of comments used in Nemerle. The comments starting with // are active until the end of line, while the comments starting with /* are active until */. This is the same as in C (well, C99).

Methods

Classes can contain methods (functions) as well as fields (values). Both kind of members can be prefixed with access attributes:

public defines a method or a field that can be accessed from outside the module.
private defines a member that is local to the module. This is the default.
internal defines a member that is local to a given library or executable.

In the method declaration header you first write modifiers, then the name of the method, then parameters with a type specification and finally a type of values returned from this function.

Typing constraints are in general written after a colon (:).

class Foo
{
   public SomeMethod () : void
   {
     //  ...
   }
   private some_other_method (_x : int) : void
   {
     //  ...
   }
   private Frobnicate (_x : int, _y : string) : int
   {
     //  we return some int value here
     0
   }
   internal foo_bar () : void
   {
     //  ...
   }
}

Fields

Fields define global values inside the module.

Fields accept the same access attributes as methods. However, there is one additional very important attribute for fields -- mutable.

By default fields are read only, that is they can be assigned values only in the module initializer function (codenamed this; we will talk about it later). If you want to assign values to fields in other places you need to mark a field mutable.

class Bar
{
   public mutable qux : int;
   private quxx : int;
   mutable bar : float;
}

Expressions

Nemerle makes no distinction between an expression and a statement, there are only expressions. The idea is to easily operate on values, which every expression returns as a result of its computation. The exception is a throw keyword, which breaks control flow by throwing an exception (the breaking statements were adopted in Nemerle using the block construct, which makes jumps localised and fit well into the expressions-only concept).

Also { } sequence is an expression, which have its value - the value of the last expression in it. This eliminates the need for return statement in functions. The value returned from a function is the last value computed in its body. You can think that there is an implicit return at the end of each function. This is the same as in ML languages.

Despite all that, the most basic example looks almost like in C#. The entry point for a program is a function called Main. It is also possible to take command line arguments and/or return integer return code from the Main method, consult .NET reference for details.

Note that unlike in ML function call requires ().

class Hello
{
  public static Main () : void
  {
    System.Console.WriteLine ("Hello cruel world!")
  }
}

A simple function

However, the following example (computing Fibonacci sequence) looks somewhat different. You can see the usage of a conditional expression. Note how the value is returned from the function without any explicit return statement.

Note that this example and the following ones are not complete. To be compiled they need to be packed into the module, equipped with the Main function and so on.

fib (n : int) : int
{
   if (n < 2)
     1
   else
     fib (n - 1) + fib (n - 2)
}

Imperative loops and value definitions

It is possible to use regular imperative loops like while and for. Both work as in C. In the example of the for loop the first expression is put before the loop, the second expression is the condition (the loop is executed as long as the condition holds) and the last expression is put at the end of the loop.

However, the most important thing about this example is the variable definition used here. Variables (values that can be changed) are defined using the mutable expression. You do not specify a type, but you do specify an initial value. The type of the defined variable is inferred from the initial value (for example the type of 1 is obviously an int). The variable introduced with mutable is visible until the end of the current sequence. Lexical scoping rules apply: the definition of a new value with the same name hides the previous one.

Sequence is a list of expressions enclosed in braces ({}) and separated with semicolons (;). An optional semicolon is allowed at the end of the sequence. Note that the function definition also introduces a sequence (as the function body is written in {}).

mutable defines variables that can be updated using the assignment operator (=). In contrast def defines values that cannot be updated -- in our example we use tmp as such value.

Older versions of the Nemerle compiler required a semicolon after the closing brace inside a sequence. This is no longer mandatory: note that for expression's closing brace is not followed by a semicolon. In some rare cases this can introduce compilation errors -- remember that you can always put a semicolon there!

fib (n : int) : int
{
   mutable last1 = 1;
   mutable last2 = 1;

   for (mutable cur = 1; cur < n; ++cur) {
     def tmp = last1 + last2;
     last1 = last2;
     last2 = tmp;
   }

   last2
}

In this example we see no gain from using def. instead of int as you would do in C# (both are 3 characters long :-). However, in most cases type names are far longer. For example:

FooBarQuxxFactory fact = new FooBarQuxxFactory (); // C#
def fact = FooBarQuxxFactory (); // Nemerle

Local functions

Local functions are functions defined within other functions. For this reason they are also called nested functions.

There are three reasons for defining local functions. The first one is not to pollute the class namespace. We have the private keyword for that, so this does not seem to be any good reason.

The second one is that local functions can access variables defined in an outer function. This allows for somewhat different (better?) code structuring than in C. You can have several variables and local functions using them defined in a function.

The most important reason for local function is however the fact that you can pass them to other functions so that they can be run from there, implementing for example iterators for data structures. This is explained in more detail later.

Local functions are defined just like other values with the def keyword. A local function definition looks similar to the global one (despite the lack of access modifiers, leading def and the trailing semicolon).

sum_cubes (v1 : int, v2 : int, v3 : int) : int
{
   def cube (x : int) : int {
     x * x * x
   }

   cube (v1) + cube (v2) + cube (v3)
}

Using local functions is one of the ways of implementing loops in Nemerle.

module Sum {
   public Main () : void
   {
     def sum (x : int) : int {
       if (x <= 0)
         0
       else
         x + sum (x - 1)
     }

     System.Console.WriteLine ("Sum of numbers from "+
                               "20 to 0 is: {0}", sum (20))
   }
}

Notice how the local function is used to organize the loop. This is typical for Nemerle. It is therefore quite important for you to grok this concept. Some external links -- tail recursion, recursion.

Here goes another example of a loop constructed with a local function.

fib (n : int) : int
{
   def my_loop (last1 : int, last2 : int, cur : int) : int {
     if (cur >= n)
       last2
     else
       my_loop (last2, last1 + last2, cur + 1)
   }

   my_loop (1, 1, 1)
}

If you are concerned about performance of such form of writing loops -- fear you not. When the function body ends with a call to another function -- no new stack frame is created. It is called a tail call. Thanks to it the example above is as efficient as the for loop we have seen before.

Mutually recursive functions are defined using special syntax:

f () : void
{
   def a () {
      b ();
   }
   and b () {
      a ();
   }

   a ();
}

Type inference

You can specify types of parameters as well as return types for local functions. However in some (most?) cases the compiler can guess (infer) the types for you, so you can save your fingers by not typing them. This is always safe, that is the program should not in principle change the meaning if type annotations are added.

Sometimes the compiler is unable to safely infer the type information, in which case an error message will be generated to indicate that type annotations are required.

In the following example we have omitted the return type, as well as types of the parameters. The compiler can figure out the types, because literals 1 are used in a few places, which are of the type int.

fib (n : int) : int
{
   def my_loop (last1, last2, cur) {
     if (cur >= n)
       last2
     else
       my_loop (last2, last1 + last2, cur + 1)
   }

   my_loop (1, 1, 1)
}

External functions

One of the best things about Nemerle is that you can use rich class libraries that come with the Framework as well as the third party libraries. Links to the documentation about .NET class libraries can be found here.

New objects are constructed by simply naming the type and supplying arguments to its constructor. Note that unlike in C# or Java you don't use the new keyword to construct new objects. Methods of objects can be invoked later, using the dot operator (some_object.SomeMethod (some_argument)). Static methods are invoked using the NameSpace.TypeName.MethodName () syntax. We will talk more about this object oriented stuff later.

Now some example:

the_answer_to_the_universe () : int
{
   // Construct new random number generator.
   // This is object construction, explained in more detail later
   def r = System.Random ();

   // Return new random number from [0, 99] range.
   // This is again object invocation, explained later
   r.Next (100)
}

For more information about The answer to the Ultimate Question of Life, the Universe and Everything please visit this site. Please note that this program run on a computer not as powerful as Deep Thought will be right only in 1% of cases.

Output formatting

There are several methods of output formatting in Nemerle.

The most basic .NET methods of displaying stuff on the screen is the System.Console.WriteLine method. In the simplest form it takes a string to be displayed. If you however supply it with more then one argument, the first one is treated as a format string and occurrences of {N} are replaced with the value of (N+2)-th parameter (counting from one).

print_answer () : void
{
  def the_answer = the_answer_to_the_universe ();
  System.Console.WriteLine ("The answer to the Ultimate "+
                            "Question of Life, the "+
                            "Universe and Everything "+
                            "is {0}", the_answer)
}

There are however other means, for example the Nemerle.IO.printf macro that works much like the printf(3) C function or Printf.printf in OCaml. (Well it doesn't yet handle most formatting modifiers, but patches are welcome ;-)

printf_answer () : void
{
  def the_answer = the_answer_to_the_universe ();
  Nemerle.IO.printf ("The answer is %d\n", the_answer);
}

String interpolation

If you have ever programmed in Bourne shell, perl, or php then you may know about string interpolation. These languages take a string "answer is $the_answer" and replace $the_answer with the value of the variable the_answer. The good news is that we have this feature in Nemerle:

interpolate_answer () : void
{
  def the_answer = the_answer_to_the_universe ();
  Nemerle.IO.print ("The answer is $the_answer\n");
}

It comes in two flavors -- the first one is the Nemerle.IO.print (without f) function -- it does string interpolation on its sole argument and prints the result on stdout.

The second version is the $ operator, which returns a string and can be used with other printing mechanisms:

interpolate2_answer () : void
{
  def the_answer = the_answer_to_the_universe ();
  System.Console.WriteLine ($ "The answer is $the_answer");
}

Both versions support special $(...) quotations allowing any code to be expanded, not just variable references:

interpolate3_answer () : void
{
  System.Console.WriteLine (
    $ "The answer is $(the_answer_to_the_universe ())");
}

The $-expansion works on all types.

Arrays

The type of array of T is denoted array [T]. This is a one-dimensional, zero-based array. There are two special expressions for constructing new arrays: array ["foo", "bar", "baz"] will construct a 3-element array of strings, while array (100) creates a 100-element array of something. The something is inferred later, based on an array usage. The empty array is initialized with 0, 0.0, etc. or null for reference types.

The assignment operator (=) can be also used to assign elements in arrays.

Note the ar.Length expression -- it gets the length of array ar. It looks like a field reference in an array object but under the hood it is a method call. This mechanism is called property.

Our arrays are subtypes of System.Array, so all methods available for System.Array are also available for array [T].

 class ArraysTest {
   static reverse_array (ar : array [int]) : void
   {
     def loop (left, right) {
       when ((left : int) < right) {
         def tmp = ar[left];
         ar[left] = ar[right];
         ar[right] = tmp;
         loop (left + 1, right - 1)
       }
     }
     loop (0, ar.Length - 1)
   }

   static print_array (ar : array [int]) : void
   {
     for (mutable i = 0; i < ar.Length; ++i)
       Nemerle.IO.printf ("%d\n", ar[i])
   }

   static Main () : void
   {
     def ar = array [1, 42, 3];
     print_array (ar);
     Nemerle.IO.printf ("\n");
     reverse_array (ar);
     print_array (ar);
   }
 }

Type enforcement and variants of `if` expression

One interesting thing about this example is the usage of the type enforcement operator -- colon (:). We use it to enforce the left type to be int, which simply giving a hint to the compiler, that we expect a given expression in a given place to have a given type.

We could have as well written def loop (left : int, right) {, to make the type of the parameter explicit. This are simply two ways to achieve the same thing.

Another interesting thing is the when expression -- it is an if without else. For symmetry purposes we also have if without then called unless. As you might have already noted unless is equivalent to when with the condition negated.

In Nemerle the if expression always needs to have the else clause. It's done this way to avoid stupid bugs with a dangling else

// C#, misleading indentation hides real code meaning
if (foo)
   if (bar)
     m1 ();
else
   m2 ();

If you do not want the else clause, use when expression, as seen in the example.

Multidimensional arrays

Multidimensional arrays (unlike arrays of arrays) are also available. Type of two-dimensional integer array is array [2, int], while the expression constructing it is array .[2] [[1, 12], [3, 4]]. "2" in both examples refer to the array being two-dimensional. They are accessed using the comma-syntax: t[1,0].

Both single- and multidimensional empty arrays can be created with array (size) or array (size1, size2, ...). Empty means that they contain nulls or zeros.

Note that there exist two ways of obtaining multidimensional storage. The first one is the mentioned multidimentional array (type array [''N'', int]). The second one is array of arrays (type array [array [int]]), which is just a standard array, but its elements are other arrays. This is useful, because you can initialize only some parts of this array, while multidimensional array allocates all the memory at once. Elements of such a array of arrays are accessed by x[2][3].

Miscellaneous information

The equality predicate is written == and the inequality is != as in C.

The boolean operators (&&, || and !) are all available and work the same as in C.

Exercises -- List 1

1.1. Write a program that prints out to the console:

1 bottle of beer.
2 bottles of beer.
3 bottles of beer.
...
99 bottles of beer.

With an appropriate amount of beer instead of .... The program source code should not exceed 30 lines.

1.2. Implement bogo sort algorithm for an array of integers. (WARNING: you should not implement destroy the universe step). Test it by sorting the following array: [4242, 42, -42, 31415].

1.3. As 1.2, but don't use the imperative loops -- rewrite them with recursion.