Code Planting Tutorial Part 5 - GetPoplog/Seed GitHub Wiki

Extending Pop-11 Syntax using Code-planting

One of the most important uses of code-planting is to extend Pop-11 by adding new 'syntax words'. In this part we will introduce the basic mechanism and in the follow-up article we will use it to serious effect and implement Python-like generators.

For this tutorial it helps to know:

  • Assigning to _ discards the value on top of the stack.
  • The <> operator concatenates two sequences (strings, lists, vectors) together to create a new sequence.
  • The meanings of syntax descriptors like infix, prefix and postfix (e.g. '+' is infix but '-' is infix and prefix).
  • Lists in Pop-11 can be a fixed set of items. But they can also be instantiated on demand and are very useful for representing stream-like-data.
  • Dynamic localisation (or dlocalisation) is a form of localisation to the duration of a procedure call that is heavily used in Pop-11.
    • It is usually applied to global variables and the effect is that, on procedure entry the current value is saved, and on exit the saved value is restored. The net effect is that the value of the variable is localised to the dynamic scope of the procedure - hence the name.
    • Dlocalisation can also be applied to active variables such as -current_directory- - and many other things.
    • Dlocalisation also works with coroutines. For the details see HELP DLOCAL.

Problem - add let syntax to Pop-11

Plenty of functional programming languages introduce local definitions using a syntax a bit like this:

let x = 100 in f( x, x ) endlet

We will demonstrate how to add this syntax into Pop-11. This is a nice easy case because it is an example of 'outfix' syntax, where the form is bracketed by a start keyword (let) and an end keyword (endlet). By contrast the other forms - prefix, infix and postfix - are a bit more complicated.

New syntax words are usually added to Pop-11 using define syntax. It works like this:

define syntax let;
    ... CODE PLANTING GOES HERE ...
enddefine;

We also want to state that endlet is a kind of 'punctuation' word. The special syntax for this looks like:

constant syntax endlet;

Strategy

So now our problem is to fill in the contents for our define syntax let. Overall we are going to plant code that looks a bit like this. The new stuff we are going to learn is marked with (*).

  • the word "let" will have been removed from the token stream before we start.
  • so we grab name of variable from the token stream and save it in varname. (*)
  • declare that variable. We know how to do that: sysLVARS( varname, 0 )
  • check the next token is = and skip over it. (*)
  • read-and-code-plant for a Pop-11 expression (*)
  • assign it to varname. We also know how to do that: sysPOP( varname )
  • check the next token is in and skip over it (*)
  • read-and-code-plant for a Pop-11 statement (*)
  • check the next token is endlet and skip over it. (*)

In order to do this we need to learn the basics about how we interface to the Pop-11 compiler. (You can read all the details in REF POPCOMPILE by the way.)

Interacting with the token stream

The Pop-11 compiler works by consuming tokens from the variable called proglist. You can directly manipulate proglist if you like. But it is more common to use itemread(), which simply takes the next item off proglist after checking for macro-expansion. And if we want to check the next item in the input stream is a particular word, we would use pop11_need_nextitem(item). So we can fill in quite a bit of our new with just these two:

define syntax let;
    lvars varname = itemread();           ;;; Grab the name from input stream.
    pop11_need_nextitem( "=" ) -> _;      ;;; Check and remove the "=" sign.
    ... COMPILE THE EXPRESSION ...
    pop11_need_nextitem( "in" ) -> _;     ;;; Check and remove the "in" keyword.
    sysLVARS( varname, 0 );
    sysPOP( varname );
    ... COMPILE THE STATEMENT ...
    pop11_need_nextitem( "endlet" ) -> _; ;;; Check and remove the "in" keyword.
enddefine;

Compiling expressions and statements

As one would expecting, there are procedures for reading-and-compiling both expressions and statements. The ones we will want also include checking the next word.

  • pop11_comp_expr_to(closer) - reads and compiles a single expression and then checks the next token is closer.
  • pop11_comp_stmnt_seq_to(closer) - reads and compiles a sequence of statements separated by ; and then checks the next token is closer.

We can plug these into our new syntax word easily enough - and eliminate a couple of calls to pop11_need_nextitem at the same time.

define syntax let;
    lvars varname = itemread();           ;;; Grab the name from input stream.
    pop11_need_nextitem( "=" ) -> _;      ;;; Check and remove the "=" sign.
    pop11_comp_expr_to( "in" ) -> _;
    sysLVARS( varname, 0 );
    sysPOP( varname );
    pop11_comp_stmnt_seq_to( "endlet" ) -> _;
enddefine;

Adding a lexical scope

There is one more wrinkle, which is that in functional languages the let x = E in S adds a new lexical scope and makes sure that E is evaluated outside of the scope but x and S are inside. Lexical scopes can be managed using sysLBLOCK and sysENDLBLOCK. We just need to put these in the right place. sysLBLOCK also needs to know whether or not it is being used at top-level (immediate execute level), which we can satisfy using the variable popexecute.

define syntax let;
    lvars varname = itemread();           ;;; Grab the name from input stream.
    pop11_need_nextitem( "=" ) -> _;      ;;; Check and remove the "=" sign.
    pop11_comp_expr_to( "in" ) -> _;
    sysLBLOCK( popexecute );              ;;; sysLBLOCK needs to know if it is top-level or not.
    sysLVARS( varname, 0 );
    sysPOP( varname );
    pop11_comp_stmnt_seq_to( "endlet" ) -> _;
    sysENDLBLOCK();
enddefine;

And that's everything. Time to test it.

Testing the implementation

To recap, our implementation looks like this:

constant syntax endlet;
define syntax let;
    lvars varname = itemread();               ;;; Grab the name from input stream.
    pop11_need_nextitem( "=" ) -> _;          ;;; Check and remove the "=" sign.
    pop11_comp_expr_to( "in" ) -> _;          ;;; Compile the expression onto the value stack.
    sysLBLOCK( popexecute );                  ;;; sysLBLOCK needs to know if it is top-level or not.
    sysLVARS( varname, 0 );                   ;;; Declare our variable inside the new lexical scope.
    sysPOP( varname );                        ;;; Assign it the value of our expression.
    pop11_comp_stmnt_seq_to( "endlet" ) -> _; ;;; Compile the statement with the local binding in force.
    sysENDLBLOCK();
enddefine;

We can try it out at top-level like this:

: let y = 'foo' in y <> y endlet =>
** foofoo
: 

Or nested inside a function like this:

define repeat4( x ); lvars x;
    let y = x <> x in y <> y endlet
enddefine;

: repeat4( 'bar' ) =>
** barbarbarbar
:

Reflection

There are a couple of learning points here that may not be entirely obvious. The first is that Pop-11 can be extended by defining syntax words, even though Pop-11 has a human-friendly syntax rather than Lisp S-expressions or Prolog terms. And writing such extensions is actually fairly simple. And this shows that rich extensibility does not depend on having a homoiconic syntax. How does it achieve that?

This leads us to the second important observation is that the interface to the Pop-11 compiler is written in terms of procedures that read-and-immediately-plant-code and there is no intermediate syntax tree object. This is clearly a design limitation because it creates difficulties in designing syntax where we want to reorder expressions or even destructure expressions (e.g. pattern matching).

The advantage of a homoiconic language is that you do not need to explain the syntax tree format, it is self-explanatory. With a human-friendly syntax, the internal representation of the syntax tree is inobvious and needs explaining, greatly complicating the interface. However, because the popcompiler interface bypasses the generation of an intermediate syntax-tree object, the need for having a complex data type to represent it is eliminated. And this is what makes the interface so simple. This is the subtle tradeoff at the heart of this design.

Next Step

⚠️ **GitHub.com Fallback** ⚠️