HOW TO Add a Parser - Spicery/Nutmeg GitHub Wiki

The Nutmeg compiler can compile a variety of files into the final program-bundle. It uses the file-extension to select a parser, whose job it is to generate a codetree. This gives us the ability to add special purpose mini-languages, or simply ingest file objects into the bundle, or even create entirely new front-end languages for Nutmeg. In this how-to, we show how to add a parser for CSV (comma-separated value) files.

WARNING: The API described in this page is a proposal that we are reviewing. It's not ready yet! (But hopefully will be very soon!)

The Goal

When we have done this how-to, we will be able to compile CSV files into the program-bundle using the nutmegc command. At runtime this will make a variable available, with the same name as the CSV file, that contains a list of list of strings.

For example, the following command will add a variable data to the myprog.bundle file that has type List<List<String>> and a const (recursively immutable) value.

% nutmegc myprog.bundle data.csv myprog.nutmeg

Top Level

At the top-level, a Nutmeg parser is a Python function that will take a readable text-file object and a match against a file name and returns a code-tree iterator. Loosely speaking:

type NutmegParser = TextFileObject * Match -> Iterator<CodeTree>

This function is then installed into the Nutmeg system using the decorator @NutmegParserExtension(REGEX_STRING), where the supplied regex is used to match the filename (using fullmatch). This decorator can be loaded from the module nutmeg_extensions. The code that defines the extension also needs to be imported into the main-program of the nutmeg compiler, which is launcher.py.

So our running CSV example will start with a CSV-parser file called (say) 'csv_nutmeg_parser.py` which defines a function csv_nutmeg_parser like this:

from nutmeg_extensions import NutmegParserExtension

@NutmegParserExtension(r'(.*)\.csv$')
def csv_nutmeg_parser( file_object, match ):
   ...code returning codetree iterator...

And we need to edit launcher.py to import this file. We don't need to do anything more than import the file because the decorator links the function into the necessary tables.

import csv_nutmeg_parser

What does the CSV Parser need to do?

This lets us get on with the main task of actually parsing the file. Roughly speaking if we have a small CSV file abbrevs.csv that looks like this:

Abbreviation	Expansion	Comment
LOL	Lots of Love	Acronym
OMG	Opened My Gob	Initialism
KISS	Kept It Stupid, Sweet!	Acronym

We need it translated into codetrees that are the equivalent of this:

const abbrevs = (
    [
        ['LOL','Lots of Love','Acronym'],
        ['OMG','Opened My Gob','Initialism'],
        ['KISS','Kept It Stupid, Sweet!','Acronym']
    ]
);

The codetree for const abbrevs = ... will be be a BindingCodelet, which has a left-hand side (lhs) and a right-hand side (rhs).

BindingCodelet( 
   lhs = IdCodelet( name='abbrevs', reftype='const' ), 
   rhs = ...
)

Lists are created using the built-in function "List". This are invoked through the SyscallCodelet which has two parameters - the name of the built-in and the arguments. A sequence of values is produced using the SeqCodelet. Our list of lists would look like this:

SyscallCodelet(
    name = "List",
    arguments = SeqCodelet(
        SyscallCodelet( 
            name = "List",
            arguments = SeqCodelet(
                StringCodelet( value = 'LOL' ),
                StringCodelet( value = 'Lots of Love' ),
                StringCodelet( value = 'Acronym' )
            )
        ),        
        SyscallCodelet( 
            name = "List",
            arguments = SeqCodelet(
                StringCodelet( value = 'OMG' ),
                StringCodelet( value = 'Opened My Gob' ),
                StringCodelet( value = 'Initialism' )
            )
        ),        
        SyscallCodelet( 
            name = "List",
            arguments = SeqCodelet(
                StringCodelet( value = 'KISS' ),
                StringCodelet( value = 'Kept It Stupid, Sweet!' ),
                StringCodelet( value = 'Acronym' )
            )
        )
    )
)

Implementing the CSV Parser

At this point we are ready to write the CSV parser in its entirety. We will use the csv Python library to do the heavy work of parsing the CSV file, so all we are left with is 'plumbing'.

import csv
from nutmeg_extensions import NutmegParserExtension

@NutmegParserExtension(r'(.*)\.csv$')
def csv_nutmeg_parser( file_object, match ):
   varname = match.group(1)
   row_reader = csv.reader(file_object)
   # Skip the first row
   first_row = next( row_reader )
   inner_lists = []
   for row in row_reader:
       seq = SeqCodelet( StringCodelet( value = rowitem ) for rowitem in row )
       syscall = SyscallCodelet( name = "List", arguments = seq )
       inner_lists.append( syscall )
   rhs = SyscallCodelet( name = "List", arguments = SeqCodelet( *inner_lists ) )
   lhs = IdCodelet( name=varname, reftype='const' )
   yield BindingCodelet( lhs=lhs, rhs=rhs )  # Cute way of returning an iterator!

Criticisms

This article does not make it clear that a header row is required - how could that be made optional?
If the CSV file is empty this would give an unpleasant failure message.
Discarding the first row seems like a bad idea. Wouldn't it be better to use named tuples for the rows, using the header row to pick the names?