SIMPL Archives Part 2 - monsonite/SIMPL GitHub Wiki

The SIMPL Archives - Part 2

Thursday, November 12, 2015

Minimal Text Interpreter - Part 3

The operation and main routines of a minimal text interpreter - Part 3

This post is merely a description of the first implementation of the text interpreter looking at the principal routines. It's so I can remember what I did in 6 months time.

Function

The minimal text interpreter is the first stage of enabling plain text to be converted into computer machine language. Charles H. Moore the inventor of Forth, found an efficient means of doing this in the 1960s whilst working with mainframe computers, so I have chosen to adhere reasonably close to his methods.

A word is a group of consecutive characters ending in whitespace. The first task for the interpreter is to step through the line of characters and identify the individual words. At the same time it can keep track of a word-length counter so that it can reformat the text into a more compact format.

Take the sentence "A word is a group of consecutive characters ending in whitespace"

It is a string of 64 characters, containing 11 words, and a minimum of (11-1) spaces. We first pick out the individual words and store them into a temporary processing buffer, we can also count the number of characters in the word and put it alongside

A 1 word 4 is 2 a 1 group 5 of 2 consecutive 12 characters 10 separated 9 by 2 whitespace 10

So we have 11 entries in our table, with their lengths. If we assume that we are going to restrict the word length to 16 characters - we could express the length as a single hex character

We now crop the words down to the first 3 characters. For those of less than 3 characters, we fill in with spaces:

A 1 wor4 is 2 a 1 gro5 of 2 conC (12) chaA (10) sep9 by 2 whiA (10)

So we have reduced our original 64 input characters to 11 x 4 - which is about a 30% reduction. Tests carried out with Forth showed that the first 3 characters and the length was an optimum way of differentiating between and storing words in the dictionary. For assembler applications where mnemonics tend to be only 3 or 4 characters it is pretty much an optimal first step of decoding the source code, prior to allocating the machine code op-code tokens.

Once the text scanner has reduced each input word to three characters and a length byte, it then become practical to use this shortened representation to perform a word match. All of the system keywords and user words can be stored compactly in this format, along with their jump addresses. If 8 bytes are allocated to the word entry in the look-up table, this allows a 16 bit jump address, a pointer to the table where the original unencoded word is stored and a 1 byte attribute - that can be used to tell the compiler something about the word to help at compilation time.

Consider a system that has a maximum of 256 user words and 256 keywords - each internally coded at 8 bytes per word. Then this would need 2K in Flash for the keywords - not a problem for most small micros, but the 2K user table would rapidly start to eat up the limited RAM resources. Fortunately most small user applications are unlikely to have anything like 256 User words, and so halving this, a 1K user dictionary space would be quite acceptable.

Immediate and Compilation Modes

The above text scanning function can be used in two distinct cases: immediate and compilation modes.

In immediate mode, a word typed into the input buffer will be executed immediately - provided of course that it already exists in the dictionary. If not, it will lead to an error message. Once found in the dictionary the jump address associated with that word is put onto the pc and the processor executes the code it finds at the jump address.

However in compilation mode, the user wants to create a new word definition, and uses a Forth method called the colon definition. As it name suggests a colon definition begins a new line with a colon : This tells the text interpreter, that the proceeding word is going to be new and the interpreter should enter compilation mode. A colon definition begins with a colon: then the name of the new_word, then the definition i.e. the code words associated with that function and finally ends with a semi-colon ;

The Forth word colon definition performs the same operation as a function in C - compiling the function and putting it into memory as a series of threaded calls to the various routines.

: new_word (put the definition here) ;

In C

new_word() { // put definition here }

Once a new_word has been defined as above it can be executed immediately - just by typing its name. This is what gives Forth it's almost unique characteristics of being an interactive and extensible language. The functions are written and compiled and can be tested in isolation of one another - so that a large project may be built interactively in small blocks - testing each block as you go.

Currently only the basics have been implemented - by way of a proof of concept, and running on a 2K RAM Arduino. Later this will be ported to various ARM Cortex parts, the FPGA - softcore ZPUino and ultimately the J1 Forth processor.

There are probably many ways in which this could be implemented - some giving even more codespace and memory efficiency. As a rookie C programmer, I have stuck to really basic coding methods - that I understand. A more experienced programmer would probably find a neater solution using arrays, pointers and the strings library - but for the moment I have kept it simple.

The interpreter resides in a continuous while(1) loop and consists of the following routines:

txt_read

Reads the text from the UART into a 128 character buffer using u_getchar. Checks that the character is printable - i.e. resides between space (32) and tilde ~ (127) in the ascii table and stores it in the buffer. Keeps accepting text until it hits the buffer limit of 128 characters or breaks out of this if it sees a return or newline \r or \n character.

colon_check

This checks if the text starts with a colon, and so is going to be a new colon definition. sets flag colon=1 calls the build_buffer function

word_scan

If the leading character is not a colon, this function determines that the word is either within the body of the definition, or it is for immediate execution. It calls build_buffer, but only builds the header to allow a word match. It should not add the word to the dictionary, if it gets a match and is already there.

build_buffer

This checks the first 3 characters of the word and puts them into a new header slot in the headers table. It also calculates the word length by counting the characters as it stores them into the dictionary table, which it continues until it sees a terminating space character. It increments the dictionary pointer ready for the next word

word_match

This compares the 4 characters of the header of the newly input word with all the headers in the header table. If all 4 characters match then it drops out with a match_address (for the jump address look-up table) and sets a match flag match= 1.

header_list

This is a utility routine which prints out a list of all the headers in the order they are stored in the headers table.

dictionary_list

This is a utility routine which prints out a list of all the words in the dictionary in the order they were stored in the dictionary table.

txt_eval

This is the main character interpretation function which implements the SIMPL language core

is_a_word

Not yet implemented. Returns true if it finds a word and invokes build_buffer and word_match

is_a_num

Not yet implemented. Converts the ascii text to a signed integer and stores it in a parameter table. Might possibly use ascii 0x80 (DEL) to signify to the header builder that the following bytes are a number. Will need a conversion routine to go between printable and internal storage formats.

UART Routines

These provide getchar and putchar support directly to the ATmega328 UART. Saves a huge amount of codespace compared to Serial.print etc

uart_init

Initialises the ATmega328 UART to the correct baudrate and format.

u_putchar

Waits until the Tx register is empty and then transmits the next character

u_getchar

Waits until a character is present in the UART receive register and returns with it

Printing Routines

Having banished Serial.print - I had to implement some really basic functions

printnum()

Sends a 16 bit integer to the UART for serial output

printlong()

Sends a 32 bit integer to the UART for serial output

Thursday, November 12, 2015

A Minimal Text Interpreter - Part 2

A Text Interpreter to run on a Resource Limited Microcontroller - Part 2

In the previous post, I described the basics of a tiny text interpreter, written in C, intended for use on resource limited microcontrollers. The text interpreter would offer a natural language user interface, allowing programming and command line control of various microcontroller projects.

It will also form the basis of a wider range of self-written computing tools, including assembler and compiler, editor and file handler - all of which could be hosted, if necessary on a resource limited target board.

However, for the moment, and for ease of experimentation, the intention was to get the interpreter to run with only 2K of RAM (as per the Arduino Uno).

I envisioned the text interpreter as being a universal resident utility programme (almost akin to a bootloader) that would be initially flashed onto the microcontroller thus allowing a serial command interface and the means to exercise the hardware or develop small interactive programmes.

At work and at home, there are many instances of when I want some degree of interactive control over a small microcontroller project - even if it is just manipulating some port lines or sending and receiving a few serial responses on a terminal programme.

Some Practical Limitations

In order to keep the demands on the interpreter program reasonable it is necessary to put some limits on its capabilities. In particular, the number of words it can recognise and create jump addresses for. For convenience I used a look up table to hold the jump addresses. If the look up table is to remain reasonably compact - then a limit of 256 entries seems reasonable. Restricting the word capacity will also help keep the dictionary and its headers to a manageable size in RAM. This is important when you only have 2K to play with!

As the 4 byte header is in fact a shortform, or compact coding convenience that represents the dictionary, it could be said that in very RAM limited systems that it is not actually a requirement to keep the dictionary in RAM on chip. The only role that the dictionary performs is to allow the header entries to be expanded to the full word at times of listing.

As small micros generally have plenty Flash available, then the dictionary for all the standard words could be programmed into flash - as indeed could their headers. If necessary, a shell hosted by a PC application could be used to host the various dictionaries and source code files needed for particular applications. However, the original aim is that this interpreter vastly increases the user-friendliness of the microcontroller target - even with just a serial terminal as the user interface.

Additionally, I have imposed a word length limit to 16 characters. Imposing this limit means that the word length can be coded as a single hexadecimal digit - which makes it displayable in ascii and human readable. If you can't name something uniquely in 16 characters then you are probably of German extraction.

Vocabularies

Different tasks need different tools, and as the interpreter will be used for a variety of tasks, then it seems reasonable that it can be augmented or tailored towards a particular task. This can be done very conveniently with the use of vocabularies - with a particular vocab being used for a particular task. A vocab that contains the mnemonics of a particular processor's instruction set would be one sensible use when using the interpreter within an assembler, compiler or disassembler.

Forthright

Those of you that are familiar with Forth, will say that I am just creating a portable Forth-like environment, but rather than being coded in the native machine language of the target processor, it has been written in C for convenience and portability.

This is indeed partly true, as the utility I am creating has been inspired by the Forth language - especially in its compactness and low resource requirements. Even in the 1960s Charles Moore was concerned how the tools provided for computing at that time hampered progress, and so set about redefining the whole man-machine interface. He compressed the previously separate editor, compiler, and interpreter programmes (none of which could be co-resident in memory at the same time) into a single compact, convenient package that did the job of all three.

When Forth was first introduced in the late 1960s, mini computers had sub-MHz clock frequencies, and very little RAM, and so benefited greatly from a moderately fast and compact language like Forth. Nowadays, typical microcontrollers have clock frequencies in the 20MHz to 200MHz range and so are not so hindered by what is essentially a virtual machine implementation of a stack processor written in C.

Virtual and Real Stack Machines

I have embarked on this journey because of my wider interest in soft-core and open core processors implemented on FPGAs. Several of these cores are based on stack machines, partly because they may be readily implemented in surprisingly few lines of VHDL or Verilog. Indeed James Bowman's J1 Forth processor is fully described in fewer than 200 lines of verilog.

Whilst a virtual stack machine might not be the easiest fit for a register based processor without performance penalties, it is a wonderful fit for a real stack machine. A number of open-core processors including the ZPUino and James Bowman's J1 are true stack machines. Here the instruction set of the virtual machine have a near one to one direct mapping to the the machine instructions of the stack processor. In this case the text interpreter can be rewritten in the native assembly language of these cpus, to benefit from the vast increase in speed of running without an additional layer of virtual machine.

In order to do this an Assembler will be required that is tailored to the instruction set of the Forth Processor, and this is one of the first tasks that the text interpreter will be used for - the assembly of machine code for a custom processor.

One of the reasons why I am concerning myself with such low level primitive tools, is the need to understand them from the ground up so that they can be implemented on a variety of non-conventional processors.

Whilst the ZPUino will execute Arduino code directly (albeit very inefficiently - because of the C to stack machine programming conflicts), the J1 will need the tools to write it's own language from the ground up - and if you already have the mechanisms of a language in place, plus an easily customisable assembler, then it makes the job a lot easier.

In a later post, I will give an update on the text interpreter and it's application to custom code assemblers.

Wednesday, November 11, 2015

The Subtle Art of Substitution - Part 1

A simple text interpreter that allows code to be invoked by natural language words. Part 1.

Over the weekend and in various bits of spare time I have been developing a tiny text interpreter in C, as part of the larger project of creating some low-overhead tools to run on various microcontroller targets. The toolset will eventually include assemblers and compilers for some custom soft-core processors - but first I need the means to interpret typed text words and execute blocks of code if the word is recognised.

Why this is useful

This text interpreter is intended to provide a more human friendly interface layer to my SIMPL interactive programming language. Writing in high level, more meaningful natural language will greatly enhance the speed at which SIMPL code can be generated.

A natural language interface makes programming tasks much easier. Real words are more memorable than individual ascii characters, and it all makes for more readable code listings. Whilst SIMPL might use lower case "a" to initiate the analog read function, typing "analog" is a lot more reader friendly. An interpreter that follows a few simple parsing rules can offer a much increased speed of programming, yet be modest in the amount of on-chip resources utilised to do this. The code to implement the interpreter is about a 2K to 3K overhead on top of SIMPL - but that will include listing, editing and file handling utilities too.

Substitution and Assemblers

A text interpreter and its ability to execute blocks of code based on parsing the text commands or file it receives is a fundamental part of utility programmes such as assemblers and compilers. Here a set of keyword mnemonics representing instructions can be interpreted and used to assemble machine code instructions by direct substitution.

With a simple text interpreter we can move out from the realms of numerical machine language, and implement the likes of assemblers, dissassemblers and even compilers.

In the case of an assembler, the wordset will comprise of the mnemonics used by the target processor - and the interpreter will merely substitute the human readable mnemonic for the machine instruction numerical opcode.

For example, a certain processor may have an instruction set including mnemonics such as ADD, AND, SUB, XOR etc. The role of the text interpreter is to find these words within the text buffer or input file and create a table consisting of direct machine code instructions, subroutine call addresses and other variables to be passed via the stack to those subroutines.

At a level above the assembler is the compiler. This also takes text based input and generates machine code to run on a specific processor. However, compilers are very complex pieces of software, and it is more likely that I will find an alternative solution - given long enough.

Why do this?

The purpose of the text interpreter is to provide a natural language text interface for a small, resource limited microcontroller - in a similar style to what was provided with the various BASICs of the late 1970's. It's remarkable to think that some fully functioning basics fitted into 4K ROM and 1K of RAM - solely by some very clever programming tricks - in raw assembly language.

Fortunately most embedded programming these days does not have to resort to raw - assembler, and C has become the preferred interchange language layer for portability. C code written for an Arduino, may be fairly easily ported to an ARM - provided that low level I/O routines - such a getchar and putchar are available for the target processor.

Coding up a text interpreter is a good exercise in C coding - and as I am not a natural born coder, any meaningful coding exercise is good practice. I also enjoy the challenge of making something work from scratch block by block - rather than being over reliant on other peoples' code, that I don't even pretend to understand.

Requirements

As a bare minimum, we assume that the microcontroller target can provide a serial UART interface for communicating with a terminal program. I have recoded Serial.print and it's associated functions to use a much smaller routine - which saves memory space.

Ideally the microcontroller should have at least a couple of kilo-bytes of RAM for holding the dictionary and headers making it possible to implement it on anything from an Arduino upwards.

The text interpreter is an extension of the SIMPL interpreter, and can be used for programming tools such as text editors, assemblers, compilers and disassemblers. It provides the means to input text, analyse it for recognised keywords and build up a dictionary and jump table.

Borrowing from Forth experience, the text interpreter (or scanner) will look for a match on the first 3 characters of the input and the length of the word. As a word is typed in, it will initiate a search of the dictionary (of already known words). If a match is found, the word will be substituted for a 4 digit (16 bit) jump address. If the word is not matched, it will be added in full to the dictionary table.

This sounds all very Forth-like, and indeed it is, because it is a proven means to input new text data into a processor's memory using minimum of overheads. The dictionary structure is simple enough that it can easily be parsed for a word-match, and also processed for editing and printing.

As each Forth definition is effectively just a line of text it can easily be handled with a line-editor - again a simple task for a resource limited processor.

Numbers are handled as literals. A quick scan of the text with "is_a_num" will reveal whether it is numerical text - if so it should be converted to a signed integer and put onto the stack.

The output of the text interpreter should be a series of call addresses relating to the functional blocks of code that perform the routines associated with the keyword. In the case of the assembler example, the mnemonics can be translated directly using a look-up table which converts them directly into the machine instruction of the target processor - this is especially relevant if the target is a stack machine - such as the J1 forth processor.

Charles Moore struck on the idea of a language that was designed for solving problems. He envisioned having separate vocabularies for each problem he wanted to solve. For example his assembler application would use a vocabulary tailored to that application - namely the mnemonics as keywords, similarly the SIMPL language would utilise a vocabulary that supported the SIMPL command set. Thus by pointing to a different vocabulary in flash, the processor can readily swap between contexts.

Hop, Skip and Jump, - the Mechanics of Text Interpretation

Short of providing a flow chart - the description below describes the operation of the text interpreter.

The text interpreter will parse through lines of text, taking each "word" as defined by a group of characters terminated by white-space, and check through a list of dictionary words for a match. If there is a match, then the newly scanned word is either a system keyword or a new one that the user has previously added to the dictionary.

If the word does not generate a match with any existing keywords then it is added to the end of the dictionary - thus allowing a match the next time it is used.

In addition to the dictionary, there is a separate array of records that will be known as the "headers". The headers consists of a shortform record of all of the words in the dictionary. The purpose of the headers is to allow an efficient search to be performed on the dictionary entries - as words are listed in the headers by their first three characters and their length. A match on the first 3 characters and the length was proven many years ago to be an effective and efficient means of word recognition- see section 3.2.3 here

Once the header of scanned word has been deemed to match the header of one already in the header table, a jump address pointer can easily be calculated - its actually generated as part of the matching routine. This jump address pointer is decoded by a look up table to generate an actual 16-bit jump address.

For compactness and efficiency,the word matching routine is limited to a maximum vocabulary of 255 words - which is more than enough for most applications.

The text interpreter deals with lines of code. At some point their will be an accompanying package that implements a line editor, as the first step towards a full, screen editor.

The input buffer of a terminal program may be some 250 to 300 characters long. This is more than adequate space to define most sequences of command words. Indeed - it may be beneficial to restrict the input buffer to say 128 characters - as this is what can be displayed sensibly on an 800 x 600 VGA screen.

Word storage format

The shortform entries stored in the dictionary headers can be saved as a group of 4 bytes, consisting of the first 3 characters and the length byte. The routine that searches the headers for the match automatically generates the jump address pointer allowing a lookup to the actual jump address from a table.

    Char1, Char2, Char3, Len

Byte 0 1 2 3

So a word can be expanded by knowing its length and the dictionary pointer to its 1st character The jump address is shortened to a single byte fast look-up from a table.

Implementation

It's taken a bit longer than expected, but after an intensive day thinking, re-thinking, then coding, the tiny (2K) text interpreter is now starting to take shape.

I have put an interim version (#45) on this github gist

The interpreter is written in fairly standard C so it can be ported to a number of devices. If implemented using Arduino using the Serial.print library it uses about 4142 bytes of flash and 1897 of RAM. By using much more efficient custom UART routines for serial input and output, this can be massively reduced to just 2002 bytes of Flash and 1710 bytes of RAM.

Part 2 of this posting will look further at features of the text interpreter and the SIMPL toolset.

Monday, September 21, 2015

A Closer Look at the SIMPL Interpreter

Keeping it SIMPL

Since May 2013, I have been slowly developing a tiny interpreted language that can be used to initialise and exercise hardware when developing with a new processor.

SIMPL is primarily intended to be a very low overhead language, requiring only a serial uart (or bit banged serial) for communication to a PC hosted terminal program.

Commands are in plain, human readable ascii text - with an emphasis on being easy to remember.

SIMPL is based on Ward Cunningham's Txtzyme interpreter - originally for Arduino - but ported onto several other microcontrollers - as it is written mainly in C.

The kernel or SIMPL interpreter needs only a few resources:

2K bytes of program memory (Flash) 35 bytes of RAM UART getchar and putchar functions microsecond delay millisecond delay

On the Arduino these delays are provided by the delay() and delayMicroseconds() functions but can be provided with simple delay loops.

Once you have this 2K of code on-board, you can then start to add it more functionality - that is tailored to your particular application.

Slimming Down the Interpreter Kernel.

As originally written, Ward Cunningham's Txtzyme compiles to 5032bytes of flash and 209 bytes of RAM. (The exact number of compiled bytes may vary on what version of the Arduino IDE you are using).

As it made use of several of the high level functions available in Arduino - such as Serial.print, digitalWrite etc, - it was certainly not optimised for codesize.

I rewrote and enhanced the interpreter - so that now it fits into just short of 2048 bytes, and is written in more generic standard C for easier porting to other processors.

I have also added more functions including arithmetic, bitwise logic and memory operations.

I am sure that if the routines were handcoded in AVR assembly language, that further reductions in codesize could be achieved. However, I wanted a useful kernel that would fit in 2K and was easy to understand.

I have placed the SIMPL kernel here as a Github Gist.

Growing the Kernel

It has long been my intention to make SIMPL an extensible language, and so for this approach I have chosen to use some of the ideas used in Forth.

The kernel can easily be extended from some 30 basic functions to about 85, just by extending the switch/case statement that forms the basic subroutine calling mechanism at the heart of the kernel.

I keeping with Charles Moore's philosopy of "Problem Oriented Languages" the kernel of SIMPL may be extended in whatever way needed for solving the problem, and should as such be considered to be a minimum common starting point - for any cpu.

Once the 2K core of the kernel was established, it was time to add in the extra functionality that allows users to add their own functions. This is done in the spirit of Forth - but with certain limitations to keep the code size down. However, with the added functionality - the code grew from 2Kbytes to 3982 bytes. The main difference is in the amount of RAM that is used - the extra code allocates a User RAM array of 1248 bytes.

If you would like to look at the code and try it out on an Arduino - I have created a Github Gist here.

If you are using a standard Arduino with the LED on Pin 13 change line 64 to:

int d = 13; // d is used to denote the digital port pin for LED operation

As this is a work in progress - more details will emerge in a later post.

Saturday, September 05, 2015

Building from the Ground Up

In my previous post which turned out to be a rant about the ever increasing complexity of software, I decided to dig out a draft post from earlier this year which describes my continuing investigation into how the lower levels of computing languages are implemented.

The purpose is to create a set of low level tools which allow application code to be developed easily and interactively on a microprocessor or soft CPU with limited memory resources.

The microprocessor may be an ARM or a softcore running on a FPGA - and with no C compiler available how do you start to bootstrap the processor from first principles?

Booting from Scratch

In bygone generations of computer hardware, there would be a set of toggle switches which could be set to represent the binary instruction words. These words would be individually deposited into memory and the program counter advanced. A very short routine, consisting of only a couple of dozen instructions would be manually toggled into memory, allowing a paper tape to be loaded. Great ingenuity was applied to the hardware architecture so that these loader routines were short.

Daniel Bailey has devised a fun project - his C88 computer is a home-brew CPU on a FPGA which takes one back to the very early methods of booting a cpu from scratch. This blogpost and video explains it nicely. Daniel will be presenting his C88 at the forthcoming OSHcamp- at the end of September in Hebden Bridge.

As technoogy progressed, and IC memories became available, interchangeable eproms were used to allow the program code to be quickly modified and re-run. However there might be a 20 minute delay between detecting a bug and reprogramming another eprom with the corrected code. To speed up the time needed to program and erase eproms - often a nonvolatile RAM,was used to emulate the eprom and hold the program code. I spent a summer in my late teens programming a Z80 dev board in this manner.

These days almost all microcontrollers have on chip flash memory, and some method for in circuit serial programming. (ICSP). As a PC or laptop is invariably used for program development and code compilation, virtually all of the development tools are hosted on the PC, and very few systems have the means to edit and recompile code on the target system itself.

There has to be another way

There is however another means of working, where a minimal interactive working environment can be loaded onto the target system, and this allows the development and debugging of programs on the target system itself. Very little additional support is needed other than a serial interface. The code can be developed in a text editor running on the microcontroller. This is no different to the way that interactive languages, such as BASIC were developed on home computers in the early to mid 1980s.

Interactive development is a completely different method of working compared to the usual edit, compile, load, test approach that has arisen out of the almost universal use of compiled languages - in particular C and its derivatives. It was only when PCs gained more RAM, disks and cpu speed that C compilation became practical.

Forth was one of the first interactive languages, - developed in the 1960s by Charles H. Moore, and it is Forth that influences my investigations here.

Forth might be compared to assembly language, as it deals with low level operations involving snippets of machine language that are threaded together to create some program function. However it is much more than just an assembly language, as it provides the means to edit, compile and assemble source code, in an extensible manner, using a very compact language.

Forth does away with much of the clutter of other languages, and gives the programmer direct access to the resources of the processor. Forth offers high speed execution, especially on FPGA hardware designed to specifically match the primitive instruction set of the Forth language. Modern, low cost FPGAs can run a Forth engine at a clock frequency of some 50 to 200MHz. At this speed the interactive nature of the language becomes extremely fast, and the time taken to go around the edit-compile-exexute process is slashed to just edit-execute. This is a significant saving in time, and ideas can be tested and rejected in the time otherwise spent compiling a large application.

Forth uses "words" to perform program functions. A Forth word ultimately causes a series of machine language subroutines to be executed. In some respect - a Forth word may be likened to a LEGO block, which can be used with other blocks to make a more complicated structure. These structures may be further combined to build up the entire object - in our analogy - the program application.

Forth words are threaded together, like beads on a string, with the end of execution of one word, rapidly jumping to the beginning of the next.

Brad Rodriguez has an excellent article "Moving Forth" which explains how this threading process is achieved. Although written over 20 years ago - the fundamentals have not changed.

SIMPL

In an attempt to understand the fundamentals of Forth, I studied another threaded, interpreted language, SIMPL - the Serial Interpreted Minimal Programming Language. I have blogged several times about SIMPL since May 2013 - when I first came across it. Refer back to previous posts for details.

SIMPL takes some aspects of Forth, and packages them in an easy to understand manner which may be adapted to run on a variety of microprocessor platforms. I first used SIMPL on a ATmega328 Arduino, but have subsequently ported it to ARM and FPGA soft core processor targets. SIMPL is written in more or less standard C, which makes it portable between, processors, without the complexity of dealing with the machine language of each target processor. It is a stepping stone towards a real Forth.

SIMPL provides an easy way of sequencing blocks of code functions, and doing it in an interactive manner from the serial keyboard interface. It's not so much a programming language - more a way of automating the access to routines - and done in an interactive manner.

As an example:

Suppose you have just written a function in C to plot a graphical character on the screen, and you wish see what it looks like - using a VGA graphics adaptor or a TFT LCD screen. In order to test this function, you need to supply it with parameters for the x,y position that you wish it to be plotted, and possibly the foreground and background colours. If you hardcoded these parameters, and got them wrong, then you would have to go back and edit and recompile until you got the effect you were looking for.

SIMPL provides the mechanism to enter these parameters and execute the function interactively. If you don't like the first result, you can change the parameters with a few key-strokes and test again. You have virtually eliminated the edit-recompile cycle from when you are developing new code functions.

The Mechanics of SIMPL

SIMPL exists as three short functions that provide a complete serial interpreter or shell, to accompany the rest of your program. By calling the SIMPL interpreter each time the main loop is executed provides the means to interact with the rest of the program.

The SIMPL interpreter consists of the following 3 functions executed sequentially within a loop.

txtRead() takes characters from the serial input and loads them into a buffer beginning at address buf - which has been allocated a length of 64 characters

txtChk() looks at the first character to see if it is a colon : If so it copies the input text string to a specific address - to form a "colon definition", an idea borrowed from Forth

txtEval(addr) This is the interpreter - It is pointed to the start address of the text string and evaluates the string a character at a time. The SIMPL interpreter has a few rules on how it treats each character

If it finds numerical characters, it enumerates them to a 16bit integer that it places on the stack.

If the characters are small letters or punctuation characters it will execute a given function each time that character is encountered. For example, you may have allocated the character "g" to plot out your new graphic symbol, each time g is encountered.

Uppercase characters are used as addressing pointers - to point to additional character strings, either in Flash or RAM. The colon definition allows you to compose a new sequence of characters, and access it every time the uppercase character is typed.

Because the text is always loaded into a buffer - it is executed at full speed - and not influenced by the serial baudrate.

In short, SIMPL provides an easy interactive means to call your code subroutines (functions) in a sequence from the keyboard, and provide serial printed output back to the serial terminal.

SIMPL is essentially providing a subroutine address lookup, and call to that subroutine, encoded into the ASCII value of a character. Sequences of characters provide sequences of function calls. Numerical parameters may be passed to those subroutines using an elementary stack based method.

Extending SIMPL

SIMPL is not a fully blown language, but a method of automating the calling of functions according to a stored sequence of characters or tokens. It treats every ASCII character as a new instruction, so you cannot use multiple character words such as ADD to convey a meaning - as this would be interpreted as sequence A followed by sequence D followed by sequence D and so on. This is where it deviates from Forth.

It's control structures are limited to simple loops - controlled by a single down counting loop counter.

SIMPL is the stringing together of subroutines in Flash, that have been created from compiled C code. A 16 bit ADD in SIMPL is going to be somewhat slower than the same operation in a directly executed subroutine.

SIMPL does provide some pointers to how a more Forth like version could be implemented.

It already has the means to interpret an integer number and place it on a stack.

It also has the means to take a sequence of tokens - in this case ASCII characters, look them up in a jump table and retrieve the start address of a code subroutine, which it then executes, before returning to the interpreter to decode the next token. Each 7-bit token provides an index into a jump table of addresses, whom where the next word is executed.

To extend SIMPL, it will be necessary to create some additional text handling code, so that typed words can be processed, added to a dictionary and have a codefield associated with them. How exactly this will be achieved will be the subject of a future post.

Sunday, September 06, 2015

Extending SIMPL

In the earlier post "Building form the ground up" I wrote about how Forth could be used to provide a low level development environment for an unfamiliar processor for which a C compiler was either not available, or not desirable to use.

The first task is to write a simple virtual machine for the target processor, and then use this VM to run your application code. This is similar to what Java bytecode does.

The virtual machine need only have a handful of instructions, including basic arithmetic, logical operations and the ability to access and manipulate memory. From these primitive instructions, all subsequent instructions may be synthesised.

This is the approach described in "The Elements of Computer Systems" (TECS) which is also known as "From Nand to Tetris". The simple machine described, consists of little more than an ALU, a program counter, two registers and a memory interface capable of accessing a 64K x 16 bit address space.

Designing a Virtual Machine

If it is possible to code up a simple virtual machine, on any choice of microcontroller, or softcore CPU FPGA - then it becomes practical to program the virtual machine using the same machine language.

The virtual machine will be a stack machine, and will support a data stack and a return stack. These stack structures will probably be created in RAM on the virtual machine. If this was then implemented in FPGA hardware, the stacks would be separate RAM blocks, with a means of fast access, without having to go through main memory.

The ALU has access to the top and second items of the stack, and will conduct arithmetic and logical operations on these elements, leaving the result on the top of the stack. If operands are required from memory, they will first have to be loaded into the top and 2nd stack locations.

Assume that the virtual machine ALU can perform the following operations. The arithmetic and logical operations are performed on the top 2 members of the data stack, returning the result to the top.

ADD SUB MUL DIV

Multiply and Divide can be synthesised from shift and add/sub - but are time consuming without additional specialist hardware.

AND Bitwise AND OR
XOR NEG

SLL Shift Left Logical SRL Shift Right Logical

@ Fetch a word from memory and place on the top of stack ! Store a word from the top of stack to memory

BRZ Branch if zero JMP Unconditional Jump CALL Call subroutine RET Return from subroutine

LIT Put a literal number onto the stack

So with approximately 20 instructions, we have the basis of a stack machine that can do useful work.

The virtual machine to perform these operations can be written in a high level language - such as C, or actually implemented as a soft core on a FPGA. One such stack processor that lends itself to this is James Bowman's J1 Forth CPU. This compact, no thrills CPU can be defined in under 100 lines of C, or synthesised in FPGA hardware using under 200 lines of Verilog code.

The J1 is a practical CPU, designed for simplicity and optimised for high speed execution of stack based languages. It offers most of the instructions outlined above and forms the basis of an exploration into stack based CPUs. Initially it can be simulated on any processor in C code for experimentation and later blown into real high speed FPGA logic.

Even though the instruction set is very small, the 16 bit instruction word length does not lend itself to easy or memorable coding in machine language. It has several instruction fields, and these have to be populated with the correct bit pattern if we are to make any progress off the starting blocks. The instructions to however map very well onto the Forth language, but there are other alternatives which could be explored.

At the bare minimum, we need an assembler to synthesise the instruction words from the various fields, and once we have a list of 16bit hex instructions we need to load them into RAM and have the simulator step through them.

An assembler typically scans through a text file containing instruction mnemonics, such as ADD, JMP, CALL etc and converts these into the instruction opcodes. It also looks for numerical constants, variables and addresses and assembles these into the instruction. Additionally, it looks for labels that identify subroutine addresses for jumps and includes these in the code.

A relatively simple program: Mnemonics in - machine language out

Another Option

However there might be a different way to do this - in a more interactive nature - and this is where Forth or one of it's near cousins will come in. For the purpose of my exploration, I want to see if the SIMPL interpreter can be used as a means to perform this assembly step.

As we know, the SIMPL interpreter will read a series of characters from a buffer, one at a time, and execute code associated with that character.

So to add 2 numbers (say 45 and 63) in SIMPL and print out their sum

45 63+p

45 is interpreted as a literal to be pushed onto the stack The space is used in SIMPL to push the stack down 63 can then be pushed onto the stack

adds the top two members of the stack leaving the result on the top p prints the result to the serial terminal

This is almost Forth, except in SIMPL there is only a limited stack structure and the space is needed to command the stack to push down to accept another number.

Instead of executing the SIMPL instructions directly, we can hijack the SIMPL interpreter to synthesise the assembly language and the machine language needed for the virtual machine.

So 45 63+p is translated to assembly language

LIT 45 LIT 63 ADD CALL PRINT

Where PRINT is a subroutine that outputs the top of stack contents as a printed integer to the serial port

But the translation to standard traditional assembly language with 3 letter mnemonics is an un-required middle step. The SIMPL interpreter can easily produce the instruction fields and generate the machine language directly:

802D LIT 45 803F LIT 63 6200 ADD 5100 CALL 100 (PRINT is at 0x0100)

So by this process of translation, the SIMPL language is the Assembly Language - we can find enough of the SIMPL character set to map directly onto the J1 instruction set, and any of the other command characters (like p) will invoke calls to subroutines.

It might be remembered that SIMPL uses small letters and punctuation characters as primitives, numbers are enumerated as literals and capitals are reserved for the application words. This means that our little language can have approximately 60 primitive instructions, which is enough to do real work, yet takes up a fraction of the space used by the 170 words used in a typical Forth system. Less words, less typing, yet still more readable than assembly language or machine code.

So lets look at the primitive words and how it might map onto the assembly language

```
ADD
```

```
SUB
```

```
MUL
```

/ DIV % MOD & AND | IOR (Inclusive OR) ^ XOR ~ INV

LIT

@ FETCH ! STORE

: CALL ; RET

< LT = EQ

GT

j JMP z JPZ

I find Stack manipulations are never easy to remember in Forth - words like DUP, OVER, SWAP, NIP, DROP etc but as these are an essential part of the language, they need to have a syntax to code them into the assembly language. Perhaps it might be possible to express the top two stack items as a two member array between parenthesis.

DUP (1,1) OVER (2,1) SWAP (2,1) NIP (x,y) DROP (2,3)

Text Editor and Assembler

In order to start writing code for our new processor, we need a few simple tools to help us. We need a means of entering and displaying text - usually a serial terminal interface, so that we can enter machine instructions into memory and have the virtual machine execute them.

In the early days of microprocessors development kits there was often a hexadecimal keypad, 7 segment display for address and data, and a monitor program. This allowed the machine instructions to be entered directly into memory, and then the means to run code from a certain address. It was quite primitive, frustrating and subject to a lot of human typing mistakes. Nowadays, we can use the power and resources of a laptop to help get new systems up and running.

The first tool we need is a simple text editor. This will take in text from the keyboard, display it in a terminal window and allow basic editing, listing and text file storage and retrieval.

Secondly, no-one wants to code in machine language, so a very basic assembler that converts text mnemonics and numbers - one a line by line basis - to machine language would also be an asset.

For this we need simple text parsing - and in traditional Forth this was done by looking for individual words or numbers separated by spaces.

Forth would take each new word and look it up in the dictionary for a match. The match was based on the first three characters of the word, and it's length. This is quick to do and suitable for most purposes.

Numerical input, which includes integers up to 65535 were generally not stored in the dictionary, and would be converted from ASCII to integer and then placed on the stack.

The text editor and assembler can exist as one program - as they share a lot of features. Principally we will need the means to parse through the entered text looking for keywords and numbers. As there are only 20 or so keywords required to implement the instruction set of the virtual machine, the task of programming these into the assembler is not too difficult.

I have chosen mnemonics that simplify this task - the first 2 characters are unique to each mnemonic, and no mnemonic is more than 4 characters long. We can combine the first two characters to produce a unique number, and then use this number in a series of Switch-Case statements to perform the action needed.

The 16 bit virtual machine can handle integer numbers up to 65535. We need a means of detecting a number within the entered text string, and converting the ASCII characters to an integer. In a similar way to how we uniquely defined the mnemonics by the first 2 characters, we can do a quick test on the string to see if it is a number.

The assembler will convert our inputed mnemonics on a one by one basis into the machine instructions of the virtual machine. More detail on how the assembler should operate is outlined in Chapter 6 of "TECS".

Sometimes it is easier working in binary or hexadecimal, so additional assembler directives, for example BIN, HEX and DEC, could be used to instruct the assembler which base to use to interpret the numerical strings.

Assemblers make use of other directives such as ORG, and labels, to refer to points in the listing. Assemblers can be single pass or two-pass. A single pass assembler will require you to keep track of your own labels, which can be quite difficult if the assembly listing is rich in subroutines. So this is possibly one reason why Forth evolved as a language, it has it's roots in assembly language programming, but the Forth system of "words", provided an efficient and automated means of keeping track of labels and subroutines - it had to, as Forth is composed almost entirely of subroutines.

Charles Moore, created Forth to be an efficient and automated way of creating a common programming environment on each of the wide variety of mainframe machines that he encountered during the 1960s-70s. His virtual machine had to be first coded in the native machine language of the processor, but with the availability of C compilers, the VM can be coded in C.

Moore realised that the tasks of text editor, assembler, compiler and interactive interpreter could be bundled up into one efficient package, which he called Forth. How exactly this was done was at first fairly obscure, and Moore initially kept much of it to himself, before branching out around 1970 and sharing the inner secrets of Forth within the Radio Astronomy community.

If the primitive assembler can only accept valid mnemonics and numbers, then any other unrecognised text string could be considered to be a new word. A word that cannot be found in the dictionary, is treated as a new definition and is composed from the primitive instructions.

This is similar to the use of a macro label within a (two pass) assembler listing. Once a macro has been encountered and given an address, it can be used again.

So the combination of a simple text editor and line assembler will help us to build up the various Forth word definitions from the primitives. Whilst this C program is not a complete Forth system, it is a tool that helps us create the Forth system.