Compiler Architecture - NetLogo/NetLogo GitHub Wiki

See also Engine Architecture and Package Guide, since the design of the compiler and the design of the engine are of course intertwined.

Compiler Versions

When thinking about the NetLogo compiler, it helps to think of it as three separate compilers which share a number of components. The three compilers are:

The Desktop compiler: designed to be run in the front-end GUI, supports all desktop extensions
The Headless compiler: a stripped-down JVM-only compiler designed to support and validate NetLogo Web
The NetLogo Web compiler: A NetLogo-to-Javascript compiler written in scala.js and designed to be run in-browser.

The compiler infrastructure can be divided into several distinct components which will make it helpful to compare the differences between these versions:

Main: The top level of the compiler which coordinates all other activities
Lex: The process of turning a character stream into syntactic tokens
Parse: The process of turning a stream of tokens into an AST
AST Validation: Inspecting the AST to ensure that it represents a valid NetLogo program
AST Conversion: Connecting parsed primitives with their runtime representation
AST Transformation: Inspecting, optimizing, and transforming the AST for use in a particular runtime
Last-Line Optimization: Optimizations which mutate the AST substantially and may make other optimizations impossible (such as constant folding)
Code Generation ("Back end"): Turning the AST into JVM or Javascript code

We can make a table comparing which classes / packages implement which functionality in each version of the compiler. N/A indicates that a given compiler phase is skipped for that compiler.

Phase \ Compiler	Desktop	Headless	Web
Main	org.nlogo.compile.CompilerMain	org.nlogo.compile.Compiler	org.nlogo.tortoise.Compiler
Lex	org.nlogo.lex	org.nlogo.lex	org.nlogo.lex
Parse	org.nlogo.parse	org.nlogo.parse	org.nlogo.parse
AST Validation	org.nlogo.parse	org.nlogo.parse	org.nlogo.parse
AST Conversion	org.nlogo.compile.middle.FrontMiddleBridge	org.nlogo.compile.middle.FrontMiddleBridge	N/A
AST Transformation	org.nlogo.compile.middle.MiddleEnd	org.nlogo.compile.middle.MiddleEnd	N/A
Last-Line Optimization	org.nlogo.compile	org.nlogo.compile.back	N/A
Code Generation	org.nlogo.compile / org.nlogo.generate	org.nlogo.compile.back / org.nlogo.generate	org.nlogo.tortoise.{ Handler, Prims }

Main entry points

The primary class is Compiler, which implements CompilerInterface. The main entry points are compileProgram() and compileMoreCode(). These correspond to two different modes of operation: we're either compiling the entire Code tab, or we're compiling an additional snippet of code (e.g. code in a button, or typed into the Command Center) which is compiled with reference to an already-compiled Code tab.

If compilation fails, a CompilerException is raised.

If compilation succeeds, the result of compilation is a CompilerResults object:

case class CompilerResults(procedures: Seq[Procedure], program: Program)

In the case of compileMoreCode(), procedures will contain only a single Procedure.

Other entry points

Some additional entry points support things like syntax highlighting, syntax checking, etc.

Phases

Compilation proceeds in phases. Each phase is implemented by a different class. If you read through Compiler.compile(), you will see each phase in order. The phases are: Tokenizer, StructureParser, IdentifierParser, ExpressionParser, Visitors, Optimizer, TypeParser, ArgumentStuffer, Assembler, Generator.

The phases may informally be divided into three groups: front end, middle end, and back end. The front end does parsing and some semantic analysis; the middle end does AST rewriting and further semantic analysis; the back end does linearization of control structures and code generation.

This diagram illustrates the overall structure of compilation:

(See the diagram legend below for how to read this diagram)

The phases are described individually below.

Front End: Parse

We can subdivide the front-end into two subphases - parsing and transformations. We will cover these in order.

StructureParser.parseSources

input: String containing program text
output: (StructureResults, Seq[FrontEndProcedure])

StructureParser uses Tokenizer to convert program text into Tokens, then uses scala parser-combinators to determine program declarations. The information in the declarations (such as globals, breeds, etc.) is stored in the new Program that is returned as a part of StructureResults. Procedures are created, and the tokens in each procedure body are stored in a map. Also creates new FrontEndProcedures containing name, arguments, and whether the procedure is a reporter; the tokens in each procedure body are kept in an Iterable[Token] available in a Map on the returned StructureResults.

StructureParser.usedNames

input: StructureResults, ProceduresMap containing old procedures
output: SymbolTable

usedNames produces a global symbol table including names of primitives, breeds, agent variables, and procedure names. This is used by ExpressionParser to determine whether a symbol is undefined. A nice refactor in the future would be to have SymbolTable hold references to the appropriate primitive or agent variable instead of having TransformableTokenStream use TokenMapper through Namer.

ExpressionParser

input: A TransformableTokenStream of named tokens constituting a single procedure body and a SymbolTable
output: A ProcedureDefinition AST object

ExpressionParser is responsible for taking a stream of tokens and turning it into a core AST. The ProcedureDefinition returned is the primary output of the FrontEnd of the compiler.

Front End: Transformations

The Compiler Front-End handles parsing, but also performs several checks and transformations on the resulting AST. These are described below.

LetReducer

input: ProcedureDefinition
output: ProcedureDefinition

Removes a special _letname primitive used only during parsing from the AST.

CarefullyVisitor

input: ProcedureDefinition
output: ProcedureDefinition

Connects the Let set by a carefully primitive on error with the error-message primitive used to access the value of that Let.

ClosureTagger

input: ProcedureDefinition
output: ProcedureDefinition

Determines the variables that anonymous commands and reporters close over and attaches those to the respective _commandlambda and _reporterlambda primitives.

SourceTagger

input: ProcedureDefinition
output: ProcedureDefinition

Sets the source for each _commandlambda and _reporterlambda primitive.

LetVerifier

input: ProcedureDefinition
output: ProcedureDefinition

Forbids statments of the form let x x.

AgentTypeChecker

input: Seq[ProcedureDefinition]
output: mutates FrontEndProcedure attached to ASTs passed in and core prims contained in the AST

Iterates through procedures and blocks determining their agent-class (which agents can they be run by). Attempts to constain each block and procedure as tightly as possible. Raises errors if there are any blocks which cannot be run by any type of agent (for instance [ show end1 pcolor ]).

ControlFlowVerifier

input ProcedureDefinition
output: raises exception if invalid

Validates that the following constraints hold:

stop may not used inside a reporter
report may not be used inside a command or an anonymous command
report must appear immediately inside a reporter and not inside an ask inside a reporter.

Front-Middle Bridge

This part of compilation turns core.ProcedureDefinitions with core primitives into nvm.ProcedureDefinitions with nvm primitives. It also turns FrontEndProcedures into nvm.Procedures (unless they are already those).

Middle end phases

The Middle-End of the compiler is all about platform-specific transformations. Note that all of these transformations are desktop-only, so they are not yet implemented in NetLogo Web. Initially, all of the stages in the middle end mutated the AST instead of producing a new one. While most of the stages in the middle end still do mutate the AST, it is strongly recommended that new changes in the middle end be implemented as transformations on the AST instead of mutations.

LambdaLift

input: ProcedureDefinition
output: Seq[ProcedureDefinition]

Lambda lifting turns all anonymous commands into their own procedures. This is important because it enables further compilation and optimization for each of those procedures. Several of the other stages in middle end optimize by turning let-variables (stored in nvm.Binding) into local variables (stored in an array on nvm.Activation). Lifting command-lambdas into procedures makes them eligible for these types of optimizations.

ReferenceTransformer

input: ProcedureDefinition
output: ProcedureDefinition

Referencer primitives are primitives which accept the name of an agent variable as an argument and operate on that named variable without getting its value from a particular agent. The canonical example is the diffuse primitive, although uphill and downhill primitives also have this behavior, as do certain primitives in the GIS extension. ReferenceTransformer finds these primitives in the AST, turns the appropriate argument into an agent type and variable number, combines that information with the existing Referencer primitive to form a new primitive to replace the original, and removes the reference argument from the AST. ReferenceTransformer also affects extension primitives which accept arguments of ReferenceType. In the case of extension primitives it works slightly differently: it replaces the argument primitive, turning it from an _agentvariable primitive into a constant primitive which returns the agent type, variable number, and name.

ScopeTransformer

input: ProcedureDefinition
output: ProcedureDefinition

SourceTagger

input: ProcedureDefinition
side effects: mutates ASTs

Walks through the AST and tags each primitive with its source code. Primitives have two source fields: source and fullSource. source is given by the user text found in the appropriate SourceLocation (this is the name of the primitive as the user typed it, like "pcolor" or "PI"). fullSource is given by the user text of a primitive and the text of all of its arguments, including any blocks or anonymous procedures. This information is most commonly used to provide effective errors to users.

SimpleOfVisitor

input: ProcedureDefinition
side effects: mutates ASTs

Converts _of(_*variable) to _*variableof.

LambdaVariableVisitor

input: ProcedureDefinition
side effects: mutates ASTs

LocalsVisitor

input: ProcedureDefinition
side effects: mutates ASTs

Changes let variables into Procedure-level local variables (slots in the Activation) where possible.

RepeatVisitor

input: ProcedureDefinition
side effects: mutates ASTs

SetVisitor

input: ProcedureDefinition
side effects: mutates ASTs

Make _set calls more specific, e.g. _setobservervariable.

Optimizer

input: ProcedureDefinition
side effects: mutates ASTs

Optimizer

This is just a ninth visitor, but it's significant enough to deserve treatment as a phase unto itself.

input: ProcedureDefinition
side effects: mutates ASTs

Rewrites parts of the parse tree into forms which should run faster.

For example the combination of _any and _with is rewritten to a call to _anywith which exits early once an example is found.

Another example is the replacing of _with(_patches,_equal(_patchvariable:PXCOR,_constdouble:5.0)) with _patchcolumn:5.

Each optimization is a subclass of compile.api.ReporterMunger or compile.api.CommandMunger.

A full list of optimizations and how they are performed can be found in Optimization List. Note that Desktop and GUI use slightly different sets of Optimizations which can result in slightly different numerical results.

Back end phases

Constant Folder

input: ProcedureDefinition
side effects: mutates Pure Instructions into constants

Prform computations at compile time (see wikipedia article).

ArgumentStuffer

input: ProcedureDefinition
side effects: mutates args arrays in Instructions

Fills the args arrays, in all of the Instructions anywhere in the Procedure, with Reporters.

No real action happens here. We're basically just discarding the "scaffolding" of the tree of AstNodes, leaving just the Command and Reporter objects themselves.

Assembler

input: ProcedureDefinition
side effect: stores Array[Command] in code field of Procedure

Mainly we're just discarding the "scaffolding" of the tree of AstNodes, leaving just the Command and Reporter objects themselves.

Exception: optimization of some types of tail recursion happens here, though. (Perhaps that could be reimplemented as a Visitor?)

Most commands are assembled generically, but some commands are “custom assembled”. Typically these are control structures such as ask and while. Each custom assembled command has an assemble() method which directs its own assembly via an AssemblerAssistant.

Generator

input: Procedure
side effects: alters/replaces Command and Reporter objects in the procedure's code field

Translates some commands and reporters directly into JVM byte code. Replaces Command and Reporter instances with instances of built-on-the-fly custom subclasses of GeneratedCommand and GeneratedReporter.

Optional phase, can be disabled.

Compiler Architecture - NetLogo/NetLogo GitHub Wiki

Compiler Versions