Compiler Architecture - NetLogo/NetLogo GitHub Wiki
See also Engine Architecture and Package Guide, since the design of the compiler and the design of the engine are of course intertwined.
When thinking about the NetLogo compiler, it helps to think of it as three separate compilers which share a number of components. The three compilers are:
- The Desktop compiler: designed to be run in the front-end GUI, supports all desktop extensions
- The Headless compiler: a stripped-down JVM-only compiler designed to support and validate NetLogo Web
- The NetLogo Web compiler: A NetLogo-to-Javascript compiler written in scala.js and designed to be run in-browser.
The compiler infrastructure can be divided into several distinct components which will make it helpful to compare the differences between these versions:
- Main: The top level of the compiler which coordinates all other activities
- Lex: The process of turning a character stream into syntactic tokens
- Parse: The process of turning a stream of tokens into an AST
- AST Validation: Inspecting the AST to ensure that it represents a valid NetLogo program
- AST Conversion: Connecting parsed primitives with their runtime representation
- AST Transformation: Inspecting, optimizing, and transforming the AST for use in a particular runtime
- Last-Line Optimization: Optimizations which mutate the AST substantially and may make other optimizations impossible (such as constant folding)
- Code Generation ("Back end"): Turning the AST into JVM or Javascript code
We can make a table comparing which classes / packages implement which functionality in each version of the compiler. N/A indicates that a given compiler phase is skipped for that compiler.
Phase \ Compiler | Desktop | Headless | Web |
---|---|---|---|
Main | org.nlogo.compile.CompilerMain | org.nlogo.compile.Compiler | org.nlogo.tortoise.Compiler |
Lex | org.nlogo.lex | org.nlogo.lex | org.nlogo.lex |
Parse | org.nlogo.parse | org.nlogo.parse | org.nlogo.parse |
AST Validation | org.nlogo.parse | org.nlogo.parse | org.nlogo.parse |
AST Conversion | org.nlogo.compile.middle.FrontMiddleBridge | org.nlogo.compile.middle.FrontMiddleBridge | N/A |
AST Transformation | org.nlogo.compile.middle.MiddleEnd | org.nlogo.compile.middle.MiddleEnd | N/A |
Last-Line Optimization | org.nlogo.compile | org.nlogo.compile.back | N/A |
Code Generation | org.nlogo.compile / org.nlogo.generate | org.nlogo.compile.back / org.nlogo.generate | org.nlogo.tortoise.{ Handler, Prims } |
The primary class is Compiler
, which implements CompilerInterface
. The main entry points are compileProgram()
and compileMoreCode()
. These correspond to two different modes of operation: we're either compiling the entire Code tab, or we're compiling an additional snippet of code (e.g. code in a button, or typed into the Command Center) which is compiled with reference to an already-compiled Code tab.
If compilation fails, a CompilerException
is raised.
If compilation succeeds, the result of compilation is a CompilerResults
object:
case class CompilerResults(procedures: Seq[Procedure], program: Program)
In the case of compileMoreCode()
, procedures
will contain only a single Procedure.
Some additional entry points support things like syntax highlighting, syntax checking, etc.
Compilation proceeds in phases. Each phase is implemented by a different class. If you read through Compiler.compile()
, you will see each phase in order. The phases are: Tokenizer, StructureParser, IdentifierParser, ExpressionParser, Visitors, Optimizer, TypeParser, ArgumentStuffer, Assembler, Generator.
The phases may informally be divided into three groups: front end, middle end, and back end. The front end does parsing and some semantic analysis; the middle end does AST rewriting and further semantic analysis; the back end does linearization of control structures and code generation.
This diagram illustrates the overall structure of compilation:
(See the diagram legend below for how to read this diagram)
The phases are described individually below.
We can subdivide the front-end into two subphases - parsing and transformations. We will cover these in order.
- input:
String
containing program text - output:
(StructureResults, Seq[FrontEndProcedure])
StructureParser uses Tokenizer
to convert program text into Tokens, then uses scala parser-combinators to determine program declarations.
The information in the declarations (such as globals, breeds, etc.) is stored in the new Program
that is returned as a part of StructureResults. Procedure
s are created, and the tokens in each procedure body are stored in a map.
Also creates new FrontEndProcedures containing name, arguments, and whether the procedure is a reporter;
the tokens in each procedure body are kept in an Iterable[Token]
available in a Map
on the returned StructureResults.
- input:
StructureResults
,ProceduresMap
containing old procedures - output:
SymbolTable
usedNames
produces a global symbol table including names of primitives, breeds, agent variables, and procedure names.
This is used by ExpressionParser
to determine whether a symbol is undefined.
A nice refactor in the future would be to have SymbolTable
hold references to the appropriate primitive or agent variable instead of having TransformableTokenStream
use TokenMapper
through Namer
.
- input: A
TransformableTokenStream
of named tokens constituting a single procedure body and aSymbolTable
- output: A
ProcedureDefinition
AST object
ExpressionParser
is responsible for taking a stream of tokens and turning it into a core AST.
The ProcedureDefinition
returned is the primary output of the FrontEnd
of the compiler.
The Compiler Front-End handles parsing, but also performs several checks and transformations on the resulting AST. These are described below.
- input:
ProcedureDefinition
- output:
ProcedureDefinition
Removes a special _letname
primitive used only during parsing from the AST.
- input:
ProcedureDefinition
- output:
ProcedureDefinition
Connects the Let
set by a carefully
primitive on error with the error-message
primitive used to access the value of that Let
.
- input:
ProcedureDefinition
- output:
ProcedureDefinition
Determines the variables that anonymous commands and reporters close over and attaches those to the respective _commandlambda
and _reporterlambda
primitives.
- input:
ProcedureDefinition
- output:
ProcedureDefinition
Sets the source for each _commandlambda
and _reporterlambda
primitive.
- input:
ProcedureDefinition
- output:
ProcedureDefinition
Forbids statments of the form let x x
.
- input:
Seq[ProcedureDefinition]
- output: mutates
FrontEndProcedure
attached to ASTs passed in and core prims contained in the AST
Iterates through procedures and blocks determining their agent-class (which agents can they be run by).
Attempts to constain each block and procedure as tightly as possible.
Raises errors if there are any blocks which cannot be run by any type of agent (for instance [ show end1 pcolor ]
).
- input
ProcedureDefinition
- output: raises exception if invalid
Validates that the following constraints hold:
-
stop
may not used inside a reporter -
report
may not be used inside a command or an anonymous command -
report
must appear immediately inside a reporter and not inside anask
inside a reporter.
This part of compilation turns core.ProcedureDefinition
s with core primitives into nvm.ProcedureDefinition
s with nvm primitives.
It also turns FrontEndProcedures
into nvm.Procedure
s (unless they are already those).
The Middle-End of the compiler is all about platform-specific transformations. Note that all of these transformations are desktop-only, so they are not yet implemented in NetLogo Web. Initially, all of the stages in the middle end mutated the AST instead of producing a new one. While most of the stages in the middle end still do mutate the AST, it is strongly recommended that new changes in the middle end be implemented as transformations on the AST instead of mutations.
- input:
ProcedureDefinition
- output:
Seq[ProcedureDefinition]
Lambda lifting turns all anonymous commands into their own procedures.
This is important because it enables further compilation and optimization for each of those procedures.
Several of the other stages in middle end optimize by turning let-variables (stored in nvm.Binding
) into local variables (stored in an array on nvm.Activation
).
Lifting command-lambdas into procedures makes them eligible for these types of optimizations.
- input:
ProcedureDefinition
- output:
ProcedureDefinition
Referencer
primitives are primitives which accept the name of an agent variable as an argument and operate on that named variable without getting its value from a particular agent.
The canonical example is the diffuse
primitive, although uphill
and downhill
primitives also have this behavior, as do certain primitives in the GIS extension.
ReferenceTransformer
finds these primitives in the AST, turns the appropriate argument into an agent type and variable number, combines that information with the existing Referencer
primitive to form a new primitive to replace the original, and removes the reference argument from the AST.
ReferenceTransformer
also affects extension primitives which accept arguments of ReferenceType
.
In the case of extension primitives it works slightly differently: it replaces the argument primitive, turning it from an _agentvariable
primitive into a constant primitive which returns the agent type, variable number, and name.
- input:
ProcedureDefinition
- output:
ProcedureDefinition
- input:
ProcedureDefinition
- side effects: mutates ASTs
Walks through the AST and tags each primitive with its source code.
Primitives have two source fields: source
and fullSource
.
source
is given by the user text found in the appropriate SourceLocation
(this is the name of the primitive as the user typed it, like "pcolor"
or "PI"
).
fullSource
is given by the user text of a primitive and the text of all of its arguments, including any blocks or anonymous procedures.
This information is most commonly used to provide effective errors to users.
- input:
ProcedureDefinition
- side effects: mutates ASTs
Converts _of(_*variable)
to _*variableof
.
- input:
ProcedureDefinition
- side effects: mutates ASTs
- input:
ProcedureDefinition
- side effects: mutates ASTs
Changes let variables into Procedure-level local variables (slots in the Activation) where possible.
- input:
ProcedureDefinition
- side effects: mutates ASTs
- input:
ProcedureDefinition
- side effects: mutates ASTs
Make _set
calls more specific, e.g. _setobservervariable
.
- input:
ProcedureDefinition
- side effects: mutates ASTs
This is just a ninth visitor, but it's significant enough to deserve treatment as a phase unto itself.
- input:
ProcedureDefinition
- side effects: mutates ASTs
Rewrites parts of the parse tree into forms which should run faster.
For example the combination of _any
and _with
is rewritten to a call to _anywith
which exits early once an example is found.
Another example is the replacing of _with(_patches,_equal(_patchvariable:PXCOR,_constdouble:5.0))
with _patchcolumn:5
.
Each optimization is a subclass of compile.api.ReporterMunger
or compile.api.CommandMunger
.
A full list of optimizations and how they are performed can be found in Optimization List. Note that Desktop and GUI use slightly different sets of Optimizations which can result in slightly different numerical results.
- input:
ProcedureDefinition
- side effects: mutates
Pure
Instructions into constants
Prform computations at compile time (see wikipedia article).
- input:
ProcedureDefinition
- side effects: mutates args arrays in Instructions
Fills the args arrays, in all of the Instructions anywhere in the Procedure, with Reporters.
No real action happens here. We're basically just discarding the "scaffolding" of the tree of AstNodes, leaving just the Command and Reporter objects themselves.
- input:
ProcedureDefinition
- side effect: stores
Array[Command]
incode
field ofProcedure
Mainly we're just discarding the "scaffolding" of the tree of AstNodes, leaving just the Command and Reporter objects themselves.
Exception: optimization of some types of tail recursion happens here, though. (Perhaps that could be reimplemented as a Visitor?)
Most commands are assembled generically, but some commands are “custom assembled”. Typically these are control structures such as ask
and while
. Each custom assembled command has an assemble()
method which directs its own assembly via an AssemblerAssistant
.
- input:
Procedure
- side effects: alters/replaces
Command
andReporter
objects in the procedure'scode
field
Translates some commands and reporters directly into JVM byte code. Replaces Command and Reporter instances with instances of built-on-the-fly custom subclasses of GeneratedCommand and GeneratedReporter.
Optional phase, can be disabled.