GLRParser Grammar Syntax - RopleyIT/GLRParser GitHub Wiki
In this section we shall describe the syntax for the input grammar files that are to be converted into a parser finite state machine for recognising input token sequences that conform to the grammar. As the grammar syntax is fairly extensive, it is described over several sections.
The basic model is that you write a formal grammar specification and store it to a file with a '.g' extension to its name. The parselr command-line program is then run with your grammar file as an argument. This causes a new file filled containing C# source code to be generated. That C# source file, along with a separately-written C# source file containing an input tokeniser, is added to the project in Visual Studio or other C# development environment you are using. References to the Parsing.dll and ParserGenerator.dll libraries are added to the project, and the project compiled.
The process of constructing an input tokeniser is described elsewhere. Here we focus on the contents of the input grammar file.
The input grammar file contains between two and four discrete sections. These are in order:
- The
options
section. This contains a list of parser options that are either used by the parser itself to modify its behaviour, or as flags to pass on to the compiled output code. This section can be omitted if you are happy to just accept a set of defaults, or if you are generating an in-line parser at runtime. - The
events
ortokens
section. This is a mandatory section of the input grammar and contains the list of different input token types that may appear when retrieved from the input tokeniser. For example, if you have written a parser that reads a C# program, examples of your input tokens might be a token representing the keywordpublic
in the input stream, or a token representing the operator '+=
' in the input stream. - The
guards
orconditions
section. This section is optional. Regular grammars do not impose guard conditions on tokens, so this is often omitted. - The
grammar
section. This section contains all the grammar rules, and is usually by far the most complicated section of the grammar to write. The full syntax for how to write this is given elsewhere in this set of documentation. The section is mandatory.
All sections begin with a keyword as given in the list of section
descriptions above. Some section types have two keywords that are aliases for
each other, namely events
or
tokens
, and guards
or conditions
. The grammar section takes an
argument to the grammar keyword in parentheses where the argument is the name of
the top level rule name in the grammar that must have been parsed for the
grammar to be recognised as complete. All sections have bodies enclosed in
curly braces.
options
{
... options go here ...
}
events
{
... events or tokens go here ...
}
guards
{
... guard condition function names here ...
}
grammar(rootSymbol)
{
... grammar description goes here ...
}