GLR Parser - RopleyIT/GLRParser GitHub Wiki
The parser tables used by LR(1) and GLR parsers are the same, except for two small differences. First, when constructing a set of parser tables for an LR(1) parser, the parser generator tries to identify places where shift-reduce and reduce-reduce conflicts occur, and will fail to produce a set of parser tables (and hence a parser) if these conflicts are encountered. This is not the case when constructing a GLR parser, as the GLR parsing algorithm merely clones the parser stacks when a conflict is encountered.
Second, there is a property in the IParser
interface (implemented by both the
Parser
class and the GeneralisedParser
class) called
ParserType
. This readonly
string property is set to "LR1"
for a regular lookahead parser, and to
"GLR"
for
a generalised parser.
Using ParseLR to create source code for an offline GLR parser is virtually the same as
for an LR(1) parser, which is described elsewhere.
The primary difference is that the application-specific parser class should have GeneralisedParser
as
its base class rather than Parser
. Since these two base classes both implement the
IParser
interface, and the methods and properties of that interface are all you need to
use from your parser, there is very little you have to do when writing your parser class.
The only thing you will need to be aware of is that you may have to write merge action functions to resolve those places in the grammar where two separate valid parses merge on a reduction to the same non-terminal. You can find out how to write these merge functions here.
Writing inline GLR parsers similarly is very much like writing inline LR(1) parser classes.
The main difference is that you must derive your application-specific parser class from
the GeneralisedParser
base class rather than the Parser
base class. In addition, when you use the parser factory's ParserFactory<MyParser>.InitializeFromGrammar
method, you must make sure the generalisedParser
parameter is set to true
. This will automatically
ensure that the right kind of parser is created and used at runtime to handle the input token
or event streams, and will turn off shift-reduce and reduce-reduce conflict rejection
when building the parser tables. The ParserFactory<MyParser>.CreateInstance()
method is
then used to create each instance of a generalised parser.
When a GLR parser is run on a sequence of input tokens or events, its behaviour is slightly different
to the behaviour of an LR(1) parser. As it is designed to capture every valid parse of an ambiguous grammar,
at the end of parsing you will be left with an array of parse trees, each element of which represents
a valid interpretation of the input token sequence. These parse trees are found in a property of the GeneralisedParser
class whose declaration looks like: IToken[] ParserResults;
where each IToken
is
actually an instance of the NonterminalToken
class, and is the root of a different valid parse tree.
To understand how this tree is constructed beneath a NonterminalToken
object, visit this page.
A second behavioural difference relates to the action code executed when each grammar rule is recognised
and reduced. Action code is no longer executed as soon as a rule is reduced, because there may be
other parse trees being composed at the same time, for which different rules might be recognised, and we don't yet know
which of these parses is the true one. Instead,
the blocks of action code are executed in exactly the same order as they would have been in an LR(1) parser, but
only when you retrieve the value of the ParserResults[N].Value
property after the parse
is over.
Lastly, note that a GLR parser is often used with merge action
functions to resolve those places in the grammar where two separate valid parses merge
on a reduction to the same non-terminal. The intention of these functions is to help the parser when there is
ambiguity in the input grammar, by telling the parser which of the two interpretations to drop and which to keep.
Hence, in many cases, the ParserResults
array property in the GLRParser
class
should end up with a length of one, and the only valid resulting parse tree should be the one in
ParserResults[0]
.