Grammar Multiplicity - RopleyIT/GLRParser GitHub Wiki
Rules in a grammar consist of a token to the left of a
colon, followed by a list of tokens with optional guards applied to them on
the right. Sometimes we wish to express that the same token can occur
optionally, or a number of times. Typically in parser input syntax this has
been done by writing several rules, some of them recursive. For example, if
we wanted to write a rule in which the token optToken
was optional, we might
end up writing a set of rules as follows:
rule: precedingTokens optionalToken followingTokens ;
optionalToken :
optToken
|
;
The action functions or inline action code blocks for the three rules would need to be structured so that the top rule received back, say, a null object from the third (empty) rule, and valid data from the second rule.
Similarly, if we wanted to implement a list of zero to many instances of a token, we would write a pair of child rules using recursion to implement the list:
rule: precedingTokens optionalTokenList followingTokens ;
optionalTokenList :
optionalTokenList optToken
|
;
The recursive rule allows reductions to accept a list of zero or more consecutive
optToken
elements. The action functions or inline action code would typically arrange that
$$
contains
a list of the resulting values from each of the optToken
elements. The list would then arrive
in the parent rule as a list whose length and content could be interrogated in the parent
rule's action code.
A similar construct can be used for a list of one to many instances of an element:
rule: precedingTokens tokenList followingTokens ;
tokenList :
tokenList optToken
| optToken
;
ParseLR grammars have an abbreviated construct that allows optional, optional lists, and non-empty lists of tokens to be directly represented in a grammar rule. Internally, the parser generator creates instances of the rules as shown above.
To represent differing multiplicity on tokens in a grammar rule, the tokens are followed by a multiplicity symbol. The symbol used is consistent with regular expressions, and with other parsers that have similar capability:
SYMBOL USED | MULTIPLICITY |
'?' | Optional token (zero or one occurrences) |
'*' | Optional list of tokens (zero through many occurrences) |
'+' | Non-empty list of tokens (one through many occurrences) |
For example, if we wanted to write a rule in which the first right hand side token was optional, the second was an optional list, and the third a non-empty list, we might write the rule in the form:
rule: first? second* third+ ;
Whenever a token is followed by a multiplicity symbol, the token's $N
value as used
in the action code on rule reduction is of type List<object>
. The elements in
this list are the values from each of the instances of the multiple tokens within the
sequence. In particular, empty lists still return a List<object>
, but with zero
elements in it. Similarly the single instance in a
'?'
multiplicity is still returned
as a single element within a list of objects. Your action code should be written
accordingly.
An example of a rule accessing the individual elements in a list is shown below:
rule:
child*
{
List<object> children = $0 as List<object>;
foreach(object o in children)
DoSomethingWithString((string)o);
}
;
child:
someTokens
{
$$ = SomethingThatGetsAStringFrom($0);
}
;
The extraction and casting to actual type of each of the arguments in a
List<object>
is tedious
and error prone. There is therefore a generic method provided in the
Parsing.Parser
and in the Parsing.GeneralisedParser
base classes with signature
public IList<T> AsList<T>(object
arg);
that takes
one of the token values as the parameter to its constructor, and interprets it as an
IList<T>
for
you. Since the action code is either written inline in the grammar, or within the source code for a partial
parser class, the Parsing
namespace is typically already included by a
using
statement at the top of the
source file. An example of its use is shown below:
rule:
child*
{
IList<string> children = AsList<string>($0);
foreach(string s in children)
DoSomethingWithString(s);
}
;
child:
someTokens
{
$$ = SomethingThatGetsAStringFrom($0);
}
;
Similarly for optional tokens (those using the '?' as their multiplicity indicator) it is
not intuitive to treat these as a collection using the AsList<T>
helper
method. There is an interface in the Parsing
namespace called
IOptional<T>
that contains the boolean property HasValue
and
the generic property T Value
. This allows the writer of action code in
grammars to use a helper function AsOptional<T>($N)
to return an
instance of this interface. An example of its use appears below:
rule:
child?
{
IOptional<string> child = AsOptional<string>($0);
if(child.HasValue)
DoSomethingWithString(child.Value);
}
;
child:
someTokens
{
$$ = SomethingThatGetsAStringFrom($0);
}
;
A distinguishing feature of ParseLR is that each token, terminal or non-terminal, can have a guard condition associated with it when it appears to the right hand side of a grammar rule. In order for the token to be shifted, or for a rule to be reduced, the input token must match and the guard condition evaluated at that point in time must be true. Guard conditions can be used in conjunction with multiplicity symbols too.
What is interesting is that the multiplicity can be
associated with one of two guard conditions. In a rule element with
multiplicity applied, such as token*
for example, a guard
condition might need to be evaluated for each successive instance of
token
. A guard condition might also be needed that applies to the
whole list once all its components have been recognised, and could even be
the guard condition that determines when the list is finished. To capture
this, guard conditions may be placed before and/or after the multiplicity
symbol. Consider the following example:
rule:
rule : earlierTokens token[GuardOnEach]*[GuardOnWholeList] laterTokens ;
In this rule, GuardOnEach
is tested for each instance of token
for it to be
recognised. GuardOnWholeList
must be true for the end of the list. In
practice, the parser generator expands this rule internally as follows:
rule:
rule : earlierTokens zeroToMany_token_GuardOnEach[GuardOnWholeList] laterTokens ;
zeroToMany_token_GuardOnEach : zeroToMany_token_GuardOnEach token[GuardOnEach]
|
;
Similar expansions handle the multiplicities '+' and '?'.