Guarded Tokens - RopleyIT/GLRParser GitHub Wiki
In traditional parsers, terminal and non-terminal tokens were the only type of element used for describing grammar rules. However, parser generators create complex finite state machines that implement/enforce the grammar rules when parsing a stream of terminal input tokens. We can also use the grammar description file therefore to describe a state machine directly rather than use it to describe grammars for languages.
When doing this, we tend to use stylised rules that obey a particular format. The non-terminal token to the left side of a rule represents a state our state machine will jump to. The first non-terminal on the right side of the rule is the state we might transit from. The second token on the right side of the rule is a terminal token representing the input event that will cause the transition between these state.
In conventional state machines, transitions between states are caused by events that are valid input events for the starting state. However, a common extension is for the transition to only be fired if the event occurs, and an associated guard condition is true at the time the event fires. The parser generator used in the off-line parser (parselr.exe) and the inline parser support the specification of a boolean guard expression attached to any token, be it terminal or non-terminal.
A guard expression is always enclosed in square brackets, and consists of guard function names and the boolean operators for not (!), and (&) and or (|). Parentheses can be included in the boolean expressions to make ultra clear the expected precedence of the boolean operations. Hence the following grammar rule, representing a state transition accompanied by a guard expression, has a sensible meaning:
Attached : Unattached MEETSPARTNER[(!Married) & ((Tall & Dark & Handsome) | Rich)];
As standard precedence is enforced, with the not (!) operator being higher precedence than and (&), and both being higher precedence than or (|), the following guard has the same meaning, albeit slightly less readable:
Attached : Unattached MEETSPARTNER[!Married & (Tall & Dark & Handsome | Rich)];
Warning: a close inspection of what this guard condition implies may slightly offend your scruples! Note also that using an LR(1) grammar in this way to write a simple finite state machine is not particularly efficient. The ParseLR toolkit has an alternative, much more efficient technique for creating simple finite state machines that is described elsewhere.
It is also possible to place guard expressions on non-terminal symbols in grammar rules. The meaning of this is that the guard condition is applied to the last terminal symbol in any descendant rules from the non-terminal with the guard after it. Hence in the following set of rules where lower-case tokens represent non-terminals, and upper-case tokens represent terminals:
w : x y[GUARD];
y : T
| V
| // Empty rule
;
x : R S;
the guard maps back onto the last terminal tokens of the rules for x and y as rewritten below. Note that the empty rule for y also allows the guard to map back further onto preceding tokens for y:
w : x y1
| x1 y0
;
y1 : T[GUARD]
| V[GUARD]
;
y0 : // Empty rule
;
x1 : R S[GUARD]
;
The meaning for guards on non-terminals is as follows. The reduction to a non-terminal that has a guard on it will only take place if all descendent rules from that non-terminal have the guard condition evaluating to true immediately before the reduction takes place. If that guard condition is not true, the reduction will not be applied. This means that either a parsing error will occur because the input token and guard stream does not match the grammar, or if another rule with different guard conditions does match the input stream, that would be reduced successfully instead.
In particular, note that applying a guard to the first token in a grammar rule, where that token is non-terminal and has the empty rule among its own rules, will cause the guard to be propagated back to any parent rule in which the grammar rule's left-hand token appears. For example, the following grammar:
a : T b;
b : c[GUARD];
c : U V
| // Empty rule
;
would be rewritten internally by the parser so that the parent rule for non-terminal 'a' would be affected in the following way:
a : T b1
| T[GUARD] b0;
b1 : c1;
b0 : c0;
c1 : U V[GUARD];
c0 : // Empty rule
;
An interesting (though not necessarily useful) consequence of the propagation back of guard conditions is that under some circumstances it is possible for the first thing that happens in an entire parse of a grammar to be a test of a guard condition. This can occur if the top-level grammar non-terminal has as a descendant of its first token an empty rule, and the same descendant somewhere in the grammar has a guard condition applied to it. The parsers created by this parser generator are able to cope with this should it occur.
There may be places in the grammar where the parser is faced with two alternative parsing paths that differ only in the guard condition applied to the next token. More specifically the two paths branch on recognition of the same next token, but that token appears in the current state with two different guard conditions, or one with a guard and the other without.
When this is encountered, the parser generator evaluates the guard conditions on both tokens, and applies the following rules to determine which to test for first when deciding which rule to apply:
- If one token has a guard and the other does not, the token with the guard is checked for first. Hence the transition with no guard is actually only taken when he guard on the first rule is not true;
- If guard condition C1 is always true when guard condition C2 is true, but not vice versa, C1 is said to be a subset of C2. In this case, the token with guard C1 is tested for first. This has the effect of making the real condition for the token with a guard C2 applied to be C2 AND NOT C1;
- In all other cases, the rules with like tokens are ordered by the Hamming weight of the boolean function's truth table. This means that a boolean function for which the output truth table has N true values will be evaluated later than a boolean function with M true values if N is greater than M. In reality, this is the only condition that is evaluated when ordering the guard conditions, but results in the behaviour described in the above two rules. Note that in this case, a warning is generated by the parser generator, because the developer of the grammar ought to inspect the resulting parser behaviour to satisfy themselves that the transitions are being evaluated in the expected order.