Grammar Tokens - RopleyIT/GLRParser GitHub Wiki

The list of tokens or input events

The tokens or events section (the names are synonyms) contains a definitive list of the recognised input tokens that the parser consumes. This is the alphabet of tokens that the input tokeniser object must provide. Arrival of any other value than those in this list will cause the parser to report an error. Although the two keywords have exactly the same effect, in practice you might use the word tokens when constructing a traditional parser that reads input tokens from a tokeniser, but might use the keyword events when using the grammar to describe a state machine. In this latter case the tokeniser is really the class that provides the input stream of input events that the state machine responds to.

Each token has a name and a value, however the value can be left out if you are happy for the parser generator to provide a value automatically. If providing a value of your own, it must be an integer value between zero and 16383 inclusive. The auto-generated token values lie outside this range, so it is possible to mix defined and auto-generated token values within the same grammar.

Once the parser has been created from your grammar, it is possible for your tokeniser to look up the values that have been assigned to each token name. Parser generation creates a dictionary-like property in the parser factory whose name is Tokens. If you want to look up the value that was allocated to the token with name SQUIGGLE, and your parser class is named MyParser, the following code will obtain the token value:


int squiggleTokenValue = ParserFactory<MyParser>.Tokens["SQUIGGLE"];

Note that the above process will be quite inefficient if called to look up token values repeatedly while scanning input. It might be better to look these up and cache them into named readonly integers, or perhaps assign values to the tokens in the grammar that match those in the tokeniser.

Entries in the tokens section of the grammar are separated by commas. There is no comma between the final entry and the closing curly brace. Each entry consists of the token name, and optionally an equals sign followed by an integer numeric value.

Example


tokens
{
	INTEGER = 1,     // Has a user-specified token value
	IDENTIFIER = 2,  // Also has a user=specified value
	PLUS,            // Will be allocated a value beyond 16384
	MINUS,
	TIMES,
	DIVIDE,
	LPAREN,
	RPAREN
}

Note that there are some reserved token names used internally by the parser generator, which you should not use. These are currently: EOF, SOF, ERR, and _Start.

⚠️ **GitHub.com Fallback** ⚠️