Using the output - Logicalshift/TameParse GitHub Wiki
One thing that differentiates TameParse from other LR parser generators is that it directly generates an Abstract Syntax Tree from the language rather than using parser actions.
When the tameparse tool is run, it will produce a header file and a C++ file. These should be compiled and linked against the TameParse library, which contains the implementation of the LR algorithm and the basic AST data structures.
For every root nonterminal called '', the header will contain a definition 'create_parser_Root'. This has a few definitions making it possible to receive input symbols from a variety of sources. The simplest to use just takes a standard C++ istream. It is used like this (substituting my_language and Root for the language and root nonterminal names respectively):
my_language::state* parser_state = my_language::create_parser_Root(stdin);
Calling the parse method will run the parser and generate the AST:
bool success = state->parse();
If success is false, then an error occurred and the parser is stopped at the point where the problem was detected. The lexeme which was rejected is available by calling state->look().item()
. Lexemes have full information on their location within the stream for error reporting purposes, and the parser's full state is still available at this point, making it possible to manually recover it to a state where it can parse the remainder of the stream.
If successful, the AST will be available. This is of type my_language::Root_n
, and can be retrieved by calling state->get_item().item()
:
const my_language::Root_n* root = static_cast<(const my_language::Root_n*)>(state->get_item().item());
(A wrapper class is yet to be written to make the cast unnecessary: this will be simplified in the future)
Classes generated for nonterminals have the _n suffix to avoid name clashes. The Root_n class is generated according to the language structure. A rule definition like this:
<Root> = identifier
will thus contain an identifier
field. Similarly for nonterminals:
<Root> = identifier '=' <Expression>
contains an identifier
and a Expression
field. By default, fields are named after the item they represent, but more meaningful names can be assigned with the [] syntax:
<Root> = identifier[name] '=' <Expression>[calculation]
giving a name
and calculation
field instead. The calculation
field will be another nonterminal, this time of type my_language::Expression_n
. This works identically to the Root_n
class, so it's possible to traverse the tree using the visitor pattern, or address particular parts using C++ syntax (for instance, root->calculation->left_hand_side
)
The final piece is lexemes. The identifier, name
is of type my_langauge::identifier_n
. The lexeme that was matched can be retrieved from this by calling root->name->get_lexeme()
. The identifier string itself can be retrieved from this, along with details of where it was located in the source file:
auto lexeme = root->name->get_lexeme();
std::string identifierText = root->name->content<char>(); // Convenience
std::wstring alsoIdentifierText = lexeme->content<wchar_t>(); // Same as above, but via the lexeme and in unicode
auto where = lexeme->pos(); // Gives line, column and character offset where this lexeme was matched