Physical Phase 0 Level B - adesutherland/CREXX GitHub Wiki
Page Status: This page is work in progress, incomplete, inconsistent and full of errors ... read no further!
In this phase the grammars are implemented by a RE2C based lexer (stage 1) drives the Lemon LALR parser (stage 2), finally the intermediate AST tree is fixed up in stage 3.
Key Principles are:
- Parsing Errors are added to the AST tree for later reporting. This approach is designed to ensure that messages can be tailored and made as useful as possible
- The lexer will be a stateless as possible, and simply and quickly tokenise the stream. However, it does use state to handle recursive comments.
- The parser is LALR with only a single token lookahead, fast but with limitations. We will create a intermediate AST tree that can be corrected in phase 3.
Processing of REXXLEVEL
and REXXOPTION
will be done by RE2C code in Stage 1.
The will set the required lexer and parser options but also populate the AST tree
under the REXX node appropriately.
For Level C REXXLEVEL
and REXXOPTION
are not used, in this case Stage 0 will
populate the the AST tree with the appropriate REXX node.
- This stage needs to strip white space and comments, and tokenize keywords and symbols.
- In the specification CAPITALISED rules and strings need to take into account whitespace therefore these are processed before the white space is removed.
- As keywords are reserved in Level B these can be processed before symbols.
- RE2C can handle case insensitive strings.
- Token char position can be used to detect &| (Abuttal) which is handled in Stage 3
Although the language specification defines the AST node types that are required, it is intentionally silent on what lexer token may be needed within the parser infrastructure. This is because different implementation technologies and algorithms will require different approaches. This section details the lexer tokens for this implementation.
- KW_ADDRESS
- KW_ARG
- KW_BY
- KW_CALL
- KW_DO
- KW_ELSE
- KW_END
- KW_FOR
- KW_IF
- KW_ITERATE
- KW_LEAVE
- KW_NOP
- KW_OTHERWISE
- KW_PARSE
- KW_PULL
- KW_PROCEDURE
- KW_RETURN
- KW_REXXLEVEL
- KW_REXXOPTION
- KW_SAY
- KW_THEN
- KW_TO
- KW_WHEN
- KW_UPPER
- OP_OR('|' / '&&')
- OP_AND('&')
- OP_EQ('=')
- OP_NEQ('=', '<>' / '><')
- OP_GT('>')
- OP_LT('<')
- OP_GTE('>=' / '<')
- OP_LTE('<=' / '>')
- OP_S_EQ('==')
- OP_S_NEQ('==')
- OP_S_GT('>>')
- OP_S_LT('<<')
- OP_S_GTE('>>=' / '<<')
- OP_S_LTE('<<=' / '>>')
- OP_CONCAT('||')
- OP_PLUS('+')
- OP_MINUS('-')
- OP_MULT('*')
- OP_DIV('/')
- OP_MODULO('mod' / '//')
- OP_IDIV('idiv' / '%')
- OP_POWER('**')
- OP_NOT('')
These are NOT sent to the Stage 2 Parser.
- SY_COMMENT
- SY_CONTINUATION
- SY_WHITESPACE
- SY_EOS (End of Stream/File)
- SY_EOC (End of Clause ';' or EOL)
- SY_STRING
- SY_NUMBER (Integer only in Phase 0)
- SY_CONST_SYMBOL
- SY_VAR_SYMBOL
- SY_LABEL
- SY_COMMA
- SY_STOP(.)
- SY_CLOSE_BRACKET
- SY_OPEN_BRACKET
- This stage used the Lemon generated parser to parse the token stream into an initial intermediate AST tree for final processing in stage 3.
- The Lemon Parser features for error resyncing and token fallback are expected to assist parsing efficiency.
Key Responsibilities of this stage are:
- Validate Procedures are in the right place and reorg AST
- Error message fixes:
- 31.2 -> 31.3
- Constant Table Build
- Symbol Table Build and Validation
- Expression Type Safety
- Iterate / Leave check that are in Loops only
- If / Then / Else Fixup
- Jumps (Signal) validation
- Error Message Tuning
- Abuttal operator fixup
- Other LALR issues fixup
After Stage 3 we can move to stage 5 (Assembler Production)
In Phase 1+ we will have stage 4 (Optimisation)