CREXX Physical Architecture - Phase 0 (Bootstrap) Level B

Page Status: This page is work in progress, incomplete, inconsistent and full of errors ... read no further!

In this phase the grammars are implemented by a RE2C based lexer (stage 1) drives the Lemon LALR parser (stage 2), finally the intermediate AST tree is fixed up in stage 3.

Key Principles are:

Parsing Errors are added to the AST tree for later reporting. This approach is designed to ensure that messages can be tailored and made as useful as possible
The lexer will be a stateless as possible, and simply and quickly tokenise the stream. However, it does use state to handle recursive comments.
The parser is LALR with only a single token lookahead, fast but with limitations. We will create a intermediate AST tree that can be corrected in phase 3.

REXX Level B

Stage 0 - Language level & Options Processing

Processing of REXXLEVEL and REXXOPTION will be done by RE2C code in Stage 1. The will set the required lexer and parser options but also populate the AST tree under the REXX node appropriately.

For Level C REXXLEVEL and REXXOPTION are not used, in this case Stage 0 will populate the the AST tree with the appropriate REXX node.

Stage 1 - Lexer (C / RE2C)

This stage needs to strip white space and comments, and tokenize keywords and symbols.
In the specification CAPITALISED rules and strings need to take into account whitespace therefore these are processed before the white space is removed.
As keywords are reserved in Level B these can be processed before symbols.
RE2C can handle case insensitive strings.
Token char position can be used to detect &| (Abuttal) which is handled in Stage 3

Lexer Tokens

Although the language specification defines the AST node types that are required, it is intentionally silent on what lexer token may be needed within the parser infrastructure. This is because different implementation technologies and algorithms will require different approaches. This section details the lexer tokens for this implementation.

Keywords

KW_ADDRESS
KW_ARG
KW_BY
KW_CALL
KW_DO
KW_ELSE
KW_END
KW_FOR
KW_IF
KW_ITERATE
KW_LEAVE
KW_NOP
KW_OTHERWISE
KW_PARSE
KW_PULL
KW_PROCEDURE
KW_RETURN
KW_REXXLEVEL
KW_REXXOPTION
KW_SAY
KW_THEN
KW_TO
KW_WHEN
KW_UPPER

Operators

OP_OR('|' / '&&')
OP_AND('&')
OP_EQ('=')
OP_NEQ('=', '<>' / '><')
OP_GT('>')
OP_LT('<')
OP_GTE('>=' / '<')
OP_LTE('<=' / '>')
OP_S_EQ('==')
OP_S_NEQ('==')
OP_S_GT('>>')
OP_S_LT('<<')
OP_S_GTE('>>=' / '<<')
OP_S_LTE('<<=' / '>>')
OP_CONCAT('||')
OP_PLUS('+')
OP_MINUS('-')
OP_MULT('*')
OP_DIV('/')
OP_MODULO('mod' / '//')
OP_IDIV('idiv' / '%')
OP_POWER('**')
OP_NOT('')

Whitespace & Comments

These are NOT sent to the Stage 2 Parser.

SY_COMMENT
SY_CONTINUATION
SY_WHITESPACE

Other Symbols

SY_EOS (End of Stream/File)
SY_EOC (End of Clause ';' or EOL)
SY_STRING
SY_NUMBER (Integer only in Phase 0)
SY_CONST_SYMBOL
SY_VAR_SYMBOL
SY_LABEL
SY_COMMA
SY_STOP(.)
SY_CLOSE_BRACKET
SY_OPEN_BRACKET

Stage 2 - Parser (C / Lemon)

This stage used the Lemon generated parser to parse the token stream into an initial intermediate AST tree for final processing in stage 3.
The Lemon Parser features for error resyncing and token fallback are expected to assist parsing efficiency.

Stage 3 - AST Fixup (C)

Key Responsibilities of this stage are:

Validate Procedures are in the right place and reorg AST
Error message fixes:
- 31.2 -> 31.3
Constant Table Build
Symbol Table Build and Validation
Expression Type Safety
Iterate / Leave check that are in Loops only
If / Then / Else Fixup
Jumps (Signal) validation
Error Message Tuning
Abuttal operator fixup
Other LALR issues fixup

Post Stage 3 Processing

After Stage 3 we can move to stage 5 (Assembler Production)

In Phase 1+ we will have stage 4 (Optimisation)

Physical Phase 0 Level B - adesutherland/CREXX GitHub Wiki

CREXX Physical Architecture - Phase 0 (Bootstrap) Level B

REXX Level B

Stage 0 - Language level & Options Processing

Stage 1 - Lexer (C / RE2C)

Lexer Tokens

Keywords

Operators

Whitespace & Comments

Other Symbols

Stage 2 - Parser (C / Lemon)

Stage 3 - AST Fixup (C)

Post Stage 3 Processing

⚠️ GitHub.com Fallback ⚠️

Physical Phase 0 Level B - adesutherland/CREXX GitHub Wiki

CREXX Physical Architecture - Phase 0 (Bootstrap) Level B

REXX Level B

Stage 0 - Language level & Options Processing

Stage 1 - Lexer (C / RE2C)

Lexer Tokens

Keywords

Operators

Whitespace & Comments

Other Symbols

Stage 2 - Parser (C / Lemon)

Stage 3 - AST Fixup (C)

Post Stage 3 Processing

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️