Logical AST Specification - adesutherland/CREXX GitHub Wiki

CREXX Logical AST Specification

Page Status: This page is ready for review for Phase 0 (PoC). It may have errors and be changed based on feedback and implementation experience

The AST specification covers all phases of the project and all REXX levels, this means that a single language processor (compiler backend) will be able to be used across the project.

Obviously the initial specification will only handle Phase 0 scope and language levels A and B but the intention is that this will be extended (where possible without breaking existing code - "backwards compatible")

The AST notation is described here, essentially:

(root child1 child2 child3) - simple tree with 3 children
(root child1 (subroot subchild1 subshild2) child3) - a tree with a subtree

AST Node List

Node Type Description Terminal
ABS_POS Absolute Pos for Parsing No
ADDRESS Address Instruction No
ARG Argument input for Parsing Yes
ASSIGN Assign Instruction No
BY BY part of REPEAT part of DO No
CALL Call Instruction No
CONST_SYMBOL Constant Symbol Yes
DO Do Instruction No
ENVIRONMENT Environment for Address Yes
ERROR Error Marker No
FOR FOR part of REPEAT part of DO No
FUNCTION Function No
IF If Instruction No
INSTRUCTIONS Instruction List No
ITERATE Iterate Instruction No
LABEL Label Yes
LEAVE Leave Instruction No
NUMBER Number Yes
OP_ADD Add/Subtraction Op No
OP_AND And Op No
OP_COMPARE Compare Op No
OP_CONCAT Concat Op No
OP_MULT Multiply/divide OP No
OP_OR Or Op No
OP_POWER Power Op No
OP_PREFIX Prefix Op (+, -, ) No
OP_SCONCAT Concat with space Op No
OPTIONS Options for Parsing No
PARSE Parse Instruction No
PATTERN Pattern for Parsing No
PROCEDURE Procedure No
PROGRAM_FILE AST Root for a file No
PULL Pull input for Parsing Yes
REL_POS Relative Pos for Parsing No
REPEAT Repeat part of DO loop No
RETURN Return Instruction No
REXX Language Level and Options No
SAY Say Instruction No
SIGN Rel pos direction for Parsing Yes
STRING String Yes
TARGET Target for Parsing No
TEMPLATES Template List for Parsing No
TO TO part of REPEAT part of DO No
TOKEN Generic token Yes
UPPER Upper Option for Parsing Yes
VAR_SYMBOL Variable Yes

Format of Non-Terminal Nodes

File Scope

(PROGRAM_FILE REXX INSTRUCTIONS?) 

Language Options

(REXX level:CONST_SYMBOL options:CONST_SYMBOL*)

Error

(ERROR TOKEN+)

This node is not a actual token instead it is inserted into token stream, the actual offending TOKEN(s) are added as children

Instructions

(INSTRUCTIONS instruction*)

Where instruction is one of:

  • ADDRESS, ASSIGN, CALL, DO, IF, INSTRUCTIONS, ITERATE, LABEL, LEAVE, PARSE, PROCEDURE, RETURN, SAY

Expressions

Expression (expr) nodes are one of:

expr <-
  (OP_OR expr expr) /
  (OP_AND expr expr) /
  (OP_COMPARE expr expr) /
  (OP_CONCAT expr expr) /
  (OP_SCONCAT expr expr) /
  (OP_ADD expr expr) /
  (OP_MULT expr expr) /
  (OP_POWER expr expr) /
  (OP_PREFIX expr) /
  (FUNCTION expr*) /
  CONST_SYMBOL /
  VAR_SYMBOL /
  NUMBER /
  STRING

Address

(ADDRESS ENVIRONMENT? expr?)

Assignment

(ASSIGN VAR_SYMBOL expr)

Call

(CALL CONST_SYMBOL expr*)

Do

(DO (REPEAT assignment (TO expr)? (BY expr)? (FOR expr)?) instructions*);

Note that simple DO / END maps to (INSTRUCTIONS instruction*)

If

(IF expr true:INSTRUCTIONS false:INSTRUCTIONS?)

Iterate

(ITERATE VAR_SYMBOL?)

Label

LABEL

Leave

(LEAVE VAR_SYMBOL?)

Parse

(PARSE (OPTIONS UPPER?) in (TEMPLATES template+)

in <- ARG / PULL;
template <- target / pattern / abs_pos / rel_pos;
target <- (TARGET VAR_SYMBOL?);
pattern <- (PATTERN STRING/VAR_SYMBOL);
abs_pos <- (ABS_POS NUMBER/VAR_SYMBOL);
rel_pos <- (REL_POS SIGN NUMBER/VAR_SYMBOL);

Procedure

(PROCEDURE LABEL instructions);

Return

(RETURN expr?)

Say

(SAY expr?)