REXX Assembler (RXAS) - adesutherland/CREXX GitHub Wiki

REXX Assembler Specification

Overview

[Work in Progress]

Additional details of the RXVM virtual machine

Features Required to Support cREXX Languages

The following table maps REXX features to RXAS capabilities to explain the motivation for RXAS capabilities and approach for implementing REXX features. Details of the REXX Features themselves are documented in the REXX specifications, and more details of the RXAS capabilities follow.

REXX Features RXAS Capabilities Implementation Approach Available
Program Flow Branches Generate loops in RX Yes
Simple Variables Local Registers Yes
STEM Variables See Classes STEMs implemented as class N/A
PROCEDURES Procedures Yes
SIGNAL / Labels branches / labels may need duplicate code Yes
Static EXPOSE Pass by Reference args EXPOSED vars by reference Yes
Dynamic EXPOSE No Additional Pool implemented in REXX N/A
VALUE() No Additional Pool implemented in REXX N/A
INTERPRET Dynamically created rxbin Complex[^1] No
REXXb Classes Regs contain sub-regs [^2] No
REXXc Singleton Classes global regs [^3] Yes
REXXb Interface Regs hold function pointer No
Dynamic / Late binding footnote by REXXb class(s) No
Arbitrary Precision Maths No Additional in REXXb class(s) N/A
ADDRESS Platform specific No
External Functions Platform specific [^4] No
Error Reporting Annotation and debug pool [^5] No

[^1]: Version 1 will require the compiler / assembler linked into the runtime. Version 2 (REXX on REXX) will require advanced REXX parsing. [^2]: Class Attributes (always private) are stored in sub-registers. Member functions are statically linked and use a naming convention "class.member". The first argument to the member is the object [^3]: Singleton Classes replace static classes / members in other languages. The singleton is stored in a global register and exposed with a naming convention "class.@1". Note that "@" cannot be used from REXX programs directly [^4]: Dynamic discovery and linking, call-backs to access variable pool. Classic REXX implementation will need to ensure all required variables are available in the variable pool [^5]: Source line information stored in a "debug" pool to allow source line error reporting. Run time error conditions (signals) call REXX Exit functions

RXAS Source Syntax and Structure

Structure

  1. Global Variable definition or Declaration
  2. Procedure Definition or Declaration (repeated)

Comments

/* Block Comment */

* Line Comment

Instructions

These have the format OP_CODE [ARG1[,ARG2[,ARG3]]] where each argument can be a

  • Register - e.g. r0...1 (for locals), a0...n (for arguments) or g0...n (for globals)
  • String - e.g. "hello"
  • Integer - e.g. 5
  • Float - e.g. 5.0
  • Function - e.g. proc()
  • Identifiers / Label - e.g. label1

Directives

RXAS supports the following directives.

Directive Description
.globals = {INT} Defines the number of globals defined for the file
.locals = {INT} Defines the number of local registers in a procedure
.expose = {ID} Defines the exposed index of a global register [^6]

[^6]: or Procedure. These are used for linking between files/modules

File Scope Global Registers

.globals={int}

Defines {int} global variable g0 ... gn. These can be used within any procedure in the file.

Global Registers

Any global register marked as exposed is available to any file which also has the corresponding exposed index/name.

File 1

.globals=2            * 2 Global Registers
g0 .expose=namespace.var_name   * 

File 2

.globals=3            * 3 Global Registers
g2 .expose=namespace.var_name   * 

In this case file 1 g0 is mapped to file 2 g2 under the index/name of "namespace.var_name".

File Scope Procedure

The locals define how many local registers, r0 to r(locals-1), are needed by the procedure.

* The ".locals" shows the procedure is defined in here
file_scope_proc() .locals=3
...
ret

Global Procedures

Global Procedures can be called between file/modules.

File 1

* The ".locals" shows the procedure is defined in here
proc() .locals=3 .expose=namespace.proc
ret

File 2

* No ".locals" here! Showing that the procedure is only being declared 
rproc() .expose=namespace.proc

main() .locals=3 
call rproc()
ret

In this case main() in File 2, calls rproc() which is globally provided under the index/name of "namespace.proc". In File 1, proc() is exposed under this name and hence called from File 1.

Note: that "namespace" hints at the use of namespaces as part of exposed names; this facility is used by the compiler to define classes.

Also, as shown names can be mapped - they don't have to be the same in the source and in the target.

RXAS Capabilities (alphabetical)

Branching and Labels

Within a procedure labels can be defined as branch targets. Conditional and unconditional branch instructions can target these labels. The following example shows a loop structure.

   ...                   * Code before loop
l75:                     * Loop start label
   igt r0,r1,r4          * Does a integer compare of r1 and r4 - puts true (1) or false (0) into r0
   brt l37,r0            * Branch if true to l37 (i.e. branch out of the loop)
   ...                   * Instructions in the loop
   inc r1                * Increment r1 (the loop counter)
   br l75                * Unconditional Branch to the start of the loop
l37:                     * Loop end Label 
   ...                   * Instructions following the loop

Code Annotation and Debug Pool

NOT IMPLEMENTED

cREXX needs to support appropriate error messages (including source line number / text), breakpoints, and tracing. RXAS directives (.file and .line) allow the source file, source line to be defined. The .clause directive allow clause boundaries to be defined.

.file = "testfile.rexx"   * Source file name 
proc()   .locals=3
   .regname r1,a          * Maps r1 to an id for debugging purposes 
   .regname r2,b
   .line 9 "a=5; b=6"     * The line number and source string
   .clause                * REXX clause boundary
   load r1,5
   .clause
   load r2,6
   .line 10 "say a+b"
   .clause
   iadd r0,r1,r2
   itos r0
   say r0
   .line 11 "return"
   .clause
   ret

The directives are processed at "build time" and the debug constant pool is created which allow assembler instructions to be mapped back to source lines. In this way error messages can be generated as if the source file was being interpreted "classically" but with no run-time overhead.

In addition, tracing/debugging is implemented by the VM machine using the clause boundaries stored in the debug pool. The VM can set a breakpoint by replacing the instruction at the appropriate address with a breakpoint instruction. When the breakpoint is reached it uses the clause boundary information to determine where the next breakpoint should be set. Tools can be made available to allow a REXX programmer to set a breakpoint at a REXX source code line number.

The debug pool also contains the information to display the rexx variable name stored in a register.

Note that accessing debug information is a significant overhead as the debug pool will need repeated searching, and will only be used for debugging/tracing (or creating an error message) where performance is not critical. The reason this approach is used is that when there is no debugging in action there is no runtime performance overhead at all (obviously the size of the rxbin binary file is made larger with the debug pool).

Constant Pool

Each File/Module has a constant pool that stores:

  • String Constants
  • Procedure Details
  • PTable information (mapping class to interface procedures for a class and objects)

Dynamic Access to Registers (including Arguments)

Dynamic access to a register is enabled by additional members of the link family of instructions these allow a register to point to the same value as a dynmically number primary register.

alink secondary_reg, arg_reg_num * Links secondary_reg to the argument register with number stored in the int value of arg_reg_num 
glink secondary_reg, global_reg_num * Links secondary_reg to the global register with number stored in the int value of global_reg_num 

Dynamic Type Instructions

The compiler will be able to manage which registers have what values most of the time but there will be certain dynamic situations where the value type or status is not known. To handle this the compiler use the registers type flag:

gettp - gets the register type flag (op1 = op2.typeflag)
settp - sets the register type flag (op1.typeflag = op2)
setortp - or the register type flag (op1.typeflag = op1.typeflag || op2)
brtpt - if op2.typeflag true then goto op1
brtpandt - if op2.typeflag && op3 true then goto op1

The typeflag is a 64bit integer and its usage is defined by convention only see cREXX Calling Convention.

Dynamic Procedure Pointers

This capability is to support interfaces. Where the compiler knows the object's class it can link statically to the correct member by using the procedure name (i.e. class_name.member_name(), however when the object is only known to implement an interface (i.e. its class is not known) then the VM needs to dynamically link interface members to the object's class specific implementation.

Each register, that contains an object whose call implements an interface, has a pointer to a entry in the constant pool. This entry allows the the interface name and member number to be searched at runtime, returning its implementation procedure pointer. This is known as the register's static ptable. The static ptable also stores the name of the objects class.

Note: Where an object members are dynamically assigned at runtime (not an immediately required capability) the dynamic mapping from member name to procedure pointer will be done within the REXX runtime library (i.e. not applicable to RXAS).

A directive defines the entry

.ptable class_name interface1_name(impl1_1(), impl1_2(), ...) interface2_name(impl2_1() ...) ...

This creates the entry into the constant pool with links to the procedure implementing an interfaces members #1,#2 etc.

The object is linked to the entry with an instruction:

setptable r1,class_name

This sets the register r1 ptable to the entry "class_name" in the constant pool.

Finally the entry can be used at runtime:

srcptable r2,"interface_name",3

In this example r2 is a class instance (object) implementing interface "interface_name". This instruction looks up the object's procedure implementing member #3 of interface "interface_name" and sets the procedure pointer of r2 to this. Then

dyncall r0, r2, r3

calls the procedure in r2, with arguments from r3 and puts the result in r0.

External Functions

Injecting Dynamic Code

Instructions

Type coding

Instructions have prefix to determine type: s=string, i=integer, f=float, o=object

Maths

As an example, the add family will have

  • iadd reg,reg,reg
  • fadd reg,reg,reg
  • etc.

Each function just uses the corresponding registers value (int, float, etc).

Load

  • [s/i/f/d]load - lost loads the corresponding type value only
  • load (i.e. with no prefix) copies all values and the type flag to the target register

Conversion

Converting means setting a value for a type based on the value on another type in the same register, e.g.

  • itos reg - sets the string value to the string representation of the integer value of reg
  • ftos reg
  • stof reg - This converts the string to a float, or triggers a signal if it can't

Note: this replaces prime/master.

SAY / ADDRESS etc.

Where an instruction needs a string it will only have a string "version". For clarity we will have

  • say
  • address

but there will not be a isay etc. Instead the compiler might need to do a "itos" first.

Procedures and Arguments

A procedures registers are independent to the caller's registers. What happens is that the VM maps its registers to the registers in the caller.

Each time a procedure/function is called a new "stack frame" is provided. This means that the called function has its own set of registers.

The function header defines how many registers (called locals) the function can access - for practical purposes we can consider that any number of registers can be assigned to a function.

In addition, each file defines a number of global registers that can be shared between procedures.

In a function with 'a' arguments, 'n' locals, and 'm' globals:

  • R0 ... R(n-1) - are local registers to be used by the function
  • R(n) ... R(n+m-1) - are the global registers, i.e. g(0) ... g(m-1)
  • R(n+m) - holds the number of arguments (a)
  • R(n+m+1) ... R(n+m+a) holds the arguments, i.e. a(1) ... a(a)

This ordering allows a dynamic numbers of arguments.

cREXX Calling Convention

All arguments within RXAS are pass by reference, therefore arguments needs to be copied to another register if pass by reference is not wanted. This approach is a way to support moves rather than copies - especially important to avoid slow object and string copies.

It is mandatory to use this calling convention between REXX and RXAS procedures. Although not necessary, it is recommended to also use this convention between RXAS procedures.

In this convention the caller is responsible for setting argument registers' typeflag. This is used to indicate if an optional argument is present, and if a pass-by-value string or object argument needs preserving.

The callee (procedure) is responsible for applying default values for optional arguments, and for ensuring that pass-by-value arguments are kept constant (so they are not changed, effecting the caller logic) if required. The callee uses the typeflag for this.

Register Type Flag Byte Values

The register typeflag is used to optimise function arguments.

  • Bit 1 - REGTP_VAL - ONLY used for optional arguments; setting (1) means the register has a specified value

  • Bit 2 - REGTP_NOTSYM - ONLY used for "pass be value" and ONLY large (strings, objects) registers; setting (2) means that it is not a symbol so does not need copying as, even if it is changed, the caller will not use its original value. Note: Small registers (int, float) are always copied as this is faster than setting and checking this flag; the REGTP_NOTSYM flag is not set or read for integers and floats.

The following examples demonstrate the calling convention.

Basic Call by Reference

REXX Program

/* Test Basic Call by Reference */
options levelb

a = 4
call func a
say a

func: procedure = .int
  arg expose x = .int
  x = x * 2
  return 0

Annotated Generated RXAS

/*
 * cREXX COMPILER VERSION : cREXX F0034
 * SOURCE                 : scratch
 * BUILT                  : 2021-10-02 17:00:02
 */

.globals=0

main() .locals=5 .expose=scratch.main
   * Line 3: a = 4
   load r1,4
   * Line 4: call func a
   * Line 4: func a
   load r2,1                 * Indicated that there is one Argument
   swap r3,r1                * Moves the register for the call (i.e. after r2)
   call r4,func(),r2
   swap r1,r3                * Moves the register back
   * Line 5: say a
   itos r1
   say r1
   * Line 5: 
   ret 0

func() .locals=1 .expose=scratch.func
   * Line 8: x = .int
   * Line 9: x = x * 2
   imult a1,a1,2             * Pass by reference - so nothing extra needs to be done     
   * Line 10: return 0
   ret 0

Optional Call by Reference

REXX Program

/* Test Optional Call by Reference */
options levelb

a = 4
call func a                  /* Optional argument specified */
say a

call func                    /* Optional argument not specified */

func: procedure = .int
  arg expose x = 0           /* Expose makes the argument as pass by reference */
  x = x * 2
  return 0

Annotated Generated RXAS

/*
 * cREXX COMPILER VERSION : cREXX F0034
 * SOURCE                 : scratch
 * BUILT                  : 2021-10-02 17:41:44
 */

.globals=0

main() .locals=6 .expose=scratch.main
   * Line 3: a = 4
   load r1,4
   * Line 4: call func a
   * Line 4: func a
   load r2,1
   imult a1,a1,2
   settp r1,1                * Set the typeflag to REGTP_VAL (1) indicating the value is specified 
   swap r3,r1                * The swap of registers also included the flags - the compiler does it this way round
   call r4,func(),r2
   swap r1,r3
   * Line 5: say a
   itos r1
   say r1
   * Line 7: call func
   * Line 7: func
   load r2,1
   settp r3,0                * Set the typeflag to 0 - resetting REGTP_VAL - indicating the value is NOT specified
                             * Note the compiler does not need to do a swap as it sets the flag on the r3 register
   call r5,func(),r2
   * Line 7: 
   ret 0

func() .locals=1 .expose=scratch.func
   * Line 10: x = 0          * Branches on the REGTP_VAL (1) bit - if it is not set the default value is used
   brtpandt l15a,a1,1
   load a1,0
l15a:
   * Line 11: x = x * 2
   imult a1,a1,2
   * Line 12: return 0
   ret 0

Call By Value Integer and Optimisations

In this example, REGTP_NOTSYM is not used as the parameter is an integer.

REXX Program

/* Test Call By Value Integer and Optimisations */
options levelb

a = 4
call func a                  /* Call function with a variable */
a = a + 1                    /* The variable is not a constant - it needs preserving and should not be turned to a constant */
say a

call func                    /* Optional argument not specified */

call func 4                  /* Call function with a constant */

call func2 4                 /* Call function func2 with a constant */

func: procedure = .int
  arg x = 0                  /* Pass by value */
  x = x * 2                  /* Argument is changed - the called must not be impacted by this modification */
  return 0

func2: procedure = .int      /* In this function x is a constant */
  arg x = 0
  return 0

Annotated Generated RXAS

/*
 * cREXX COMPILER VERSION : cREXX F0034
 * SOURCE                 : scratch
 * BUILT                  : 2021-10-02 20:59:44
 */

.globals=0

main() .locals=7 .expose=scratch.main
   * Line 3: a = 4
   load r1,4
   * Line 4: call func a
   * Line 4: func a
   load r2,1
   settp r1,1                * Set the REGTP_VAL flag as the value is present
   swap r3,r1
   call r4,func(),r2
   swap r1,r3
   * Line 5: a = a + 1
   iadd r1,r1,1
   * Line 6: say a
   itos r1
   say r1
   * Line 8: call func
   * Line 8: func
   load r2,1
   settp r3,0                * Reset the REGTP_VAL flag as the value is NOT present
   call r5,func(),r2
   * Line 10: call func 4
   * Line 10: func 4
   load r2,1
   load r3,4
   settp r3,1                * Set the REGTP_VAL flag as the value is present. Note that REGTP_NOTSYM is not used
   call r6,func(),r2
   * Line 12: call func2 4   * Call function 2 the same way
   * Line 12: func2 4
   load r2,1
   load r3,4
   settp r3,1
   call r7,func2(),r2
   * Line 12: 
   ret 0

func() .locals=2 .expose=scratch.func
   * Line 13: x = 0
   brtpandt l23a,a1,1        * Check REGTP_VAL
   load r1,0                 * Default Value
   br l23b
l23a:
   icopy r1,a1               * Always copy the integer value (fast) to protect the caller from the change to x
l23b:
   * Line 14: x = x * 2
   imult r1,r1,2
   * Line 15: return 0
   ret 0

func2() .locals=1 .expose=scratch.func2
   * Line 20: x = 0
   brtpandt l39a,a1,1
   load a1,0
l39a:                        * In func2 there is no need to copy x as it is not changed anyway
   * Line 21: return 0
   ret 0

Call By Value Strings and Optimisations (optional arguments)

In this example, REGTP_NOTSYM is used as the parameter is an string in optional arguments.

REXX Program

call func a                  /* Call function with a variable */

/* Test Call By Value Strings and Optimisations (optional arguments) */
options levelb

a = "Peter"
call func a
a = a "and René"             /* Make sure 'a' is not a constant */
say a "were here"

call func

call func "Mike"

call func2 "Mike"

func: procedure = .int
  arg x = "Adrian"
  x = x "likes cREXX level B"
  say x
  return 0

func2: procedure = .int
  arg x = "Adrian"
  say x "in func 2"
  return 0

Annotated Generated RXAS

/*
 * cREXX COMPILER VERSION : cREXX F0034
 * SOURCE                 : scratch
 * BUILT                  : 2021-10-03 12:56:52
 */

.globals=0

main() .locals=8 .expose=scratch.main
   * Line 3: a = "Peter"
   load r1,"Peter"
   * Line 4: call func a
   * Line 4: func a
   load r2,1
   settp r1,1                * Set the REGTP_VAL flag as the value is present
   swap r3,r1
   call r4,func(),r2
   swap r1,r3
   * Line 5: a = a "and René"
   sconcat r1,r1,"and René"  
   * Line 6: say a "were here"
   sconcat r3,r1,"were here"
   say r3
   * Line 8: call func
   * Line 8: func
   load r2,1
   settp r3,2                * This resets REGTP_VAL as there is no specified argument and 
                             * sets REGTP_NOTSYM flag as the argument is not a variable.
                             * Setting REGTP_NOTSYM is correct but arguably pedantic and
                             * unnecessary but this is what the compiler does
   call r5,func(),r2
   * Line 10: call func "Mike"
   * Line 10: func "Mike"
   load r2,1
   load r3,"Mike"
   settp r3,3                * This sets both REGTP_VAL (there is an argument) and REGTP_NOTSYM (it is not a symbol)
   call r6,func(),r2
   * Line 12: call func2 "Mike"
   * Line 12: func2 "Mike"
   load r2,1                 * Calls func2() in the same way
   load r3,"Mike"
   settp r3,3                * This sets both REGTP_VAL and REGTP_NOTSYM
   call r7,func2(),r2
   * Line 12: 
   ret 0

func() .locals=2 .expose=scratch.func
   * Line 15: x = "Adrian"
   brtpandt l28a,a1,1        * Check REGTP_VAL
   load r1,"Adrian"          * Set default Value
   br l28d
l28a:
   brtpandt l28c,a1,2        * Check REGTP_NOTSYM (unless the default value has been used)
   scopy r1,a1               * Have to do a string copy
   br l28d
l28c:
   swap r1,a1                * Can just to a quick swap to get the register number right
l28d:
   * Line 16: x = x "likes cREXX level B"
   sconcat r1,r1,"likes cREXX level B"
   * Line 17: say x
   say r1
   * Line 18: return 0
   ret 0

func2() .locals=2 .expose=scratch.func2
   * Line 21: x = "Adrian"
   brtpandt l43a,a1,1        * Check REGTP_VAL
   load a1,"Adrian"
l43a:                        * Because x is a constant there is no need to worry about REGTP_NOTSYM or to copy the string
   * Line 22: say x "in func 2"
   sconcat r1,a1,"in func 2"
   say r1
   * Line 23: return 0
   ret 0

Arbitrary number of arguments with ...

TO BE IMPLEMENTED (requires arrays)

EXPOSE

TO BE IMPLEMENTED

Syntax candy to provide familiar (but not the same) EXPOSE experience for REXX programmers.

In this example:

exp = 100

say test( "hello")
exit

test: procedure = .string expose exp = .int
  arg message = .string
  return message || exp

Is converted by the compiler to:

exp = 100

say test(exp, "hello")
exit

test: procedure = .string 
  arg expose exp = .int, message = .string
  return message || exp

This is designed to provide a familiar (but not the same) experience for REXX programmers

Procedure Lookup Tables

Registers

Register Data

Each register holds 5 values - String, float, integer and object, and a type flag which is used to indicate which values are valid. Note that arbitrary position maths is handled as objects.

In most cases it is for the compiler to decide what values to use, and how/if to set the type flag. Only a few dynamic scenarios will need some extended functions (see following). The type values are 0=unset, 1=string, 2=float, 4=integer, 8=object, 16=interface. Each register can have multiple types set as valid. The compiler sets the valid types explicitly with instructions - this is not an automatic runtime capability. At runtime the VM has no need nor the ability to validate data. Any caching has to be achieved by compiler logic.

An Object has an pointer to its static ptable entry in the constant pool as well as an array of sub-registers. These sub-registers are the private attributes of the object.

Register Initialisation

All registers are initialized on entry to a procedure on the register "stack". The rationale is that all the memory can be malloced at once which is faster/safer. The pointers to globals and arguments are also setup.

In addition a shadow set of pointers to the procedure's registers are setup. These are used by the unlink instruction, they holding the base/initial register pointer, and unlink sets the register pointer to the pointer held in the respective shadow value.

Registers hold references to their parent/owner for memory freeing purposes. The owner can change, for example the owner of a returned register is set to the caller. When an object, stack frame, global pool is being deleted the registers are also freed/deleted if the registers owner is the same as the container being deleted.

Register Re-Mapping Facilities

There are a few scenarios where the contents of a register is needed in another register number: a call requires the arguments to be in consecutive registers, object sub-registers need to be copied to registers, or accessing an arbitrary argument registers (i.e. when the number of arguments is only known at runtime).

Copying the contents of registers to achieve this would be slow (and for large strings or objects, very slow). Also it is inconsistent because an integer copy and an object pointer copy (which is a copy by reference) behave differently (integers become independent but a change to the object changes it in each "copy").

We provide 2 facilities to allow very fast and safe register moves:

  • SWAP. This swaps two registers. This is very fast as it requires only 6 pointer copies (3 swapping the two register pointers and 3 swapping the two shadow pointers). It allows the programmer to swap two registers (arguments, globals, registers) to get the register into a convenient register number (perhaps for a call). Doing the swap again restores the register numbers.

  • LINK/UNLINK. The link instruction makes two register numbers point to the same register (one is primary, the other secondary). Unlink makes the secondary register revert back to it original state by using the shadow register pointer. Each instruction only requires one pointer copy.

The behaviour should normally be quite simple - however swapping or linking already linked or swapped registers may cause complex outcomes. Developers should consider the above behaviour descriptions to untangle this!

Runtime Error Conditions and Exit Functions

Shell Instructions

Sub-Registers

Super Instructions

Once we see what the code generated by the compiler looks like we may combine some of these instructions in to super instructions for performance reasons.

Variable Pool in REXX

In the current code for the VM we have registers which point to a variable structure (that can contain string, integers etc). These can be considered to be “anonymous variables” – the compiler will assign a register to hold a variable at compile time.

In this way “80%” of the needs for REXX programs will be handled – but not all. Some aspects of REXX needs dynamic variable names – e.g. Some EXPOSE scenarios, INTERPRET, VALUE, and the REXXCOMM / SAA API. When these are needed the compiler needs to work in what I am calling “Pedantic” mode. In this mode variables are also given a string index in a variable pool, this index can be searched for dynamically at runtime.

Each stack frame will also include its own variable pool. This is a name/value index (via a HASH or TREE) whereby a variable can be found via the index string. This will be implemented in REXXb.

  • When a procedure exits and the stack frame is torn down, the corresponding variable pool and variables also need freeing.
  • To facilitate EXPOSE, a variable pool index can be linked to a parent pool variable.
  • A register can point to an anonymous variable (as implemented today in the code) or mapped to a variable in the variable pool.
⚠️ **GitHub.com Fallback** ⚠️