Instruction Set Configuration File - michaelkamprath/bespokeasm GitHub Wiki

Description

The instruction set configuration file defines the instruction set and assembly language features used by BespokeASM to assemble machine code. This file can be written in either JSON or YAML format.

Machine Code Compilation

The purpose of this configuration file is to control how machine code is generated for a given instruction set. BespokeASM uses a fixed method for compiling machine code for any instruction. The standard form of an instruction is:

  MNEMONIC [OPERAND1[, OPERAND2[, ...]]]

Each instruction must have at least a mnemonic, and can optionally have one or more operands.

The machine code generated for an instruction consists of two parts: the instruction byte code and the argument values.

  • Instruction Byte code: Indicates which instruction the CPU should execute. It is composed of values specific to the mnemonic and, optionally, each operand. The total size of the packed byte code (mnemonic plus operands) should match the instruction size of the target hardware.
  • Argument values: These are parameters used by the instruction's microcode, such as immediate values or addresses. If multiple operands provide argument values, they are ordered according to the operand order in the instruction.

Both the instruction mnemonic and its operands can contribute to the byte code, but only operands can generate argument values.

For example, consider this assembly instruction:

  mov a,[$8000]      ; copy value at address $8000 into register A

In this case:

  • The instgruction mnemonic mov, the operand a (the register A), and the operand [...] (an indirect value) each contribute to the instruction's overall byte code.
  • The numeric value $8000 is the argument for the [...] operand and is placed after the instruction byte code in the final machine code.

The diagram below illustrates this. Here the machine code is for computer with an 8-bit data word using little-endian byte order:

    Byte 0    Byte 1   Byte 2
  ========== ======== ========
  01 001 110 00000000 10000000
  -- --- --- -----------------
   |  |   |          |
   |  |   |          +-- The second operand's argument value ($8000) in little-endian byte order
   |  |   +------------- The byte code 110 for the second operand ([...])
   |  +----------------- The byte code 001 for the first operand (register A)
   +-------------------- The byte code 01 for the mov mnemonic

In summary, two types of machine code are generated for each instruction:

  • Instruction Byte code: Specifies the instruction to execute (often used to select a microcode sequence).
  • Argument values: Data that the instruction operates on (e.g., the address for a jump instruction).

Throughout this documentation, "instruction byte code" and "argument value" are used as defined above. In BespokeASM, argument values are always emitted after the instruction's byte code. Operands can affect both the byte code and the argument values.

Data Words and Endianness

Data Words

In BespokeASM, a "word" is a unit of data whose size matches the native data bit size of the target CPU. A word is what a memory address points to. For example:

  • An "8-bit computer" has an 8-bit word size (1 byte).
  • A 32-bit CPU has a 32-bit word size (4 bytes).

Bytecode is represented as a sequence of words, and each word is addressable. For a 32-bit CPU, address 0 points to the first 4-byte word, address 1 to the next, and so on.

The word size of the target CPU is a fundamental configuration in BespokeASM. It is set in the general section of the configuration file. The default word size is 8 bits, but this can be changed as needed.

Word Segments

A word segment is a subdivision of a word, representing the physical ordering of bits within that word. For example, a 16-bit computer might store its 16-bit words so that the first 8 bits are the least significant (little-endian segment order), and the last 8 bits are the most significant. This is distinct from the endianness of multi-word values (e.g., a 32-bit value on an 8-bit CPU). Word segments are rarely used in most CPUs, but may be needed for special cases, such as the bit layout of an EPROM.

In most cases, the word segment size equals the word size. However, if a word needs to be subdivided and reordered, the segment size can be smaller. The segment size must be less than or equal to the word size, and the word size must be evenly divisible by the segment size.

The word segment size is set in the general section of the configuration file. By default, it matches the word size. This feature is rarely needed, so the default is usually sufficient.

Endianness

BespokeASM supports two concepts of endianness:

  1. The order of words within a multi-word value (multi-word endianness)
  2. The order of word segments within a word (intra-word endianness)

Multi-Word Endianness

Multi-Word Endianness refers to the order of words in memory that together represent a multi-word value. For example, a 32-bit value 0x12345678 on an 8-bit CPU could be stored as:

  Address 0: 0x12
  Address 1: 0x34
  Address 2: 0x56
  Address 3: 0x78

(big-endian: most significant byte at the lowest address)

Or:

  Address 0: 0x78
  Address 1: 0x56
  Address 2: 0x34
  Address 3: 0x12

(little-endian: least significant byte at the lowest address)

If the CPU has 16-bit words, the same 32-bit value in little-endian would be:

  Address 0: 0x5678
  Address 1: 0x1234

Multi-word endianness is set in the general section of the configuration file. The default is big.

Intra-Word Endianness

Intra-Word Endianness refers to the order of segments within a word. For example, a 16-bit word with 4-bit segments and a value of 0x1234:

  • Big-endian: 0x1234 (most significant segment first)
  • Little-endian: 0x3412 (least significant segment first)

The address and value of the word remain the same; only the order of segments within the word changes. Intra-word endianness is useful for matching the physical layout required by certain memory devices.

Intra-word endianness is set in the general section of the configuration file. The default is big. This feature is rarely needed, so the default is usually sufficient.

String Byte Packing for Data Directives

When string_byte_packing is enabled in the general section, quoted strings in .byte and .cstr data directives are packed tightly into words, rather than each character being placed in its own word. This feature is only available if word_size is a multiple of 8 and at least 16. If not set, the default behavior is to place each character in its own word.

For example, with word_size: 16 and string_byte_packing: true, the directive:

.cstr "Hello World"

will produce the following 16-bit words (big-endian):

0x4865, 0x6c6c, 0x6f20, 0x576f, 0x726c, 0x6400

If the string does not fill the last word, the value of string_byte_packing_fill (default 0) is used to pad the remaining bytes. For example, with word_size: 32, string_byte_packing: true, and string_byte_packing_fill: 0xFF:

.byte "Hello World"

will produce:

0x48656c6c, 0x6f20576f, 0x726c64FF

For .cstr, the configured cstr_terminator value is always appended to the string before packing and padding. If the terminator does not fill the last word, the remaining bytes are padded with string_byte_packing_fill.

For example, with word_size: 32, string_byte_packing: true, string_byte_packing_fill: 0xFF, and cstr_terminator: 0xAA:

.cstr "Hello World!"

will produce:

0x48656c6c, 0x6f20576f, 0x726c6421, 0xAAFFFFFF

If string_byte_packing is not enabled, the default behavior is to emit each character (and the terminator for .cstr) in its own word, regardless of string_byte_packing_fill.

Configuration Sections

The configuration has the following main sections:

General

The general section defines the general configuration of BespokeASM and various assembly language features. The general section is required. The supported options are:

Option Key Value Type Description
address_size integer The number of bits that is required to represent a memory address.
page_size integer (Optional) The default memory page size in bytes to be used with the .page directive. Defaults to a value of 1.
word_size integer (Optional) The default number of bits in a word. Defaults to a value of 8.
word_segment_size integer (Optional) The default number of bits in a word segment. Defaults to the value of word_size.
endian string deprecated (Optional) Defines the endianness of multi-word values. Allowed values are big and little. If not present, this option defaults to big.
multi_word_endianness string (Optional) Defines the endianness of multi-word values. Allowed values are big and little. If not present, this option defaults to big.
intra_word_endianness string (Optional) Defines the endianness of intra-word segments when converting to bytes. Allowed values are big and little. If not present, this option defaults to the value of big.
registers list[string] (Optional) A list of register labels that will be used in this instruction set. Anything that is declared as a register label cannot be used as a constant or address label, and anything not declared as a register label cannot be used an a register operand. If not present, no register labels are defined.
min_version string (Required) The minimum version of BespokeASM that this instruction set configuration file will work with. BespokeASM will also do a counter-minimum version check to make sure this instruction set configuration file has the schema it is expecting.
identifier dictionary (Optional) Configures name and version information for the assembly language defined by this configuration file. This field is used both by language extension generation and source code language requirements. Contains the following key/value items:
  • name - The name of the assembly language. Should not contain spaces, but hyphens and underscores are OK.
  • version - The version of the assembly language. Should be expressed as a semantic version, e.g. "0.1.3".
  • extension - (Optional) The file extension (not including the period) used to identify source files containing this assembly language. Defaults to asm. BespokeASM will compile any text, but this extension is used primarily by language extensions to identify specific assembly language versions.
origin integer (Optional) Defines the default starting origin address for byte code generated with this configuration file. This is an offset from the start of the GLOBAL memory zone. The starting origin defaults to an address of 0 if this option is not present.
cstr_terminator integer (Optional) Defines the terminating character for byte sequences made with the .cstr data directive. Defaults to 0 if unset.
allow_embedded_strings boolean (Optional) If set true, the compiler will allow the embedded string feature. Defaults to false.
string_byte_packing boolean (Optional) If set to true, quoted strings in .byte and .cstr data directives will be packed tightly into words, rather than each character being placed in its own word. Only allowed to be true if word_size is a multiple of 8 and at least 16. Defaults to false.
string_byte_packing_fill integer (Optional) The byte value (0-255) used to pad the last word when string byte packing is enabled and the string does not fill all bytes in the word. Defaults to 0 if not present.

Predefined Values

Both compiler constants and memory blocks can be defined in the ISA configuration file, and the labels defined with these entities can be used in code compiled with the ISA configuration file. This section is identified with the predefined key and contains a dictionary with the following key/values.

Constants

Compiler constants for numerical values can be defined for use in the instruction set. This subsection, identified by the constants key, contains a list of dictionaries with these keys:

Option Key Value Type Description
name string The label string assigned to this constant value. This case-sensitive label can be used at compile time to reference the assigned integer value.
value integer The integer value assigned to this constant.

Data Blocks

Predefined data blocks can be used to reserve sections of memory for hardware features or common uses (such as buffers). BespokeASM will generate an error if the addresses of compiled code or data should ever overlap with predefined memory blocks. These are defined under the data key as a list of dictionaries:

Option Key Value Type Description
name string The label for the first address value in this data block. This label can be used at compile time to reference the assigned address value.
address integer The start address of the data block.
size integer The number of bytes associated with this data block (minimum 1).
value integer (Optional) The byte value to fill this data block with when generating a binary image. Defaults to 0 if not present.

Memory Zone

A predefined memory zone can be defined in the predefined section under memory_zones, as a list of dictionaries:

Option Key Value Type Description
name string The name of the memory zone.
start integer The start address of the memory zone.
end integer The end address of the memory zone.

The GLOBAL memory zone may be defined here by using the GLOBAL name. If defined, the origin value in the general settings is interpreted as an offset from the GLOBAL zone's start address. If not defined, a default GLOBAL zone is created.

Memory zones are where bytecode for code and data is assembled, while a data block is a preallocated block of bytecode.

Preprocessor Macro Symbols

Preprocessor macro symbols can be predefined in the predefined section under symbols, as a list of dictionaries:

Option Key Value Type Description
name string The name of the preprocessor macro symbol.
value string (Optional) The string replacement value for the macro. If not provided, the empty string is assumed.

Operand Sets

The operand_sets section defines sets of operands for instructions. An operand set represents all possible operand values for a specific operand position and defines the byte code and argument values to be packed when forming machine code. Operand sets are defined separately from instructions to allow reuse. Each operand set consists of one or more distinct operands.

The operand_set section is a dictionary, where the dictionary key is the name for the operand set, and the value is the configuration of that operand set. The name of the operand set is only use internally within this configuration file and does not directly impact the assembly language that is derived from this configuration file.

Each item listed in the operand_sets consists of a single element titled operand_values, which contains a dictionary that configures each of the operand variants in this operand set.

Operand Configuration Dictionary

The operand configuration dictionary specifies the assembly behavior of a specific operand value. The key is the internal name of the operand value, and the value is a collection of configuration items:

Option Key Value Type Description
type string Specifies one of the operand types and operand addressing modes supported by BespokeASM. The allowed values are:
  • numeric - The Immediate addressing mode. Creates a argument value set to this operand value.
  • indirect_numeric - The Indirect addressing mode.
  • deferred_numeric - The Deferred addressing mode.
  • register - The Register addressing mode.
  • indexed_register - The Indexed Register addressing mode.
  • indirect_register - The Indirect Register addressing mode
  • indirect_indexed_register - The Indirect Indexed Register addressing mode.
  • enumeration - The Enumeration type operand. The byte code and/or argument value is set by a key-value lookup, where the key is the operand string value, and the value for that key is set by configuration.
  • numeric_enumeration - Similar to the enumeration operand type, but the key is a numeric value that can be set by a numeric expression resolved at compile time.
  • numeric_bytecode - An operand type where the a byte code value is set directly by this operand value, subject to configurable bounds.
  • address - An immediate operand that represents a valid address. This address can be validated against a specific memory zone, and can be optionally configured to only emit N least significant bits of the address value to support "fast" address operations like a "zero page" instruction.
  • relative_address - An immediate operand that emits the argument value of the difference between the operand's expression value and this instructions's address. Optionally can use curly brace notation {...}.
  • empty - Used to indicate byte code that should be emitted when no operand is present. This enables instructions that can have a variant behavior for a "no operand" case. This type of operand can only be used with operands configured under the specific_operands Instruction Operands Configuration.
bytecode dictionary (Optional) A dictionary that configures the byte code associated with this operand. If not present this operand will not generate any byte code. This dictionary contains the following keys:
  • value - integer - The value of the byte code. Not used for enumeration, numeric_enumeration, and numeric_bytecode operand types, all of which define the byte code value through alternative means.
  • size - integer - The bit size of the byte code. The value will be masked to this bit size.
  • byte_align - Controls whether the byte code generated from this operand should be aligned to a byte boundary or not.
  • position - string - (Optional) Describes where the byte code from the operand should be placed relative tot he base byte code of the instruction mnemonic. Valid values are prefix and suffix. If this option is not present, suffix is the default value used.
  • min - Only used with the numeric_bytecode operand type. Enforces a minimum value that the operand can generate into byte code.
  • max - Only used with the numeric_bytecode operand type. Enforces a maximum value that the operand can generate into byte code.
  • value_dict - Only used with the enumeration and numeric_enumeration operand types. Contains a dictionary that defines the mapping of the operand value to the byte code that should be generated for that operand value. enumeration operands types can use strings as the dictionary keys, while numeric_enumeration operand types must use integer as the keys.
  • memory_zone - When specified for the address operand type, the operand value will be check to ensure it is a valid address in the indicated memory zone. If not specified, the GLOBAL memory zone will be used for validation. An error will be generated is the indicated memory zone is not valid at the time of compilation.
  • slice_lsb - An optional boolean value for the address operand type that indicates whether only the least significant bits of the address value should be encoded in to the byte code. When true, the operand's size value will be used to indicate the number of least significant bits to be encoded. When false, the entire address value will be encoded into byte code. Defaults to false if not present.
  • match_address_msb - An optional boolean value for the address operand type that indicates whether the most significant bits of a sliced address value should match the most significant bits of the address when this operands instruction is at. Useful for ensuring a valid address for local or short jump type instructions. Can only be true if slice_lsb is also set to true.
argument dictionary Configures how the operand argument will be emitted into the machine code. Must be present for the numeric, numeric_indirect, enumeration , and numeric_enumeration operand types. Ignored for all other types.

The dictionary contains the following keys:
  • size - integer - The bit size for the operand argument. The emitted value will be masked to this bit size.
  • word_align - boolean - Indicates whether the argument value should be aligned to the next whole word, or can be packed immediately after the prior section's last bit.
  • multi_word_endian - string - (Optional) The multi-word endianness that should be used for this argument. If not present, the multi-word endianness configured in the general section will be used.
  • intra_word_endian - string - (Optional) The intra-word endianness that should be used for this argument. If not present, the intra-word endianness configured in the general section will be used.
  • valid_address - boolean - (Optional) Indicates whether the argument value should be enforced to be within the range defined by the GLOBAL memory zone. Defaults to false if this option is not present. Only used with the numeric, indirect_numeric, deferred_numeric, and relative_address operand types.
  • value_dict - Only used with and required for the enumeration and numeric_enumeration operand types. Contains a dictionary that defines the mapping of the operand value to the byte code that should be generated for that operand value. enumeration operands types can use strings as the dictionary keys, while numeric_enumeration operand types must use integer as the keys.

register string The assembly code representation of the register value to be used for this operand. Must be one of the register values listed in the registers list of the general section. Must be present for the register, register_indirect, and indirect_indexed_register operand types, ignore for all other operand types.
offset dictionary Configures the offset value that is optional for the indirect_register operand type. Ignored for all other types. If not present, then no offset is enabled, and no argument value will be emitted in the machine code. If offset values are enabled, this operand will generate an argument value in the machine code equal to the offset value specified in the assembly code. The compiler will still permit not specifying an offset for a indirect_register instruction configured to enabled offsets. In this case, the offset of zero is implied and will be emitted as the argument value.

The dictionary contains the following keys:
  • size - integer - The bit size for the operand offset. The emitted value will be masked to this bit size.
  • byte_align - boolean - Indicates whether the offset value should be aligned to the next whole byte, or can be packed immediately after the prior section's last bit.
  • max - integer - The maximum value allow for the offset.
  • min - integer - The minimum value allowed for the offset.
  • endian - string - (Optional) The endianness that should be used for this offset. If not present, the default endianness configured in the general section will be used.

index_operands dictionary Configures the allowed offset operands for the indexed_register and indirect_indexed_register operand types. Contains a dictionary, where the key is an internal name for each offset operand option, and the value is an operand configuration formatted the same as described in this table. When compiling, BespokeASM will attempt to match one operand listed in index_operands. Note that the byte code of the matched index operand will be appended to this operand's configured byte code to form the overall byte code for this operand. If the matched index operand generates an argument, that will be appended to this operand's arguments, if any.
use_curly_braces boolean (Optional) Used only with the relative_address operand type. Determines whether the assembly notation for this operand should use curly braces {..} around the expression that indicates the target address. Defaults to FALSE if not present.
offset_from_instruction_end boolean (Optional) Used only with the relative_address operand type. Indicates whether the relative offset to be calculated should be calculated from the program counter value at the ned of the instruction (TRUE) or the program counter value at the beginning of the instruction (FALSE). Defaults to FALSE (beginning of instruction) if not present.
decorator dictionary (Optional) Indicates whether this operand requires a decorator in order to match. Only supported by the register, indirect_register, and indirect_indexed_register operand types. The decorator configuration dictionary requires two keys:
  • type - string - indicates what decorator is being configured. Supported values are:
    • plus - the + symbol
    • plus_plus - the ++ symbol
    • minus - the - symbol
    • minus_minus - the -- symbol
    • exclamation - the ! symbol
    • at - the @ symbol
  • is_prefix - boolean - (Optional) A boolean value indicating whether the decorator is a prefix (true) or a postfix (false). Defaults to a postfix (false) if not present.

Note: This configuration dictionary is used both by Operand Set configuration and by specific operands in other sections.

Instructions

The instructions section defines supported instruction mnemonics. Each instruction definition consists of three parts: the mnemonic, the instruction arguments, and the instruction byte code. This section is a key/value dictionary where the keys are the mnemonic strings and the values are dictionaries defining the instruction's arguments and byte code.

Option Key Value Type Description
aliases list[string] (Optional) A list of alternative mnemonics for this instruction. Each alias is accepted as a valid mnemonic in assembly source and language extensions, and generates the same code as the root mnemonic. All aliases must be globally unique across all mnemonics and aliases.
bytecode dictionary A dictionary that describes the base byte code for this instruction that should be emitted to indicate the instruction. The key and values that must be present are:
  • value - The value of the byte code
  • size - The bit size of the byte code. The value will be masked to this bit size.
  • endian - (Optional) The endian of the instruction prefix byte code. Useful only if the instruction byte code bits size is greater than 8. Defaults to the general endian setting.
  • suffix - (Optional) Defines a byte code fragment that will be appended to the byte code built by the base byte code augmented by operand sourced byte code fragments. This Is itself a dictionary that contains the value and size keys with similar meaning, but scoped to the suffix alone. A suffix is only created if this configuration is set.
This base byte code can be augmented by instruction operands in order to form the finalized byte code for the overall instruction.
operands dictionary A dictionary that configures the set of operands that are allowed for this instruction mnemonic. The key and values that are used in this dictionary are described in the table below. If not present, then the instruction mnemonic is assumed to have no operands.
variants list (Optional) This options allows the specification of one or more alternative configurations for the mnemonic. This is useful when a different instruction byte code prefix should be emitted for a certain operand signature. The value of this key is a list, and each list element is another instruction configuration with bytecode and operands as specified above. Variant configurations are processed if the operands do not match the main configurations, and then each variant configuration is processed in order present in the list, using the first match found to generate the byte code.

Instruction Aliases

You can define alternative mnemonics (aliases) for an instruction using the aliases field in the instruction's configuration. Aliases are treated as first-class mnemonics: they are accepted in assembly source, generate the same code as the root mnemonic, and are included in language extension syntax highlighting. All aliases must be globally unique across all mnemonics and aliases in the configuration.

  • The aliases field is a list of one or more alternative names for the instruction mnemonic.
  • If aliases is not present, the instruction has no aliases.
  • Aliases are not supported for macros (only for native instructions).

Example:

instructions:
  jsr:
    aliases: [call, jump_to_subroutine]
    bytecode:
      value: 42
      size: 8
  nop:
    bytecode:
      value: 0
      size: 8

In this example, jsr, call, and jump_to_subroutine are all valid mnemonics for the same instruction. Any of these can be used in assembly code, and they will generate the same machine code.

Instruction Operands Configuration

The operands configuration for an instruction requires at least one of operand_sets or specific_operands, or both.

Option Key Value Type Description
count integer The number of operands this mnemonic must have.
operand_sets dictionary (Optional) Present if operand sets are used to configure the operands of the mnemonic. Contains the following keys and values:
  • list - A list of names for the operand sets to be used as the operand options for this instruction. Must have count number of items in the list, and the position in the list pertains to the position of the operand.
  • disallowed_pairs - (Optional) A list of operand name tuples that represents combinations of operands from the configured operand sets that the compiler should not permit. The operand name is the key names of the operand_values dictionary for a given operand set. The tuple is expressed as a python-style list. For example, [a, b] is used to indicate a disallowed operand set for a mnemonic with two operands where the unallowed operand tuple is a from the first operand set in combination with b from the second operand set.
  • reverse_argument_order - (Optional) A boolean that indicates whether the machine code for the operand arguments should be emitted in reverse order. This is useful when it is more convenient for the microcode to process the last argument first, and then continued reverse order for the rest. Affects all operand combinations for the operand_sets, but only has any real impact for instruction operands that need 2 or more arguments emitted. Defaults to false if not present.
  • reverse_bytecode_order - (Optional) A boolean that indicates whether the byte code for the operands should be emitted in reverse order. The reversing occurs only within the prefix or suffix scope of the operands' byte code. That is, the operands whose byte code gets emitted as a prefix will be reversed separately from the operands that get emitted as a suffix to the base byte code of the instruction. Affects all operand combinations for the operand_sets, but only has any real impact for instruction with 2 or more operands with byte code. Defaults to false if not present.
specific_operands dictionary (Optional) A dictionary of specific operand combination configurations that are allowed when assembling this instruction. Takes precedence over the operand combinations allowed in the operand_sets configuration for this instruction when both configure the same operand combination. The keys of this dictionary are arbitrary strings used internally to identify a specific operand configuration, and the values are the keys' operand configuration. Each operand configuration is a dictionary that contains the following keys and values:
  • list - A dictionary of specific operand configurations. The key is an arbitrary string to internally identify the specific operand, and the value is an operand configuration formatted the same as Operand Configuration Dictionary.
  • reverse_argument_order - (Optional) A boolean that indicates whether the machine code for the arguments of this specific operand configuration should be emitted in reverse order. This is useful when it is more convenient for the microcode to process the last argument first, and then continued reverse order for the rest. Only has any real impact for instruction operands that need 2 or more arguments emitted. Defaults to false if not present.
  • reverse_bytecode_order - (Optional) A boolean that indicates whether the byte code for the operands should be emitted in reverse order. The reversing occurs only within the prefix or suffix scope of the operands' byte code. That is, the operands whose byte code gets emitted as a prefix will be reversed separately from the operands that get emitted as a suffix to the base byte code of the instruction. Affects all operand combinations configured in this section, but only has any real impact for instruction with 2 or more operands with byte code. Defaults to false if not present.

Instruction Macros

Instruction macros are a way to make configurable sequences of instructions and then just just use a single instruction (macro) to insert that instruction sequence into the byte code. For example, if the ISA of the computer only has a single byte move instruction named mov, a two byte move instruction (macro) named mov2 can be constructed from the following sequence of instructions:

     mov [addr1],[addr2]
     mov [addr1+1],[addr2+1]

Then, the macro instruction mov2 [addr1],[addr2] can be defined such that it expands to this sequence.

BespokeASM enables the ability for instruction macros to be defined in the ISA configuration file. Once defined, the macro mnemonic can be used in the assembly code identically to native instruction mnemonics, with the only noticeable difference being that instruction macros generate more byte code than native instructions. What BespokeASM does here is essentially run a pre-assembler that expand a macro instruction into desired set of replacement instruction lines through a string parsing and replacement process. Then the constructed instruction lines are assembled with all the other instruction lines from the assembly code to generate the machine code.

Defining Instruction Macros

Macros are defined in the macros section of the configuration file. The section is structured similar to the instructions section in that the section is a dictionary where the keys are the mnemonic of the macro and the value is a list of distinct configurations for that macro. A macro configuration list is a list of dictionaries. Each dictionary has two elements, operands and instructions.

The operands section is configured the same as the operands section for instructions is configured, however it is worth noting that since no byte code is emitted directly from a macro, any configuration provided for a macro's operand's byte code is ignored. The goal of the operand section for a macro is simply to define what the allowed types of operands are for a specific macro configurations.

The instructions section of a macro definition lists in order the instruction templates that will be used to compile the instruction sequence that the macro will be expanded into. Each instruction is written as is to be assembled, the macro mechanism essentially replaces the macro instruction in the assembly code with the assembly code listed in instructions. However, before doing so, certain tokens that may be present in the instruction section get replaced with finalized values. The tokens are of the form @YYY(x), where YYY is the token label, and x is an integer indicating what macro operand will be the source of its value. The first macro operand is represent by x being zero to 0, the second is 1, and so on. The following macro tokens are supported:

  • @OP(x) - Generates a value based on the whole string of the x macro operand.
  • @ARG(x) - Generates a value based on the argument numeric expression of the x macro operand
  • @REG(x) - Generates a value based on the register label used in the x macro operand.

The specific value emitted by each macro token depends on the operand type that the x macro operand is configured to be in the operands section of this macro configuration. The following table lists what each macro token will generate for all supported operand types.

Operand Type operand argument @ARG(x) operand register @REG(x) entire operand @OP(x)
numeric The original numeric expression error The original numeric expression
indirect_numeric The numeric expression of the indirect address error The entire operand, including the [ ] brackets
deferred_numeric The numeric expression of the indirect address error The entire operand, including the [[ ]] brackets
register error The register The register
indirect_register The offset expression applied to the register The register The entire operand, including the [ ] brackets
indirect_indexed_register ? The base register The entire operand, including the [ ] brackets
enumeration The string of the enumeration value error The string of the enumeration value
numeric_enumeration The numeric expression of the enumeration value error The numeric expression of the enumeration value
numeric_bytecode The original numeric expression error The original numeric expression
empty error error error

See the table in the original documentation for details on what each token emits for each operand type.

Example Macro Definition

To illustrate how to configure an macro, the the following is a nominal configuration for the mov2 example discussed above:

macros:
  mov2:
    - operands:
        count: 2
        specific_operands:
          indirect_indirect:
            list:
              iaddr1:
                type: indirect_numeric
                argument:
                  size: 16
                  byte_align: true
              iaddr2:
                type: indirect_numeric
                argument:
                  size: 16
                  byte_align: true
      instructions:
        - "mov [@ARG(0)],[@ARG(1)]"
        - "mov [@ARG(0)+1],[@ARG(1)+1]"
Macro Considerations and Limitations
  • Macros cannot define labels or constants, nor use directives. However, they can use predefined labels and constants in expressions.
  • The instructions listed in the instruction section of a given instance of a macro definition are tightly coupled to the operands types configured for the macro in the operands section. If the instructions do not match what the macro operands would provide, then errors would be generated during assembly. While operand_sets can be used to configure a macro's operands, care should be taken to ensure all operands listed in the operand set are consistent with each other in terms of how the macro instructions will use it. If operands are inconsistent, a different configuration for the macro should be created in the list of configurations for a given macro.

Examples

Example configuration files can be found in the examples directory of the BespokeASM repository.

⚠️ **GitHub.com Fallback** ⚠️