File format - pig-games/asm465 GitHub Wiki

The file format shares a lot with the in-editor memory layout for an assembly file.

Use of PETSCII vs Screen codes in file format (and memory representation)

I'm currently leaning towards using Screen codes for all text in both file format and memory representation. The reason for this is simple, to reduce the amount of back and forth translation during 'regular' use (loading/saving, editing and assembling of modules). For translation from and to PETSCII and/or ASCII, import/export functionality can be created both in the native assembler itself and on 'modern' platforms. It is assumed this import/export use will be much less frequent and can be easily automated on the 'modern' PC side in order to for example sync code with github.

Grammatical specification of file format

The definition below uses an 'Augmented Backus-Nauer Form' like notation to formally specify the file format. A few specific modifications have been added to the 'official ABNF' in order to make specifying a Commodore 8-bit data format a bit easier. In general, character values are assumed to be ScreenCode (default) or Petscii (explicitly specified), this may change as does everything.

  • Prefer inCase over use of '-' to separate words in production rule names
  • Multi-line production rule does not have to repeat the rule name for each line
  • Additional core rules:
    • BYTE
    • WORD
    • LEND: end of a tokenised line, defaults to $FF
    • TEND: end of a text byte, defaults to $FF
  • Additional operators
    • | combines bits into bitfield
  • Terminal value prefixes, where unambiguous the first % is left out
    • %%: binary
    • %$: hex
module         = header status body
header         = moduleName fileSize startOfBody symbolData
moduleName     = symbol
fileSize       = WORD
startOfBody    = WORD
status         = addressResolvedLine
body           = *line

symbol         = ALPHA *(ALPHA / '_') TEND

line           = lineType lineSize lineAddress lineContent
              =/ LT_EMPTY lineSize lineAddress

lineType       = [LT_RELAD] | [LT_COMM] | [LT_MCUSE] | [LT_MCDEF] | [LT_DIR] | [LT_LBUSE] | [LT_INST] | [LT_LBDEF]

LT_RELAD       = %1000_0000             ; Line address is relative if set, absolute if unset
LT_COMM        = %0100_0000             ; Line contains a comment
LT_MCUSE       = %0010_0000             ; Line uses a macro
LT_MCDEF       = %0001_0000             ; Line defines a macro
LT_DIR         = %0000_1000             ; Line uses a directive
LT_LBUSE       = %0000_0100             ; Line uses a label
LT_INST        = %0000_0010             ; Line contains instruction
LT_LBDEF       = %0000_0001             ; Line defines a label
LT_EMPTY       = %0000_0000             ; Empty lines

lineSize       = BYTE                   ; specifies length to tokenised line from the first byte, 
                                        ; except for lineType == LT_EMPTY : then it specifies the number of empty lines.

lineAddress    = WORD

lineContent    = [labelDef] instruction / directive / macroUse [comment]
              =/ labelDef [comment]
              =/ comment
              =/ macroDef [comment]
              
column         = BYTE
labelDef       = column symbol
instruction    = column instToken column [instOperand]
instToken      = BYTE
instOperand    = column (addressingMode | valueDef) [operValue]
operValue      = value
              =/ 'a' / 'A'     ; only for addressingMode AM_ACC
macroUse       = labelUse column [instOperand]

value          = BYTE
              =/ WORD
              =/ labelUse
              =/ expression

labelUse       = column symbol *symbol    ; unresolved label: the '.' separated names representing the fully qualified label
                                          ; the TEND terminating each symbol represents the '.' and the end of the full qualified label
              =/ column %$00 1*BYTE       ; resolved label: a list of label id's, representing the fully qualified label

addressingMode = AM_IMP / AM_ACC / AM_IMM / AM_IMW / AM_BP  / AM_BPQ / AM_BPX / AM_ABS / AM_ABX / AM_ABY 
              =/ AM_REL / AM_RLW / AM_XIN / AM_INY / AM_INZ / AM_IND / AM_BPR
                               ; bit 0-4

valueDef       = VP_HEX / VP_DEC / VP_BIN / VP_OCT / VP_EXP / VP_LAB 
                                ; bit 5-7, if applicable specifies the base of the operand (for example with AM_IMM)
                                ; or whether the value comes from a label or expression

                                ; TODO: Consider or'ing the column and valueDef together, meaning only 32 spaces between values and 
                                ;       following ','. This is a limitation but saves a lot of bytes in large value lists in a source file.
                                ;       In my experience the ',' typically always directly follows the previous value anyway.
directive      = DIR_BYTE 1*(column valueDef column byteListValue)    ; the valueDef also represents the ',' between values
              /= DIR_WORD 1*(column valueDef column wordListValue)    ; valueDef must be != VP_CHR
              /= DIR_TEXT column *VCHAR TEND
              /= 

DIR_BYTE       = 1      ; .byte
DIR_WORD       = 2      ; .word
DIR_TEXT       = 3      ; .text
DIR_

byteListValue  = BYTE / labelUse / expression

AM_IMP         = 0             ; implied        -> valueDef must == 0, no operValue is allowed
AM_ACC         = 1             ; accumulator    -> operValue may be only ['a' / 'A']
AM_IMM         = 2             ; immediate      -> valueDef must be >= 1<<5, operValue must be a BYTE
AM_IMW         = 3             ; immediate word -> valueDef must be >= 1<<5, operValue must be a WORD
AM_BP          = 4             ; base page      -> valueDef must be >= 1<<5, operValue must be a BYTE
AM_BPQ         = 5             ; base page quad -> valueDef must be >= 1<<5, operValue must be a BYTE
AM_BPX         = 6             ; 
AM_ABS         = 7
AM_ABX         = 8
AM_ABY         = 9
AM_REL         = 10
AM_RLW         = 11
AM_XIN         = 12
AM_INY         = 13
AM_INZ         = 14
AM_IND         = 15
AM_BPR         = 16                               

VP_HEX         = 1<<5           ; hexadecimal, e.g.: $0A
VP_DEC         = 2<<5           ; decimal, e.g.:     10
VP_BIN         = 3<<5           ; binary, e.g.:      %00001010
VP_OCT         = 4<<5           ; octal, e.g.:       o12
VP_EXP         = 5<<5           ; expression, e.g.:  SomeLabel + 1
VP_LAB         = 6<<5           ; label, e.g.:       SomeLabel
VP_CHR         = 7<<5           ; literal character, e.g. 'a'