mc09 L2 Port status Nov 2016 - nealcrook/multicomp6809 GitHub Wiki

These notes describe the state of NitrOS-9 Level 2 for mc09 as-of November 2016. They are of historic interest only (if that) because the port is now complete and working.

I deliberately made the mc09 mmu behave like the one on the COCO with the aim of allowing a Level 2 port.

Debug of NitrOS-9 is tricky for me because my emulator (exec09) does not support the CWAI instruction, which is used in the kernel. For the Level 1 port I did lots of debugging in the emulator to get the kernel up and running, then I had to debug on real hardware.

(Update: the Level 2 port activity included removing that restriction from exec09; both Level 1 and Level 2 can boot and run in exec09 now).

I started looking at a Level 2 port but did not make a lot of progress. The activity of writing NitrOS-9 was a number of years ago and there seems to be little or no active development. I posted a few questions on the coco mailing list but failed to get much technical engagement. The only other people who were interested in this port did not have the knowledge/skills/time to contribute. I kind of lost interest and started looking at FUZIX instead.

The one person who did some actual coding on this was Bill Nobel. He also had given me some help in fixing a bug in my SD driver that came from my limited understanding of the device driver data structures.

22Jan2016 Bill explained to me that the top address range in the MMU has a special behaviour (this is not covered in the NitrOS-9 documentation that I used to specify the behaviour of my design):

  • The top 512 bytes of address space fe00-ffff are always ram and always fixed no matter what pages are mapped by the MMU (The Coco3 uses block $3f as this top page, but it could be any block)

(In addition, on mc09, ffd0-ffdf are always I/O no matter what pages are mapped by the MMU).

02Feb2016 I committed modified RTL to implement this additional MMU behaviour through a special register sequence. Once enabled it cannot be disabled except by reset - Bill said that the MMU is never disabled, so that restriction is fine. In the RTL, I named this mode Fixed RAM Top (FRT). The description of the MMU behaviour is attached here as APPENDIX 1 (it is simply cut from the header of the RTL) but, in summary:

  • FRT is disabled by default.
  • When enabled it will map the top 256 bytes of physical block 7 (the top of the block currently mapped to $E000-$FFFF) permanently to the top 256 bytes of the CPU address space.
  • To enable FRT, do a byte store of $20 to address $FFDE - do this early in the track34 code. After this subsequent writes that you do to $FFDE should always have bits [7] and [5] set to 1.
  • One of the LEDs is connected up to the FRT signal, for debug (it is LED_7). You should see the LED light when you do that byte store, and stay high forever.

24Jan2106 Bill wrote: "I have never disabled the MMU on any machine once I turn it on. I will have to enable [the FRT] in order for the RAM at that $ffe0-$ffff to get the initial vectors setup at the point REL kicks in and copies the kernel from $2600 to the top of RAM. Once setup, they very rarely get changed after KRNP2 is finished initializing."

APPENDIX 2 has some FORTH code that I wrote to demonstrated that FRT is behaving as I expect.

03Feb2016 Bill wrote: "Looks good so far. I have already started to create a stripped Level 2 make structure in the Level 2 repo (I based it off your Level 1 makefiles)"

"It’s looking good for the conversion also. I started to convert the routines in the vector page I will probably strip the GrfDrv task routine as We don’t need it (this gives me some room in the page). REL & Boot though gives me a lot of space back to the system space even though it ends up being padded out. Even keeping the Level 2 Boot screen debug system."

14Feb2016 Bill wrote: "I have finally got a mc09 makefile working for Level 2. I copied the coco3 make structure to the mc09, and made the changes needed. It has a lot of extra crap included in building the disk images, but it functions."

Bill: is there a specific reason why you split clock.asm out to a separate asm for the mc09?, vs using conditionals?"

Neal: When I was coding the L1 version it seemed to have little in common with the existing code so I thought it was cleaner to have a separate file. That might not be true for the L2 version in which case it might make sense to revisit the L1 decision. Let me know your opinion.

Bill: Splitting clock isn’t a problem, I was just curious.

Bill: I have REL converted, but I need to know if there is any pre-process I need to worry about coming from Camelforth. I am assuming that when Camelforth loads the Bookrack that it just releases execution to REL, no restrictions."

Neal: Your assumption is correct. Moving from CamelForth to NitrOS-9 is a one-way journey with no requirement to preserve any state.

26Feb2016 Bill wrote: "Update for the Level 2 upgrade - I have been moving along quite well until I discovered something tonight with your MMU. Nothing wrong just unexpected behaviour. I found the Level 2 memory size test in KRN doesn’t complete (just gets stuck in a endless loop). It relies on Ghosting of unused (non-existent) banks which you have stated in the MMU doc as undefined (unknown). Well it does not ghost the banks as expected. I will have to find another way to test memory size on the mc09, or hard code it to a specific size. One way or another I will find a way. I have it hard coded for my prototype at 512k right now for development purposes."

"Here is a quick video of the progress: [not attached]"

"The characters printed at the top left of screen start with the ‘R’ which means boot is in REL, next is K which is the KRN init routine, next the 'REL Boot KRN' is the module validation routine and addition of these modules to the module directory, from that point things go haywire, but I have progress. The vector page has been successfully updated to use the Coco3 style for interrupts (just not locked with FRT yet)"

Here is a memory sizing algorithm that I came up with but have not coded/tested:

  • assumption: MMU is set up with "flat" 1-1 mapping
  • pass 1: using one logical block, select each physical block in turn working down from 31 to 0. Write a unique value into (eg) location 0,1 in the block. For example, write the block number and inverse block number
  • pass 2: using one logical block, select each physical block in turn working up from 0 to 31. Check for the unique value and stop when you fail to find it

The aliasing makes real memory (in low address space) repeat at high addresses. Doing pass1 high->low means that the aliased views are overwritten by the real views so that pass2 sees the correct value.

This is destructive but that can be solved by optimisation 1: assume that at least 64K is present and mapped in to logical address space. Just test block 31..8. Now the fact that these (unmapped) blocks have been messed with should not matter.

(Update: the memory sizing routine that I ended up implementing is much simpler than that; it assumes that there are only 2 legal memory sizes, corresponding to 1 or 2 memory chips fitted, and works out which of the two exist).

10Sep2016 Bill wrote: "I am having trouble packing the MMU task switching from Level 2 to fit into the Multicomp MMU. the original routine just uses a loop writing 8 bytes consecutively into the mmu. Your scheme is a little different, It needs a routine that writes 2 bytes. I am not saying I am defeated (I am a determined sob)"

"The problem is in KRN. The following routines sit at the very top of memory in the Vector page. Changing to the new MMU makes it too large to fit in that area. They are the main switching routines for the task registers and MMU. There are shadow registers in system direct page (variables that start with <D.xxx)" -- APPENDIX 3 has the code in question.

So, in conclusion, the current coding challenge is to rewrite the code in APPENDIX 3 so that it will control the Multicomp MMU and fit in the space available; that is what Bill got stuck on. He suggested previously that it's a problem that the MMU registers cannot be read back but I'm not sure whether that's really true: it seems to me that when you come to set up a mapping, you should not care what the old mapping was. In any case, I have not looked over that code in detail. I can change the programming interface to the MMU (and maybe even make it read/write) if that would solve the problem.. but what I cannot do is to expand the address space used by the MMU - it's stuck at 2 bytes.

I think that Bill has some other stuff coded (not sure how much, as I have not seen it). It seems that he is still interested in getting this working but (as with all of us) he has restricted time to spend on his "hobby".

APPENDIX 1: mc09 MMU description

-- OVERVIEW OF MMU OPERATION
-- =========================
--
-- The 6809 can address 64KBytes of memory directly, through a 16-bit address
-- bus. This will be referred to as the "logical address space". The MMU
-- considers the logical address space as 8, 8KByte blocks. Address bits
-- [15:13] identify a logical block number (0-7, 8 in total).
--
-- Up to 1MByte of RAM is supported, referred to as the "physical
-- address space" and needing 20-bit address bus. Address bits [19:13]
-- identify a physical block number (0-127, 128 in total).
--
-- Within the MMU, 6809 address lines [15:13] are used to index a
-- programmable look-up table. Each entry in the table holds a physical
-- block number, which is driven out as address lines/chip selects to RAM.
--
-- There are 16 entries in the table, arranged in two groups of 8. A register
-- bit "TR" is used to select which group is used. This allows software to
-- switch rapidly between two sets of mappings.
--
-- To program a table entry you first select the table entry using a write
-- to one register then select the physical physical block number for that
-- entry using a write to another register. These two operations can be
-- combined into a single 16-bit write.
--
-- Each physical block can be write-protected so that it acts like ROM.
--
-- Logical block 7 ($D000-FFFF) acts differently in three ways:
-- 1. The boot ROM sits in this block, overlaying any RAM that is mapped
--    there. The ROM is enabled after reset but can be disabled by a
--    register write.
-- 2. The multicomp I/O is decoded in this block, in address range
--    $FFD0-$FFDF. The I/O is always present. If you map ROM to this
--    block, accesses to ROM are ignored and I/O is accessed instead.
--    If you map RAM to this block, write accesses go to I/O and to
--    RAM (ie, the RAM locations at $FFD0-$FFDF are corrupted).
-- 3. When the "Fixed RAM Top" (FRT) is enabled, the address range
--    $FE00-FFCF, $FFE0-$FFFF are *always* mapped to physical RAM
--    block 7. This 256byte region is the "vector page" on the COCO
--    (interrupted here by the I/O space). This special mapping is
--    performed for both reads and writes. Furthermore, when this
--    mapping is enabled, I/O writes will corrupt the associated
--    locations in physical RAM block 7, regardless of what RAM block
--    is mapped into logical block 7.
--
-- At reset, the MMU is disabled (giving a 1-1 mapping) but the
-- mapping registers themselves are NOT reset.
--
-- MMU PROGRAMMING INTERFACE
-- =========================
--
-- The software interface is through 2 write-only registers that
-- occupy unused addresses in the SDCARD address space:
-- $FFDE MMUADR
-- $FFDF MMUDAT
--
-- MMUADR
-- b7       ROMDIS Disable ROM. 0 after reset.
-- b6       TR     Select upper group of mapping registers.
-- b5       MMUEN  Enable MMU. 0 after reset.
-- b4       NMI bit.
-- b3       } MAPSEL Select mapping register to
-- b2       } write through MMUDAT. MAPSEL values 0-7 control
-- b1       } the address translation when TR=0, MAPSEL values
-- b0       } 8-15 control the address translation when TR=1.
--
-- MMUDAT
-- b7       WRPROT When 1 the physical block is read-only
-- b6       } Physical block number associated with the logical
-- b5       } block selected by the current value of MAPSEL.
-- b4       }
-- b3       }
-- b2       }
-- b1       }
-- b0       }
--
-- Magic: for NitrosL2, want a fixed 512byte region of r/w memory
-- at the top of the address space. There is no space to provide
-- an enable for this behaviour (which I call FRT for FixedRamTop)
-- and so some special magic is used, as follows:
--
-- IF ROMDIS=1 & MMUEN=1 then a write with b4=0 (see NMI behaviour
-- below) and b7=0 and b5=1 does NOT enable the ROM but actually
-- sets FRT=1. Any write with MMUEN=0 sets FRT=0 again. In summary:
-- Current           Action        End State
-- -----------------+-------------+-----------------
-- ROMDIS MMUEn FRT  ROMDIS MMUEn  ROMDIS MMUEn FRT
-- x      x     x    RESET         0      0     0
-- x      x     x    0      1      0      1     x
-- x      x     x    1      1      1      1     x
-- x      x     x    x      0      x      0     0
-- 1      1     x    0      1      1      1     1
--
-- If you select a physical block that is outside the actual size
-- of your RAM, the behaviour is undefined (it will probably alias).
--
-- When MMUEN=0, logical block 0-7 selects physical block 0-7.
--
-- You can write MMUDAT, MMUADR as separate 8-bit stores or as a 16-bit
-- store.
--
-- The NMI bit should be set using an 8-bit store. On writes to
-- MMUADR with bit4=1, the state of the other data bits is ignored
-- (they do not change). The avoids the need to know the current
-- state of any of the other bits. The NMI bit is self-clearing and
-- generates an NMI edge after a specific delay. As part of a
-- carefully-controlled code sequence it can be used to interrupt
-- after execution of a single instruction (see SINGLE STEP, below)
--
-- Remember, these two registers are WRITE-ONLY!

APPENDIX 2: CAMELFORTH TEST SEQUENCE FOR FRT BEHAVIOUR

\ test special MMU magic from FORTH
\ from the slash to end of line is a comment - no need to type it

HEX

\ enable MMU with 1-1 mapping
MMUMAP

\ copy FORTH ROM image to block 6 at address C000-DFFF
E000 C000 2000 MOVE

\ map block 6 to address E000 and block 7 to address C000
\ ! is "store" .. <address> <data> !
2706 FFDE !  \ 7 is the CPU block, 6 is the RAM block
2607 FFDE !  \ so this is block 7 going to C000

\ copy FORTH ROM image to block 7 at address C000-DFFF
E000 C000 2000 MOVE

\ branch through RAM, disable ROM, branch back through
\ reset vector (trust me on this one..)
PIVOTRST

\ should be back at the CAMELFORTH banner, but now
\ we are running from RAM.

\ we lost all state so..
HEX

\ should be identical copies. The 20's are unused space in the ROM
DF00 20 DUMP
FF00 20 DUMP

\ write and see it written: ie the destinations are unique
1234 DF00 !
5678 FF00 !
DF00 20 DUMP
FF00 20 DUMP

\ MMU is enabled and ROM is disabled so we are in the right
\ state to enable magic behaviour
20 FFDE C!    \ C! is char-store -- ie, 8-bit

\ now, the top of memory should show data from page 7
DF00 20 DUMP \ should be as before
FF00 20 DUMP \ should show data from page 7 -- see the 1234 NOT the 5678

\ two bytes straddling the border
ABCD DDFF !  \ write to page 7
DDF0 20 DUMP \ page 7 -- see both bytes cross the border
FDF0 20 DUMP \ page 6 -- see the CD because it's in the top $200 bytes

\ and again
789A FDFF !  \ write to page 6
DDF0 20 DUMP \ page 7 -- see the 9A because it's in the top $200 bytes
FDF0 20 DUMP \ page 6 -- see both bytes cross the border

APPENDIX 3: NitrOS9 extract from level2/modules/kernel/krn.asm

* The following routines must appear no earlier than $E00 when assembled, as
* they have to always be in the vector RAM page ($FE00-$FEFF)

* Default routine for D.SysIRQ
S.SysIRQ
        lda     <D.SSTskN       Get current task's GIME task # (0 or 1)
        beq     FastIRQ         Use super-fast version for system state
        clr     <D.SSTskN       Clear out memory copy (task 0)
        jsr     [>D.SvcIRQ]     (Normally routine in Clock calling D.Poll)
        inc     <D.SSTskN       Save task # for system state
        lda     #1              Task 1
        ora     <D.TINIT        Merge task bit's into Shadow version
        sta     <D.TINIT        Update shadow
        sta     >DAT.Task       Save to GIME as well & return
        bra     DoneIRQ Check for error and exit

FastIRQ jsr     [>D.SvcIRQ]     (Normally routine in Clock calling D.Poll)
DoneIRQ bcc     L0E28   No error on IRQ, exit
        IFNE    H6309
        oim     #IntMasks,0,s   Setup RTI to shut interrupts off again
        ELSE
        lda     ,s
        ora     #IntMasks
        sta     ,s
        ENDC
L0E28   rti

* return from a system call
L0E29   clra                    Force System task # to 0 (non-GRDRV)
L0E2B   ldx     <D.SysPrc       Get system process dsc. ptr
        lbsr    TstImg          check image, and F$SetTsk (PRESERVES A)
        orcc    #IntMasks       Shut interrupts off
        sta     <D.SSTskN       Save task # for system state
        beq     Fst2    If task 0, skip subroutine
        ora     <D.TINIT        Merge task bit's into Shadow version
        sta     <D.TINIT        Update shadow
        sta     >DAT.Task       Save to GIME as well & return
Fst2    leas    ,u              Stack ptr=U & return
        rti

* Switch to new process, X=Process descriptor pointer, U=Stack pointer
L0E4C   equ     *
        IFNE    H6309
        oim     #$01,<D.TINIT   switch GIME shadow to user state
        lda     <D.TINIT
        ELSE
        lda     <D.TINIT
        ora     #$01
        sta     <D.TINIT
        ENDC
        sta     >DAT.Task       save it to GIME
        leas    ,y              point to new stack
        tstb                    is the stack at SWISTACK?
        bne     MyRTI           no, we're doing a system-state rti

        IFNE    H6309
        ldf     #R$Size         E=0 from call to L0E8D before
        ldu     #Where+SWIStack point to the stack
        tfm     u+,y+           move the stack from top of memory to user memory
        ELSE
        ldb     #R$Size
        ldu     #Where+SWIStack point to the stack
RtiLoop lda     ,u+
        sta     ,y+
        decb
        bne     RtiLoop
        ENDC
MyRTI   rti                     return from IRQ


* Execute routine in task 1 pointed to by U
* comes from user requested SWI vectors
L0E5E   equ     *
        IFNE    H6309
        oim     #$01,<D.TINIT   switch GIME shadow to user state
        ldb     <D.TINIT
        ELSE
        ldb     <D.TINIT
        orb     #$01
        stb     <D.TINIT
        ENDC
        stb     >DAT.Task
        jmp     ,u

* Flip to task 1 (used by GRF/WINDInt to switch to GRFDRV) (pointed to
*  by <D.Flip1). All regs are already preserved on stack for the RTI
S.Flip1 ldb     #2              get Task image entry numberx2 for Grfdrv (task 1)
        bsr     L0E8D           copy over the DAT image
        IFNE    H6309
        oim     #$01,<D.TINIT
        lda     <D.TINIT        get copy of GIME Task side
        ELSE
        lda     <D.TINIT
        ora     #$01
        sta     <D.TINIT
        ENDC
        sta     >DAT.Task       save it to GIME register
        inc     <D.SSTskN       increment system state task number
        rti                     return

* Setup MMU in task 1, B=Task # to swap to, shifted left 1 bit
L0E8D   cmpb    <D.Task1N       are we going back to the same task
        beq     L0EA3           without the DAT image changing?
        stb     <D.Task1N       nope, save current task in map type 1
        ldx     #$FFA8          get MMU start register for process's
        ldu     <D.TskIPt       get task image pointer table
        ldu     b,u             get address of DAT image
L0E93   leau    1,u             point to actual MMU block
        IFNE    H6309
        lde     #4              get # banks/2 for task
        ELSE
        lda     #4
        pshs    a
        ENDC
L0E9B   lda     ,u++            get a bank
        ldb     ,u++            and next one
        std     ,x++            Save it to MMU
        IFNE    H6309
        dece                    done?
        ELSE
        dec     ,s
        ENDC
        bne     L0E9B           no, keep going
        IFEQ    H6309
        leas    1,s
        ENDC
L0EA3   rts                     return

* Execute FIRQ vector (called from $FEF4)
FIRQVCT ldx     #D.FIRQ         get DP offset of vector
        bra     L0EB8           go execute it

* Execute IRQ vector (called from $FEF7)
IRQVCT  orcc    #IntMasks       disasble IRQ's
        ldx     #D.IRQ  get DP offset of vector

* Execute interrupt vector, B=DP Vector offset
L0EB8   clra                    (faster than CLR >$xxxx)
        sta     >DAT.Task       Force to Task 0 (system state)
        IFNE    H6309
        tfr     0,dp    setup DP
        ELSE
        tfr     a,dp
        ENDC
MapGrf  equ     *
        IFNE    H6309
        aim     #$FE,<D.TINIT   switch GIME shadow to system state
        lda     <D.TINIT        set GIME again just in case timer is used
        ELSE
        lda     <D.TINIT
        anda    #$FE
        sta     <D.TINIT
        ENDC
MapT0   sta     >DAT.Task
        jmp     [,x]            execute it

* Execute SWI3 vector (called from $FEEE)
SWI3VCT orcc    #IntMasks       disable IRQ's
        ldx     #D.SWI3         get DP offset of vector
        bra     SWICall         go execute it

* Execute SWI2 vector (called from $FEF1)
SWI2VCT orcc    #IntMasks       disasble IRQ's
        ldx     #D.SWI2         get DP offset of vector

* This routine is called from an SWI, SWI2, or SWI3
* saves 1 cycle on system-system calls
* saves about 200 cycles (calls to I.LDABX and L029E) on grfdrv-system,
*  or user-system calls.
SWICall ldb     [R$PC,s]        get callcode of the system call
* NOTE: Alan DeKok claims that this is BAD.  It crashed Colin McKay's
* CoCo 3.  Instead, we should do a clra/sta >DAT.Task.
*         clr   >DAT.Task       go to map type 1
        clra
        sta     >DAT.Task
* set DP to zero
        IFNE    H6309
        tfr     0,dp
        ELSE
        tfr     a,dp
        ENDC

* These lines add a total of 81 addition cycles to each SWI(2,3) call,
* and 36 bytes+12 for R$Size in the constant page at $FExx
*  It takes no more time for a SWI(2,3) from system state than previously,
* ... and adds 14 cycles to each SWI(2,3) call from grfdrv... not a problem.
* For processes that re-vector SWI, SWI3, it adds 81 cycles.  BUT SWI(3)
* CANNOT be vectored to L0EBF cause the user SWI service routine has been
* changed
        lda     <D.TINIT        get map type flag
        bita    #$01            check it without changing it

* Change to LBEQ R.SysSvc to avoid JMP [,X]
* and add R.SysSvc STA >DAT.Task ???
        beq     MapT0           in map 0: restore hardware and do system service
        tst     <D.SSTskN       get system state 0,1
        bne     MapGrf          if in grfdrv, go to map 0 and do system service

* the preceding few lines are necessary, as all SWI's still pass thru
* here before being vectored to the system service routine... which
* doesn't copy the stack from user state.
        sta     >DAT.Task       go to map type X again to get user's stack
* a byte less, a cycle more than ldy #$FEED-R$Size, or ldy #$F000+SWIStack
        leay    <SWIStack,pc    where to put the register stack: to $FEDF
        tfr     s,u             get a copy of where the stack is
        IFNE    H6309
        ldw     #R$Size         get the size of the stack
        tfm     u+,y+           move the stack to the top of memory
        ELSE
        pshs    b
        ldb     #R$Size
Looper  lda     ,u+
        sta     ,y+
        decb
        bne     Looper
        puls    b
        ENDC
        bra     L0EB8           and go from map type 1 to map type 0

* Execute SWI vector (called from $FEFA)
SWIVCT  ldx     #D.SWI          get DP offset of vector
        bra     SWICall         go execute it

* Execute NMI vector (called from $FEFD)
NMIVCT  ldx     #D.NMI          get DP offset of vector
        bra     L0EB8           go execute it

* The end of the kernel module is here
        emod
eom     equ     *

* What follows after the kernel module is the register stack, starting
* at $FEDD (6309) or $FEDF (6809).  This register stack area is used by
* the kernel to save the caller's registers in the $FEXX area of memory
* because it doesn't* get "switched out" no matter the contents of the
* MMU registers.
SWIStack
        fcc     /REGISTER STACK/        same # bytes as R$Size for 6809
        IFNE    H6309
        fcc     /63/    if 6309, add two more spaces
        ENDC

        fcb     $55     D.ErrRst

* This list of addresses ends up at $FEEE after the kernel track is loaded
* into memory.  All interrupts come through the 6809 vectors at $FFF0-$FFFE
* and get directed to here.  From here, the BRA takes CPU control to the
* various handlers in the kernel.
        bra     SWI3VCT SWI3 vector comes here
        nop
        bra     SWI2VCT SWI2 vector comes here
        nop
        bra     FIRQVCT FIRQ vector comes here
        nop
        bra     IRQVCT  IRQ vector comes here
        nop
        bra     SWIVCT  SWI vector comes here
        nop
        bra     NMIVCT  NMI vector comes here
        nop
⚠️ **GitHub.com Fallback** ⚠️