OMU - rosco-pc/propeller-wiki GitHub Wiki

One man Unix on the pPropQL and pPropQL020

Note: The original source of OMU can be found here. Thanks Steve ! Note: This is a work in progress, so the code may change without notice.

OMU was developed (it seems) on some kind of Codata board using a version of Unix with the tools and libc available at the time. Porting means using another compiler, another hardware, a new libc, executable format and filesystem.

A discussion about this project can be found at Parallax' boards here

Tasks

Getting a usable compiler

  • Grab binutils (2.17 or 2.18) from gnu.org

  • Grab gcc-4.1.2+ from gcc.gnu.org. The core package is enough. Newer versions need and will compile some libraries that are certainly not needed.

To compile you will need the normal: make (gnu make), an installed compiler (gnu), autotools, flex, yacc or bison and so on.

Let's say that all happens in a prg directory:

$ cd prg

$ bzip2 -d < ~/download/binutils-2.17.tar.bz2 | tar x

$ bzip2 -d < ~/download/gcc-4.1.2.tar.bz2 | tar x

$ mkdir gcc-m68k

$ cd gcc-m68k

$ mkdir binutils

$ cd binutils

$ ../../bintutil-2.17/configure –target=m68k-unknown-elf

$ make [-j 3]

$ sudo make install

ready bintuils, if no errors. Fix the errors and redo. sometimes removing everything and relaunching configure helps if you installed some missing program/library.

$ cd ..

$ mkdir gcc

$ cd gcc

$ ../../gcc-4.1.2/configure –target=m68k-unknown-elf –disable-libssp

$ make [-j 3]

$ sudo make install
  • Getting the source of OMU to at least compile with gcc:

Grab it here

Well it compiles... and gives loads of warnings :-( Do not worry we will take care of them :-). Now, I ported all assembler code to use gcc syntax. Added some missing headers from linux (!) (they are also GPL v2) but to test I'd rather use a simulator, a love simulators :-). The mixed headers need to be sorted. Minimal header would be a better option: Linux' are just messy. Too many of them. Maybe an older 1.x version has better headers.

I'm now at the point where I have some 68k simulator that needs a usable front-end. I started to write a simulator in python from scratch, but I rather use something already done for speed reasons, of development. Porting the code means now: getting the kernel to run to the point where it wants to execute init. For that some routines (libc related) have to be written. As well as a usable filesystem and a binary executable format has to be defined and incorporated. I think the best would be to use the elf format. It supports everything we need and want albeit it may contain too many features, but we can cross compile to it from either GNU/linux, *BSD, Solaris or Mac OSX.

Filesystem

pPropQL has a bootloader that has to load the kernel image from SD card (serial port could also be used). The filesystem recognition/read is built-in in this minimal bootloader. For QL a FAT fs could be used but for OMU a Un*x fs would be more appropriate. There are several options, all of them more or less complicated:

  • minix
  • ext2
  • ufs (there are several of these, which one ?)

A possibility would be to have a filesystem contained in a file that resides in a FAT formatted SD card. A couple of programs could be written to manipulate this image. But that does not help with selecting a filesystem type. To keep it simple I'd choose minix. I think it represents the simplest of the bunch, especially considering that ufs is a name for several similar but different implementations. Minix sources for the filesystem are some 200 kbytes. A read-only mounter does not really need much more than the reading the superblock, the inode table and the root directory.

After some thoughts and digging of the sources, I decided that a minix either filesystem in an image or in a partition will be the way to go. A V1 filesystem can be as large as 64 MB, more than enough to host the entire system.

The geometry of the image is as follows:

boot block 1 block
superblock 1 block
bitmap 1 to 8 blocks, 1 bit per block, 65536 bits max, 8 kbytes max
zone map same as bitmap ?
inode table each inode occupies 32 bytes
free blocks the rest of the blocks

The function of the zone in a V1 fs is not yet clear to me. I'll have to look closer into the mkfs utility. The filesystem is divided in blocks with the first block reserved for a boot block, I believe. Each block is 1 kbytes, V2 can have larger blocks and 32 bit pointers to blocks.

The filesystem described in the headers of OMU closely resembles this, so minimum changes should be needed.

Drivers

The most important part for propeller users is probably the interface between the propeller and a third processor. the hardware interface, level shifting and so on has been described in the pPropQL and pPropQL020 pages. Basically the propellers act as memory mapped devices, with an address space, a data bus and read/write strobe signals that are asserted when the devices needs to be read or written.

On the propeller side and depending on the number of peripherals one or more COGs can listen to this strobe signals and act accordingly.

Let's see some examples.

ROM emulator

The propeller can act as a ROM. For this it only needs to serve a byte of data for every address. Using the multiplexed BUS employed in the mentioned boards, a simple routine that listens to the state of a read strobe can be used:


DAT
                        org     $0

ROMEMU                  mov     OUTA, #0
                        mov     DIRA, c0_c_DIRA

c0_romemu               waitpne c0_c_PROMCS, c0_c_PROMCS ' waits for CS to be asserted
                        mov     c0_v_addr, INA          '@ 2   gets low part of address
                        shr     c0_v_addr, #16          '@ 6
                        and     c0_v_addr, #255         '@10
                        add     c0_v_addr, PAR          '@14 adds ROM offset
                        mov     c0_v_addrh, INA         '@18 now it is safe to get high addr
                        shr     c0_v_addrh, #8          '@22
                        and     c0_v_addrh, c0_c_MSKADDRH
                        add     c0_v_addr, c0_v_addrh   '@30
                        rdbyte  OUTA, c0_v_addr    '@34
                                                        '@56 (max)
                        or      DIRA, c0_c_DATAOUT      '@60
                        waitpeq c0_c_PROMCS, c0_c_PROMCS'@64
                        andn    DIRA, c0_c_DATAOUT
                        jmp     #c0_romemu


c0_c_DIRA               long    0
c0_c_MSKADDRH           long    $00007f00
c0_c_PROMCS             long    1<<25        ' NROMCS, ROM read strobe, active low
c0_c_DATAOUT            long    $ff
c0_v_addr               long    0
c0_v_addrh              long    0
c0_v_data               long    0

The address is sent low byte first high byte next 2 M68K cycles apart. The program waits for the strobe signal to go high before disabling the output. OUTA can be used as destination register because no other output PIN is been used by this COG.

Video Memory

Using a propeller as a video generator is one of the most easy to implement functions. The propeller has special circuitry designed to generate video signals freeing the COGs for this time consuming task.

The following example shows how the propeller can act as memory mapped buffer (only write is shown). The buffer is limited to 4 kbytes because only text is implemented.


DAT
                        org     $0

VIDEOCOG                mov     DIRA, #0

c2_videoemu             waitpne c2_c_VIDEOW, c2_c_VIDEOW ' waits for NVIDEOW to be asserted

                        mov     c2_v_addr, INA          '@ 2   gets low part of address
                        shr     c2_v_addr, #16          '@ 6
                        and     c2_v_addr, #255         '@10
                        add     c2_v_addr, PAR          '@14   adds video buffer offset

                        mov     c2_v_addrh, INA         '@18   now it is safe to get high addr
                        mov     c2_v_data, c2_v_addrh
                        shr     c2_v_addrh, #8          '@26
                        and     c2_v_addrh, c2_c_MSKADDR
                        add     c2_v_addr, c2_v_addrh   '@34
                        wrbyte  c2_v_data, c2_v_addr    '@38

                        waitpeq c2_c_VIDEOW, c2_c_VIDEOW
                        jmp     #c2_videoemu


c2_c_VIDEOW             long    1<<26       ' NVIDEOW strobe input, active low
c2_v_addr               long    0
c2_v_addrh              long    0
c2_v_data               long    0
c2_c_MSKADDR            long    $00000f00   ' only 4 kbytes !!!!

A extra COG generates the corresponding video signal and produces the image according to the data in this buffer (pointed by PAR). Graphic memory can also be used in the same manner, but probably a bigger buffer could be needed.

Other IO

Connection of keyboard, serial interface, RTC, SD/MMC card reader, Timers and so on can be accomplished in a similar manner to the one described above. One COG listens to reads and a second one to writes. As 2 or 3 HUB accesses need to be done a cycle time of about 1 microsecond is needed. This can be wasteful in some cases but adds simplicity in circuit design. More performance can be obtained using an AVR32, ARM or ColdFire processor instead. One possible method is for every peripheral to listen to the read or write strobes directly. That can cause some problems and missed reads/writes. A better method is to dedicate 2 COGs to interface and the rest are free to interface with the mentioned devices.

A simple listener that interfaces with other COGs for the different tasks is shown below. Be aware of the fact that the normal FullDuplexSerial object has been modified to use two pointers instead of three (buffer and pointers inside the buffer). The keyboard object can be used as it is and the SD/MMC code is the newest mb_spi by Rokicki/Lonesock. Other peripherals have not yet been implemented. Buffer pointers are written during initialization. This method works because the peripherals have a shared memory interface.


DAT
                        org     $0

' This COG accepts reads from the processor

IORCOG                  mov     c0_p_rxbuffend, c0_p_rxbuff
                        add     c0_p_rxbuff, #RXBUFFLEN

c0_iorloop              waitpne c0_c_IOR, c0_c_IOR      ' waits for IOR to be asserted
                        mov     c0_v_addr, INA          ' @ 2 cycles after IOR is asserted
                        shr     c0_v_addr, #16-4        ' @ 6
                        and     c0_v_addr, #$1f0  wz    '@ 10
            if_nz       jmp     c0_v_addr

                        mov     OUTA, c0_v_ready        ' command 0 reads status of engine A5 == ready
                        or      DIRA, c0_c_DATAOUT      ' activates outputs
                        waitpeq c0_c_IOR, c0_c_IOR
                        andn    DIRA, c0_c_DATAOUT
                        jmp     #c0_iorloop

c0_v_ready              long    0

                        long    0[((($+15)>>4)<<4)-$]

{ This must be at address 01
  Receive status read
}
                        ' Port 1
c0_serrx_01             rdlong  c0_p_ptr2, c0_p_rxtail
                        rdbyte  OUTA, c0_p_ptr2             ' reads last byte received
                        or      DIRA, c0_c_DATAOUT          ' activates outputs
                        waitpeq c0_c_IOR, c0_c_IOR
                        andn    DIRA, c0_c_DATAOUT
                        cmp     c0_p_ptr2, c0_p_rxbuffend   wz
            if_z        mov     c0_p_ptr2, c0_p_rxbuff      ' moves tail to beginning of buffer
            if_nz       add     c0_p_ptr2, #1
                        wrlong  c0_p_ptr2, c0_p_rxtail      ' saves new tail of buffer
                        jmp     #c0_iorloop

c0_p_rxhead             long    0
c0_p_rxtail             long    0
c0_p_rxbuff             long    0
c0_p_rxbuffend          long    0

                        long    0[((($+15)>>4)<<4)-$]

{ returns $0f if there are no bytes to read
}

c0_serstatus_02         rdlong  c0_p_ptr2, c0_p_rxtail
                        rdlong  c0_p_ptr1, c0_p_rxhead
                        cmp     c0_p_ptr1, c0_p_ptr2   wz
                        muxz    OUTA, #$0f
                        or      DIRA, c0_c_DATAOUT
                        waitpeq c0_c_IOR, c0_c_IOR
                        andn    DIRA, c0_c_DATAOUT
                        jmp     #c0_iorloop

c0_p_kbdhead            long    0
c0_p_kbdtail            long    0
c0_p_kbdbuff            long    0

                        long    0[((($+15)>>4)<<4)-$]

c0_kbdrx_03             rdlong  c0_p_ptr2, c0_p_kbdtail
                        rdlong  c0_p_ptr1, c0_p_kbdhead

                        add     c0_p_ptr1, c0_p_kbdbuff
                        rdbyte  OUTA, c0_p_ptr1
                        or      DIRA, c0_c_DATAOUT          ' activates outputs
                        waitpeq c0_c_IOR, c0_c_IOR
                        andn    DIRA, c0_c_DATAOUT
                        sub     c0_p_ptr1, c0_p_kbdbuff
                        add     c0_p_ptr1, #1
                        and     c0_p_ptr1, #15
                        wrlong  c0_p_ptr1, c0_p_kbdtail
                        jmp     #c0_iorloop

' Returns 00 if there are chars to read
' Returns 0F if there are no chars to read

                        long    0[((($+15)>>4)<<4)-$]

c0_kbdstatus_04         rdlong  c0_p_ptr2, c0_p_kbdtail
                        rdlong  c0_p_ptr1, c0_p_kbdhead
                        cmp     c0_p_ptr2, c0_p_ptr1   wz   ' are they ==
                        muxz    OUTA, #$0f
                        or      DIRA, c0_c_DATAOUT          ' activates outputs
                        waitpeq c0_c_IOR, c0_c_IOR
                        andn    DIRA, c0_c_DATAOUT
                        jmp     #c0_iorloop

c0_p_sdcmd              long    0
c0_p_sdblk              long    0
c0_p_sdptr              long    0
' data pointer
c0_v_dptr               long    0
c0_v_cnt                long    0
c0_c_512                long    512

                        long    0[((($+15)>>4)<<4)-$]

                        ' dummy
                        waitpeq c0_c_IOR, c0_c_IOR
                        andn    DIRA, c0_c_DATAOUT
                        jmp     #c0_iorloop

                        long    0[((($+15)>>4)<<4)-$]
' reads the error
' resets the data pointer to the beginning of the buffer
'
c0_readcmd_06           rdlong  c0_v_data, c0_p_sdcmd       ' reads error
                        mov     c0_v_dptr, c0_p_sdptr       ' resets pointer
                        mov     c0_v_cnt, #0                ' counter
                        shr     c0_v_data, #24
                        mov     OUTA, c0_v_data             ' presents MSByte
                        or      DIRA, c0_c_DATAOUT          ' activates outputs
                        waitpeq c0_c_IOR, c0_c_IOR
                        andn    DIRA, c0_c_DATAOUT
                        jmp     #c0_iorloop

                        long    0[((($+15)>>4)<<4)-$]
' Reads the sector data
' after 512 reads the pointer is reset

c0_readdta_07           rdbyte  OUTA, c0_v_dptr             ' reads error
                        add     c0_v_cnt, #1
                        cmp     c0_v_cnt, c0_c_512   wz
            if_z        mov     c0_v_cnt, #0
            if_z        mov     c0_v_dptr, c0_p_sdptr       ' resets pointer
                        or      DIRA, c0_c_DATAOUT          ' activates outputs
                        waitpeq c0_c_IOR, c0_c_IOR
                        andn    DIRA, c0_c_DATAOUT
                        jmp     #c0_iorloop

                        long    0[((($+15)>>4)<<4)-$]

' Allows to read the last written block number
c0_readblk_08           rdlong  c0_v_data, c0_p_sdblk       ' reads block number
                        shr     c0_v_data, #24
                        mov     OUTA, c0_v_data             ' presents MSByte
                        or      DIRA, c0_c_DATAOUT          ' activates outputs
                        waitpeq c0_c_IOR, c0_c_IOR
                        andn    DIRA, c0_c_DATAOUT
                        jmp     #c0_iorloop

                        long    0[((($+15)>>4)<<4)-$]

c0_readblk_09           rdlong  c0_v_data, c0_p_sdblk       ' reads block number
                        shr     c0_v_data, #16
                        mov     OUTA, c0_v_data             ' presents MSByte
                        or      DIRA, c0_c_DATAOUT          ' activates outputs
                        waitpeq c0_c_IOR, c0_c_IOR
                        andn    DIRA, c0_c_DATAOUT
                        jmp     #c0_iorloop

                        long    0[((($+15)>>4)<<4)-$]

c0_readblk_0A           rdlong  c0_v_data, c0_p_sdblk       ' reads block number
                        shr     c0_v_data, #8
                        mov     OUTA, c0_v_data             ' presents MSByte
                        or      DIRA, c0_c_DATAOUT          ' activates outputs
                        waitpeq c0_c_IOR, c0_c_IOR
                        andn    DIRA, c0_c_DATAOUT
                        jmp     #c0_iorloop

                        long    0[((($+15)>>4)<<4)-$]       ' == .align 16

c0_readblk_0B           rdlong  c0_v_data, c0_p_sdblk       ' reads block number
                        mov     OUTA, c0_v_data             ' presents MSByte
                        or      DIRA, c0_c_DATAOUT          ' activates outputs
                        waitpeq c0_c_IOR, c0_c_IOR
                        andn    DIRA, c0_c_DATAOUT
                        jmp     #c0_iorloop




c0_c_DIRA               long    0
c0_c_MSKADDRH           long    $00ff0000
c0_c_IOR                long    1<<25
c0_c_DATAOUT            long    $ff

c0_v_addr               long    0
c0_v_addrh              long    0
c0_v_data               long    0
c0_p_ptr1               long    0
c0_p_ptr2               long    0

                        jmp     #c0_iorloop
                        fit     $1f0

More to come!