OMU - rosco-pc/propeller-wiki GitHub Wiki
One man Unix on the pPropQL and pPropQL020
Note: The original source of OMU can be found here. Thanks Steve ! Note: This is a work in progress, so the code may change without notice.
OMU was developed (it seems) on some kind of Codata board using a version of Unix with the tools and libc available at the time. Porting means using another compiler, another hardware, a new libc, executable format and filesystem.
A discussion about this project can be found at Parallax' boards here
Tasks
Getting a usable compiler
-
Grab binutils (2.17 or 2.18) from gnu.org
-
Grab gcc-4.1.2+ from gcc.gnu.org. The core package is enough. Newer versions need and will compile some libraries that are certainly not needed.
To compile you will need the normal: make (gnu make), an installed compiler (gnu), autotools, flex, yacc or bison and so on.
Let's say that all happens in a prg directory:
$ cd prg
$ bzip2 -d < ~/download/binutils-2.17.tar.bz2 | tar x
$ bzip2 -d < ~/download/gcc-4.1.2.tar.bz2 | tar x
$ mkdir gcc-m68k
$ cd gcc-m68k
$ mkdir binutils
$ cd binutils
$ ../../bintutil-2.17/configure –target=m68k-unknown-elf
$ make [-j 3]
$ sudo make install
ready bintuils, if no errors. Fix the errors and redo. sometimes removing everything and relaunching configure helps if you installed some missing program/library.
$ cd ..
$ mkdir gcc
$ cd gcc
$ ../../gcc-4.1.2/configure –target=m68k-unknown-elf –disable-libssp
$ make [-j 3]
$ sudo make install
- Getting the source of OMU to at least compile with gcc:
Grab it here
Well it compiles... and gives loads of warnings :-( Do not worry we will take care of them :-). Now, I ported all assembler code to use gcc syntax. Added some missing headers from linux (!) (they are also GPL v2) but to test I'd rather use a simulator, a love simulators :-). The mixed headers need to be sorted. Minimal header would be a better option: Linux' are just messy. Too many of them. Maybe an older 1.x version has better headers.
I'm now at the point where I have some 68k simulator that needs a usable front-end. I started to write a simulator in python from scratch, but I rather use something already done for speed reasons, of development. Porting the code means now: getting the kernel to run to the point where it wants to execute init. For that some routines (libc related) have to be written. As well as a usable filesystem and a binary executable format has to be defined and incorporated. I think the best would be to use the elf format. It supports everything we need and want albeit it may contain too many features, but we can cross compile to it from either GNU/linux, *BSD, Solaris or Mac OSX.
Filesystem
pPropQL has a bootloader that has to load the kernel image from SD card (serial port could also be used). The filesystem recognition/read is built-in in this minimal bootloader. For QL a FAT fs could be used but for OMU a Un*x fs would be more appropriate. There are several options, all of them more or less complicated:
- minix
- ext2
- ufs (there are several of these, which one ?)
A possibility would be to have a filesystem contained in a file that resides in a FAT formatted SD card. A couple of programs could be written to manipulate this image. But that does not help with selecting a filesystem type. To keep it simple I'd choose minix. I think it represents the simplest of the bunch, especially considering that ufs is a name for several similar but different implementations. Minix sources for the filesystem are some 200 kbytes. A read-only mounter does not really need much more than the reading the superblock, the inode table and the root directory.
After some thoughts and digging of the sources, I decided that a minix either filesystem in an image or in a partition will be the way to go. A V1 filesystem can be as large as 64 MB, more than enough to host the entire system.
The geometry of the image is as follows:
boot block | 1 block |
---|---|
superblock | 1 block |
bitmap | 1 to 8 blocks, 1 bit per block, 65536 bits max, 8 kbytes max |
zone map | same as bitmap ? |
inode table | each inode occupies 32 bytes |
free blocks | the rest of the blocks |
The function of the zone in a V1 fs is not yet clear to me. I'll have to look closer into the mkfs utility. The filesystem is divided in blocks with the first block reserved for a boot block, I believe. Each block is 1 kbytes, V2 can have larger blocks and 32 bit pointers to blocks.
The filesystem described in the headers of OMU closely resembles this, so minimum changes should be needed.
Drivers
The most important part for propeller users is probably the interface between the propeller and a third processor. the hardware interface, level shifting and so on has been described in the pPropQL and pPropQL020 pages. Basically the propellers act as memory mapped devices, with an address space, a data bus and read/write strobe signals that are asserted when the devices needs to be read or written.
On the propeller side and depending on the number of peripherals one or more COGs can listen to this strobe signals and act accordingly.
Let's see some examples.
ROM emulator
The propeller can act as a ROM. For this it only needs to serve a byte of data for every address. Using the multiplexed BUS employed in the mentioned boards, a simple routine that listens to the state of a read strobe can be used:
DAT
org $0
ROMEMU mov OUTA, #0
mov DIRA, c0_c_DIRA
c0_romemu waitpne c0_c_PROMCS, c0_c_PROMCS ' waits for CS to be asserted
mov c0_v_addr, INA '@ 2 gets low part of address
shr c0_v_addr, #16 '@ 6
and c0_v_addr, #255 '@10
add c0_v_addr, PAR '@14 adds ROM offset
mov c0_v_addrh, INA '@18 now it is safe to get high addr
shr c0_v_addrh, #8 '@22
and c0_v_addrh, c0_c_MSKADDRH
add c0_v_addr, c0_v_addrh '@30
rdbyte OUTA, c0_v_addr '@34
'@56 (max)
or DIRA, c0_c_DATAOUT '@60
waitpeq c0_c_PROMCS, c0_c_PROMCS'@64
andn DIRA, c0_c_DATAOUT
jmp #c0_romemu
c0_c_DIRA long 0
c0_c_MSKADDRH long $00007f00
c0_c_PROMCS long 1<<25 ' NROMCS, ROM read strobe, active low
c0_c_DATAOUT long $ff
c0_v_addr long 0
c0_v_addrh long 0
c0_v_data long 0
The address is sent low byte first high byte next 2 M68K cycles apart. The program waits for the strobe signal to go high before disabling the output. OUTA can be used as destination register because no other output PIN is been used by this COG.
Video Memory
Using a propeller as a video generator is one of the most easy to implement functions. The propeller has special circuitry designed to generate video signals freeing the COGs for this time consuming task.
The following example shows how the propeller can act as memory mapped buffer (only write is shown). The buffer is limited to 4 kbytes because only text is implemented.
DAT
org $0
VIDEOCOG mov DIRA, #0
c2_videoemu waitpne c2_c_VIDEOW, c2_c_VIDEOW ' waits for NVIDEOW to be asserted
mov c2_v_addr, INA '@ 2 gets low part of address
shr c2_v_addr, #16 '@ 6
and c2_v_addr, #255 '@10
add c2_v_addr, PAR '@14 adds video buffer offset
mov c2_v_addrh, INA '@18 now it is safe to get high addr
mov c2_v_data, c2_v_addrh
shr c2_v_addrh, #8 '@26
and c2_v_addrh, c2_c_MSKADDR
add c2_v_addr, c2_v_addrh '@34
wrbyte c2_v_data, c2_v_addr '@38
waitpeq c2_c_VIDEOW, c2_c_VIDEOW
jmp #c2_videoemu
c2_c_VIDEOW long 1<<26 ' NVIDEOW strobe input, active low
c2_v_addr long 0
c2_v_addrh long 0
c2_v_data long 0
c2_c_MSKADDR long $00000f00 ' only 4 kbytes !!!!
A extra COG generates the corresponding video signal and produces the image according to the data in this buffer (pointed by PAR). Graphic memory can also be used in the same manner, but probably a bigger buffer could be needed.
Other IO
Connection of keyboard, serial interface, RTC, SD/MMC card reader, Timers and so on can be accomplished in a similar manner to the one described above. One COG listens to reads and a second one to writes. As 2 or 3 HUB accesses need to be done a cycle time of about 1 microsecond is needed. This can be wasteful in some cases but adds simplicity in circuit design. More performance can be obtained using an AVR32, ARM or ColdFire processor instead. One possible method is for every peripheral to listen to the read or write strobes directly. That can cause some problems and missed reads/writes. A better method is to dedicate 2 COGs to interface and the rest are free to interface with the mentioned devices.
A simple listener that interfaces with other COGs for the different tasks is shown below. Be aware of the fact that the normal FullDuplexSerial object has been modified to use two pointers instead of three (buffer and pointers inside the buffer). The keyboard object can be used as it is and the SD/MMC code is the newest mb_spi by Rokicki/Lonesock. Other peripherals have not yet been implemented. Buffer pointers are written during initialization. This method works because the peripherals have a shared memory interface.
DAT
org $0
' This COG accepts reads from the processor
IORCOG mov c0_p_rxbuffend, c0_p_rxbuff
add c0_p_rxbuff, #RXBUFFLEN
c0_iorloop waitpne c0_c_IOR, c0_c_IOR ' waits for IOR to be asserted
mov c0_v_addr, INA ' @ 2 cycles after IOR is asserted
shr c0_v_addr, #16-4 ' @ 6
and c0_v_addr, #$1f0 wz '@ 10
if_nz jmp c0_v_addr
mov OUTA, c0_v_ready ' command 0 reads status of engine A5 == ready
or DIRA, c0_c_DATAOUT ' activates outputs
waitpeq c0_c_IOR, c0_c_IOR
andn DIRA, c0_c_DATAOUT
jmp #c0_iorloop
c0_v_ready long 0
long 0[((($+15)>>4)<<4)-$]
{ This must be at address 01
Receive status read
}
' Port 1
c0_serrx_01 rdlong c0_p_ptr2, c0_p_rxtail
rdbyte OUTA, c0_p_ptr2 ' reads last byte received
or DIRA, c0_c_DATAOUT ' activates outputs
waitpeq c0_c_IOR, c0_c_IOR
andn DIRA, c0_c_DATAOUT
cmp c0_p_ptr2, c0_p_rxbuffend wz
if_z mov c0_p_ptr2, c0_p_rxbuff ' moves tail to beginning of buffer
if_nz add c0_p_ptr2, #1
wrlong c0_p_ptr2, c0_p_rxtail ' saves new tail of buffer
jmp #c0_iorloop
c0_p_rxhead long 0
c0_p_rxtail long 0
c0_p_rxbuff long 0
c0_p_rxbuffend long 0
long 0[((($+15)>>4)<<4)-$]
{ returns $0f if there are no bytes to read
}
c0_serstatus_02 rdlong c0_p_ptr2, c0_p_rxtail
rdlong c0_p_ptr1, c0_p_rxhead
cmp c0_p_ptr1, c0_p_ptr2 wz
muxz OUTA, #$0f
or DIRA, c0_c_DATAOUT
waitpeq c0_c_IOR, c0_c_IOR
andn DIRA, c0_c_DATAOUT
jmp #c0_iorloop
c0_p_kbdhead long 0
c0_p_kbdtail long 0
c0_p_kbdbuff long 0
long 0[((($+15)>>4)<<4)-$]
c0_kbdrx_03 rdlong c0_p_ptr2, c0_p_kbdtail
rdlong c0_p_ptr1, c0_p_kbdhead
add c0_p_ptr1, c0_p_kbdbuff
rdbyte OUTA, c0_p_ptr1
or DIRA, c0_c_DATAOUT ' activates outputs
waitpeq c0_c_IOR, c0_c_IOR
andn DIRA, c0_c_DATAOUT
sub c0_p_ptr1, c0_p_kbdbuff
add c0_p_ptr1, #1
and c0_p_ptr1, #15
wrlong c0_p_ptr1, c0_p_kbdtail
jmp #c0_iorloop
' Returns 00 if there are chars to read
' Returns 0F if there are no chars to read
long 0[((($+15)>>4)<<4)-$]
c0_kbdstatus_04 rdlong c0_p_ptr2, c0_p_kbdtail
rdlong c0_p_ptr1, c0_p_kbdhead
cmp c0_p_ptr2, c0_p_ptr1 wz ' are they ==
muxz OUTA, #$0f
or DIRA, c0_c_DATAOUT ' activates outputs
waitpeq c0_c_IOR, c0_c_IOR
andn DIRA, c0_c_DATAOUT
jmp #c0_iorloop
c0_p_sdcmd long 0
c0_p_sdblk long 0
c0_p_sdptr long 0
' data pointer
c0_v_dptr long 0
c0_v_cnt long 0
c0_c_512 long 512
long 0[((($+15)>>4)<<4)-$]
' dummy
waitpeq c0_c_IOR, c0_c_IOR
andn DIRA, c0_c_DATAOUT
jmp #c0_iorloop
long 0[((($+15)>>4)<<4)-$]
' reads the error
' resets the data pointer to the beginning of the buffer
'
c0_readcmd_06 rdlong c0_v_data, c0_p_sdcmd ' reads error
mov c0_v_dptr, c0_p_sdptr ' resets pointer
mov c0_v_cnt, #0 ' counter
shr c0_v_data, #24
mov OUTA, c0_v_data ' presents MSByte
or DIRA, c0_c_DATAOUT ' activates outputs
waitpeq c0_c_IOR, c0_c_IOR
andn DIRA, c0_c_DATAOUT
jmp #c0_iorloop
long 0[((($+15)>>4)<<4)-$]
' Reads the sector data
' after 512 reads the pointer is reset
c0_readdta_07 rdbyte OUTA, c0_v_dptr ' reads error
add c0_v_cnt, #1
cmp c0_v_cnt, c0_c_512 wz
if_z mov c0_v_cnt, #0
if_z mov c0_v_dptr, c0_p_sdptr ' resets pointer
or DIRA, c0_c_DATAOUT ' activates outputs
waitpeq c0_c_IOR, c0_c_IOR
andn DIRA, c0_c_DATAOUT
jmp #c0_iorloop
long 0[((($+15)>>4)<<4)-$]
' Allows to read the last written block number
c0_readblk_08 rdlong c0_v_data, c0_p_sdblk ' reads block number
shr c0_v_data, #24
mov OUTA, c0_v_data ' presents MSByte
or DIRA, c0_c_DATAOUT ' activates outputs
waitpeq c0_c_IOR, c0_c_IOR
andn DIRA, c0_c_DATAOUT
jmp #c0_iorloop
long 0[((($+15)>>4)<<4)-$]
c0_readblk_09 rdlong c0_v_data, c0_p_sdblk ' reads block number
shr c0_v_data, #16
mov OUTA, c0_v_data ' presents MSByte
or DIRA, c0_c_DATAOUT ' activates outputs
waitpeq c0_c_IOR, c0_c_IOR
andn DIRA, c0_c_DATAOUT
jmp #c0_iorloop
long 0[((($+15)>>4)<<4)-$]
c0_readblk_0A rdlong c0_v_data, c0_p_sdblk ' reads block number
shr c0_v_data, #8
mov OUTA, c0_v_data ' presents MSByte
or DIRA, c0_c_DATAOUT ' activates outputs
waitpeq c0_c_IOR, c0_c_IOR
andn DIRA, c0_c_DATAOUT
jmp #c0_iorloop
long 0[((($+15)>>4)<<4)-$] ' == .align 16
c0_readblk_0B rdlong c0_v_data, c0_p_sdblk ' reads block number
mov OUTA, c0_v_data ' presents MSByte
or DIRA, c0_c_DATAOUT ' activates outputs
waitpeq c0_c_IOR, c0_c_IOR
andn DIRA, c0_c_DATAOUT
jmp #c0_iorloop
c0_c_DIRA long 0
c0_c_MSKADDRH long $00ff0000
c0_c_IOR long 1<<25
c0_c_DATAOUT long $ff
c0_v_addr long 0
c0_v_addrh long 0
c0_v_data long 0
c0_p_ptr1 long 0
c0_p_ptr2 long 0
jmp #c0_iorloop
fit $1f0
More to come!