2 Specifications - jpursey/oz-3 GitHub Wiki

Design philosophy

The OZ-3 is a virtual CPU, building on the general architecture ideas and quirks of the OZ-1 and OZ-2 (see History).

Just quirky enough

The goal of the OZ-3 is to be the heart of a "computer" exposed more or less directly in games and toy applications. It has the following top-level goals:

Old school: The OZ-3 is intended to "feel" like an old 16-bit style CPU from the 1980s and 1990s. However, it deliberately is not an actual copy, so programmers (gamers) have a sense of discovery and can't just take existing code and run it.
Unique: It has unique or uncommon features that diverge from most/any other CPU. In the OZ-3, the prime example is the separate and configurable memory banks for code, stack, and two general purpose memory banks (data and extra).
Simple: While the CPU inspiration is certainly on the Z80/x86 side, there is more uniformity in the register layout and opcodes in the default instruction set. This allows for less workarounds when programming (for instance, due to limited addressing modes supported by an opcode). It also makes programs shorter as a result, which is important with the relatively small memory footprint.
Less boilerplate: Assembly programming is always going to be verbose, but the OZ-3 has a generous amount of opcode space, and so in true CISC style, the default instruction set supports several complex opcodes for commonly needed tasks (pushing/popping register sets, counting, looping, etc). This also adds to the discovery and options when it comes to optimization.
Extensible: The CPU is intended to be embeddable within a game, and so the architecture and runtime is designed to support external devices, coprocessors (including expanding opcodes), etc. without needing to change the core. You can also provide your own instruction set to either limit or expand upon the default instruction set. In fact, each core can have its own unique instruction set for asymmetric use cases.
Fun: The processor is ridiculous and would never exist as a real processor. The bus requirements combined with exclusive locking semantics on components would be highly impractical in real silicon. The cycle counts are self consistent and kinda real-ish, but not really. Overall, the design is likely actively hostile to an efficient hardware implementation. But that's not the point. It has the veneer of being real, but the decisions are based more about what would be fun and easy to reason about -- especially in the context of a game. There's even a few "hacks" that have documented/guaranteed behavior for those interested in building their own instruction sets for the OZ-3.

Backward compatibility

While its intended use cases is the same as the OZ-1 and OZ-2, it is not backward compatible at all. The obvious reason for this is that there are no existing OZ-1 and OZ-2 programs -- so why bother. However, this is a nostalgia passion project, and backwards compatibility can be "fun" (after all I did make the OZ-2 backward compatible with the OZ-1). So the more nitty gritty reasons for breaking backward compatibility are:

Simplify the organization of the opcodes (grouping similar opcodes together).
Be flexible in terms of what the instruction set is. While the OZ-3 provides a general-purpose 80's-era instruction set, it is implemented in terms of OZ-3 microcode. This both allows for new bespoke instruction sets for games, as well as providing "realistic" cycle counts based on the microcode implementation. In fact, the OZ-1 and OZ-2 instruction sets could be supported with a custom instruction set (if combined with the math coprocessor).
Replacing the "error" flag with an "overflow" flag in the Z80 style which better supports signed integer use cases. The overflow flag can still double as an error flag when signed overflow is not meaningful (for instance, a divide by zero).
Changing floating point operations to be more of a coprocessor-style approach (separate external registers and parallel execution). The underlying OZ-3 CPU core does not directly support anything except 16-bit add, subtract, and bitwise operations. Of course, the default instruction set implements 32-bit operations and extended operations like integer multiply and divide, but it is implemented against the limitations of the base microcode. Supporting IEEE floating point would be... impractical, to say the least.
Remove the separate concept of an "address stack" and just use the single "stack" memory. The (now single) stack also follows the more typical pattern of pushing onto the stack resulting in decrementing the stack pointer.
Unify the memory model by adding 16 banks of memory (64Kiw each). The code, stack, data, and extra banks for an OZ-3 core can all be assigned to separate (or the same) bank of memory. This also plays nicely with multiple cores.
Overhaul addressing modes in the default instruction set, supporting index style addressing directly and removing the ability to do memory-to-memory operations. A register always holds a result, and a MOV to memory is only supported from a register. Explicit memory-to-memory block operations are supported via a DMA coprocessor. Note, this is only a limitation of the default instruction set. Custom instruction sets can do what they like.

Hardware specification

Overview

The OZ-3 is a modeled as 16-bit processor with up to 8 cores, up to 16 banks of memory, up to 256 double-word ports, and 32 interrupt lines. The following lists the various OZ-3 hardware components and how they work together.

Main memory: The OZ-3 supports up to 16 banks of memory each with 64 Kiw of address space. The memory is general purpose, and is used for code, data, and device memory mapping. Each memory bank has a dedicated 16-bit address bus and 16-bit data bus, which is shared across any cores, coprocessors, and external devices that support memory mapping. Each memory bank is logically divided into 16 pages (4 KiW each), and is statically configured with a 16-bit read mask and a 16-bit write mask indicating the read/write access permissions for each page. In addition each memory bank supports a contiguous range of pages that can be memory mapped to a single device or coprocessor (or really, anything). The read/write masks cannot be changed by OZ-3 code at runtime, it is part of the system configuration.
Cores: The OZ-3 CPU supports up to 8 cores that run in parallel. Each core has its own dedicated set of registers, interrupt vector, and ability to fetch, decode, and execute opcodes. However, all cores share main memory, ports, and coprocessors. Each core is associated with up to four memory banks simultaneously: one for code (CODE), one for the stack (STACK), and two for general purpose data (DATA and EXTRA).
Ports: The OZ-3 supports 256 logical I/O ports. Each I/O port caches two 16-bit words and a single status word (which may be zero or one). Reading and writing to ports is controlled by three mode flags which may be freely set, cleared, or combined:
- Test (T): This tests the status of the port and either performs the read/write or not as follows: read only if the status is 1, and write only if the status is 0.
- Status (S): This updates the status of port as follows: read sets the status to 0, and write sets the status to 1.
- Address (A): This updates the internal address of the port after the read/write to the other 16-bit word. For instance, if the A flag is set the first read will be the first word and the second read will be the second word. If a third read occurs, it will be the first word again.
Interrupts: The OZ-3 has 32 interrupts triggered via dedicated interrupt lines per core. Cores, coprocessors, and devices (or the host application) can trigger interrupts on the lines they are connected with. Multiple devices and coprocessors can share the same line, however as they share a line they will be unable to raise interrupts simultaneously. Each interrupt is independently handled by each OZ-3 core, with the address stored in an interrupt vector settable by individual cores. This means two cores may both handle the same interrupt when it is raised, or cores may be assigned to different interrupts (as determined by their software).

Registers

Registers define the working set memory and status for each OZ-3 core. They are unique to each core, and generally are only directly accessible by code running within the core (cores can access each other's registers in a limited fashion via multi-core instructions).

General purpose registers: The general registers in the OZ-3 are fully interchangeable across opcodes for a given bit-depth. Notably (compared to the Z80/8080 and friends) there is no "accumulator" or "index" registers.
- 16-bit registers: The OZ-3 has 8 general purpose 16-bit registers: R0 to R7.
- 32-bit registers: The OZ-3 combines the 16-bit registers into pairs, yielding 4 general purpose 32-bit registers: D0 to D3. These are aliased over the 16-bit registers in little endian fashion, with R0 being the low 16-bits of D0 and R1 being the high 16-bits of D0.
Special purpose address registers: The OZ-3 also has several dedicated special purpose registers. These refer to addresses in each memory data bank and may be offset with a scalar value when referencing memory. Unlike the general purpose registers, they also have meaning tied to dedicated opcodes, and are manipulated directly by the OZ-3 as part of general operation.
- Base address register: Each memory bank has a dedicated 16-bit register which is treated as the base (zero) address for each memory bank: BC, BS, BD. and BE are the base address for the CODE, STACK, DATA, and EXTRA memory banks respectively. All memory accesses are implicitly relative to these register values.
- Instruction Pointer: The IP register indicates where the next instruction is located that the OZ-3 core will execute in the CODE memory bank (relative to the BC register). The IP register wraps around if execution passes the top of memory.
- Stack Pointer: The SP register indicates where the top of the stack is in the STACK memory bank (relative to the BS register). Like the Z80, the top of the stack points to the address of the last pushed value (not the one after it). The stack pointer register starts at zero and wraps around.
- Frame Pointer: The FP register is a secondary pointer into the STACK memory bank (relative to the BS register). It is commonly used to specify the base address of the stack "frame" for a function being run (the previous function's SP value).
Memory Bank: The MB register is a 16-bit register which indicates the banks being used by the core. OZ-3 code can configure the memory banks with the RST instruction in the default instruction set. There are four bank assignments, each represented by 4 bits. Bank assignments can all refer to the same physical memory bank.
- CODE: This specifies the bank the IP register references, and where instructions and their immediate operands are fetched from.
- STACK: This specifies the bank the SP and FP registers reference, and where stack data is stored and retrieved. The general purpose registers R6 to R7 also refer to addresses in this bank.
- DATA: This specifies the primary data bank for all direct and indirect addressing of main memory (data that isn't the stack). The general purpose registers R0 to R3 refer to addresses in this bank. In many cases, the DATA bank and the STACK bank may map to the same physical memory.
- EXTRA: This specifies the secondary data bank for general purpose use. The general purpose registers R4 to R5 refer to addresses in this bank. This may be used to access additional memory, or to provide additional register addressing into same bank as is mapped to STACK or DATA.
Status/Control register: The ST register is a 16-bit register that contains all status flags for the core. It cannot be accessed directly at all, except internally by instructions via microcode, but individual flags can be set, cleared, or tested. See Flags for details.
Interrupts:
- Interrupt trigger: The IT register is a 32-bit register that contains a bit indicating whether an interrupt was raised for that index. It is set by the triggering code (usually an external device or coprocessor), and then it is cleared when the core handles the interrupt (whether it is mapped to a handler or not).
- Interrupt vector: Each core has its own vector of 32 16-bit addresses which specify where the interrupt handler is located. If the address is zero, then no handler will be called (it doesn't not call address zero). Otherwise the address is called within the specified code bank when the interrupt fires. The current IP and ST registers are pushed onto the stack (pointed to by SP), so they can be restored later (by the IRET instruction in the default instruction set, which uses the IRT microcode). The ST flag is also cleared (so interrupts are disabled, while the handler runs).

All 16-bit registers have a well defined association with a memory bank. This is used by the default instruction set for operations support memory addressing by register (with or without an offset).

Register	Memory bank association
`R0`	`DATA`
`R1`	`DATA`
`R2`	`DATA`
`R3`	`DATA`
`R4`	`EXTRA`
`R5`	`EXTRA`
`R6`	`STACK`
`R7`	`STACK`
`BC`	`CODE`
`BS`	`STACK`
`BD`	`DATA`
`BE`	`EXTRA`
`IP`	`CODE`
`SP`	`STACK`
`FP`	`STACK`
`MB`	`CODE`

Flags

Each OZ-3 core has several flags stored in the ST status and control register. There are two types of flags:

Status: Status flags can be modified by instructions (or by user programs if the configured instruction set exposes that). These have standard meanings, and the underlying OZ-3 microcode and default instruction set operates against these.
Control Control flags cannot be changed by instructions, or by user programs (no matter what the instruction set). However, they can be read by microcode (and thus user programs, if exposed). Control flags may represent internal control state of the CPU (the W flag, for instance), or may be set by external code (the T flag is set by the OZ-3 debugger for instance, to step through code and handle breakpoints).

Bit	Flag	Name	Type	Description
0	Z	Zero	Status	Set when an operation results in zero
1	S	Sign	Status	Set when an operation results in the high bit set
2	C	Carry	Status	Set when an operation causes an unsigned overflow
3	O	Overflow	Status	Set when an operation causes a signed overflow or an error
4	I	Interrupt	Status	When set, interrupts are enabled
8	T	Trap	Control	When set, the core automatically halts after each instruction
9	W	Wait	Control	When set, the core has a WAIT instruction active

Instruction Set

The OZ-3 core does not have a fixed instruction set. Instead, each OZ-3 core can be configured with either the OZ-3 default instruction set (TODO: link to documentation), an extension of the instruction set, or a completely different application-specific instruction set. Instructions within an instruction set are implemented in terms of OZ-3 microcode assembly which defines the execution engine for the core.

Instruction sets may be defined explicitly in host application code, or externally via an instruction set text file. Each instruction set can have up to 65536 unique instructions (lots!), each with its own custom microcode. The OZ-3 assembler supports all defined syntax for the instruction set and can generate machine code that an OZ-3 core using the same instruction set will be able to run.

Synchronization

The OZ-3 has several cores, coprocessors, and devices all with access to shared resources like memory, ports, CPU cores, and coprocessors. This means there are lots of race conditions and possible synchronization issues, which is not fun. To address this, the OZ-3 provides very strong synchronization guarantees at the microcode level (for CPU cores), and host API level (for application code, devices, and coprocessors).

Each operation that uses a shared resource requires a Lock object, which guarantees exclusive access to the resource. If multiple pieces of code attempt to lock the same object at the same time, they are automatically queued for access in a FIFO fashion. In microcode, this is done via LK and UL operations which are mutually exclusive with each other (preventing instruction-level deadlock possibilities). In the host application API (for devices and coprocessor code), it is done explicitly by calling RequestLock on an oz3::Lockable, or equivalent function. Locks are not literally blocking for calling code, as it is a simulated lock, and calling code must wait until the lock indicates it is locked before it can be used. Destroying the lock, allows the resource to be locked by other code. See oz3/core/locackable.h for details.

The approach optimizes for ease of understanding over maximizing (virtual) performance in the interest of fun. In fact, due to the heavy synchronization guarantees, it provides an interesting opportunity for programmers (gamers) to find optimal ways to use the shared resources. It is also easy for host applications to design new devices and coprocessors that easily comply with the OZ-3 locking semantics.

Concurrent memory access

Each core and coprocessor may be configured to be able to access one or more of the available memory data banks. Fetch and store actions are implicitly fully synchronized, such that no two components are accessing the same memory bank at the same time. This is true for the duration the operation requires the memory bank (both reads and writes). Any other cores and coprocessors will be put into a blocked state until the preceding operation is complete for the specified memory bank operation.

As a simple example, a core may execute the MOV R1 (1234) instruction from the default instruction set (copy 16-bit value at address 1234 to register R1). This will result in the following instruction microcode:

Microcode	Cycles	Description
cpu fetch	---	Lock CODE memory bank.
cpu fetch	1	Set CODE address bus to `IP`.
cpu fetch	1	Read opcode from CODE data bus, and decode it. `a` is set to `R1`. This also advances `IP` by 2.
`LD(C1)`	1	Read 16-bit word containing `1234` from the CODE data bus into register `C1`.
`UL`	---	Unlock CODE memory bank.
`LK(DATA)`	---	Lock DATA memory bank.
`ADR(C1)`	1	Set DATA address bus to `1234` stored in `C1`.
`LD(R1)`	1	Reads 16-bit word from DATA data bus into register `R1`.
`UL`	---	Unlocks DATA memory bank.

The total execution time of this instruction is 5 cycles (memory bank locks and unlocks are logically instantaneous). Other cores and coprocessors are blocked from accessing the CODE memory bank for the first three cycles and the DATA memory bank of the last two cycles. If the OZ-3 core has mapped the CODE and DATA memory banks to the same physical bank, then other cores and coprocessors may still execute between the "fetch" portion and "write" portion of execution. Whenever a UL is executed, the "next in line" will automatically claim the lock and the subsequent LK would block the CPU core.

Concurrent port access

A port is a shared connection between cores, coprocessors, and devices. Coprocessors and devices are mapped to specific ports at construction, and general purpose cores can access all ports. Each port is uniquely lockable. Therefore, the only contention comes from multiple cores attempting to read or write to the same port at the same time, or a core attempting to read or write at the same time as a mapped coprocessor or device. Like main memory, all port reads and writes are synchronized. Processors and coprocessors may enter a (likely very brief) blocked state if they attempt to do a read or write to a port. There is no synchronization between instructions however, so it may be important for a core or coprocessor to read/write both the status and value of a port within a single instruction. In the OZ-3 default instruction set, this is done with the INS / OUTS instructions instead of the IN / OUT instructions which ignore the port status.

Interrupts

Coprocessors, cores, and external devices can all trigger interrupts, and so can attempt to trigger an interrupts at the same time. Triggering an interrupt takes no time (in a simulated sense). It is uniquely triggered per core and it can only be cleared by the core itself, so there is no race condition for the actual setting and clearing of the interrupt trigger. However, there are race conditions when triggering an interrupt while an interrupt is in progress. The OZ-3 takes a relatively simple approach to this:

Interrupt trigger: Any time (0 cycles)
- Sets the associated bit for the interrupt in the IT register (interrupt vector trigger)
- Duplicate triggers (before handling) of the same interrupt are ignored (coprocessors and devices are notified whether this happens when they raise the interrupt)
Interrupt detection: After each instruction completes (0 cycles)
- If the interrupt enable flag I is not set then control flow continues (interrupts are disabled).
- If no bits in the IT register are set, then control flow continues (there are no interrupts).
- Continues to interrupt handler mapping (#3)
Interrupt handler mapping: (0 cycles)
- The lowest index set interrupt handler is determined and the IT bit is reset for that index.
- If the handler is not set:
  - If IT is not set, continue with normal code execution then control flow continues.
  - If any bit in IT is set, restart interrupt handler mapping (#3)
- Continues to interrupt handling (#4)
Interrupt handling: (3 cycles)
- The current IP and ST registers are pushed onto the stack
- Status flag I is cleared. This disables interrupts. The Z, S, C, and O status flags are also cleared.
- Using the OZ-3 default instruction set, the code can then optionally call EI and DI to enable and disable interrupts for the duration of this interrupt handler.
Interrupt handling end: An instruction (like IRET from the default instruction set) calls the IRT microcode (min 6 cycles)
- The IRET (or equivalent) instruction is fetched, and IRT microcode is executed.
- The IP and ST registers are popped from the stack.
- This will restore interrupt handling to what it was before the interrupt began.

Coprocessors

The OZ-3 core supports additional low-level functionality through coprocessors. Coprocessors run independently to the general purpose cores and provide additional functionality through dedicated reserved opcodes. Each coprocessor has its own separate opcodes which are fetched and decoded by an OZ-3 core, and then passed with their arguments to the coprocessor for execution. Coprocessors may have direct access to shared resources (memory, ports, interrupts). Some standard OZ-3 defined coprocessors are defined here. However, specific virtual computer implementations may also define their own coprocessors, with their own behavior and opcodes.

DMA processor

The OZ-3 supports a coprocessor core for doing block memory copies of pages between memory banks. It has a 8 16-bit DMA request registers, which are mapped 1-to-1 with the general purpose cores of the OZ-3.

TODO

Math processor

The OZ-3 also supports a separate math coprocessor that supports basic floating point operations. It also supports higher level operations like faster integer multiply/divide, and standard trig functions (sin, cos, etc.)

TODO