Processor Design ‐ 6 - muneeb-mbytes/computerArchitectureCourse GitHub Wiki
Implementation of the controller
The controller can be implemented using either a PLA or a ROM. Implementation via PLA takes place in two parts:
PLA 1. Generation of control signals for datapath
The tables for micro operations are linked to the groups of the same, i.e., PCgrp, MEMgrp, RFgrp and ALUgrp. There are a total of 10 control states required(cs0 to cs9) for all the necessary micro operations given in the below transition diagram:
The below table shows the relationship between control states and control signals:
To represent 10 control states, 4 bits are required, and hence the input to PLA1 is 4 bit.
The output signals are grouped, 4, 4, 5 and 6 for PCgrp, Memgrp, RFgrp and ALUgrp respectively for their respective micro operations.
PLA 2. Generation of next control state
The below table describes the next state given the present state and instruction groups:
R-class | sw | lw | beq | j | |
---|---|---|---|---|---|
cs0 | cs1 | cs1 | cs1 | cs1 | cs1 |
cs1 | cs2 | cs4 | cs4 | cs8 | cs9 |
cs2 | cs3 | x | x | x | x |
cs3 | cs0 | x | x | x | x |
cs4 | x | cs5 | cs6 | x | x |
cs5 | x | cs0 | x | x | x |
cs6 | x | x | cs7 | x | x |
cs7 | x | x | cs0 | x | x |
cs8 | x | x | x | cs0 | x |
cs9 | x | x | x | x | cs0 |
Note: X represents don't cares. We can have a more compact representation of the above table where we can represent this as a 1D table instead of a 2D one by listing input combinations vertically.
Present state | Instruction group | Next State |
---|---|---|
cs0 | x | cs1 |
cs1 | R-class | cs2 |
cs1 | sw/lw | cs4 |
cs1 | beq | cs8 |
cs1 | j | cs9 |
cs2 | x | cs3 |
cs3 | x | cs0 |
cs4 | sw | cs5 |
cs4 | lw | cs6 |
cs5 | x | cs0 |
cs6 | x | cs7 |
cs7 | x | cs0 |
cs8 | x | cs0 |
cs9 | x | cs0 |
From the above table we can infer that for cs0 regardless of the instruction group, the next state must be cs1. After cs1 the next state depends on the instruction group. The first two columns will serve as inputs to PLA2 to generate the required next state.
We connect the two PLAs and use a 4-bit state register to hold the control state value and change values every clock cycle. The state register contains the present control state and drives both the PLAs.
ROM: A suitable alternative to PLA?
Size of a ROM
A ROM(Read Only Memory) is a general purpose component that can take n inputs (corresponding to $2^n$ rows) as an address to generate m outputs. The number of words a ROM can hold is $2^n$. This is because a ROM holds all possible minterms.
Size of a PLA
The number of terms a PLA can hold is k(in the form of a compact truth table). This is because a PLA holds only the required number of minterms to form a canonical representation of the required logical function.
Comparison
Since PLAs only hold the required number of minterms, they will always consume lesser size than a ROM. Take the following example: For a PLA:
inputs | outputs |
---|---|
0X110X | 1010 |
For a ROM:
inputs | outputs |
---|---|
001100 | 1010 |
001101 | 1010 |
011100 | 1010 |
001101 | 1010 |
Control state transitions:
Present State | Opcode | Next State |
---|---|---|
0000 | XXXXXX | 0001 |
0001 | 000000 | 0010 |
0001 | 10X011 | 0100 |
0001 | 000100 | 1000 |
0010 | 000010 | 1001 |
0011 | XXXXXX | 0011 |
0100 | XXXXXX | 0000 |
0100 | 101011 | 0101 |
0101 | 100011 | 0110 |
0110 | XXXXXX | 0000 |
0111 | XXXXXX | 0111 |
1000 | XXXXXX | 0000 |
1001 | XXXXXX | 0000 |
1001 | XXXXXX | 0000 |
For a ROM, a total of 10 inputs will be required, 4 for the present state and 6 for opcode. The memory needs to hold $2^10$ words, i.e., 1024 words. Also, for example, to generate next state 0100, two terms must generate the same output, i.e., opcode input 100011 and 101011 should generate 0100. In comparison, A PLA only needs to hold k = 14 terms, which is significantly smaller in comparison. Hence, a PLA would be far more compact in comparison to ROM.
Microprogrammed controller
In this style of design, the controller is considered to be a small computer with a memory block(micro program block) that generates a 19 bits for data path control signals and 2 bits for controlling the sequence of execution of the micro program that is to be executed by the controller. The micro program counter is a 4 bit register that steps through different words of the micro programmed memory to ensure that the right signals are generated at the right time. The micro sequencer ensure the right address is put into the micro PC.
In reference to the first diagram on this page, there are 2 instances where branching takes place, cs1 to cs2, cs4, cs8 or cs9 and cs4 to cs5 and cs6. In micro programming, a multi-way branch is called a dispatch. Our instance has 2 dispatches.
The output to the micro PC is either PC+1, reset, dispatch 1 or dispatch 2. The dispatches generate the necessary address when branching takes place.
Microprogram
Take the example of the following micro program:
first: fetch, PCinc, seq, rs2A, rt2B, Paddr, dispatch1 1a: arith, seq, res2rd, reset 1b: Maddr, dispatch2 2a: m_wr, reset 2b: m_rd, seq, mem2rt, reset 1c: branch, reset 1d: jump, reset
The first instruction is universal. the 1 in 1a is the dispatch number, and a in 1a is the branch. Micro programs are written in symbolic form. Micro assemblers can translate this into contents that will go into the control store of the micro program memory. There are 2 styles in which micro programs can be written:
Horizontal microprogramming
The previously written micro program follows this style. In this style, micro operations can be performed concurrently within the same instructions. It generally performs better but at the cost of minimal encoding.
Vertical microprogramming
This style of programming supports lower concurrency of micro operations and has lower memory requirements(maximal encoding) but at the cost of performance.
Microcode: Trade-offs
Microcode in general is easier to design and write. Implementation requires an off-chip ROM. It is easier to change the values since they are in memory and it can emulate other architectures and make use of internal registers. But the disadvantage is that since the ROM will be off-chip, it will be slower.