Control Unit - CarlosCraveiro/RISCV_based_processor GitHub Wiki
As described before, the processor architecture is inspired on RV32C standard, however, the project is working with 8 registers, 16-bit registers and 16-bit modified instructions, so it is called RV16Cm (standing for RISC-V 16bits compact-modified set of instructions) - it is important to mention this was conceived by the designers and we have no further notice of such a standard existing previously, but it is a nice way of adapting the RV standards for didactic purposes.
Firstly, it is important to mention the RV32C instruction set, which inspires most of the conceived instruction set of RV16Cm. It is important to notice that the RV32C instructions are 16 bits wide(in accord with it's purpose of compact instructions) so, technically, the 16 bit architecture could be compatible with RV32C. As the state machine of this project was never intended to implement all instructions on RV32C for simplicity and other hardware reasons(for instance, the ALU was also chosen not to have floating point numbers), it was chosen to modify the RV32 instructions and create the so called RV16Cm. To visualize the modifications, the following tables containing both instructions sets.
| Format | Meaning | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CR | Register | funct4 | rd/rs1 | rs2 | op | ||||||||||||
| CI | Immediate | funct3 | imm | rd/rs1 | imm | op | |||||||||||
| CSS | Stack-relative Store | funct3 | imm | rs2 | op | ||||||||||||
| CIW | Wide Immediate | funct3 | imm | rs2 | op | ||||||||||||
| CL | Load | funct3 | imm | rs1 | imm | rd | op | ||||||||||
| CS | Store | funct3 | imm | rs1 | imm | rs2 | op | ||||||||||
| CB | Branch | funct3 | offset | rs1 | offset | op | |||||||||||
| CJ | Jump | funct3 | jump target | op | |||||||||||||
- Table that describes RV32C instruction formats. Source: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-209.pdf
| Fields | 3 bits | 5 bits | 3 bits | 3 bits | 2 bits |
|---|---|---|---|---|---|
| CRm | Funct3 | 00000 | rd/rs1 | rs2 | op |
| CIm | Funct3 | Imm8[7:3] | rd/rs1 | Imm8[2:0] | op |
| CLm | Funct3 | Addr8[7:3] | rd | Addr8[2:0] | op |
| CSm | Funct3 | Addr8[7:3] | rs1 | Addr8[2:0] | op |
| CBm | Funct3 | Addr11[7:3] | Addr11[10:8] | Addr11[2:0] | op |
- Table that describes RV16Cm instruction formats. Source: the authors
It is important to point some main differences:
- The RVC16m uses only 5 instructions formats - CR(Compact-Register), CL(Compact-Load), CS(Compact-Store) and CB(Compact-Branch). So, naturally, RV16Cm instructions do not support Stack-Relative Store, unconditional branching with absolute values(jump) or wide immediates.
- Not all the functions are implemented for each original instruction format - this can be easily visualized as CRm has a
Funct3field and notFunct4(allowing less instructions then original CR) - The load, store and branch formats had their immediates renamed to adresses and have only one(load and store) or no register fields - which is exactly one less then original CL/CS/CB formats. This is because, for immediate-handling and insctruction implementation simplicity it is chosen to save, load and branch only based on absolute adresses - what will obviously have implications on register and memory acess which will be discussed on a later section.
- The parts of the instruction do not vary in size through the instruction formats(notice that there are no merged cells) - bit 5(from right to left), for example, could be part of the 5 bit-field
rs2for CR format or the 8-bit wide immediateimmfor CIW on the original instruction formats. In RV16Cm, it is always part of a 3-bit field.
These modifications all target a simpler and more didactic implementation of a RISCV inspired processor - being the last one perhaps the most important for didactic puporses.
As mentioned earlier, the modified instruction formats also are not implemented with allo original instructions - due to field changes, some instructions are not possible(CR cannot have all instructions because of Funct3 insetad of Funct4 for example), some lack of meaning and others are impossible because of other hardware limitations of the project(such as floating points). However, that does not mean that the only instructions possible are the ones implemented here, and there is definetly more room of adaptation for new instructions and states on the Finite State Machine. The 9 instructions here implemented are chosen for being some of the most basic ones and capable of doing basic operations - with the proper assembler, this machine can definetly implement some serious functionalities. The following sub-sections describe the implemented instructions
The Compact Register modified (CRm) format implements the 5 functions of the ALU operating on registers rd and rs2 and being stores on rd. It's op field equals to 00 and the Funct3 field distinguishes between the 5 different instructions. The following table describes all the fields and meaning of the instruction of CRm format.
| Instruction | Operation | Funct3 | -------- | rd/rs1 | rs2 | Op |
|---|---|---|---|---|---|---|
| add rd, rs2 | rd <- rd + rs2 | 000 | 00000 | rd/rs1[2:0] | rs2[2:0] | 00 |
| sub rd, rs2 | rd <- rd - rs2 | 001 | 00000 | rd/rs1[2:0] | rs2[2:0] | 00 |
| and rd, rs2 | rd <- rd AND rs2 | 010 | 00000 | rd/rs1[2:0] | rs2[2:0] | 00 |
| or rd, rs2 | rd <- rd OR rs2 | 011 | 00000 | rd/rs1[2:0] | rs2[2:0] | 00 |
| slt rd, rs2 | rd <- rd SLT rs2 | 101 | 00000 | rd/rs1[2:0] | rs2[2:0] | 00 |
- Table that describes the instructions of CRm format. Source: the authors
Notice that:
- The
Funct3field of each operation is exactly the same that the ALU uses for representing such operations - meaning that no extra-Decoder for the ALU is needed - The choice to make the field
Funct3instead of the originalFunct4is not because of space - since there is 5 unused bits that could be used for this or to represent more registers if the architecture needed it - as mentioned earlier, the choice is made targeting simplicity
The Compact Immediate modified (CIm) format implements the ALU functions operating over immediates. For the scope of this project, only one function is implemented(addi, which adds an immediate to a register and saves on the same register), making the processor itself function(the addi is specialy important for iterating) but without making it's implementation too redundant(the other 4 ALU operations are already implemented on registers). Its op field is 01 and the 8-bit immediate is signal-extended to fit the 16 bits and add to the register operand. The following table describes the fields and meaning of the sole instruction implemented for the CIm format
| Instruction | Operation | Funct3 | Imm[7:3] | rd/rs1 | Imm[2:0] | Op |
|---|---|---|---|---|---|---|
| addi rd, Imm | rd <- rd + s_ext(Imm) | 000 | Imm[7:3] | rs1[2:0] | Imm[2:0] | 01 |
- Table that describes the instructions of CIm format. Source: the authors
The Compact Load modified format implements the load instructions that get a data from memory onto a register. Again, the implementation has only one instruction - load word(lw) - that loads 4 bytes(a word) of data from an adress to a register. Its op field is 10(which is shares with the CSm instruction format - they differ in the Funct3 field). The following table describes the fields and meaning of the load word instruction implemented for the CLm format:
| Instruction | Operation | Funct3 | Addr[7:3] | rd | Addr[2:0] | Op |
|---|---|---|---|---|---|---|
| lw rd,Addr | rd <- M[Addr] | 000 | Addr[7:3] | rd[2:0] | Addr[2:0] | 10 |
- Table that describes the instructions of CLm format. Source: the authors
The Compact Save modified format implements the save instructions that saves a data from a register into memory. It also has only one instruction - save word(sw) - that saves 4 bytes of data from a register onto a memory adress. Its op field is also 10. The following table describes the fields and meaning of the save word instruction implemeted for the CSm format:
| Instruction | Operation | Funct3 | Addr[7:3] | rs1 | Addr[2:0] | Op |
|---|---|---|---|---|---|---|
| sw rs1,Addr | M[Addr] <- rs1 | 001 | Addr[7:3] | rs1[2:0] | Addr[2:0] | 10 |
- Table that describes the instructions of CLm format. Source: the authors
The Compact Branch modified format implements the conditional branch instructions that jumps the PC to a given adress conditionally. In this case, once more because additional implementations would only and complexity and be redundant in didactic means, there is only one implementation. bneqzbranches if the zero flag is false, which can be extensively used on logical loops, for example. Its op field is 11. The following table describes the fields and meaning of the branch if not equal zero instruction of CBm format:
| Instruction | Operation | Funct3 | Addr[7:3] | Addr[10:8] | Addr[2:0] | Op |
|---|---|---|---|---|---|---|
| bneqz Addr | PC <- Addr if Zero==False | 000 | Addr[7:3] | Addr[10:8] | Addr[2:0] | 11 |
- Table that describes the instructions of CBm format. Source: the authors
PS: The
PCis a register named program counter and described on the registers section
This sub-section is used to describe the memory adresses that are acessed by the processor. First of all, the RAM is not implemented on FPGA and is used from externally(on the test benches, it is used the Altera RAM memory from the FPGA kit). The instructions section describe how many bits of adress each instruction format receives. Basically, the lw and sw instructions receive an adress of 8 bits, while the bneqz receives an adress of 11 bits. Mapping it to memory with some math:
Where LSWAddr and BAddr correspond to the largest adress that the load/store and branch instructions can acess, respectively. This is a limitation due to the choice to acess adresses only with absolute values - because of this, the whole adress must fit onto the instruction and, with a reduced instruction of 16 bits, fewer adresses are acessible.
So, basically, the data memory range is 0x000 - 0x0FF, while the program memory range is 0x000 - 0x7FF.
Since the architecture is based on, but not exactly, RISC-V, it is important to point clearly the adaptations made on RV16Cm:
| RV32C | RV16Cm | |
|---|---|---|
| Data bus | 32 bits | 16 bits |
| Instruction Size | 32 bits | 16 bits |
| Register size | 32 bits | 16 bits |
| Number of registers | 16 | 8 |
| Register instructions | 2 registers | 2 registers |
| rd <- rd op rs | rd <- rd op rs | |
| Load instructions | rt <- M[rs + sign_ext(imm)] | rd <- M[imm] |
| Store instructions | M[rs + sign_ext(imm)] <- rt | M[imm] -< rd |
| Branch instructions | U/J type | J type |
| bneq rs,rt, label | bneqz label | |
| if rs != rt, PC <- PC + sign_ext(label) | if ZF=0, PC <- label |
With every instruction properly documented, the control unit can be totally described as a finite state machine(FSM) that will percurr all states needed to perfom the instructions - fetch, decode and execute. But since the processor is multi-cycle, one instruction may take more than one clock cycle to be executed - meaning that the FSM will not be as simple as 3 states. Firstly, it is important to list every single input and output that the FSM should control. On the main diagram of the processor it is possible to visualize all hardware attached to the control machine, thus all inputs and outputs it has to deal with. All that information is sumarized on the following table:
| Variable | Input or Output | Number of bits | Meaning |
|---|---|---|---|
| op | Input | 2 | It is the op field from the instruction |
| Funct3 | Input | 3 | It is the Funct3 field from th instruction |
| Zero | Input | 1 | It is the Zero flag from the ALU that indicates wheter the last result of it was 0 |
| PCUpdate | Output | 1 | Controls if the PC should be updated |
| Branch | Output | 1 | Controls if it is a branch case |
| AdrSrc | Output | 1 | Controls a MUX to determine if the Adress to be inputted on the Memory should be readen from PC(0) or from the result from the ResultSrc MUX(1) |
| MemWrite | Output | 1 | Determines if the memory should be written - connected to WriteEnable of the memory |
| IRWrite | Output | 1 | Determines if the Instruction Register should be written to - connected to its enable |
| ResultSrc | Output | 2 | Controls a MUX that sends its result to the PC(as input) and to the MUX controlled by AddrSrc. The inputs of the MUX are a register that accumulates the result of ALU(ALUOut), another register that contains the readen data from memory(Data) and the direct result from the switching circuit of the ALU(ALUResult) |
| ALUControl | Output | 3 | Controls the ALU different operations |
| ALUSrcB | Output | 2 | Controls wheter the SrcB input of ALU will be the register that contains the output RD2 of the register bank(00), the signal/zero extension from the Extend block(01), or a hardwired 2(10) |
| ALUSrcA | Output | 2 | Controls wheter the SrcA input of ALU will be the PC(00), a hardwired 0(01), or the register that contains the ouput RD1 of the register bank |
| ImmSrc | Output | 2 | Controls the Extend module - stating if there sould be a signal or zero extension on the immediate, and where the immediate is placed on the instruction |
| RegWrite | Output | 1 | Controls the write on the register bank - connected to WE3
|
- Table that contains every input and output of the finite state machine
It is important to note that:
- The
Funct3field that selects the operation to be done on the ALU, as described earlier, has the same correspondence of operations asALUControl, which means that in the instructions where it is used one will be directly connected to the other without any addiotinal decoder - The 0 and 2 hardwired to muxes controled by
ALUSrcAandALUSrcB, respectively, are used to increment PC -
PCUpdateandBranchare not actually outputs of the control unit, but only from the finite state machine. Together, through a switching circuit, both composePCWrite, which determines if the control unit should write to the PC and is directly attached to the EN of the PC register. The switching circuit that definesPCWriteis: $ PCWrite = (PCUpdate) OR (Branch AND Zero) $
Secondly, it is as important to list all states conceived for the FSM:
| State name | State function |
|---|---|
| Fetch | Fetches the next instruction of PC and increments PC |
| Decode | Decodes the instruction and jumps to the according state |
| MemAdr | Gets the memory adress from a instruction by parsing it on immediate module and summing wiht 0 on ALU |
| MemRead | Uses the ALUOut register(with previous adress result) to read an adress from memory |
| MemWB | Writes the readen data form memory back to register file on the according register |
| MemWrite | Acesses the adress at ALUOut and writes it with RD1 register(one of two possible simultaneous read-registers from regbank) from register bank |
| ExecuteR | Executes a CRm format by operating on the right registers on register bank and making the correspondent operation on ALU |
| ALUWB | Writes the ALU result back to register bank |
| ExecuteI | Executes a CIm format by parsing correclty the immediates on immediate module and making the correspondent operation on ALU |
| BNEQZ | Parses the immediate accordingly on the immediate module, sums it with 0 on the ALU and places the result on PC to branch |
- Table that describes the states of the FSM
With all control variables of the control unit defined, the following truth table describes how each state should behave
| PC Write | Addr Src | Mem Write | IR Write | Result Src | ALU Control | ALU Src B | ALU Src A | Imm Src | Reg Write | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Branch | PCUpdate | ||||||||||
| Fetch | 0 | 1 | 0 | 0 | 1 | 10 | 000 | 10 | 00 | dd | 0 |
| Decode | 0 | 0 | d | 0 | 0 | dd | ddd | dd | dd | dd | 0 |
| Mem Adr | 0 | 0 | d | 0 | 0 | dd | 000 | 01 | 01 | 01 | 0 |
| Mem Read | 0 | 0 | 1 | 0 | 0 | 00 | ddd | dd | dd | dd | 0 |
| Mem WB | 0 | 0 | d | 0 | 0 | 01 | ddd | dd | dd | dd | 1 |
| Mem Write | 0 | 0 | 1 | 1 | 0 | 00 | ddd | dd | dd | dd | 0 |
| ExecuteR | 0 | 0 | d | 0 | 0 | dd | funct3 | 00 | 10 | dd | 0 |
| ALU WB | 0 | 0 | d | 0 | 0 | 00 | funct3 | dd | dd | dd | 1 |
| Execute I | 0 | 0 | d | 0 | 0 | dd | funct3 | 01 | 10 | 00 | 0 |
| BNEZ | not(ZF) | 0 | d | 0 | 0 | 10 | 000 | 01 | 01 | 11 | 0 |
The truth table shown allows a classification of the finite state machine(FSM) that composes the control unit: the table shows that the outputs depend only upon the current state, not upon the inputs - which means the FSM is a Moore Machine. There are 2 clear exceptions that would make this classification invalid: on ExecuteR, ALUWB, ExecuteI states, the ALUControl signal is related to Funct3 input from the instruction, and on BNEQZ state, the Branch signal is related to ZF input from ALU. Instead of reclassifying the FSM as a Mealy Machine - what would be a more general case, but could make the implementation less modular if it is made based on this idea - it is chosen to isolate those outputs from the FSM.
So, basically, the ALUControl output comes from a separate decoder that uses the Funct3 field in it, while the Branch output is not implemented, in such a way that there is a switching circuit that uses the current state, instead of a Branch output(this is logically equivalent to an implicit Branch output set as always 1 on the Truth Table). To clarify, the code snippet that sets the PC_Write (located on control_unit.v) is:
pc_write = pc_update or (~zero_flag and (curr_state == `BNEZ));
As described, the switching circuit is outside the FSM(described on CU_main_decoder.v) and put in directly in the control unit, and it does not use a branch signal, only the curr_state signal.
Another important thing to mention is that it can be analyzed that there are many dont-care terms on the truth table - this indicates that many parts of the processor are not being used on many states, meaning that multiple states could easily be processed simultanously. That is, this feature of dont-care terms suggests that the processor's speed would improve a lot with an eventual future pipelining implementation.
With all input variables of the control unit defined, the following table of states describe how each state should transition according to the inputs:
| Current State | OP Field | Func Field | Next State |
|---|---|---|---|
| Fetch | dd | ddd | Decode |
| Decode | 00 | ddd | ExecuteR |
| 01 | ddd | ALUWB | |
| 10 | ddd | MemAdr | |
| 11 | ddd | BNEQZ | |
| MemAdr | 10 | 000 | MemRead |
| 10 | 001 | MemWrite | |
| MemRead | dd | ddd | MemWB |
| MemWB | dd | ddd | Fetch |
| Mem Write | dd | ddd | Fetch |
| ExecuteR | dd | ddd | ALUWB |
| ALUWB | dd | ddd | Fetch |
| ExecuteI | dd | ddd | ALUWB |
| BNEQZ | dd | ddd | Fetch |
flowchart TB
Fetch((______FETCH______<br> AdrSrc = 0 <br> IRWrite <br> ALUSrcA = 00 <br> ALUSrcB = 10 <br> ALUControl = 000 <br> ResultSrc = 10 <br> PCUpdate)) --> Decode((DECODE))
Decode --> |"op = 10 (CLm OR CLs)"|MemAdr((MemAdr <br> ALUSrcA = 01 <br> ALUSrcB = 01 <br> ALUControl = 000))
Decode -->|"op = 00 (CRm)"| ExecuteR((ExecuteR <br> ALUSrcA = 10 <br> ALUSrcB = 00 <br> ALUControl = funct3))
Decode -->|"op = 01 (CIm)"| ExecuteI((ExecuteI <br> ALUSrcA = 10 <br> ALUSrcB = 01 <br> ALUControl = funct3))
Decode -->|"op = 11 (CBm)"| BNEQZ((BNEQZ <br> ALUSrcA = 01 <br> ALUSrcB = 01 <br> ResultSrc = 10 <br> ALUControl = 000 <br> Branch))
MemAdr --> |"funct3 = 000 (CLm)"|MemRead((MemRead <br> ResultSrc = 00 <br> AdrSrc = 1))
MemAdr --> |"funct3 = 001 (CSm)"|MemWrite((MemWrite <br> ResultSrc = 00 <br> AdrSrc = 1 <br> MemWrite))
MemRead --> MemWB((MemWB <br> ResultSrc = 01 <br> RegWrite))
ExecuteR & ExecuteI --> ALUWB((ALUWB <br> ResultSrc = 00 <br> RegWrite))
MemWB & MemWrite & ALUWB & BNEQZ --> Fetch
- Flowchart of the FSM
There are some important things to mention about the diagram that describes the FSM:
- On the diagram, the 1-bit signals are 1 where they are mentioned or 0 where they are excluded, while the signals with more bits are always represented with their according value
- The state transitions are all well-defined in terms of inputs - but there can be invalid inputs which are dealt with by maintaning the previous state. That is, the default case if the inputs do not match any specified transition is to maintain the state. This is chosen as a design option to minimize the transition of states, minimizing the switching of signals and, thus, the heat emission on an integrated circuit. Be noted that no testbenching or quantitive analysis was made for this design option, just followed a convention.
- The state-flow always starts on Fetch after a reset
Apart from the main control unit, there is a module called Immediate module that operates parallel to the FSM parsing the immediates form the instructions and providing it correctly according to the instruciton format. This module is described here on Control Unit section since they were implemented together(even on the same dev branch) because of their shared role of parsing.
Basically, it receives the instruction and a 2-bit signal that comes from the FSM and indicates the immediate format of the instruction format. On the instruction formats description, it can be seen that the immediate varies from size, possibly having 8 or 11 bits. Additionally, the immediates should be extended to fit the 16-bit bus to the ALU. This extensions can be done in two ways: the instructions involving immediates to be operated on ALU are signal-extended, while the instructions involving adress operations on ALU are zero-extended.
Because of this, the immediate module control signal(ImmSrc) is 2-bits wide, the first bit meaning its size and the second one meaning which extension should be performed. The following table describes the control signal of immediate extension:
ImmSrc |
Size of immediate | Extension to be performed | Instruction Format | FSM State that uses it |
|---|---|---|---|---|
| 00 | 8 bits | Signal Extension | CIm | ExecuteI |
| 01 | 8 bits | Zero Extension | CLm or CSm | MemAdr |
| 10 | 11 bits | Signal Extension | - | - |
| 11 | 11 bits | Zero Extension | CBm | BNEZ |
- Table that describres
ImmSrccontrol of the immediate module
As the signal extension on a 11 bit immediate (ImmSrc = 10) case should never happen in the conceived FSM, it is not written onto code and the default case is defined as setting the immediate as a 16-bit 0 value.
Firstly, one important feature added to the Verilog code is the description of the states as parameters. For example, the state Fetch is defined as follows:
parameter fetch = 4'b0000
This representation not only improves code legibility, but also allows the developers to use a special feature of Intel® Quartus® software, which is used for loading the hardware description in Verilog onto an Altera FPGA kit(which is the brand that the developers have acess on the laboratory).
The Quartus® State Machine Editor can change the values of states to different sequences(not only binary sequence 0000,0001,0010... but, for example, the Gray sequence 000,001,011...). This allows the developers to test not only on the hardcoded default case of binary sequence, but also another sequences. The sequence in which the states are declared may affect how the synthesis tool synthesizes the circuit onto the FPGA - affecting the number of logic cells, for example.
So, basically, the states are defined as:
parameter fetch = 4'b0000
parameter decode = 4'b0001
parameter memadr = 4'b0010
parameter memread = 4'b0011
parameter memwb = 4'b0100
parameter memwrite = 4'b0101
parameter executer = 4'b0110
parameter aluwb = 4'b0111
parameter executei = 4'b1000
parameter bneq = 4'b1001
The files that contain the control unit code are modularized as follows:
control_unit.v is the top control unit module - meaning it calls the other ones. Basically, it calls a decoder called the main decoder that sets most of the outputs (CU_main_decoder.v), another module that updates the state(CU_sequential.v), another decoder that sets the ALU outputs(alu_decoder.v), apart from the switching circuit that defines pc_write that was explained above.
So, basically, CU_main_decoder.v implements the FSM truth table(apart from ALUControl), while CU_sequential implements the FSM table of states. As said earlier, alu_decoder.v sets only the ALUControl signal. Inside the decoders, there are only case statements(that work inside always blocks for checking positive edge of clock) that are really simple and do nothing apart from the tables described throughout this text. On the case statement, it is possible to visualize the Moore Machine properties described: the case from CU_main_decoder.v depend only upon the state, while the one from alu_decoder.v depend on the input funct3. The only specific note is that on CU_main_decoder.v, as there were too many ouputs, to make the code briefer, they were all atributed to a buffer_out reg that is 13-bit wide and matches the outputs accordingly.
So, basically, control_unit.v has some instantiations to wire the decoders all up, and the state is updated inside an always block that gives the sequential behaviour due to clock cycle and also implements a synchronous reset. The code snippet for the always block is only:
always @(posedge clk) begin
// updates the state
if (reset == 1'b1) curr_state = fetch;
else curr_state = next_state;
end
being next_state a wire connected to the CU_sequential instance's output.
Before the instantiations, other regs are defined simply to parse the instruction to funct3 and op fields.
The immediate module follows a similar modularization, with CU_immediate_extension.v being the top most module, that may call N_to_16_sig_extend.v or N_to_16_zero_extend.v depending on the imm_src signal inside the case statement(in this case, the always block has instruction inside, meaning the immediate module is a switching circuit and does not depend on the clk). The N_to_16_sig_extend.v and N_to_16_zero_extend.vare not so trivial as the ones above, but they basically use Verilog syntax and parametrized modules to perform signal/zero extension from N bits(passed as parameter, used only for N=11 or N=8) to 16 bits.
Besides, all the case statements have default to set the outputs to all 0 in case of invalid inputs.