Pipeline protection - janomach/the-hardisc GitHub Wiki
Basic concept
A fault in a system may lead to an unexpected output. A fault-tolerant system must provide the correct function during the presence of a fault. This can be achieved by executing the system function in three identical copies and voting about the correct result (2 out of 3), assuming only one copy is faulty. Such a system may be a processor with three identical cores executing the same instructions, leading to a more than 200%
increase in area and power consumption. The processor delivers the system function by transfers through the data bus interface. If an instruction execution is affected by a fault, the processor should avoid sending a bus transfer. Once the bus transfer has been sent, the processor should avoid sending it again. The Hardisc fulfills those two requirements by replicating the pipeline. The first requirement means that it's enough to detect faults before a data bus transfer. The second requirement means the system must tolerate faults after a bus transfer. This separates the pipeline into two sections. The first section (before a data bus request) is duplicated (pipeline d0 and d1), whereas the second is triplicated (pipeline t0, t1, and t2). Upon detected discrepancy, the instruction is marked as corrupted, and possible transfer to the data bus is not initiated.
The EX stage expands duplicated logic into triplicated. The pipeline d0 is connected to t0 and t2; d1 is only connected to the t1 pipeline. The load and store instructions do not need the Executor inside the EX stage, so it contains only two copies. The first receives inputs from the d0 and saves results to the t0 and t2, whereas the second Executor receives inputs from d1 and is connected to the t1. The MA stage compares product registers of pipeline t0 and t1, written from the EX stage, to detect discrepancies in the Executor. The instruction is marked as corrupted and directly restarted if different results are stored in these registers. TMR protects all EXMA and the MAWB registers. Each copy of a register in the TMR circuit is driven by one of the tX pipelines. The voter at the output of the TMR is also triplicated and drives one tX pipeline.
Restarting instruction
Suppose a discrepancy is detected during instruction execution. In that case, an instruction restart is performed by flushing the pipeline and jumping at the address saved in the Reset Point register protected by TMR. If this register were not present in the architecture and an instruction address was propagated via the pipeline registers (and IFB), the TMR would be needed to protect the address registers, which would highly impact area and power consumption.
Register File (DEPRECATED)
A single-error correction, double-error detection code is leveraged to protect the register file. The syndrome is checked during reading the register file in the OP stage. The checksum is generated in the WB stage inside each tX pipeline separately. As the register file contains only a single write port, a TMR logic is used to select valid data for the write port from the three pipelines. The instruction is marked as corrupted upon detecting a correctable error in the requested register during reading and restarted from the MA stage. The value read from the register file is directly routed to the OP stage parallel to syndrome evaluation and correction logic. This avoids a critical path due to correction logic. The instruction is marked as not correctable upon detection of an uncorrectable error. In this case, an exception is reported if the instruction propagates to the MA stage.
Automatic Correction Mechanism
A configurable automatic correction mechanism (ACM) decreases the probability of fault accumulation. The ACM ensures the faulty register is corrected before the instruction propagates back to the OP stage. The register file has two read ports, with addresses in pipeline registers idop_rs1 and idop_rs2. The ACM may be configured by the custom hrdctrl0 CSR register to automatically leverage the unused read ports to search and correct the faulty records in the register file. If the instruction in the ID stage does not require a particular read port or the OP stage is empty, the ACM may change the read port address from the decoder. The new address is determined by a 5-bit register acm_add, that is incremented in a round-robin fashion whenever the ACM changes one of the read addresses. This helps to check all registers, thus lowering the faults accumulation periodically. Addresses for read ports are taken from the d0 pipeline, and read values are pushed to both the d0 and d1 pipelines. Each time a correctable error is detected, the ACM saves the corrected value to the internal registers to be written back to the register file in the following clock cycles. The ACM will wait with the write until instructions from the WB stage do not require the single write port. A logic inside the ACM is duplicated to protect it from faulty behavior. If the copies of the ACM logic differ, the correction process is terminated, so the faulty value needs to be detected again to be corrected.
Huge advantage
The protected core contains only one instance of the predictor, which gets the currently fetching address only from pipeline d0, but its prediction is signalized to both pipelines. The address in pipeline d1 should be the same, but if it is affected by a fault and contains a different address, the pipeline will be restarted, and the prediction will be discarded. Since even the unprotected pipeline can detect all types of misprediction, we can afford to leave the whole predictor unprotected. The same situation occurs for RAS. It is fed from d0, and only one instance is present. Any fault inside the predictor or RAS can lead only to a misprediction. Information that the prediction was performed (the prediction flag) must be protected. So, the flag is saved to the IFB of both pipelines. The target address is saved into the BoP. Any fault inside the BoP corrupts the saved address, showing as a misprediction and leading to instruction restart. This means that even the BoP does not have to be protected, and the protected pipeline contains only one instance of BoP. Having only one predictor is a huge advantage of the Hardisc, which is impossible in the systems protected by replicating the cores. This possibility greatly reduces the area and power consumption of the protected system.