No data value dependent exceptions for RISC V - riscvarchive/riscv-CMOs-discuss GitHub Wiki
It seems to be highly desired by the original RISC-V architects like Andrew Waterman and Krste Asanovic that there be no data value dependent Exceptions (traps/faults/whatever) in RISC-V.
As far as I can tell, and as they have informed me, there are at the moment no such data value dependent traps and exceptions in RISC-V.
(Some folks think that this is incorrect. If so, please point us to examples. I suspect that the original RISC-V architects are correct for the so far approved specs, unpriv, priv, and probably V spec. But it would be really good to know if that is not true for the specs, and also for specs that are underway in their TGs but not yet approved.)
This is not a hard and fast requirement of the RISC-V ISA. As far as I can tell, it is not in any of the RISC-V specifications, not even as rationale.
It is just a design principle that the original architects say they followed, and which they recommend be followed by future RISC-V ISA extensions.
Not necessarily by implementations. Implementations are free to define implementation specific data value dependent exception. e.g. to trap and emulate certain unusual values, like the classic FP denorm and NaN support. But there is no portable way of coding such trap handlers. Not until a standard for such nonstandard behavior is created.
-
Glew prediction: I give it five years. Until then, this will be a source of fragmentation of ISA and portable code.
Although this wiki page was composed for the CMOs TG, to discuss CMOs related instruction definition issues related to wanting to avoid data value dependent exceptions, most of this discussion is not really CMO dependent, and should be moved out.
Obviously, memory address dependent exceptions are allowed. Whether the address is completely constant, or is obtained by a calculation involving a register value (and possibly CSR values).
Exceptions dependent on the values in a CSR are allowed. But not at the time of the CSR write. One might think of this as implying that exceptions that can be known at decode time are allowed, but not so much exceptions that are dependent on general-purpose register values.
Krste: most important to avoid exceptions that depend on the data value that is calculated by an instruction. E.g. no exception when producing a floating-point NaN. a sticky bit is set as a side effect.
... Next most important, is to avoid exceptions on the input operands to instruction that come from registers or memory. (Not necessarily CSR values?)
Krste did not say, but I think it should be obvious that we don't care as much about exceptions that are purely dependent on a constant that is in the instruction, since that can be determined by the decoder. We might prefer to avoid it, so that the decoder doesn't need to look at those bits as often. But it is nowhere near as bad as a data value dependent exception that depends on a general-purpose register or memory value.
- general-purpose scalar extra registers?: NO - prefer no data value dependent exceptions
- values read from memory by load or store instructions? NO - prefer not
- CSR register contents? OK
- values read from memory by non-instructions, e.g. by page table walkers: OK
- values in vector registers: NO
Obviously, reduced hardware cost and/or increase performance. But let us explore this further:
Simplistic first generation out of order machines required instructions to be complete before they could be committed/retired/graduated.
Next: you can retire/commit instruction or its result is known as long as it is known not to produce an exception, i.e. as long as it is "safe". integer instructions do not normally produce exceptions, except possibly for overflow. it was already known from in order microarchitectures how to detect safe results more quickly than the full computation, as a way of reducing dataflow latency. (Sometimes 100% accurate, sometimes most of the time accurate although some safe executions might actually be delayed. Nevertheless, still an average latency reduction.)
NEXT: You can actually completely execute before retirement/commit instructions that are guaranteed to have no exceptions on either data input values or data output values.
TBD: pointer to Stephan Jourdain PhD thesis showing that this obtain performance equivalent to a significant increase in out of order window, with a fairly large savings of hardware.
NEXT: you can use this knowledge not just for postretirement execution, but to actually elide large sequences from the retirement stream, improving performance further.
It is obviously easier to do this if you can know at decode time whether such data value dependent exceptions are possible. RISC-V permits such implementations, although does not require them. Data value dependent exceptions can be provided, but are HW implementation specific, and do not necessarily interfere with RISC-V ISA compatibility as long as the trap and emulate code complies
With respect to the common floating-point exceptions, NaNs and denorms, more and more real Implementations just does the whole thing in hardware and/or uses Non-IEEE approaches like flush to zero.
GPUS led the way in this regard, because taking an exception on a GPU is even more difficult than it is on a CPU. And in fact, historically many GPUs but not/cannot take exceptions at all. even page-fault support for GPUs is only recent. Doing it all in hardware avoided these issues.
At first GPUs tended to use flushed zero per denorms, And what ever for NaNs support.
As GPUs became more commonly used in HPC applications, NaN became more common. eventually, denorm support in hardware, directly, without slowdown.
It is possible to imagine user level code that takes advantage of exceptions. For the purposes of this page, date of value dependent exceptions, like integer overflow in a dynamically typed language, to overflow from a 32 or 64-bit integer into BigNum handling.
(The literature, mostly academic but some infamous industry implementations, and not just LISP machines, has also proposed user level exceptions for things like dynamic on-the-fly profiling for jitters. The sort of thing posed by Using hardware counters to improve dynamic compilation, but with less OS overhead. some have even proposed taking such exceptions on cache fills and evictions.)
However, there is no good standard for such exceptions wrt user code. Neither with exceptions directly going to user level exception handlers, no not even with the exception going to the operating system, but the OS providing services to the user process.
(What about UNIX/Linux signals? Received wisdom is The three most common things to do in a signal handler are (i) set a flag variable and return immediately, and (ii) (messy) throw away all the program was doing, and restart at some convenient point, perhaps the main command loop or so, and (iii) clean up and exit. Some might go so far as to say (1) and (3) these are the only things you can safely and portably do in a signal handler, and (2) is possible only with knowledge of specific ABIs, compilers, and runtimes, and is therefore much less portable.)
Exception handling is portable at the OS level, with respect to page-faults, etc. but this is of much less interest with spec to data value dependent exceptions, except as described elsewhere.
Lacking standards, there is much less motivation to provide user level exception handlers. And in particular data value dependent exceptions.
This has not always been the case: for example, at the dawn of RISC programming languages such as Pascal that required integer overflow detection were common. Hence early RISC ISA's like MIPS dedicating a large fraction of their instruction encoding space to having both overflow handling and non-overflow handling of most common integer instructions. (And also leading to confusion such as ADD being the mnemonic for signed integer add with overflow detection, and ADDU being the mnemonic not just for unsigned addition, but also for signed addition without integer overflow trapping.)
We are all familiar with debug breakpoints or watch points that generate exceptions at memory addresses or program counters.
It is also possible and often useful to define debug breakpoints on particular data values. Typically comparing bits under a certain mask.
The rule on no data value dependent exceptions for RISC-V means that this cannot be part of the architecture. As usual, it can be provided in an implementation specific manner.
See also instruction encoding or opcode dependent debug exception.
Remember the infamous Intel F div bug?
Bad results for only a few particular input values?
No data value dependent exceptions makes this harder to patch. e.g. you cannot use #data value debug breakpoints. (Although Intel did not do this - the patches on existing silicon intercepted all FDIVs.)
Integer overflows are one of the most common causes of security bugs exploited by malware. Often not the integer overflow of itself, but because the overflow or zero divide results in a bad memory address.
RISC-V does not provide integer overflow exceptions. The unprivspec says
-
We did not include special instruction-set support for overflow checks on integer arithmetic
operations in the base instruction set, as many overflow checks can be cheaply implemented using
18 Volume I: RISC-V Unprivileged ISA V20191213
RISC-V branches. Overflow checking for unsigned addition requires only a single additional
branch instruction after the addition: `add t0, t1, t2; bltu t0, t1, overflow`.
For signed addition, if one operand’s sign is known, overflow checking requires only a single
branch after the addition: `addi t0, t1, +imm; blt t0, t1, overflow`. This covers the
common case of addition with an immediate operand.
For general signed addition, three additional instructions after the addition are required,
leveraging the observation that the sum should be less than one of the operands if and only if the
other operand is negative.
add t0, t1, t2 slti t3, t2, 0 slt t4, t0, t1 bne t3, t4, overflowIn RV64I, checks of 32-bit signed additions can be optimized further by comparing the results of ADD and ADDW on the operands.
This is all well and good if the integer code in question was written or compiled or generated with such overflow checks.
It doesn't help you if the code lacks such checks, and the need for them was only discovered after in the system.
=> It makes it harder to patch security bugs in the field. Not necessarily much harder: you can always patch the binaries, if there are not too many of them. Or you can DBT. But it removes an option or patching security bugs in the field.
Once again, RISC-V implementations can always provide implementation specific trap and emulate.
Note: integer overflow exceptions are NOT something you can trap and emulate to remain compatible with the RISC-V ISA definition. Of necessity, if you are going to do things like killing a process or handling the integer overflow in a safe way, you are extending the ISA, not implementing the existing definition. The only way you can be compatible with the existing definition is to provide a binary patch.
(or possibly an instruction encoding or opcode dependent debug exception)
note: this and some other similar statements are my own opinion, and certainly not those of the RISC-V organization. I should probably move them out of this RISC-V GitHub repo/wiki, and move it to my personal computer architecture repo/wiki. But nevertheless I think they are relevant RISC-V discussions, although certainly not appropriate to actual RISC-V proposals
This no data value dependent exceptions means that there can be no precise exceptions related to error correcting codes on memory loads.
It is pretty standard practice to not provide such exceptions for correctable ECC errors, since execution can proceed.
It is by now pretty standard practice not to provide such precise exceptions for non-correctable ECC errors. which basically means that as part of the standard architecture you cannot recover from an un-correctable ECC error. At least not without a lot of knowledge of the program under execution, knowledge that would allow you to determine that the value that is exposed to the instruction set architecture is not actually depended on by the user program.
IIRC This means that RISC-V cannot attain IBM 360 levels of RAS, since IIRC the IBM 360 or Z series machine check architecture permits such precise exceptions.
Note: the Intel machine check architecture supports but does not require much precise exceptions for non-correctable ECC errors. The machine check status register can indicate that the exception is precise, although for the most part AFAIK all Intel implementation so far have made all machine check exceptions imprecise.
(ARM?)
Note: any implementation that has a store buffer may very well have imprecise on correctable ECC errors at the time the store is committed. For that matter, microarchitectures that have small cache lines in inner caches and larger cache lines in outer caches will have similarly imprecise on correctable ECC errors on eviction/merges.
More and more systems have ECC on registers. Same considerations apply.
It is highly desirable to be able to scrub ECC errors from wherever they are part. Obviously from DRAM with ECC, but also from cache lines with ECC, registers, ...
If there is a user level instruction that accomplishes this, call it CBO.SCRUB.$id (which might be CMO.FLUSH.$id, or which might be a separate scrub instruction) then no data value dependent exceptions means:
- either the CBO.SCRUB must perform the scrubbing and hardware, without requiring software help
- or software should look at some return value from CBO.SCRUB
- or an implementation dependent data value dependent trap may be used, to emulate the scrubbing
It is highly likely that certain privilege levels are allowed to perform some CMOs directly, but not others.
For example, user level CMOs
- might be allowed to perform not non-lossy operations like FLUSH and CLEAN, but not lossy operations like DISCARD.
- They might also be not allowed to perform operations like CMO.ALL, because of interrupt blocking and/or forward progress issues.
- similarly, they might not be allowed to perform operations like CMO.UX, the cache line/entry (set,way) of a cache.
- Because of excessive FUD of process migration flushing entire cache via CMO.ALL and CMO.UX
- we can't guarantee that flushes of an entire cache can be migrated or otherwise permitted to user mode
- but it is obvious that certain such entire cache flushes can be migrated or otherwise permitted to user mode
- e.g. instantaneous invalidations
- e.g. flushes of shared coherent caches
- Because of excessive FUD of process migration flushing entire cache via CMO.ALL and CMO.UX
So unless we can group these things in such a way that the interception can be based on DSR or instruction encoding, essentially we would say that you cannot intercept such operations at fine-grain. You have to intercept all in their class, or none.
This is of course also applicable not just to user level CMOs but also to guest operating system CMOs.
I suspect that this could be a significant impediment to virtualization/hypervisor performance. In general, when you're building hypervisor/virtual machine system, you want to allow the guest operating system to be able to do as many things unmediated as possible.
Once again: This only applies to code that is supposed to run everywhere on a RISC-V system.
- implementations are always capable of providing implementation dependent data value dependent exceptions
Don't get confused by statements such as the following in the unprivspec (Volume 1, Unprivileged Spec v. 20191213)
"Exceptional conditions" does not mean "exception that redirects control flow out of user code to an exception handler". it just means that The RISC-V ISA requires the appropriate values, e.g. the canonical man, to be produced and placed in the destination register, and the appropriate sticky flags to be set in the floating-point status register.
Also, observe that implementations are allowed to provide data dependent traps to machine mode software handlers". the architecture just does not require it of all implementations. the trap handler, if it returns, is required to put values in destination register and floating-point status flags as required by the ISA.
Ditto subnormal numbers: "Operations on subnormal numbers are handled in accordance with the IEEE 754-2008 standard." Or perform an implementation specific trap and emulate. #implementation specific trap and emulate
At least when I am first writing this I find it really hard to express in passive third-party language, without referring to people who have explained it to me. Especially since I do not want to give the impression that this is my idea. in fact, I disagree with some aspects of it, although I'm happy to go along with it[*]. I mainly want to share my understanding, as conveyed to me by Krste and Andrew, as well as the conjectures and thoughts from my discussions with other members of the CMO TG, etc.
Note *: hmm, perhaps I should say that this is my idea, since sometimes it seems to me like the best way to get an idea rejected is to say that I think it's a good idea.