Common EXCCAUSEs Explained - mhightower83/Arduino-ESP8266-misc GitHub Wiki

Exception 0

"IllegalInstructionCause - Illegal Instruction" - Behind that description, there are a few possibilities. This commonly happens when the processor tries to execute code that is zeros. There are a few scenarios that cause this:

  1. An interrupt service routine or its call tree is in the ICACHE address space. Its address should start with 0x4010xxxx (IRAM) or 0x4000xxxx(Boot ROM range). If the function reported in the decode starts with 0x402xxxxx, it is executing from the instruction cache/flash (ICACHE). Use the IRAM_ATTR macro on the functions that require access from an ISR context.

  2. inline is a suggestion to the compiler. The compiler could keep the inline function as a unique function or split it between inlining and calling a unique function. When used by an ISR, add __attribute__((__always_inline__)) to the inline function's definition.

  3. Instruction ill used to force an Exception 0. As expected its value is 0x000000.

  4. Beware when using C++ Templates. This use is complicated by the compiler not passing the IRAM_ATTR attribute forward.

  5. (TODO: Need more research on the concerns of class operators. I did not follow the specifics.)

  6. The safest path is to avoid C++ classes for ISRs. And avoid class instantiation inside an ISR.

There may be other rare situations; however, #1 may be the most common. It is a good practice to keep ISRs as short and simple as possible. Allocate any DRAM needed outside the routine and defer more detailed processing to the main loop.

Why do ISRs need to be in IRAM?

The first immediate answer you might think is "to keep execution fast", while true it is not the main reason for the ESP8266. The ESP8266 has three places for executing code ROM, IRAM, and ICACHE (instruction cache - instructions read from flash and saved for repeated execution).

The ICACHE is populated by using the SPI0 bus to read from Flash memory. When executing from ICACHE, the instruction cache has to own access to Flash memory during this time to satisfy a cache miss during execution. For other Flash read/write operations to occur, the ICACHE must give up access for a while. While offline, any attempt to access instructions in the ICACHE address range is read as zero. An ISR or its call tree running from ICACHE is always at risk of an Exception 0. For more details on ISRs see Other Causes for Crashes. Similarly, attempts to read strings or table entries at this time from the ICACHE address range may read zeros.

Exception 2

WIP

"InstructionFetchErrorCause - Processor internal physical address or data error during instruction fetch" - For a "Processor internal physical address" event, registers EXCVADDR and epc1 have an address that does not point to instruction memory. I don't know how to create a "data error during instruction fetch" event.

This exception was reported in these recent issues:

Exception 3

"LoadStoreErrorCause - Processor internal physical address or data error during instruction fetch" - Register EXCVADDR will have the pointer address attempting the load or store operation. And, register EPC1 will have the code address performing the access. The storage must be accessed as aligned 32-bit data elements. Access 8 or 16-bit elements can cause this exception. Unaligned access would fail with Exception 9.

  1. This exception is common when accessing PROGMEM storage incorrectly. To access PROGMEM you need to use the supporting APIs.

  2. You can construct word-aligned pointers to access ICACHE/PROGMEM with some success. However, the compiler tries hard to optimize things. And, each newer compiler does a better job of it. I have seen a word access pointer where a word was read, but only a byte was used. A newer compiler changed the word pointer into a byte pointer read. This works fine for DRAM, but not with IRAM or ICACHE.

Exception 9

"LoadStoreAlignmentCause - Load or store to an unaligned address" - Register EXCVADDR will have the pointer address attempting the load or store operation. And, register EPC1 will have the code address performing the access. A common example is a 16-bit value being addressed at an odd address. Or a 32-bit value split across two 16-bit address slots.

  1. Can be caused by incorrectly accessing PROGMEM storage. To access PROGMEM you need to use the supporting APIs.

  2. To say bad data pointer would be obvious. A common scenario is to use pointers from a block of allocated memory that was previously freed. The freed allocation is reallocated and the pointer is overwritten. Zero your references to freed memory to accelerate this error discovery.

Exception 20

"InstFetchProhibitedCause - An instruction fetch referenced a page mapped with an attribute that does not permit instruction fetch" Attempting to execute code at an address that is not instruction memory. Register EXCVADDR will have the invalid execution address. And, register EPC1 will have the code address accessing. Unfortunately, the SDK does not install a handler for this exception and the Boot ROM's default handler does a breakpoint that is never seen unless you build with gdb. Without gdb, the BP turns into a Hardware WDT Reset.

  1. Calling a NULL callback function or an uninitialized pointer with an invalid instruction memory address value.

  2. A callback function address stored in a block of allocated memory that was previously freed. The freed allocation is reallocated and the value is overwritten.

  3. Calling a weak function that was declared, but no function was defined in the build: no linker errors or warnings for this scenario.

Exception 28

"LoadProhibitedCause - A load referenced a page mapped with an attribute that does not permit loads" - Memory read with an invalid address. For us, these are commonly memory address ranges without any memory behind them on the SoC. Register EXCVADDR will have the invalid pointer address used in the load operation. And register EPC1 will have the code address performing the access.

  1. Reading memory with a NULL or uninitialized pointer containing an invalid address value.

  2. When out of memory (OOM), malloc returns a NULL pointer. Be sure to verify success after malloc or other Heap allocation functions.

  3. Another possible cause is attempting to dereference a read pointer stowed in a previously freed heap allocation. After a subsequent reallocation of the block, the pointer is destroyed. Be sure to zero old references to freed Heap allocations. For debugging, it may also help to zero any reference pointers in a Heap allocation before free.

  4. Attempts to read strings, tables, or any other elements in Flash from an ISR may sometimes read zeros. Pointers to pointers in the Flash address space are more likely to result in this exception than the other data read. See Why do ISRs need to be in IRAM?.

  5. There may be an issue with the VTables: "Flash" option and an ISR using a C++ class with virtual methods. Try VTables: "DRAM" or VTables: "IRAM". Depending on how the VTables are accessed, this could also be expressed as an Exception 0. See Why do ISRs need to be in IRAM?.

Exception 29

"StoreProhibitedCause - A store referenced a page mapped with an attribute that does not permit stores" - Memory writes to an invalid address. For us, these are commonly memory address ranges without any memory behind them on the SoC. Register EXCVADDR will have the invalid pointer address attempting the store operation. And register EPC1 will have the code address performing the access.

  1. Writing memory with a NULL or uninitialized pointer containing an invalid address value.

  2. When out of memory (OOM), malloc returns a NULL pointer. Be sure to verify success after malloc or other Heap allocation functions.

Side note, the SDK may miss checking for a NULL return after malloc. eg. 'ieee80211_setup_ratetable()' does not check for NULL allocation result then crashes with Exception 29 in 'memcpy'. Consider the OOM counter a hint you have run dangerously low on memory.

  1. Another possible cause is attempting to dereference a write pointer stowed in a previously freed heap allocation. After a subsequent reallocation of the block, the pointer is destroyed. Be sure to zero old references to freed Heap allocations. For debugging, it may also help to zero any reference pointers in a Heap allocation before free.