Cortex M7 sample boot up flow - MarekBykowski/readme GitHub Wiki
Cortex-M7 (ARMv7-M, 500 MHz)
┌──────────────┼──────────────┐
│ │ │
I-TCM (32 KB) D-TCM (32 KB) AXIM (64-bit)
64-bit port 2x32-bit port │
│
I-Cache 32 KB (64-bit)
D-Cache 32 KB (32-bit)
│
AXI Master
│
NoC @300 MHz
│
┌───────────────────────┬──────────────────────────┬────────────────────┐
│ │ │ │
BOOT & LIB ROM MAIN RAM (~5 MB) qSPI Controller Other AXI
928 KB (32/64-bit wide) │ Slaves
│
Memory-Mapped
Interface
│
HyperRAM
(External)
Memory map
| Address Range | Label |
|---|---|
| 0x00000000–0x1FFFFFFF | Code |
| 0x20000000–0x3FFFFFFF | SRAM |
| 0x40000000–0x5FFFFFFF | Peripheral |
| 0x60000000–0x9FFFFFFF | External memory |
| 0xE0000000–0xE00FFFFF | Private Peripheral Bus |
| 0xE0100000–0xFFFFFFFF | Vendor-specific memory |
Use a DSB, followed by an ISB instruction or exception return to ensure that the new MPU configuration is used by subsequent instructions.
Interupt handling
- Thread mode: Unprivileged
- Handler mode (Privileged)
- Handler runs
- Exception return
- Back to Thread mode (Unprivileged)
Debugging
The debugging hardware of the Cortex-M processor is based on the CoreSight architecture.
Unlike traditional ARM processors, the CPU core itself does not have a Joint Test Action Group (JTAG) interface. Instead, a debug interface module is decoupled from the core, and Debug Access Port (DAP) is provided at the core level. Through this bus interface, external debuggers can access control registers to debug hardware as well as system memory, even when the processor is running.
Chip manufacturers can also include an Embedded Trace Macrocell (ETM) to allow instruction trace.
The data watchpoint function is provided by a Data Watchpoint and Trace (DWT).
Execution Paths
Operation type/path
| Operation Type | Path Used |
|---|---|
| Instruction fetch | I-Code |
| Vector fetch | I-Code |
| Data load | D-Code/System |
| Data store | System |
| Exception stacking | System |
| PPB access | System |
| ITCM fetch | ITCM |
| DTCM access | DTCM |
ARM recommends that you locate the vector table in either the CODE, SRAM, External RAM, or External Device areas of the system memory map.
Using the Peripheral, Private peripheral bus, or Vendor-specific memory areas can lead to unpredictable behavior in some systems.
This is because the processor uses different interfaces for load/store instructions and vector fetch in these memory areas.
If the vector table is located in a region of memory that is cacheable, you must treat any load or store to the vector as self-modifying code and use cache maintenance instructions to synchronize the update to the data and instruction caches
If code in ITCM:
Core → ITCM (direct, no cache, deterministic)
If code in MAIN RAM:
Core → I-Cache → AXI → NoC → MAIN RAM
If code in HyperRAM:
Core → I-Cache → AXI → NoC → qSPI → HyperBus → HyperRAM
This path is:
- Longer
- Higher latency
- Dependent on qSPI configuration
- Dependent on burst mode
- Dependent on dummy cycles
Executing from HyperRAM means:
- Instruction fetch goes through I-Cache
- Cache miss triggers AXI burst
- AXI request goes through NoC
- Hits qSPI
- qSPI translates to HyperBus transaction
- HyperRAM returns data
If any of these is misconfigured:
- wrong latency
- no burst
- MPU region wrong
- cache disabled
- qSPI not in memory-mapped mode
→ HardFault / IBUSERR / PRECISERR
If your Cortex-M7 (ARMv7-M) crashes when executing code from HyperRAM, this is almost always a cache + MPU + memory attribute issue — not the CPU itself.
MPU not configured for executable memory
By default:
- External memory may be marked XN (Execute Never)
- Or as Device memory
- Or as Strongly Ordered
If MPU region is wrong → HardFault on first instruction fetch
Check:
- CFSR
- HFSR
- MMFSR
I-Cache enabled but memory marked non-cacheable
If:
- I-Cache ON
- HyperRAM region is Normal memory but incorrectly configured
You can get:
- Prefetch abort
- Bus fault
- Random crash
HyperRAM latency too slow for instruction fetch
HyperRAM is:
- High latency
- External
- Variable latency (refresh inside device)
- Instruction fetch is very timing sensitive.
If:
- Controller not configured for burst
- Latency too low
- Wrong dummy cycles
CPU fetch → invalid instruction → crash.
D-Cache coherence issue
If code was copied to HyperRAM D-Cache was not cleaned then execution jumps there and CPU fetches stale data.
You should:
SCB_CleanDCache_by_Addr(...)
SCB_InvalidateICache();
__DSB();
__ISB();
before jumping.
HyperRAM controller not configured for memory-mapped mode
Many OctoSPI/HyperBus controllers require:
- Memory-mapped mode enabled
- Proper wrap/burst configuration
- Linear addressing mode
If not → instruction fetch = garbage.