ARM Cortex M RTOS Context Switching - JohnHau/mis GitHub Wiki

30 Oct 2019 by Chris Coleman Many embedded systems reach a level of complexity where having a basic set of scheduling primitives and ability to run different tasks can be helpful. The operation of switching from one task to another is known as a context switch. A Real Time Operating System (RTOS) will typically provide this functionality. Having a foundational knowledge of how the core of an RTOS works can be a valuable skill set for an embedded engineer to have.

In this article we will explore how context switching works on ARM Cortex-M MCUs. We will discuss how the hardware was designed to support this operation, features that impact the context switching implementation such as the Floating Point Unit (FPU), and common pitfalls seen when porting an RTOS to a platform. We will also walk through a practical example of analyzing the FreeRTOS context switcher, xPortPendSVHandler, utilizing gdb to strengthen our understanding.

If you’d rather listen to me present this information and see some demos in action, watch this webinar recording.

Like Interrupt? Subscribe to get our latest posts straight to your mailbox.

Table of Contents Cortex-M ARM MCU Features Cortex-M Operation Modes Registers Stack Pointers Context State Stacking RTOS Context Switching Demystifying the FreeRTOS Context Switcher The Port Compiling the code and launching it with GDB Context Switching Starting the FreeRTOS Scheduler Closing Reference Links Cortex-M ARM MCU Features To understand how RTOS context switching works for ARM Cortex-M MCUs, it’s critical to have foundational knowledge about the primitives the architecture provides to make it possible.

In this section we go through these building blocks by distilling down the information spread across the ARM Cortex-M reference manuals and the ARM Architecture Procedure Calling Standard (AAPCS)1 which defines the Application Binary Interface (ABI) a compiler must abide by for ARM.

NOTE: If you already have a good understanding of these concepts, feel free to switch over this section (pun intended).

Cortex-M Operation Modes When a Cortex-M based MCU is running from an exception handler such as an Interrupt Service Routine (ISR), it is known as running in Handler Mode. The rest of the time the MCU runs in Thread Mode.

The core can operate at either a privileged or unprivileged level. Certain instructions and operations are only allowed when the software is executing as privileged. For example, unpriviledged code may not access NVIC registers. In Handler Mode, the core is always privileged. In Thread Mode, the software can execute at either level.

Switching Thread Mode from the unprivileged to privileged level can only happen when running from Handler Mode.

These different configurations enable use cases where certain application code, such as the RTOS kernel, can be better sandboxed from one another. We will cycle back to this terminology throughout the article.

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

Compiling the code and launching it with GDB

Compile the code

$ make Compiling main.c Compiling startup.c Compiling freertos_kernel/tasks.c Compiling freertos_kernel/queue.c Compiling freertos_kernel/list.c Compiling freertos_kernel/timers.c Compiling freertos_kernel/portable/GCC/ARM_CM4F/port.c Compiling freertos_kernel/portable/MemMang/heap_1.c Linking library Generated build/nrf52.elf

In one terminal, start a GDB Server

$ JLinkGDBServer -if swd -device nRF52840_xxAA SEGGER J-Link GDB Server V6.52a Command Line Version

Flash the code on the NRF52 and start gdb

$ arm-none-eabi-gdb-py --eval-command="target remote localhost:2331" --ex="mon reset" --ex="load" --ex="mon reset" --se=build/nrf52.elf GNU gdb (GNU Tools for Arm Embedded Processors 8-2019-q3-update) 8.3.0.20190703-git Copyright (C) 2019 Free Software Foundation, Inc. [...] Resetting target Loading section .interrupts, size 0x40 lma 0x0 Loading section .text, size 0x194d lma 0x40 Loading section .data, size 0x4 lma 0x1990 Start address 0x40, load size 6545 Transfer rate: 2130 KB/sec, 2181 bytes/write. Resetting target (gdb)

image

A context switch may also occur outside of this if a task decides to yield (portYIELD).

Both of these paths trigger the PendSV exception, where the real magic happens:

/ FreeRTOSConfig.h

#define vPortSVCHandler SVC_Handler #define xPortPendSVHandler PendSV_Handler #define xPortSysTickHandler SysTick_Handler

// port.c void xPortPendSVHandler( void ) { /* This is a naked function. */

__asm volatile
    (
    "       mrs r0, psp                     \n"
    "       isb                             \n"
    "                                       \n"
    "       ldr     r3, pxCurrentTCBConst   \n" /* Get the location of the current TCB. */
    "       ldr     r2, [r3]                \n"
    "                                       \n"
    "       tst r14, #0x10                  \n" /* Is the task using the FPU context?  If so, push high vfp registers. */
    "       it eq                           \n"
    "       vstmdbeq r0!, {s16-s31}         \n"
    "                                       \n"
    "       stmdb r0!, {r4-r11, r14}        \n" /* Save the core registers. */
    "       str r0, [r2]                    \n" /* Save the new top of stack into the first member of the TCB. */
    "                                       \n"
    "       stmdb sp!, {r0, r3}             \n"
    "       mov r0, %0                      \n"
    "       msr basepri, r0                 \n"
    "       dsb                             \n"
    "       isb                             \n"
    "       bl vTaskSwitchContext           \n"
    "       mov r0, #0                      \n"
    "       msr basepri, r0                 \n"
    "       ldmia sp!, {r0, r3}             \n"
    "                                       \n"
    "       ldr r1, [r3]                    \n" /* The first item in pxCurrentTCB is the task top of stack. */
    "       ldr r0, [r1]                    \n"
    "                                       \n"
    "       ldmia r0!, {r4-r11, r14}        \n" /* Pop the core registers. */
    "                                       \n"
    "       tst r14, #0x10                  \n" /* Is the task using the FPU context?  If so, pop the high vfp registers too. */
    "       it eq                           \n"
    "       vldmiaeq r0!, {s16-s31}         \n"
    "                                       \n"
    "       msr psp, r0                     \n"
    "       isb                             \n"
    "                                       \n"
    "                                       \n"
    "       bx r14                          \n"
    "                                       \n"
    "       .align 4                        \n"
    "pxCurrentTCBConst: .word pxCurrentTCB  \n"
    ::"i"(configMAX_SYSCALL_INTERRUPT_PRIORITY)
);

}

image

image

// step over the first two instructions (can also type "si 2" for short) (gdb) step instruction 2 Moving on we have:

    "       ldr     r3, pxCurrentTCBConst   \n" /* Get the location of the current TCB. */
    "       ldr     r2, [r3]                \n"

[...] "pxCurrentTCBConst: .word pxCurrentTCB \n" What’s happening here is a label with the location of the C variable, pxCurrentTCB, is loaded into $r3. Then the value of pxCurrentTCB gets loaded into $r2. We can confirm this by stepping through the two instructions and comparing the registers with the C types.

(gdb) step instruction (gdb) p/x $r3 $23 = 0x20000008 (gdb) p &pxCurrentTCB $24 = (TCB_t * volatile *) 0x20000008 (gdb) step instruction (gdb) p/x $r2 $25 = 0x20000610 (gdb) p pxCurrentTCB $26 = (TCB_t * volatile) 0x20000610 <ucHeap+1208>

Next we have:

    "       tst r14, #0x10                  \n" /* Is the task using the FPU context?  If so, push high vfp registers. */
    "       it eq                           \n"
    "       vstmdbeq r0!, {s16-s31}         \n"

This set of instructions checks to see if the FPU Context was active prior to exception entry. This can be resolved by reading the information passed via the $lr register on exception entry (more details above). If bit 5 is 0, the FPU is active, otherwise it is not. The tst (Test) instruction performs a logical AND operation between the register and the immediate value provided (so in this case r14 & 0x10). It then populates condition flags in the PSR register based on the result. Condition flags available include a “Zero condition flag” which means the result of the AND was zero.

The it (If-Then) instruction is then used to conditionally execute further instructions based on the current state in the condition flags. it eq is shorthand for “if the result of the last comparison was zero then execute the instruction that follows”. The following instruction pushes the callee-saved floating point registers onto the psp (currently stored in r0). In our case the result is not zero so this instruction should be skipped. We should see the psp remains unchanged from the value we originally dumped:

(gdb) p/x $r0 $1 = 0x200005a0 (gdb) si 4 (gdb) x/i $pc => 0x1446 <PendSV_Handler+22>: stmdb r0!, {r4, r5, r6, r7, r8, r9, r10, r11, lr} (gdb) p/x $r0 $1 = 0x200005a0 WARNING: Over the years I’ve seen a lot of nasty stack overflows arise here which can be tricky to track down. As soon as an FPU instruction is used an additional 132 bytes will be pushed on the stack, which can lead to unexpected overflows of small embedded stacks

This brings us to the next part which is pretty self explanatory:

    "       stmdb r0!, {r4-r11, r14}        \n" /* Save the core registers. */
    "       str r0, [r2]                    \n" /* Save the new top of stack into the first member of the TCB. */

We push all the callee-saved core registers onto psp using the stmdb (Store Multiple Decrement Before stores multiple registers) instruction and then update the first word in our pxCurrentTCB pointer with the updated stack location (stored in r0).

pxCurrentTCB is a FreeRTOS symbol that is always populated with the running task. A TCB (Task Control Block) contains various state associated with the task. Looking at the source code we see the first word is:

typedef struct tskTaskControlBlock /_ The old naming convention is used to prevent breaking kernel aware debuggers. _/ { volatile StackType_t pxTopOfStack; /< Points to the location of the last item placed on the tasks stack. THIS MUST BE THE FIRST MEMBER OF THE TCB STRUCT. */ [...] We can also look confirm this is what is happening by comparing the C types with register values within gdb:

(gdb) si 0x0000144a 435 __asm volatile (gdb) p/x $r0 $2 = 0x2000057c (gdb) x/i $pc => 0x144a <PendSV_Handler+26>: str r0, [r2, #0] (gdb) p/x pxCurrentTCB->pxTopOfStack $3 = 0x200005a4 (gdb) si 0x0000144c 435 __asm volatile (gdb) p/x pxCurrentTCB->pxTopOfStack $4 = 0x2000057c Awesome! At this point we have saved all the register state of the original task and recorded that location within the task specific TCB context. Now it’s time to actually context switch over to a new task:

    "       stmdb sp!, {r0, r3}             \n"
    "       mov r0, %0                      \n"
    "       msr basepri, r0                 \n"
    "       dsb                             \n"
    "       isb                             \n"
    "       bl vTaskSwitchContext           \n"

[...] ::"i"(configMAX_SYSCALL_INTERRUPT_PRIORITY) We see the port uses a GCC feature known as Extended Asm15 to mix C macros with ARM assembly. This block prepares to call the context switch logic, vTaskSwitchContext, a C function which determines the next task to run. First the “argument” registers are saved on the active stack (always msp for exceptions). Next interrupts below configMAX_SYSCALL_INTERRUPT_PRIORITY are disabled since it is only safe to access the data structures accessed by vTaskSwitchContext without interruption. If interrupts were not disabled, the context switching code could be preempted and a call to a FreeRTOS *_FromISR() API could corrupt the data structure.

When lowering the effective execution level, an isb instruction is required for the new priority to be visible for future instructions. The dsb instruction shouldn’t be explicitly necessary here16. Finally we call the C function. From FreeRTOS documentation we can conclude what will happen:

/*

  • THIS FUNCTION MUST NOT BE USED FROM APPLICATION CODE. IT IS ONLY
  • INTENDED FOR USE WHEN IMPLEMENTING A PORT OF THE SCHEDULER AND IS
  • AN INTERFACE WHICH IS FOR THE EXCLUSIVE USE OF THE SCHEDULER.
  • Sets the pointer to the current TCB to the TCB of the highest priority task
  • that is ready to run. */ portDONT_DISCARD void vTaskSwitchContext( void ) PRIVILEGED_FUNCTION; So when the function returns pxCurrentTCB should be populated with the new task to switch to. Let’s try it out!

// display the name (gdb) p pxCurrentTCB->pcTaskName $6 = "Ping", '\000' <repeats 11 times> (gdb) si 5 0x00001460 435 __asm volatile (gdb) x/i $pc => 0x1460 <PendSV_Handler+48>: bl 0x760 // step over the function call using "next instruction" ("ni") (gdb) ni 0x00001464 435 __asm volatile (gdb) x/i $pc => 0x1464 <PendSV_Handler+52>: mov.w r0, #0 (gdb) p pxCurrentTCB->pcTaskName $7 = "Pong", '\000' <repeats 11 times> We see that the pxCurrentTCB has changed from the “Ping” task to the “Pong” task.

Upon return from the function call, all interrupts are re-enabled by resetting basepri to 0 and the initial values of the argument registers ($r0-$r3) prior to the function invocation are restored by popping them off the stack. No synchronization instructions are required for the msr call because the ARM core will actually take care of this for you when the execution priority increases 17. Let’s step over this block:

    "       mov r0, #0                      \n"
    "       msr basepri, r0                 \n"
    "       ldmia sp!, {r0, r3}             \n"

(gdb) si 3 0x0000146e in PendSV_Handler () at freertos_kernel/portable/GCC/ARM_CM4F/port.c:435 435 __asm volatile (gdb) x/i $pc => 0x146e <PendSV_Handler+62>: ldr r1, [r3, #0] Now it’s time to actually start up the new task. Recall when we saved the “Ping” task state above, we placed the location of the task stack in pxTopOfStack. So to recover the task state of the “Pong” task we just need to do the opposite. First this requires loading up the new TCB_t that pxCurrentTCB points to:

    "       ldr r1, [r3]                    \n" /* The first item in pxCurrentTCB is the task top of stack. */
    "       ldr r0, [r1]                    \n"

(gdb) p/x pxCurrentTCB $8 = 0x200003b0 (gdb) si 0x00001470 435 __asm volatile (gdb) p/x $r1 $11 = 0x200003b0 (gdb) p/x pxCurrentTCB->pxTopOfStack $12 = 0x20000324 (gdb) si 0x00001472 435 __asm volatile (gdb) p/x $r0 $13 = 0x20000324 r0 now holds a pointer to the top of the stack for the task we want to switch to. First we pop the callee-saved core registers using the ldmia (Load Multiple Increment After) instruction, then we check the restored value in the $lr / $r14 register to determine if there is any FPU state which needs to be restored as well (in our case, it does not):

    "       ldmia r0!, {r4-r11, r14}        \n" /* Pop the core registers. */
    "                                       \n"
    "       tst r14, #0x10                  \n" /* Is the task using the FPU context?  If so, pop the high vfp registers too. */
    "       it eq                           \n"
    "       vldmiaeq r0!, {s16-s31}         \n"
    "                                       \n"

(gdb) si 4 0x00001480 435 __asm volatile (gdb) x/i $pc => 0x1480 <PendSV_Handler+80>: msr PSP, r0 Now r0 points to the location of the program stack exactly as it was when the “Pong” task originally got context switched out via the PendSV exception handler! Let’s take a look at the stack just like we did on exception entry:

(gdb) x/a $r0 // contains $r0 value for "Pong" task 0x20000348 <ucHeap+496>: 0x0 <g_pfnVectors> (gdb) < enter for $r1 > 0x2000034c <ucHeap+500>: 0x200003b4 <ucHeap+604> (gdb) < enter for $r2 > 0x20000350 <ucHeap+504>: 0x20000000 (gdb) < enter for $r3 > 0x20000354 <ucHeap+508>: 0x10000000 (gdb) < enter for $r12 > 0x20000358 <ucHeap+512>: 0x0 <g_pfnVectors> (gdb) < enter for $r14 / $lr > 0x2000035c <ucHeap+516>: 0x5d9 <xTaskResumeAll+40> (gdb) < enter for ReturnAddress - the pc that should be fetched upon return > 0x20000360 <ucHeap+520>: 0xe2a <xQueueReceive+246> (gdb) < psr value> 0x20000364 <ucHeap+524>: 0x61000000 The last thing we need to do is change the location of the psp to match the value in r0 and populate the link register (r14) with the special EXC_RETURN value we just recovered from the “Pong” task stack with the ldmia instruction. This will tell the hardware how return to Thread Mode and restore the context state that was automatically saved correctly:

    "       msr psp, r0                     \n"
    "       isb                             \n"
    "                                       \n"
    "                                       \n"
    "       bx r14                          \n"
    "                                       \n"

Based on the dump of the top of the “Pong” stack we did above, we expect to see $sp=0x20000368, $pc=0xe2a, $lr=0x5d9 after the branch to r14. Let’s give it a try:

// state prior to branching (gdb) info reg [...] sp 0x20002a88 0x20002a88 lr 0xfffffffd 4294967293 pc 0x1480 0x1480 <PendSV_Handler+80> xpsr 0x2100000e 553648142 msp 0x20002a88 536881800 psp 0x200005a0 536872352 [...] (gdb) si 3 // state after branching xQueueReceive (xQueue=0x20000160 <ucHeap+8>, pvBuffer=pvBuffer@entry=0x2000039c <ucHeap+580>, xTicksToWait=, xTicksToWait@entry=4294967295) at freertos_kernel/queue.c:1378 1378 portYIELD_WITHIN_API(); (gdb) info reg sp 0x20000368 0x20000368 <ucHeap+528> lr 0x5d9 1497 pc 0xe2a 0xe2a <xQueueReceive+246> xpsr 0x61000000 1627389952 msp 0x20002a88 536881800 psp 0x20000368 536871784 (gdb) bt #0 xQueueReceive (xQueue=0x20000160 <ucHeap+8>, pvBuffer=pvBuffer@entry=0x2000039c <ucHeap+580>, xTicksToWait=, xTicksToWait@entry=4294967295) at freertos_kernel/queue.c:1378 #1 0x00000084 in prvQueuePongTask (pvParameters=) at main.c:41 #2 0x00001334 in ?? () at freertos_kernel/portable/GCC/ARM_CM4F/port.c:703 They values match! We’ve successfully completed a context switch and are running the pong task!

Starting the FreeRTOS Scheduler The astute observer reader may wonder how the scheduler starts in the first place. What happens if a PendSV gets triggered but there isn’t a currently running task because the system just booted?!

There are several different strategies but a common pattern an RTOS will follow when creating a new task is to initialize the task stack to look like it had been context switched out by the scheduler. Then to start the scheduler itself by triggering a SVC exception with the svc instruction. This way starting a thread is nearly identical to context switching to a thread.

During initialization you will also usually find a couple extra configuration settings such as:

Configuration as to whether or not tasks operate at privileged or unprivileged level FP Extension configuration (i.e whether or not the FPU is enabled and what context stacking schema to use). For example, the port used in the example does the following FPU configuration: static void vPortEnableVFP( void ) { __asm volatile ( " ldr.w r0, =0xE000ED88 \n" /* The FPU enable bits are in the CPACR. / " ldr r1, [r0] \n" " \n" " orr r1, r1, #( 0xf << 20 ) \n" / Enable CP10 and CP11 coprocessors, then save back. / " str r1, [r0] \n" " bx r14 " ); } [...] BaseType_t xPortStartScheduler( void ) { [...] / Ensure the VFP is enabled - it should be anyway. */ vPortEnableVFP();

/* Lazy save always. */
*( portFPCCR ) |= portASPEN_AND_LSPEN_BITS;

[...] To start FreeRTOS and get tasks to runvTaskStartScheduler needs to be called. If you are interested in taking a closer look at this logic, I’d recommend looking at are pxPortInitialiseStack, xPortStartScheduler, and vPortSVCHandler functions in port.c.

Closing We hope you learned something interesting about how the ARM Cortex-M architecture hardware helps to enable multi-tasking and developed a better understanding of how the FreeRTOS implementation works.

We’d love to hear interesting RTOS bugs you have tracked down or other topics you would like to see covered on the topic. Let me know in the discussion area below!

Interested in learning more about debugging HardFaults? Watch this webinar recording..

See anything you'd like to change? Submit a pull request or open an issue at GitHub

Reference Links ARM Architecture Procedure Calling Standard(AAPCS) ↩ ↩2 ↩3 ↩4 ↩5

ARMv7-M Architecture Reference Manual ↩ ↩2 ↩3 ↩4

ARMv8-M link ↩

Cortex-M4F Lazy Stacking and Context Switch App note ↩ ↩2

See B1.5.6 Exception entry behavior ↩

FreeRTOS ↩

nRF52840 Development Kit ↩

JLinkGDBServer ↩

GNU ARM Embedded toolchain for download ↩

Creating a FreeRTOS Project ↩

Github FreeRTOS Kernel ↩ ↩2

FreeRTOS Heap documentation ↩

FreeRTOSConfig.h documentation ↩

ISB after mrs ↩

Extended Asm ↩

Discussion about DSB in FreeRTOS port ↩

See “Visibility of changes in execution priority resulting from executing an MSR instruction” ↩

Chris Coleman is a founder and CTO at Memfault. Prior to founding Memfault, Chris worked on the embedded software teams at Sun, Pebble, and Fitbit.

⚠️ **GitHub.com Fallback** ⚠️