In a previous blog, I showed how you can display the stack of a Micrium OS Kernel-based applications using µC/Probe. In this post, I’ll describe the importance of sizing your stacks at design time and checking task stacks at run‑time to catch stack overflows. I will first explore how to determine the size of task stacks and then go into ways that can be used to detect overflows. I will show different stack overflow detection methods. They are listed in order of the most preferable to the least preferable, based on the likelihood of detecting the overflow.
In a Micrium OS Kernel-based (and most real-time kernels) application, each task requires its own stack. The size of the stack required by a task is application specific. It’s possible to manually figure out the stack space needed by adding up:
Adding all this up is a tedious chore and the resulting number is a minimum requirement. Most likely you would not allocate the size of the stack that precisely so that you can plan for “surprises”. The number you come up with should probably be multiplied by some safety factor, possibly 1.5 to 2.0. The stack usage calculation assumes that the exact path of the code is known at all times, which is not always possible. Specifically, when calling a function such as printf() it might be difficult or nearly impossible to even guess just how much stack space printf() will require. Also indirect function calls through tables of function pointers could be problematic. Generally speaking, start with a fairly large stack space and monitor the stack usage at run-time to see just how much stack space is actually used after the application runs for a while. For more information, you can visit “Exploring the Micrium OS Kernel Built-In Performance Measurements” in the Blog section of the Silicon Labs website (www.silabs.com).
Also, avoid writing recursive code because stack usage is typically non-deterministic with this type of code.
There are really cool and clever compilers/linkers such as Keil and IAR that provide this information in a link map. Specifically, for each function, the link map indicates the worst-case stack usage. However, these tools will not account for indirect calls (i.e. function pointers) or assembly language routines. GCC has partial support by providing per-function stack usage but not a call-graph. This feature clearly enables you to better evaluate stack usage for each task. It is still necessary to add the stack space for a full CPU context plus another full CPU context for each nested ISR (if the CPU does not have a separate stack to handle ISRs), plus whatever stack space is needed by those ISRs. Again, allow for a safety net and multiply this value by some factor.
If your kernel monitors stack usage at run-time then it’s a good idea to display that information and keep an eye on your stacks while developing and testing the product. Stack overflows are common and can lead to some curious behaviors. In fact, whenever someone mentions that his or her application behaves “strangely,” insufficient stack size is the first thing that comes to mind.
Just so that we are on the same page, below is a description of what a stack overflow is. For the sake of discussion, it’s assumed here that stacks grow from high-memory to low-memory. Of course, the same issue occurs when the stack grows in the other direction. Refer to Figure 1.
Figure 1 – Stack Overflow
F1-(1) The CPU’s SP (Stack Pointer) register points somewhere inside the stack space allocated for a task. The task is about to call the function foo() as shown in Listing 1.
Listing 1 – Example of possible stack overflow
F1-(2) Calling foo() causes the CPU to save the return address of the caller onto the stack. Of course, that depends greatly on the CPU and the compiler.
F1-(3) The compiler then adjusts the stack pointer to accommodate for local variables. Unfortunately, at this point, we overflowed the stack (the SP points outside the storage area assigned for the stack) and just about anything foo() does will corrupt whatever data is beyond the stack base. In fact, depending on the code flow, the array might never be used, in which case the problem would not be immediately apparent. However, if foo() calls another function, there is a high likelihood that this will cause something outside the stack to be touched.
F1-(4) So, when foo() starts to execute code, the stack pointer has an offset of 48 bytes from where it was prior to calling foo() (assuming a stack entry is 4 bytes wide).
F1-(5) We typically don’t know what resides here. It could be the stack of another task, it could be variables, data structures or an array used by the application. Overwriting whatever resides here can cause strange behaviors: values computed by another task may not be what you expected and could cause decisions in your code to take the wrong path, or your system may work fine under normal conditions but then fail. We just don’t know and it’s actually quite difficult to predict. In fact, the behavior can change each time you make changes to your code.
There are a number of techniques that can be used to detect stack overflows. Some make use of hardware while some are performed entirely in software. As we will see shortly, having the capability in hardware is preferable since stack overflows can be detected nearly immediately as they happen, which can help avoid those strange behaviors and aid in solving them faster.
Hardware stack overflow detection mechanisms generally trigger an exception handler. The exception handler typically saves the current PC (Program Counter) and possibly other CPU registers onto the current task’s stack. Of course, because the exception occurs when we are attempting to access data outside of the stack, the handler would overwrite some variables or another stack in your application; assuming there is RAM beyond the base of the overflowed stack.
In most cases the application developer will need to decide what to do about a stack overflow condition. Should the exception handler place the embedded system in a known safe state and reset the CPU or simply do nothing? If you decide to reset the CPU, you might figure out a way to store the fact that an overflow occurred and which task caused the overflow so you can notify a user upon reset.
Some processors (unfortunately very few of them) have simple yet highly effective stack pointer overflow detection registers. This feature will however, be available on processed based on the ARMv8-M CPU architecture. When the CPU’s stack pointer goes below (or above depending on stack growth) the value set in this register (let’s call it the SP_Limitregister), an exception is generated. The drawing in Figure 2 shows how this works.
Figure 2 – Using a Stack Limit Register to Detect Stack Overflows
F2-(1) The SP_Limit register is loaded by the context switch code of the kernel when the task is switched in.
F2-(2) The location where the SP_Limit points to could be at the very base of the stack or, preferably, at a location that would allow the exception handler enough room to save enough registers on the offending stack to handle the exception.
F2-(3) As the stack grows, if the SPregister ever goes below the SP_Limit, an exception is generated. As we’ve seen when your code calls a function and uses local variables, the SP register can easily be positioned outside the stack upon entry of a function. One way to reduce the likelihood of this happening is to move the SP_Limit further away from the Stack Base Address.
The Micrium OS Kernel was designed from the get-go to support CPUs with a stack limit register. Each task contains its own value to load into the SP_Limit and this value is placed in the Task Control Block (TCB). The value of the SP_Limit register used by the CPU’s stack overflow detection hardware needs to be changed whenever the Micrium OS Kernel performs a context switch. The sequence of events to do this must be performed in the following order:
1- Set SP_Limit to 0. This ensures the stack pointer is never below the SP_Limit register. Note that I assumed here that the stack grows from high memory to low memory but the concept works in a similar fashion if the stack grows in the opposite direction.
2- Load the SP register.
3- Get the value of the SP_Limit that belongs to the new task from its TCB. Set the SP_Limit register to this value.
The SP_Limit register provides a simple way to detect stack overflows.
Arm Cortex-M processors are typically equipped with an MPU (Memory Protection Unit) which typically monitors the address bus to see if your code is allowed to access certain memory locations or I/O ports. MPUs are relatively simple devices to use but are somewhat complex to setup. However, if all you want to do is detect stack overflows then an MPU can be put to good use without a great deal of initialization code. The MPU is already on your chip, meaning it’s available at no extra cost to you, so why not use it? In the discussion that follows, we’ll setup an MPU region that says “if ever you write to this region, the MPU will trigger a CPU exception.”
One way to setup your stacks is to locate ALL of the stacks together in contiguous memory, starting the stacks at the base of RAM, and locating the C stack as the first stack at the base of RAM as shown in Figure 3.
Figure 3 – Locating Task Stacks Continuously
As the kernel context switches between tasks, it moves a single MPU ‘protection window’ (I will call it the “RED Zone”) from task to task as shown in Figure 4. Note that the RED Zone is located below the base address of each of the stacks. This allows you to make use of the full stack area before the MPU detects an overflow.
Figure 4 – Moving the RED Zone During Context Switches
As shown, the RED Zone can be positioned below the stack base address. The size of the RED Zone depends on a number of factors. For example, the size of the RED Zone on the MPU of a Cortex-M CPU must be a power of 2 (32, 64, 128, 256, etc.). Also, stacks must be aligned to the size of the RED Zone. On processors based on the Armv8-M architecture, this restriction has been removed and MPU region size granularity is 32 bytes. However, with the Armv8-M, you’d use its stack limit register feature. The larger the RED Zone, the more likely we can detect a stack overflow when a function call allocates large arrays on the stack. However, locating RED Zones below the stack base address has other issues. For one thing, you cannot allocate buffers on a task’s stack and pass that pointer to another task because it’s possible that the allocated buffer would be overlaid by the RED Zone thus causing an exception. However, allocating buffers on a task’s stack is not good practice anyway, so getting slapped by an MPU violation is a kind punishment.
You may also ask: “Why should the C stack be located at the start of RAM?”. Because in most cases, once multitasking has started, the C stack is never used and is thus lost. Overflowing into RAM that is no longer used might not be a big deal but, technically, it should not be allowed. Having the C stack RAM simply allows us to store the saved CPU registers that are stacked on the offending task’s stack during an MPU exception sequence.
If you are not able to allocate storage for your tasks in continuous memory as I outlined in the previous section then we need to use the MPU differently. What we can do here is to reserve a portion of RAM towards the base of the stack and, if anything gets written in that area then we can generate an exception. The kernel would reconfigure the MPU during a context switch to protect the new task’s stack. This is shown in Figure 5.
Figure 5 – Locating the RED Zone inside a Task’s Stack
Again, the size of the RED Zone depends on a number of factors. As previously discussed, for the MPU on a Cortex-M CPU (except for Armv8-M), the size must be a power of 2 (32, 64, 128, 256, etc.). Also, stacks must be aligned to the size of the RED Zone. The larger the RED Zone, the more likely we can detect a stack overflow when a function call allocates large arrays on the stack. However, in this case, the RED Zone takes away storage space from the stack because, by definition, a write to the RED Zone will generate an exception and thus cannot be performed by the task. If the size of a stack is 512 bytes (i.e. 128 stack entries for a 32-bit wide stack), a 64-byte RED Zone would consume 12.5% of your available stack and thus leave only 448 bytes for your task, so you might need to allocate larger stacks to compensate.
As shown in Figure 6, if a function call ‘skips over’ the RED Zone by allocating local storage for an array or a large data structure then the code might not ever write in the RED Zone and thus bypass the stack overflow detection mechanism altogether. In other words, if the RED Zone is too small,foo()might just use iand array to array but nothing that happens to overlap the RED Zone.
Figure 6 – Bypassing the RED Zone
To avoid this, local variables and arrays should always be initialized as shown in Listing 2.
Listing 2 – Initializing local variables to better detect stack overflows
The Micrium OS Kernel has a built-in RED Zone stack overflow detection mechanism but, it’s implemented in software. This software based approach is enabled by setting OS_CFG_TASK_STK_REDZONE_ENto DEF_ENABLED in os_cfg.h. When enabled, the Micrium OS Kernel creates a monitored zone at the end of a task's stack which is filled upon task creation with a special value. The actual value is not that critical and we used 0xABCD2345 as an example (but it could be anything). However, it’s wise to avoid values that could be used in the application such as zero. The size of the RED Zone is defined by OS_CFG_TASK_STK_REDZONE_DEPTH. By default, the size of the RED Zone is eight CPU_STK elements deep. The effectively usable stack space is thus reduced by 8 stack entries. This is shown in Figure 7.
The Micrium OS Kernel checks the RED Zone at each context switch. If the RED Zone has been overwritten or if the stack pointer is out-of-bounds the Micrium OS Kernel informs the user by calling OSRedzoneHitHook(). The hook allows the user to gracefully shutdown the application since at this point the stack corruption may have caused irreversible damage. The hook, if defined, must ultimately call CPU_SW_EXCEPTION() or otherwise stop the Micrium OS Kernel from proceeding with corrupted data.
Since the RED Zone is typically small, it’s ever so important to initialize local variables, large arrays or data structures upon entry of a function in order to detect the overflow using this mechanism.
The software RED Zone is nice because it’s portable across any CPU architecture. However, the drawback is that it consumes possibly valuable CPU cycles during a context switch.
Figure 7 – Software-based RED Zone
Although not actually an automatic stack overflow detection mechanism, determining the ideal size of a stack at run-time is highly useful and is a feature available in the Micrium OS Kernel. Specifically, you’d allocate more stack space than is anticipated to be used for the stack then, monitor and possibly display actual maximum stack usage at run-time. This is fairly easy to do. First, the task stack needs to be cleared (i.e. filled with zeros) when the task is created. You should note that we could have used a different value than zero. Next, a low priority task (the statistics task in the Micrium OS Kernel) walks the stack of each created task, from the bottom towards the top, counting the number of zero entries. When the statistics task finds a non-zero value, the process is stopped and the usage of the stack can be computed (in number of stack entries used or as a percentage). From this, you can adjust the size of the stacks (by recompiling the code) to allocate a more reasonable value (either increase or decrease the amount of stack space for each task). For this to be effective, however, you need to run the application long enough and under stress for the stack to grow to its highest value. This is illustrated in Figure 8.
Figure 8 – Determining Actual Stack Usage at Run-Time
The Micrium OS Kernel provides a function that determines stack usage of a task at run-time, OSTaskStkChk() and, in fact, the Micrium OS Kernel’s statistics task, OS_StatTask() calls this function repeatedly for each task created every 1/10th of a second. This is what µC/Probe displays as described in my other article: See “Exploring the Micrium OS Kernel Built-In Performance Measurements” in the Blog section of the Silicon Labs website (www.silabs.com).
This blog described different techniques to detect stack overflows. Stack overflows can occur either in single or multi-threaded environments. Even though we can detect overflows, there is typically no way to safely continue execution after one occurs and, in many cases, the only recourse is to reset the CPU or halt execution altogether. However, before taking such a drastic measure it’s recommended for your code to bring your embedded system to a known and safe state. For example, you might turn off motors, actuators, open or close valves and so on. Even though you are in a shutdown state you might still be able to use kernel services to perform this work.