Official Blog of Silicon Labs

      • Updated: Your Z-Wave Smart Locks are Safe and Secure

        Lance Looper | 05/143/2018 | 02:37 PM

        We want to be very clear: installed, previously paired Z-Wave devices are secure and not vulnerable to a downgrade attack. This represents practically all 100 million Z-Wave devices in homes today.

        This type of attack would require physical proximity to the device during the pairing (inclusion) process. The pairing is done during initial installation or reinstallation.  Pairing must be initiated by the homeowner (or installation professional), which means the homeowner is present at the time of the attempted attack. It would not be possible to execute an attack without the homeowner becoming aware that the link is running S0, as they would for any other S0 device added to the S2 controller.

        We take what Pen Test Partners has reported very seriously and are taking steps to tighten the certification requirements regarding warnings presented to the user. We also believe any warning for a security step needs to be explicit. We are updating the specification to ensure that any user will not only get a warning during a downgrade to S0 but will have to acknowledge the warning and accept it to continue inclusion.

        We believe it's important for all smart home devices to have the highest possible levels of security available, and our development team will continue to work with the security community to make improvements to the Z-Wave specification.




      • Timing 101 #8: The Case of the Cycle-to-Cycle Jitter Rule of Thumb

        kgsmith | 05/143/2018 | 01:41 PM


        In this post, The Case of the Cycle-to-Cycle Jitter Rule of Thumb, I will review a rule of thumb that can be used for estimating the RMS cycle-to-cycle jitter if all you have available is the RMS period jitter. The reason I’m doing so this month is that a couple of colleagues of mine recently asked me to reconcile a particular Timing Knowledge Base article versus one of our app notes . I first observed this rule of thumb in the lab and subsequently learned more about it.


        What’s the Rule of Thumb?

        It’s real simple. If the period jitter distribution is Gaussian or normal, then the cycle-to-cycle jitter can be estimated from the period jitter as follows:

        Jcc (RMS) = sqrt(3) * Jper (RMS)

        I first recorded this in a Timing Knowledge Base article Estimating RMS Cycle to Cycle Jitter from RMS Period Jitter. I wrote at the time the following statement:

        The sqrt(3) factor arises from the definitions of period jitter and cycle-to-cycle jitter in terms of the timing jitter of each clock edge versus a reference clock.

        I will spend a little bit more time on this thought today and attack the problem from several different angles.


        What’s the Question?

        In our application note, A Primer On Jitter, Jitter Measurement and Phase-Locked Loops, the figure below shows the following slopes for post-processing phase noise into timing jitter metrics. Period jitter and cycle-to-cycle jitter are shown as high pass filters with 20 dBc/dec and 40dB/dec slopes, respectively. This is correct and a useful illustration to keep in mind.

        The question is how can RMS cycle-to-cycle jitter be larger than RMS period jitter, per the sqrt(3) rule, and the cycle-to-cycle jitter filter have a steeper slope? The answer is that it’s not just the slope that determines the end result. More on that later.


        Some Terminology

        Before proceeding, here are a couple of definitions adapted from AN279: Estimating Period Jitter from Phase Noise.

        • Cycle-to-cycle jitter - The short-term variation in clock period between adjacent clock cycles. This jitter measure, abbreviated here as JCC, may be specified as either an RMS or peak-to-peak quantity.
        • Period jitter - The short-term variation in clock period over all measured clock cycles, compared to the average clock period. This jitter measure, abbreviated here as JPER, may be specified as either an RMS or peak-to-peak quantity.

        The distinction between these time domain jitter measurements is important, hence the italicized terms above. (By the way, you can find old examples in the academic and trade literature where these terms may mean different things, so always double-check the context). The terms here are as used presently and in standards such as JEDEC Standard No. 65B, “Definition of Skew Specifications for Standard Logic Devices”.


        Example Lab Measurement

        First, the following example lab measurement comes straight from the KB article. The annotated image has been made more compact for convenience.

        There are three items called out on the screen capture.

        1. The period distribution after 1 million cycles appears Gaussian and comes very close to meeting the 68-95-99.7 % rule for ±1, ±2, and ± 3 standard deviations respectively.
        2. The measured RMS period jitter is the standard deviation of the period jitter distribution or about 1.17 ps.  We can therefore estimate the RMS cycle to cycle jitter as sqrt(3) * 1.17 ps or 2.03 ps.
        3. The actual measured cycle to cycle jitter is 2.05 ps which is reasonably close to the estimate.


        Example Excel Demonstration

        You can also demonstrate this rule in Excel simulations. Exploring the effect, I generated a spreadsheet where I took an ideal clock edge and then jittered the edges by taking random samples from a Gaussian distribution. I then took the period measurements, and the cycle to cycle measurements, over five trials each for 30 edges, and 100 edges with the clock edges representing a jittery 100 MHz clock. Note that since the cycle-to-cycle jitter results are signed, i.e. positive or negative, we should expect the standard deviation of these quantities to be larger, all else being equal. The 100 edges trials were usually much closer to the sqrt(3) rule than the 30 edges trials but you could still see the general effect even over just 30 edges.

        If you are interested in playing with it, the spreadsheet is attached as CCJ_ROT_Demonstrator.xlsx


        An Explanation

        So how does this rule of thumb arise? As mentioned previously, I first observed this in the lab years ago and learned I could count on it. Yet, I have seen little written about this. Eventually I ran across Statek Technical Note 35, An overview of oscillator jitter. The explanation below is a somewhat simplified and modified version of that derivation where the quantities are expected values for a “large” time series (recall my comments about 100 edges converging to the rule better than 30 edges.)

        Let the variable below represent the variance of a single edge’s timing jitter, i.e. the difference in time of a jittery edge versus an ideal edge.


        Every period measured then is the difference between 2 successive edge values, where each edge jitter has variance s2j. Period jitter is sometimes referred to as the first difference of the timing jitter. Since cycle-to-cycle jitter is the difference between adjacent periods it can be referred to as the second difference of the timing jitter.

        If each edge’s jitter is independent then the variance of the period jitter can be written as





        This is just what we would expect per the Variance Sum Law. You can see an example here, which states that for independent (uncorrelated) variables:



        However, we can’t calculate cycle-cycle jitter quite as easily since in every cycle-to-cycle measurement we use one “interior” clock edge twice and therefore we must account for this. Instead we write:




        Since each edge’s jitter is assumed to be independent and have the same statistical properties we can drop the cross correlation terms and write:




        The ratio of the variances is therefore




        This is an interesting and unexpected result, at least to me :)  


        Post-Processing Phase Noise

        AN279: Estimating Period Jitter from Phase Noise describes how one can estimate period jitter from phase noise based on applying a 4[sin(pi*f*tau)]^2 weighting factor to the phase noise integration. The weighting factor is predominately a +20 dB/dec high pass filter until reaching a peak at the half-carrier frequency.

        It turns out that you can use a similar approach to calculating cycle-to-cycle jitter. This requires applying a {4[sin(pi*f*tau)]^2}^2 or 16[sin(pi*f*tau)]^4 weighting factor which is predominately a +40 dB/dec high pass filter until reaching a peak at the half-carrier frequency.  This is exactly what AN687 refers to.

        So how can a sharper HPF skirt integrate such that cycle-to-cycle jitter is larger than the period jitter and the sqrt(3) rule applies?

        I had to dig up my old Matlab program which I used when writing that app note. Fortunately, I still had the file and the original data. I then ran a modified version of the program and compared the results for max fOFFSET where the phase noise dataset is extended and truncated at both fc/2 and fc. The answer is that while the cycle-to-cycle HPF skirt is steeper the maximum is also higher. See the plots below. The blue wide trace is the period jitter weighted (filtered) phase noise and the red wide trace is the cycle-to-cycle jitter weighted phase noise.  It’s the larger far offset phase noise contributions that make the difference.


        The original data was for a 160 MHz CMOS oscillator which had a scope measured period jitter at the time of about 2 ps. To be conservative, it was for that reason that I often ran the integration further out than fc/2. Scopes are lower noise now and it would be interesting to go find the original device under test and measure it on a better instrument. My main interest here is to see if the sqrt(3) relationship holds true. As you can see, the rule of thumb holds up in both cases.


        Well I hope you have enjoyed this Timing 101 article. The sqrt (3) rule of thumb for cycle-to-cycle jitter holds up well in the lab, in Excel spreadsheet simulations, and when post-processing phase noise.

        As always, if you have topic suggestions, or there are questions you would like answered, appropriate for this blog, please send them to with the words Timing 101 in the subject line.  I will give them consideration and see if I can fit them in. Thanks for reading. Keep calm and clock on.








      • Top 5 Reasons to Subscribe to the Support and Community Newsletter

        Nari | 05/124/2018 | 04:14 AM

        Did you know you can sign up for our monthly newsletter tailored specifically for Silicon Labs community members? Here are top five reasons why you should subscribe:

        Stay Informed on Hot Topics

        First, you will get informed of the most popular forum discussions among peer engineers. We feature the most interesting topics within IoT, Internet Infrastructure, and Industrial Automation on a monthly basis.

        Get the Latest Resources

        Second, you can access the latest training resources about Silicon Labs products. You will receive information about the latest video tutorial, webinar, or knowledge base article once a month in your inbox.

        Be Inspired

        Third, you will be surprised to see how many inspiring projects and real-life applications were built by our members and customers. We introduce those cool examples featuring Silicon Labs’ part to you through our community newsletter.

        Learn about the Newest Products

        Fourth, you will stay on top of our latest hardware and software releases as well as new product launches.

        Stay Connected

        Fifth, the community is the place to share your knowledge and connect with each other. Through our featured member section in the newsletter, you will get to know our distinguished community members better.



      • SystemView: How to enable it in a Dynamic Multiprotocol Application

        Juan Benavides | 05/123/2018 | 11:24 AM

        The Silicon Labs Dynamic Multiprotocol allows you to support multiple wireless protocols on a single chip.

        This technology time-slices the radio and rapidly changes configurations to enable different wireless protocols to operate reliably at the same time. 

        The technology leverages Micrium OS Kernel to run each wireless stack as a separate RTOS task.

        You are probably aware of the multiple benefits of SystemView; a tool to record and analyze the Micrium OS Kernel events in real-time.

        To enable SystemView, Simplicity Studio offers this utility that inserts the required C files and configures the project include paths all by the press of a button. It sounds great, except that it is usually broken by constant changes in the different SDKs from Silicon Labs.

        In this blog, I'm gonna describe how to add SystemView to your DMP project manually, for those situations in which fancy tools just won't work.


        Inserting the SystemView Recorder Files to your DMP Project

        Right-click over the project name to open the context menu and select the options New -> Folder

        Click the button Advanced >>  select the option Link to alternate location (Linked Folder) and enter the following path:



        As shown in the image below:

        Figure 1. Adding SystemView to your Project Manually



        Inserting the Include Paths in your Compiler Configuration

        Right-click over the project name to open the context menu and select the option Properties.

        Select the option C/C++ General > Paths and Symbols and add the following include paths to all Languages and Configurations




        Resolving a couple of conflicts by excluding some C Files from compilation

        Locate the file SEGGER_SYSVIEW_Config_MicriumOSKernel.c in the Project Explorer at dev-cfg > source

        Right click over the file SEGGER_SYSVIEW_Config_MicriumOSKernel.c to open the context menu and select the option Properties.

        Select the option C/C++ Build and exclude this file from compilation by selecting the checkbox Exclude resource from build.

        Similarly, locate the file SEGGER_RTT.c in the Project Explorer at debug-basic-library > EFR32

        Right click over the file SEGGER_RTT.c to open the context menu and select the option Properties.

        Select the option C/C++ Build and exclude this file from compilation by selecting the checkbox Exclude resource from build.



        Enabling the Trace Recorder

        Open the file os_cfg.h located at the following path:



        Locate and set the macro OS_CFG_TRACE_EN to DEF_ENABLED



        Finding the memory address of the RTT block

        Re-compile your project and launch a Debug Session.

        Click the button Probe located on the top toolbar of Simplicity Studio.

        Once Probe is opened, type in the keyword _RTT in the Symbol Browser panel in Probe (at the bottom of the application) and make a note of the memory address as illustrated in the image below:

        Figure 2. Finding the RTT Block's Memory Address with Probe



        Starting a Recording

        The SystemView host application is available from SEGGER.

        Once you have the application installed on your computer, start SystemView.

        Press F5 to start a recording.

        Select the option Address for the RTT Control Block Detection and enter the address you found with Probe as shown below:

        Figure 3. SystemView RTT Block Address


        It may be that as the DMP SDK and/or Simplicity Studio evolve, the tool to insert SystemView automatically finally works. I will keep checking if that's the case and I will delete this blog if it's no longer relevant. In the meantime, I hope it will help someone.


        Disclaimer: The views, thoughts, and opinions expressed in this blog belong solely to the author, and not necessarily to Silicon Labs.

      • Upgradeable Security is Not Optional for the IoT

        Lance Looper | 05/121/2018 | 10:32 AM

        We have yet to see the full-fledged economic value of billions of new IoT devices entering multiple industries, though we can prepare ourselves with what we know will come along with it. As with any new innovation and/or market, malicious adversaries and attackers will lurk and invade for their own piece of the pie.

        Despite the looming security threats, companies and developers designing new IoT products often like to focus their attention on the application itself versus proper security. Security slows the time-to-market and is often viewed as inconvenient because it increases cost.

        But no one wants to design an application that’s prone to hacking or data theft. Undesirable events like high-profile hacks can lead to serious brand damage and loss of customer trust, and worst-case is a slow down or permanent reduction in the adoption of IoT.

        When it comes to security, IoT is no different than previous technology innovations such as PCs, smartphones, and the Internet itself. If security is not addressed sufficiently by the creators of the technology – in this case, IoT product designers - the oversight could have devastating effects on the entire market, and it will no doubt have negative consequences for the individual companies opting to design irresponsibly.

        Varying Degrees of Security

        To avoid these scenarios, designers need to change how they view IoT security. Unfortunately, it’s not as simple as a “to have or not to have” decision. Security is not binary. The reality is there are many different levels of security. A device can only be considered secure in the context of an attacker, when the level of security is higher than the capabilities of the attacker.

        Moreover, the capabilities of the attacker are typically non-static, and therefore, the security level will change over time. The improved capabilities of the attacker can come about in several different ways, from the discovery and/or publication of issues and vulnerabilities to broader availability of equipment and tools.

        History has taught us some valuable lessons about how fast security threats can change for an object. A typical lifetime of an IoT-device depends on the application, but in industrial applications, 20 years is a common timeframe. A device launched in 1998, for example, was once only vulnerable to nation-state attacks; today it must be able to withstand DPA attacks by hobbyists with $300 for tools, some spare time and lots of coffee. Predicting the future capabilities of a class of adversaries is very difficult if not impossible, especially over a 20-year timespan. How does the adversary look in 2040? One might speculate if it is even human?

        Bootloader Benefits

        The only reasonable way to counter future attack scenarios is for the security of the device to evolve with the increased capabilities of the adversary. This requires IoT security with upgradable software.

        Of course, there is functionality requiring hardware primitives, which cannot be retrofitted via software updates. However, it is incredible what can be solved in software when the alternative is a truck-roll. Though, it impossible to predict and account for all future attacks.

        Secure updates involve authenticating, integrity checking, and potentially encrypting the software for the device. The software handling such security updates is the bootloader, typically referred to as a secure bootloader. The secure bootloader itself, along with its corresponding cryptographic keys, constitutes the root-of-trust in the system and needs to have the highest level of security. A secure bootloader is functionality IoT vendors should expect to get from the IC manufacturers.

        The authentication and integrity check should be implemented using asymmetric cryptography, with only public keys in the device. This way, it is not necessary to protect the signature-checking key in the devices. Since protecting keys in deployed devices is (or at least should be) harder than protecting keys in control of the device owner, it is also acceptable to use the same bootloader keys for many devices.

        Encrypting the Software

        Encrypting the software running on the IoT device has two benefits. First, it protects what vendors consider to be intellectual property (IP) from both competitors and counterfeiting. Secondly, encryption makes it more difficult for adversaries to analyze the software for vulnerabilities. Encrypting the new software for secure boot does; however, involve secret keys in the device, and protecting secret keys inside a device in the field is becoming increasingly harder. At the same time, newer devices have increased resistance to DPA attacks. Furthermore, a common countermeasure against DPA attacks is limiting the number of cryptographic operations that can take place to make it infeasible to get sufficient data to leak the key. Even though protecting the key is difficult and motivated adversaries will likely extract it, key protection makes attacking more difficult for the attacker.

        Another consequence of secure updates is the likely future need for more memory in the IoT device. This is a complicated trade-off for several reasons. First, software tends to expand to the memory available in the device. So, a larger memory device requires discipline from the software team to leave room for future updates. The other complication is the value of free memory in the future versus the device’s initial cost. More memory tends to increase the cost of the device. This cost must be justified both from the device maker and the consumer point of view.

        Finally, it is important to have a plan for distributing the security updates. For most devices, these updates use the device’s existing Internet connection. But in some cases, this requires adding or using physical interfaces such as USB drives (i.e., sneakernet). It is also important to consider that the devices might be behind firewalls or in some cases disconnected from the Internet.

        IoT device software is often fully owned and managed by the device maker, meaning the device maker should have proven processes in place to internally protect the signing keys and particularly those who can issue updates.

        Securing the Future

        There is no such as thing as a 100 percent secure-proof device, especially during the entire duration of a product’s lifecycle.

        Yet it is possible to understand and prepare for the most likely threats and safeguard for future threats by designing in the ability for upgradable software updates. IoT developers must adopt themselves to this critical mindset of responsible security design. Otherwise, they are placing their innovations, and IoT’s market potential, into the hands of adversaries.

        For more on upgradeable security, Silicon Labs’ senior director of product security Lars Lydersen hosted a webinar in which he provided the insight and background to help in evaluating what security functionality is necessary in an IoT design. 


      • SystemView: How to prevent overflows

        Juan Benavides | 04/120/2018 | 11:37 AM

        Almost all of Silicon Labs Starter/Development Kits include an onboard J-Link debugger, which is great to not only debug and flash your embedded application, but also to run SystemView.

        SystemView as you probably know is the tool to record and analyze your Micrium OS Kernel events in real time.

        However, the onboard J-Link can be slow depending on the rate of kernel events that your embedded application is creating. Overflow events occur when the SystemView buffer on your embedded target is full.

        In this blog I'm gonna discuss the basic steps to prevent overflows and then I will describe the ultimate way to prevent overflows.


        Steps to Prevent Overflows

        1. Increase the buffer size to store the events:

        Open the configuration file SEGGER_SYSVIEW_Conf.h and set the buffer size to 4096 as shown below:

        #define SEGGER_SYSVIEW_RTT_BUFFER_SIZE           4096


        2. In case you are running Simplicity Studio, close Simplicity Studio and let SystemView run by itself.


        3. In case you are running Probe, close Probe and let SystemView run by itself.


        4. Open the configuration file os_cfg_trace.h and decrease the number of events by disabling the following features:

        #define  OS_CFG_TRACE_API_ENTER_EN               DEF_DISABLED
        #define  OS_CFG_TRACE_API_EXIT_EN                DEF_DISABLED


        5. If you are still having overflows after making the changes above, then the ultimate way to prevent overflows is to buy a much faster External J-Link from SEGGER:

        Most of our Starter Kits have a Debug Connector that you can use to connect the external J-Link. 

        The following section describes how to connect your external J-Link to a Silicon Labs Starter Kit.


        Connecting your External J-Link

        1. First you need to configure your Starter Kit to reroute the debugging circuitry to the external debug connector. 

        Open Simplicity Studio, select your Starter Kit, locate the section Debug Mode: MCU and then press the link Change as shown in the image below:

        Figure 1. Simplicity Studio: Debug Mode


        2. You may be asked to download an adapter firmware image. If so, press the button Yes.


        3. The default Debug Mode is called MCU which means that your debugger is the Onboard J-Link.


        4. Select from the drop-down the option IN which means that your debugger is an External J-Link as shown in the image below:

        Figure 2. Debug Mode: IN (External J-Link)


        5. Connect your external J-Link to the debug connector on your Silicon Labs Starter Kit.

        Depending on your kit, it may be one of the ones shown below.

        The J-Link 19-pin 0.05" Cortex-M Debug Connector shown in Figure 3 may require a J-Link 19-pin Cortex-M Adapter available from SEGGER.

        Figure 3. J-Link 19-pin 0.05" Cortex-M Debug Connector


        On the other hand, the standard 20-pin 0.1" JTAG Debug Connector shown in Figure 4 does not require any adapters and can be connected directly to your external J-Link.

        Figure 4. Standard 20-pin 0.1" JTAG Debug Connector


        For more information on how to configure your Starter Kit in the appropriate debugging mode, please consult your Starter Kit's User's Guide, the section On-Board Debugger, Debug Modes.


        Related Links:

        SystemView Installer:

        SystemView User's Manual:

        J-Link Debug Probes:

        J-Link 19-pin Cortex-M Adapter:


      • Migration Guide: from FreeRTOS to Micrium OS

        Juan Benavides | 04/115/2018 | 03:56 PM

        Whether you are currently running your embedded application on Silicon Labs hardware or other semiconductor, the migration path is the same as illustrated in Figure 1.

        You should start from a working Micrium OS example and then move your embedded application over to the example project.

        Figure 1. Migration Paths


        Once you move your embedded application to the Micrium OS baseline example project, you need to change the code that calls the FreeRTOS API.

        The purpose of the attached PDF document is to describe the differences between the two kernels, offer plenty of side-by-side examples and mapping tables to help you in the process of migrating your embedded application.


      • Silicon Labs Acquires Z-Wave, Bringing Complete Smart Home Connectivity Under One Roof

        Lance Looper | 04/109/2018 | 07:47 AM

        Today, we’ve announced the acquisition of Sigma Designs’ Z-Wave business. Adding Z-Wave to our wireless portfolio gives ecosystem providers and developers of smart home solutions access to the broadest range of wireless connectivity options available today.

        Together, we’ll open the door to millions of potential users of smart home technologies by expanding access to a large and varied network of ecosystems and partners. Z-Wave’s reputation as a leading mesh networking technology for the smart home with traction in more than 2,400 certified interoperable Z-Wave devices from more than 700 manufacturers and service providers worldwide, coupled with Silicon Labs’ position as the leader in silicon, software, and solutions for the IoT, make this a great match.

        Silicon Labs has been actively driving the IoT for years, and we recognize the large following Z-Wave has among developers and end customers in the smart home. With our experience in mesh technologies, we are uniquely positioned to advance the standard and grow adoption with input from the Z-Wave Alliance and partners.

        Adding Z-Wave to Silicon Labs’ extensive portfolio of connectivity options allows us to create a unified vision for the technologies underpinning the smart home market: a secure, interoperable customer experience is at the heart of how smart home products are designed, deployed and managed. Our vision for the smart home is one where various technologies work securely together, where any device using any of our connectivity technologies easily joins the home network, and where security updates or feature upgrades occur automatically or on a pre-determined schedule.

        You can learn more in the press release as well as our website.

      • Detecting Stack Overflows with the Micrium OS Kernel

        Jean Labrosse | 04/102/2018 | 02:21 PM


        In a previous blog, I showed how you can display the stack of a Micrium OS Kernel-based applications using µC/Probe.  In this post, I’ll describe the importance of sizing your stacks at design time and checking task stacks at run‑time to catch stack overflows.  I will first explore how to determine the size of task stacks and then go into ways that can be used to detect overflows.  I will show different stack overflow detection methods.  They are listed in order of the most preferable to the least preferable, based on the likelihood of detecting the overflow. 

        How do you determine the size of a task stack?

        In a Micrium OS Kernel-based (and most real-time kernels) application, each task requires its own stack.  The size of the stack required by a task is application specific.  It’s possible to manually figure out the stack space needed by adding up:

        • The memory required by all function call nesting. For each function call hierarchy level:
          • Depending on the CPU architecture, one pointer for the return address of a function call.  Some CPUs actually save the return address in a special register reserved for that purpose (often called the Link Register).  However, if the function calls another function, the Link Register must be saved by the caller so, it might be wise to assume that the pointer is pushed onto the stack anyway. 
          • The memory required by the arguments passed in those function calls.  Arguments are often passed in CPU registers but again, if a function calls other functions the register contents will be saved onto the stack anyway.  I would thus highly recommend that you assume that arguments are passed on the stack for the purpose of determining the size of a task’s stack.
          • Storage of local variables for those functions
          • Additional stack space for state saving operations inside the functions
        • The storage for a full CPU context (depends on the CPU) plus FPU registers as needed
        • The storage of another full CPU context for each nested ISR (if the CPU doesn’t have a separate stack to handle ISRs)
        • The stack space needed for local variables used by those ISRs. 

        Adding all this up is a tedious chore and the resulting number is a minimum requirement.  Most likely you would not allocate the size of the stack that precisely so that you can plan for “surprises”.  The number you come up with should probably be multiplied by some safety factor, possibly 1.5 to 2.0.  The stack usage calculation assumes that the exact path of the code is known at all times, which is not always possible.  Specifically, when calling a function such as printf() it might be difficult or nearly impossible to even guess just how much stack space printf() will require.  Also indirect function calls through tables of function pointers could be problematic.  Generally speaking, start with a fairly large stack space and monitor the stack usage at run-time to see just how much stack space is actually used after the application runs for a while.  For more information, you can visit “Exploring the Micrium OS Kernel Built-In Performance Measurements” in the Blog section of the Silicon Labs website (

        Also, avoid writing recursive code because stack usage is typically non-deterministic with this type of code.

        There are really cool and clever compilers/linkers such as Keil and IAR that provide this information in a link map.  Specifically, for each function, the link map indicates the worst-case stack usage.  However, these tools will not account for indirect calls (i.e. function pointers) or assembly language routines.  GCC has partial support by providing per-function stack usage but not a call-graph. This feature clearly enables you to better evaluate stack usage for each task.  It is still necessary to add the stack space for a full CPU context plus another full CPU context for each nested ISR (if the CPU does not have a separate stack to handle ISRs), plus whatever stack space is needed by those ISRs.  Again, allow for a safety net and multiply this value by some factor.  

        If your kernel monitors stack usage at run-time then it’s a good idea to display that information and keep an eye on your stacks while developing and testing the product.  Stack overflows are common and can lead to some curious behaviors.  In fact, whenever someone mentions that his or her application behaves “strangely,” insufficient stack size is the first thing that comes to mind.

        What are Stack Overflows?

        Just so that we are on the same page, below is a description of what a stack overflow is.  For the sake of discussion, it’s assumed here that stacks grow from high-memory to low-memory.  Of course, the same issue occurs when the stack grows in the other direction. Refer to Figure 1.

        Figure 1 – Stack Overflow

        F1-(1) The CPU’s SP (Stack Pointer) register points somewhere inside the stack space allocated for a task.  The task is about to call the function foo() as shown in Listing 1.

        Listing 1 – Example of possible stack overflow

        F1-(2) Calling foo() causes the CPU to save the return address of the caller onto the stack.  Of course, that depends greatly on the CPU and the compiler.

        F1-(3) The compiler then adjusts the stack pointer to accommodate for local variables. Unfortunately, at this point, we overflowed the stack (the SP points outside the storage area assigned for the stack) and just about anything foo() does will corrupt whatever data is beyond the stack base.  In fact, depending on the code flow, the array might never be used, in which case the problem would not be immediately apparent.  However, if foo() calls another function, there is a high likelihood that this will cause something outside the stack to be touched.

        F1-(4) So, when foo() starts to execute code, the stack pointer has an offset of 48 bytes from where it was prior to calling foo() (assuming a stack entry is 4 bytes wide).

        F1-(5) We typically don’t know what resides here.  It could be the stack of another task, it could be variables, data structures or an array used by the application.  Overwriting whatever resides here can cause strange behaviors: values computed by another task may not be what you expected and could cause decisions in your code to take the wrong path, or your system may work fine under normal conditions but then fail.  We just don’t know and it’s actually quite difficult to predict.  In fact, the behavior can change each time you make changes to your code.

        Detecting Stack Overflows

        There are a number of techniques that can be used to detect stack overflows.  Some make use of hardware while some are performed entirely in software.  As we will see shortly, having the capability in hardware is preferable since stack overflows can be detected nearly immediately as they happen, which can help avoid those strange behaviors and aid in solving them faster.  

        Hardware stack overflow detection mechanisms generally trigger an exception handler.  The exception handler typically saves the current PC (Program Counter) and possibly other CPU registers onto the current task’s stack.  Of course, because the exception occurs when we are attempting to access data outside of the stack, the handler would overwrite some variables or another stack in your application; assuming there is RAM beyond the base of the overflowed stack.  

        In most cases the application developer will need to decide what to do about a stack overflow condition.  Should the exception handler place the embedded system in a known safe state and reset the CPU or simply do nothing?  If you decide to reset the CPU, you might figure out a way to store the fact that an overflow occurred and which task caused the overflow so you can notify a user upon reset.

        Technique 1: Using a Stack Limit Register

        Some processors (unfortunately very few of them) have simple yet highly effective stack pointer overflow detection registers.  This feature will however, be available on processed based on the ARMv8-M CPU architecture.  When the CPU’s stack pointer goes below (or above depending on stack growth) the value set in this register (let’s call it the SP_Limitregister), an exception is generated. The drawing in Figure 2 shows how this works.

        Figure 2 – Using a Stack Limit Register to Detect Stack Overflows

        F2-(1) The SP_Limit register is loaded by the context switch code of the kernel when the task is switched in.

        F2-(2) The location where the SP_Limit points to could be at the very base of the stack or, preferably, at a location that would allow the exception handler enough room to save enough registers on the offending stack to handle the exception.

        F2-(3) As the stack grows, if the SPregister ever goes below the SP_Limit, an exception is generated.  As we’ve seen when your code calls a function and uses local variables, the SP register can easily be positioned outside the stack upon entry of a function.  One way to reduce the likelihood of this happening is to move the SP_Limit further away from the Stack Base Address.  

        The Micrium OS Kernel was designed from the get-go to support CPUs with a stack limit register.  Each task contains its own value to load into the SP_Limit and this value is placed in the Task Control Block (TCB).  The value of the SP_Limit register used by the CPU’s stack overflow detection hardware needs to be changed whenever the Micrium OS Kernel performs a context switch.  The sequence of events to do this must be performed in the following order:

        1- Set SP_Limit to 0.  This ensures the stack pointer is never below the SP_Limit register.  Note that I assumed here that the stack grows from high memory to low memory but the concept works in a similar fashion if the stack grows in the opposite direction.

        2- Load the SP register.

        3- Get the value of the SP_Limit that belongs to the new task from its TCB.  Set the SP_Limit register to this value.

        The SP_Limit register provides a simple way to detect stack overflows.  

        Technique 2: Using an MPU – Stacks Are Contiguous

        Arm Cortex-M processors are typically equipped with an MPU (Memory Protection Unit) which typically monitors the address bus to see if your code is allowed to access certain memory locations or I/O ports.  MPUs are relatively simple devices to use but are somewhat complex to setup.  However, if all you want to do is detect stack overflows then an MPU can be put to good use without a great deal of initialization code.   The MPU is already on your chip, meaning it’s available at no extra cost to you, so why not use it?  In the discussion that follows, we’ll setup an MPU region that says “if ever you write to this region, the MPU will trigger a CPU exception.”  

        One way to setup your stacks is to locate ALL of the stacks together in contiguous memory, starting the stacks at the base of RAM, and locating the C stack as the first stack at the base of RAM as shown in Figure 3.

        Figure 3 – Locating Task Stacks Continuously

        As the kernel context switches between tasks, it moves a single MPU ‘protection window’ (I will call it the “RED Zone”) from task to task as shown in Figure 4.  Note that the RED Zone is located below the base address of each of the stacks.  This allows you to make use of the full stack area before the MPU detects an overflow.

        Figure 4 – Moving the RED Zone During Context Switches

        As shown, the RED Zone can be positioned below the stack base address. The size of the RED Zone depends on a number of factors.  For example, the size of the RED Zone on the MPU of a Cortex-M CPU must be a power of 2 (32, 64, 128, 256, etc.).  Also, stacks must be aligned to the size of the RED Zone.  On processors based on the Armv8-M architecture, this restriction has been removed and MPU region size granularity is 32 bytes.  However, with the Armv8-M, you’d use its stack limit register feature.  The larger the RED Zone, the more likely we can detect a stack overflow when a function call allocates large arrays on the stack.  However, locating RED Zones below the stack base address has other issues. For one thing, you cannot allocate buffers on a task’s stack and pass that pointer to another task because it’s possible that the allocated buffer would be overlaid by the RED Zone thus causing an exception.  However, allocating buffers on a task’s stack is not good practice anyway, so getting slapped by an MPU violation is a kind punishment.  

        You may also ask: “Why should the C stack be located at the start of RAM?”. Because in most cases, once multitasking has started, the C stack is never used and is thus lost.  Overflowing into RAM that is no longer used might not be a big deal but, technically, it should not be allowed.  Having the C stack RAM simply allows us to store the saved CPU registers that are stacked on the offending task’s stack during an MPU exception sequence. 

        Technique 3: Using an MPU – Stacks are non-contiguous

        If you are not able to allocate storage for your tasks in continuous memory as I outlined in the previous section then we need to use the MPU differently.  What we can do here is to reserve a portion of RAM towards the base of the stack and, if anything gets written in that area then we can generate an exception.  The kernel would reconfigure the MPU during a context switch to protect the new task’s stack.  This is shown in Figure 5.

        Figure 5 – Locating the RED Zone inside a Task’s Stack

        Again, the size of the RED Zone depends on a number of factors.  As previously discussed, for the MPU on a Cortex-M CPU (except for Armv8-M), the size must be a power of 2 (32, 64, 128, 256, etc.).  Also, stacks must be aligned to the size of the RED Zone.  The larger the RED Zone, the more likely we can detect a stack overflow when a function call allocates large arrays on the stack.  However, in this case, the RED Zone takes away storage space from the stack because, by definition, a write to the RED Zone will generate an exception and thus cannot be performed by the task.   If the size of a stack is 512 bytes (i.e. 128 stack entries for a 32-bit wide stack), a 64-byte RED Zone would consume 12.5% of your available stack and thus leave only 448 bytes for your task, so you might need to allocate larger stacks to compensate.  

        As shown in Figure 6, if a function call ‘skips over’ the RED Zone by allocating local storage for an array or a large data structure then the code might not ever write in the RED Zone and thus bypass the stack overflow detection mechanism altogether.  In other words, if the RED Zone is too small,foo()might just use iand array[0] to array[5] but nothing that happens to overlap the RED Zone.

        Figure 6 – Bypassing the RED Zone

        To avoid this, local variables and arrays should always be initialized as shown in Listing 2.  

        Listing 2 – Initializing local variables to better detect stack overflows


        Technique 4: Software-based RED Zones

        The Micrium OS Kernel has a built-in RED Zone stack overflow detection mechanism but, it’s implemented in software.  This software based approach is enabled by setting OS_CFG_TASK_STK_REDZONE_ENto DEF_ENABLED in os_cfg.h. When enabled, the Micrium OS Kernel creates a monitored zone at the end of a task's stack which is filled upon task creation with a special value.  The actual value is not that critical and we used 0xABCD2345 as an example (but it could be anything).  However, it’s wise to avoid values that could be used in the application such as zero.  The size of the RED Zone is defined by OS_CFG_TASK_STK_REDZONE_DEPTH.  By default, the size of the RED Zone is eight CPU_STK elements deep.  The effectively usable stack space is thus reduced by 8 stack entries.  This is shown in Figure 7.

        The Micrium OS Kernel checks the RED Zone at each context switch.  If the RED Zone has been overwritten or if the stack pointer is out-of-bounds the Micrium OS Kernel informs the user by calling OSRedzoneHitHook().  The hook allows the user to gracefully shutdown the application since at this point the stack corruption may have caused irreversible damage.  The hook, if defined, must ultimately call CPU_SW_EXCEPTION() or otherwise stop the Micrium OS Kernel from proceeding with corrupted data. 

        Since the RED Zone is typically small, it’s ever so important to initialize local variables, large arrays or data structures upon entry of a function in order to detect the overflow using this mechanism. 

        The software RED Zone is nice because it’s portable across any CPU architecture.  However, the drawback is that it consumes possibly valuable CPU cycles during a context switch.

        Figure 7 – Software-based RED Zone


        Technique 5: Determining the actual stack usage at run-time

        Although not actually an automatic stack overflow detection mechanism, determining the ideal size of a stack at run-time is highly useful and is a feature available in the Micrium OS Kernel.  Specifically, you’d allocate more stack space than is anticipated to be used for the stack then, monitor and possibly display actual maximum stack usage at run-time.  This is fairly easy to do.  First, the task stack needs to be cleared (i.e. filled with zeros) when the task is created.  You should note that we could have used a different value than zero.  Next, a low priority task (the statistics task in the Micrium OS Kernel) walks the stack of each created task, from the bottom towards the top, counting the number of zero entries. When the statistics task finds a non-zero value, the process is stopped and the usage of the stack can be computed (in number of stack entries used or as a percentage).  From this, you can adjust the size of the stacks (by recompiling the code) to allocate a more reasonable value (either increase or decrease the amount of stack space for each task).  For this to be effective, however, you need to run the application long enough and under stress for the stack to grow to its highest value.  This is illustrated in Figure 8.  

        Figure 8 – Determining Actual Stack Usage at Run-Time

        The Micrium OS Kernel provides a function that determines stack usage of a task at run-time, OSTaskStkChk() and, in fact, the Micrium OS Kernel’s statistics task, OS_StatTask() calls this function repeatedly for each task created every 1/10th of a second.  This is what µC/Probe displays as described in my other article: See “Exploring the Micrium OS Kernel Built-In Performance Measurements” in the Blog section of the Silicon Labs website (


        This blog described different techniques to detect stack overflows.  Stack overflows can occur either in single or multi-threaded environments.  Even though we can detect overflows, there is typically no way to safely continue execution after one occurs and, in many cases, the only recourse is to reset the CPU or halt execution altogether.  However, before taking such a drastic measure it’s recommended for your code to bring your embedded system to a known and safe state. For example, you might turn off motors, actuators, open or close valves and so on. Even though you are in a shutdown state you might still be able to use kernel services to perform this work. 


      • Exploring the Micrium OS Kernel's Built-In Performance Measurements - Part 2

        Jean Labrosse | 04/102/2018 | 01:14 PM


        The Micrium OS Kernel has a rich set of built-in instrumentation that collects real-time performance data.  This data can be used to provide invaluable insight into your kernel‑based application, allowing you to have a better understanding of the run-time behavior of your system.  Having this information readily available can, in some cases, uncover potential real-time programming errors and allow you to optimize your application. 

        In Part I of this post we examined, via µC/Probe, a number of the statistics built into the Micrium OS Kernel, including those for stack usage, CPU usage (total and per-task), context-switch counts, and signaling times for task semaphores and queues.  

        In this post, we'll examine the kernel’s built-in ability to measure interrupt-disable and scheduler lock time on a per‑task basis.  Once again, we used µC/Probe to display, at run‑time, these values.

        Measuring Interrupt Disable Time

        Kernels often need to disable interrupts to manage critical sections of code. In fact, it’s often useful, if not necessary to get a sense of just how much time interrupts are disabled in an application.  Micriµm added this capability in one of its utility module called µC/CPU which is provided with the Micrium OS Kernel.  Disabling and enabling interrupts is the fastest and most efficient way to ensure access that code is accessed atomically.  However, this must be done with great care as to not overly impact the responsiveness of your application.  

        Throughout Micriµm’s code, you will notice the sequence as shown in Listing 1.

        Listing 1, Protecting a Critical Section of Code

        The ‘:’ indicates a section of code that executes.  The actual sequence is unimportant for this discussion.  CPU_SR_ALLOC() is a macro that creates a local variable called cpu_sr.  cpu_sris used to save the current state of the CPU’s interrupt disable flag which is typically found in the CPU’s status register thus the ‘sr’ in the name.

        CPU_CRITICAL_ENTER() is also a macro and it’s used to save the state of the CPU interrupt disable flag by placing the current value of the status register in cpu_srbefore disabling further interrupts.  CPU_CRITICAL_EXIT() simply restores the state of the status register (from the saved cpu_sr).

        You enable interrupt disable time measurement by setting a configuration #definecalled CPU_CFG_INT_DIS_MEAS_EN to DEF_ENABLED.  In this case, the macros CPU_CRITICAL_ENTER() and CPU_CRITICAL_EXIT() are each automatically altered to include a call to a function.  CPU_CRITICAL_ENTER() will call CPU_IntDisMeasStart() immediately upon disabling interrupts and CPU_CRITICAL_EXIT() will call CPU_IntDisMeasStop() just before re-enabling interrupts.  Both of these functions are presented in Listing 2 and 3, respectively. 

        Listing 2, Interrupt Disable Time Measurement – Start Measurement Function

        L2-(1) Here we keep track of how often interrupts are disabled.  This value is not actually involved in the interrupt disable time measurement.

        L2-(2) A counter is used to track nesting of CPU_CRITICAL_ENTER() calls.  In practice, however, it’s rare, if ever, that we actually nest this macro.  However, the code is included in case your application needs to do this.  That being said, it’s not recommended that you do.

        Here we read the value of a free-running counter by calling CPU_TS_TmrRd().  If you enable interrupt disable time measurements, you (or the implementer of the port for the CPU you are using) will need to setup a free-running counter that would ideally be a 32‑bit up counter that is clocked at the same rate as the CPU.  The reason a 32-bit counter is preferable is because we use Time Stamping (thus the ‘TS’ acronym) elsewhere for delta‑time measurements and a 32-bit counter avoids having to account for overflows when measuring relatively long times.  Of course, you can also use a 16-bit counter but most likely it would be clocked at a slower rate to avoid overflows.   The value read from the timer is stored in a global variable (global to the µC/CPU module) called CPU_IntDisMeasStart_cnts. 

        L2-(3) We then increment the nesting counter.

        Listing 3, Interrupt Disable Time Measurement – End Measurement Function

        L3-(4) A local variable of data type CPU_TS_TMR is allocated and is used in the delta-time measurement.  This variable typically matches the word width of the free-running counter and is always an unsigned integer.

        L3-(5) We decrement the nesting counter and, if this is the last nested level, we proceed with reading the free-running counter in order to obtain a time-stamp of the end of interrupt disable time.

        L3-(6) The time difference is computed by subtracting the start time from the end time (i.e. stop time).  The calculation always yields the proper delta time because the free-running counter is an up counter and, we used unsigned math.  

        L3-(7) We keep track of two separate interrupt disable time peak detectors.  One peak detector is used to determine the maximum interrupt disable time of each task (i.e. CPU_IntDisMeasMaxCur_cnts) and is reset during each context switch. CPU_IntDisMeasMax_cnts is the global maximum interrupt disable time and is never reset.  

        Interrupt disable time is measured in free-running counter counts.  To get the execution time in µs, you would need to know the clocking rate of the free-running counter.  For example, if you get 1000 counts and the counter is clocked at 100 MHz then the interrupt disable time would be 10 µs.

        As you’d expect, the code to measure interrupt disable time adds measurement artifacts and should be accounted for. This overhead is found in the variable CPU_IntDisMeasOvrhd_cnts and is determined during initialization.  However, the overhead is NOT accounted for in the code shown above.  CPU_IntDisMeasOvrhd_cnts also represents free‑running counter counts. Depending on the CPU architecture, the measurement overhead (i.e. callingCPU_IntDisMeasStart() and CPU_IntDisMeasStop()) consists of typically between 50 and 75 CPU instructions.

        µC/Probe is able to display the interrupt disable time on a per-task basis for the Micrium OS Kernel when you select to display the kernel awareness capabilities.  Also, µC/Probe takes care of the conversion of counts to µs and thus, all values are displayed in µs as shown in Figure 1.

        Each row represents the value of a task.  Not shown in the figure is that there is a name associated with each task and thus you can of course know which time is associated with what task.  The highlighted task clearly shows that one task disables interrupts longer than the average.  This could be a problem or, it might be within expected limits.  Something the developer might need to investigate.

        The per-task interrupt disable time came in handy recently as it helped us discover that a driver was disabling interrupts for over 2,500 microseconds!  This value popped up like a sore thumb so we were able to quickly identify and correct the issue.  Without this measurement, I’m not sure we would have been able to identify and correct this problem as quickly as we did.


        Figure 1 – Per-Task CPU Interrupt Disable Times

        Per-task Scheduler Lock Time

        In order to avoid long interrupt disable times, the Micrium OS Kernel locks, and allows you to lock the scheduler. This has the same effect as temporarily making the current task the highest priority task.  However, while the scheduler is locked, interrupts are still accepted and processed.

        As shown in Figure 2, the Micrium OS Kernel measures the maximum scheduler lock time on a per‑task basis.  Again, not shown in the figure is that there is a name associated with each task allowing you to know which time is associated with what task.  

        The values are displayed in microseconds and for most tasks shown in this example, the scheduler never gets locked. However, the Micrium OS Kernel’s timer task locks the scheduler for close to 15 microseconds.  In my example, I created 20 soft-timers and selected (a configuration option) to lock the scheduler to ensure that timers are treated atomically when they are updated.  If mutual exclusion semaphores are enabled when you use the Micrium OS Kernel then this is the mechanism used to gain exclusive access to soft timers.

        As a general rule, you should avoid locking the scheduler in your application but, if you do so, try to lock it for shorter periods than what the Micrium OS Kernel does.


        Figure 2 – Per-Task Scheduler Lock Time


        In this post we examined, on a per‑task basis, statistics built into the Micrium OS Kernel for measuring interrupt disable time as well as scheduler lock time. These metrics can be invaluable in determining whether your embedded application satisfies some of its requirements. With µC/Probe, you get to see performance metrics and kernel status information live.