This blog series compares use cases for 8-bit and 32-bit MCUs and serves as a guide on how to choose between the two MCU architectures. Most 32-bit examples focus on ARM Cortex-M devices, which behave very similarly across MCU vendor portfolios.
There is a lot more architectural variation on the 8-bit MCU side, so it’s harder to apply apples-to-apples comparisons among 8-bit vendors. For the sake of comparison, we use the widely used, well-understood 8051 8-bit architecture, which remains popular among embedded developers.
I was in the middle of the show floor talking to an excitable man with a glorious accent. When I told him about our 8-bit MCU offerings, he stopped me and asked, “But why would I want to use an 8-bit MCU?"
This wasn't the first time I had heard the question, and it certainly won’t be the last.
It's a natural assumption that just as the horse-drawn buggy gave way to the automobile and snail mail gave way to email, 8-bit MCUs have been eclipsed by 32-bit devices. While that MCU transition may become true in some distant future, the current situation isn't quite that simple. It turns out that 8- and 32-bit MCUs are still complementary technologies, each excelling at some tasks particularly well versus the other, while performing at parity in others.
The trick is figuring out when a particular application lends itself to a particular MCU architecture.
Is "Star Trek better than Star Wars?" is similar to asking, “Is ARM Cortex better than 8051?”.
The truth is that while both questions are interesting, neither one is logical. Each fits different applications very well. (And Star Wars is clearly superior. Just kidding. Please don’t comment-bomb me.)
For MCUs, the much better question to ask is "Which MCU will best help me solve the problem I'm working on today?" Different jobs require different tools, and the goal is to understand how best to apply the available 8-bit and 32-bit devices.
Before we begin comparing architectures, it's important to note that I am comparing modern 8-bit technology with modern 32-bit technology. I am using the Silicon Labs’ EFM8 line of 8051-based MCUs which are far more efficient than the original 8051 architecture with modern process technology.
Development tools are also important. Modern embedded firmware development requires a fully-featured IDE, ready-made firmware libraries, extensive examples, comprehensive evaluation and starter kits, and helper applications to simplify things like hardware configuration, library management and production programming.
ARM has an army of tools developers supporting their impressive IDE. Again, the Silicon Labs 8-bit IDE, Simplicity Studio, is what I used, and it compares nicely with various suites for both ARM and 8-bit development.
The first generality is that ARM Cortex-M cores excel in large systems (> 64 KB of code), while 8051 devices excel in smaller systems (< 8 KB of code). The middle ground could go either way, depending on what the system is doing. It's also important to note that in many cases, peripheral mix will play an important role. If you need three UARTs, an LCD controller, four timers and two ADCs, chances are you won't find all of those on an 8-bit part, while many 32-bit parts support that feature set.
For systems sitting in the middle ground where either architecture might do the job, the big trade-off is between the ease of use that comes with an ARM core and the cost and physical size advantages that can be gained with an 8051 device.
The unified memory model of the ARM Cortex-M architecture, coupled with full C99 support in all common compilers, makes it very easy to write firmware for this architecture. In addition, there is a huge set of libraries and third-party code to draw from. Of course, the penalty for that ease-of-use is cost. Ease-of-use is an important factor for applications with high complexity, short time-to-market or inexperienced firmware developers.
While there is some cost advantage when comparing equivalent 8- and 32-bit parts, the real difference is in the cost floor. It's common to find 8-bit parts as small as 2 KB/512 bytes (flash/RAM), while 32-bit parts rarely go below 8 KB/2 KB. This range of memory sizes allows a system developer to move down to a significantly lower-cost solution in systems that don't need a lot of resources. For this reason, applications that are extremely cost-sensitive or can fit in a very small memory footprint will favor an 8051 solution.
8-bit parts also generally have an advantage in physical size. For example, the smallest 32-bit QFN package offered by Silicon Labs is 4 mm x 4 mm, while our 8051-based 8-bit parts are as small as 2 mm x 2 mm in QFN packages. Applications that are severely space-constrained often need to use an 8051 device to satisfy that constraint.
One of the major reasons for the lower cost of an 8051 MCU is that it generally uses flash and RAM more efficiently than an ARM Cortex-M core, which allows systems to be implemented with fewer resources. The larger the system, the less impact this will have.
However, this 8-bit memory resource advantage is not always the case. In some situations, an ARM core will be as efficient as an 8051 core. For example, 32-bit math operations require only one instruction on an ARM device, while requiring multiple 8-bit instructions on an 8051 MCU.
The ARM architecture has two major disadvantages at small flash/RAM sizes: code-space efficiency and predictability of RAM usage.
The first and most obvious issue is general code-space efficiency. The 8051 core uses 1-, 2- or 3-byte instructions, and ARM cores use 2- or 4-byte instructions. The 8051 instructions are smaller on average, but that advantage is mitigated by the fact that a lot of the time, the ARM core can do more work with one instruction than the 8051. The 32-bit math case is just one such example. In practice, instruction width results in only moderately more dense code on the 8051.
In systems that contain distributed access to variables, the load/store architecture of the ARM architecture is often more important than the instruction width. Consider the implementation of a semaphore where a variable needs to be decremented (allocated) or incremented (freed) in numerous locations scattered around code. An ARM core must load the variable into a register, operate on it and then store it back, which takes three instructions. The 8051 core, on the other hand, can operate directly on the memory location and requires only one instruction. As the amount of work done on a variable at one time goes up, the overhead due to load/store becomes negligible, but for situations where only a little work is done at a time, load/store can dominate and give the 8051 a clear efficiency advantage.
While semaphores are not common constructs in embedded software, simple counters and flags are used extensively in control-oriented applications and behave the same way. A lot of common MCU code falls into this category.
The other piece of the puzzle involves the fact that an ARM processor makes much more liberal use of the stack than an 8051 core. In general, 8051 devices only store return addresses (2 bytes) on the stack for each function call, handling a lot of tasks through static variables normally associated with the stack. In some cases, this creates an opportunity for problems, since it causes functions to not be re-entrant by default. However, it also means that the amount of stack space that must be reserved is small and fairly predictable, which matters in MCUs with limited RAM.
As a simple example, I created the following program. Then I measured the stack depth inside funcB and found that the M0+ core's stack consumed 48 bytes, while the 8051 core's stack consumed only 16 bytes. Of course, the 8051 core also statically allocated 8 bytes of RAM, consuming 24 bytes total. In larger systems, the difference is negligible, but in a system that only has 256 bytes of RAM, it becomes important.
Next post will dive into Architecture Specifics, and a more nuanced look at where each architecture excels.
PART 2 -->
one thing that should be mentiond above in the program memory size comparisons is that the typical ARM eats lots of codespace just to get started, where the '51 get by with just a few bytes