Follow @jack_ganssle

The logo for The Embedded Muse

For novel ideas about building embedded systems (both hardware and firmware), join the 27,000+ engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype and no vendor PR. It takes just a few seconds (just enter your email, which is shared with absolutely no one) to subscribe.

This month we're giving away the Zeroplus Logic Cube logic analyzer that I review later in this issue. This is a top-of-the line model that goes for $2149.



By Jack Ganssle


Published 12/05/2008

My February column for Embedded Systems Design was an attempt to show that the emperor, at least when talking about multicore technology, has no clothes. Multicore is being hyped as the solution to clock rate stagnation, when it really addresses two problems:
- A handful of "embarrassingly parallel" problems can derive great performance benefits from SMP.
- In many applications one can reduce power consumption by using more processors at slower clock rates.

Actually, there is a third problem that multicore solves: the vendors' need to sell us more transistors as they continue to exploit Moore's Law.

Now a study ( shows that even for the classic embarrassingly parallel problems like weather simulations multicore offers little benefit. The curve in that article is priceless. As the number of cores grow from two to 64 performance plummets by a factor of five. Additional processors nullify each other.

Call it the Nulticore Effect.

One might think that more CPUs == faster systems, but in traditional symmetrical multiprocessing groups of cores share the same memory bus, a bus that even with a single core is already as congested as 101 at rush hour. Memory simply can't keep up with a single-cycle machine that can swallow a couple of instructions per nanosecond. We all know this; it's the reason a modern processor is crammed full of complex circuits like pipelines and cache. Every access to the bus entails numerous wait states which bring the system to a screeching halt. Add more cores, all demanding access to that same bus, and system performance is bound to drop.

Other problems surface. We know that absent scheduling algorithms like RMA (which itself is highly problematic) preemptive multitasking is not deterministic. Though most embedded systems use preemptive multitasking, there's no way to insure the system won't fail from a perfect storm of interrupts and task switches. And it's hard - really hard in a complex system - to get multitasking right. Add in multiple cores, each of which is constantly blocking the others from memory, and determinism looks about as likely as every school kid's plan to become an NBA star. Reentrantly sharing memory is tough enough with a single processor; when many share the same data the demands on developers to produce perfectly locked and reentrant code become overwhelming.

Then there's the little issue of parallelizing programs, an unsolved problem that is to supercomputing what the holy grail is to the Knights Templar - plenty of rumors, lots of speculation, but no hard results.

There are a lot of smart people working on these problems and I've no doubt they will be solved at some point. But today a generally better approach is asymmetric multiprocessing, where each core has its own memory space. More on that later.