Multicore Madness

Summary: Here's one speed limit that's heavily enforced.

Route 140 here in Finksburg is heavily patrolled - break the speed limit and you'll likely face a fine.

But some speed limits can't be exceeded, no matter how much one wishes to. The speed of light comes to mind. As does the speed at which a teenager's brain matures.

where f is the percentage of a problem that cannot be parallelized, and n is the number of processors. In a system, where, say, only 50% of the problem can be executed in parallel, even with an infinite number of CPUs you can only halve the execution time by adding processors.

Gustafson's Law suggests that Amdahl is too conservative, and notes that sometimes problems scale faster in the parallel portion than in sequential. Google's Pagerank algorithm is one example. I suspect that in most embedded systems, though, Gustafson won't apply.

However, I believe Amdahl and Gustafson are optimistic in many cases, especially when working with symmetric multicore processors. These have two or more identical cores, each with their own L1 cache. They share L2 and a common memory bus. Executing out of L1 they will scream. But that cache is tiny - often only 32KB. Go to L2 - or worse, main memory - and the brake lights come on. Up to dozens of wait states slow processing, and bus contention will occur if more than one CPU needs memory at the same time. This effect is pretty hard to model since it will be both non-deterministic and very problem-specific.

But Sandia National Labs have come up with some interesting data showing that even on traditional parallel problems multicore's advantages diminish very quickly (https://share.sandia.gov/news/resources/news_releases/more-chip-cores-can-mean-slower-supercomputing-sandia-simulation-shows/). Going from two to four cores nets some serious execution-time reduction. Double down, to 8 cores, and there's no gain. Each additional doubling slows the system down - by a lot. A 64 core solution slows the system by half an order of magnitude over one with just four.

Multicore as being pushed by the major semi vendors in some cases can offer some significant advantages, both in terms of speed and power. But I think the benefits are being oversold. Memory bandwidth is a hugely-limiting factor. Alternatives such as asymmetric multiprocessing are often a better solution, depending, of course, on the nature of the problem being addressed.

A new processor technology from Venray Technology (https://www.venraytechnology.com/home.htm) is an interesting twist on the memory bandwidth problem. Instead of adding DRAM to a CPU, they add CPUs to DRAM. Small (20k transistors) processors are tightly integrated with memory. A typical arrangement marries 4 of these cores with 64 MB of DRAM. That puts the CPU transistor count at 0.01% of the memory. Venray's web site is long on marketing-speak and short of tech details, but the idea is compelling.

Published January 26, 2012