My February
column in Embedded Systems Design Magazine was an attempt to show
that the emperor, at least when talking about multicore technology, has
no clothes. Multicore is being hyped as the solution to clock rate
stagnation, when it really addresses two problems:
- A handful of “embarrassingly parallel” problems can derive great
performance benefits from SMP.
- In many applications one can reduce power consumption by using more
processors at slower clock rates.
Actually, there is a third problem that multicore solves: the
vendors’ need to sell us more transistors as they continue to exploit
Moore’s Law.
Now a study in IEEE Spectrum
shows that even for the classic embarrassingly parallel problems like
weather simulations multicore offers little benefit. The curve in that
article is priceless. As the number of cores grow from two to 64
performance plummets by a factor of five. Additional processors nullify
each other.
Call it the Nulticore Effect.
One might think that more CPUs equals faster systems, but in
traditional symmetrical multiprocessing groups of cores sharing the
same memory bus, a bus that even with a single core is already as
congested as Highway 101 at rush hour. Memory simply can’t keep up with
a single-cycle machine that can swallow a couple of instructions per
nanosecond.
We all know this; it’s the reason a modern processor is crammed full
of complex circuits like pipelines and cache. Every access to the bus
entails numerous wait states which bring the system to a screeching
halt. Add more cores, all demanding access to that same bus, and system
performance is bound to drop.
Other problems surface. We know that absent scheduling algorithms
like RMA (rate monotonic analysis) - which itself is highly problematic
- preemptive multitasking is not deterministic. Though most embedded
systems use preemptive multitasking, there’s no way to insure the
system won’t fail from a perfect storm of interrupts and task switches.
And it’s hard - really hard in a complex system - to get
multitasking right. Add in multiple cores, each of which is constantly
blocking the others from memory, and determinism looks about as likely
as every school kid’s plan to become an NBA star.
Reentrantly sharing memory is tough enough with a single processor;
when many share the same data the demands on developers to produce
perfectly locked and reentrant code become overwhelming.
Then there’s the little issue of parallelizing programs, an unsolved
problem that is to supercomputing what the holy grail is to the Knights
Templar - plenty of rumors, lots of speculation, but no hard results.
There are a lot of smart people working on these problems and I’ve
no doubt they will be solved at some point. But today a generally
better approach is asymmetric multiprocessing, where each core has its
own memory space. More on that later.
Jack G. Ganssle is a lecturer and consultant on embedded
development issues. He conducts seminars on embedded systems and helps
companies with their embedded challenges. Contact him at jack@ganssle.com. His website is www.ganssle.com.