LONDON EDA companies, academic institutions, stalwarts of IT and some inspired startups are wrestling with one of the fundamental questions besetting the electronics industry: what’s the best way to go parallel?
They are studying programming languages, compilers and architectures as engineers start to realize that the old-fashioned ways of doing hardware and software do not work well in multiprocessor designs.
In Europe, the research organization IMEC (Leuven, Belgium) is planning a major tilt at the issue while startups such as Imperas Inc. (Palo, Alto, Calif.) and Compaan Design BV (Leiden, Netherlands) see commercial opportunities crying out to be exploited.
The general problem has been known for a long time. It’s how to best generalize, and if possible automate, the efficient assignment of parallelized software to an unknown and variable number of processing resources. But it remains a big hole in the theoretical underpinning of modern electronics, and now it is becoming urgent for the semiconductor chipmakers and for the EDA industry whose charter is to support them.
Parallel programming for multicore processors “is the biggest challenge in computer science today,” said David A. Patterson, a professor at the University of California, Berkeley, speaking at a Microsoft Research open day held recently in Redmond, Wash.
Right now much of that programming work is done manually: partitioning between software and hardware; assigning software to particular processors and hardware accelerators, testing the paths between processors, and avoiding bottlenecks, data contentions and deadlocks and so on. The result is chips of a few cores that work, but not necessarily in the most efficient way. And manually rendered systems cannot scale with Moore’s Law as the number of interprocessor connections and interactions scale exponentially with the number of processors on chip.
Right now the debate often focuses on whether parallel processing will be heterogeneous – with many different processor cores on a chip – or homogeneous arrays of resources.
For Ralph von Vignau, senior director at NXP BV (Eindhoven, Netherlands), speaking as a panelist at the Design Automation and Test in Europe exhibition and conference in Nice, France, the nondeterminism of the C language is a big issue. Nonetheless, von Vignau argued about parallel processing from the platform point of view. “There are more and more special processors coming on chip to run software,” he said. He added that there are concerns over unchecked software interactions, latencies and the possible need to time-stamp information as it is passed around a chip.
For Gary Smith, founder and principal analyst at Gary Smith EDA (Santa Clara, Calif.), the kinds of heterogeneous multiprocessing designed today are the low-hanging fruit that will be picked first. The bigger challenge is larger homogeneous multiprocessing, which he believes cuts across some of the fundamental assumptions at the heart of the IT industry.
“If we accept restrictions and the strong coupling of modules with their own software, then the tasks are isolated; it is not a big issue,” said Smith. He dubbed this the “padded cell” approach to multiprocessing. “But if we want to use a scalable homogeneous architecture it is very different.” Smith cited Amdahl’s law, named after computer architect Gene Amdahl, that the speedup of a program by using multiple processors in parallel is limited by the sequential fraction of the program, and pointed out that for general-purpose processors and applications the practical limit is about four. “Intel is already at four cores on a chip, so where are they going to go?”
Smith predicted EDA and the academics associated with the sector were among those most likely to solve the parallel-processing problem. “EDA engineers are experienced in concurrent programming. The software world doesn’t understand,” Smith said. “EDA may come to a solution faster than the software community.”
It is certainly true that EDA companies are already wrestling with simulation and verification software that has to be parallelized across server farms of up to 10,000 computing “blades” to allow applications to finish in a reasonable time. And the general rule is that today’s system will tend to become tomorrow’s system-on-chip (SoC).
Nor does Smith rule out “a deterministic, concurrent language to replace C,” citing research teams attacking the problem.
Microsoft Research, for one, is working on a language that automates the creation of parallel constructs. Meanwhile, Darpa’s High Productivity Computing Systems (HPCS) program is evaluating Cray’s Chapel, IBM’s X10 and Sun’s Fortress languages as candidates for programming next-generation systems, which are expected to use tens of thousands of processors.
However, the existence of billions of lines of C code suggests that some sort of extension to C may be the more popular way forward.
“The C language isn’t appropriate [for expressing concurrency] but it’s not practical to get rid of C,” said Simon Davidmann, cofounder and CEO of Imperas. “The practical approach is to use C with an API so you don’t use C as the method of communication. The approach is then to use the API for communication, as some sort of coordinating script to describe the software architecture and its attributes.”
The topology of hardware architectures can be defined using XML-based descriptions as defined by the Spirit consortium, and UML can be used to express software architecture topologies, said Peter Flake, Imperas’ chief scientist, and Frank Schirrmeister, the company’s vice president of marketing, writing in EDA Tech Forum Vol 4, Issue 1.
However, Davidmann is not a fan of just-in-time or run-time compilation, one way proposed to cope with the need to assign software across variable resources, at least not for embedded applications. “Linux OS is all dynamic. This works well with things that don’t have hard real-time needs. But real-time tends to need static allocation. Basically there’s the OS way of doing things and the hardware-replacement way of doing things.”
“The software guys have to realize that with multiprocessors there’s going to be a lot of testing and debug that has to be done. That’s why we have joined the Multicore Association to try and develop a consensus on the APIs,” said Davidmann.
Martijn de Lange, founder and CEO of ACE Associated Compiler Experts BV (Amsterdam, Netherlands), pointed out there are many types of parallelism and many aspects to compilation, many levers that can be used.“There’s data parallelism, such as radar and image processing – that’s a nice paradigm to work in. There’s task parallelism such as an operating system with threads and tasks, and pipeline parallelism often found in networking applications.”
ACE is a supplier of products and services for compiler development. CoSy allows companies to build their own compilers for custom processors and often for multiprocessors. “We’ve helped Clearspeed with their MIMD approach and Imagination with their Meta multithreading processor. Compaan has a top-down approach that takes executable C code and automatically parallelizes it,” said de Lange.
Compaan Design is a 2005 startup led by founder Bart Kienhaus, previously a researcher at Philips Research and a postdoctoral researcher under Professor Ed Lee at University of California, Berkeley. Compaan has chosen to focus on a particular application domain to make headway in parallel programming. Compaan has developed a technology that it claims enables the automatic generation of a pipelined, streaming architecture of both hardware and software components in programming heterogeneous multicore platforms. Compaan offers to find all the parallelism in an application model and the use of the Kahn Process Network model to express parallelism.
Rudy Lauwereins, IMEC vice president responsible for design research, has an answer to the heterogeneous vs. homogeneous multiprocessing question: they will coexist. His group of nearly 200 researchers is focused on software-defined radio and on the design of wireless systems and multimedia baseband chips for the application.
Lauwereins pointed out that in this application domain, but also in many others, the need to go to multiprocessing is driven by the need to support multiple services and minimize energy consumption in a scalable way.
“We foresee tens of processors within tiles that are domain-specific,” said Lauwereins. “So you have to parallelize within the task within the application.” Lauwereins emphasized that the allocation of resources needs to be done at run-time. “This platform should be scalable but you should be able to write the software once. You write the software to scale within the resources.”
IMEC is launching a three-year program to address next-generation wireless systems that may need to handle 2 Gbits/s or 3 Gbits/s of data transmission either on an ultrawide band network or possibly on a 60-GHz carrier. “We need at least a factor of 20 more compute performance,” said Lauwereins. “We think we can get a factor of four from process but for agile radio we need extremely parallel processing internally,” he added.
Lauwereins said the IMEC research program already has a couple of commercial partners and that he would assign about 70 researchers to work on the wireless part and 70 researchers to work on the multiprocessing problems in the baseband SoC. The digital work divides into three areas, compiler research, design and debug support, and technology.