YAAA! – Yet Another Automobile Analogy
by Ken Wood on Dec 10, 2009

YAAA! – Yet Another Automobile Analogy
It’s dangerous for me to own one which is why I drive a truck, but I love fast cars.
Whether production, stock, or home made, fast cars are always intriguing, challenging and sexy. I especially enjoy reading facts like this from Car and Driver, “…the Veyron’s fuel consumption at 253 mph was 3.0 mpg. At full throttle, its 26.4 gallon fuel tank would empty in just 12 minutes 46 seconds. After 15 minutes at a continuous 253 mph, the tires would melt”. After attending the Supercomputing ’09 conference in Portland a couple of week’s ago, it got me thinking about what many of the High Performance Computing leaders are proposing, performance with practicality (or restrictions).
The Challenge! Build a 1 Exa-Flop supercomputer without melting the datacenter. IBM’s Sequoia project is proposed to be a 20 Peta-Flops supercomputer that will require ~20 mega-watts of power to run. At today’s state-of-the-art components and architectures, a 1 Exa-Flop supercomputer is expected to require 3 giga-watts of power to run JUST the system. It’s estimated that another 700 mega-watts of power to run the facility for a total of 3.7 giga-watts is required. These impressive numbers are just for the computing. The storage end of this is still an additional cost to the power and facility budget.
Again, this is based on today’s state-of-the-art knowledge and stuff. This means if we massively scale-out with no real concern for operational expense – OPEX. So one area that is being revisited is to combine scale-out and scale-up architectures. Continue to cluster but do so with much more powerful computers that can do more with less, probably for more dollars. This means that instead of using many low cost systems, use fewer high cost systems with more specialized processing for communications and networking, memory processing, IO, and what I’ll call process processing.
Process processing is similar to the cell processor found in the Play Station 3. Only a small percentage of computing is for the operating system traffic and other non-data processing work, yet multi-core processors are designed with identical heavy weight processors that do everything. With cell processing, only one core runs the OS traffic handling, i.e. processes meta-data and meta-instructions. The remaining processors are lighter weight and perform only data calculation work. This sort of swings back to the old days of CISC (Complex Instruction Set Computer) vs. RISC (Reduced Instruction Set Computer) wars that occupied everyone’s time back in the early to mid ‘90s. However, this could usher in the asymmetric hybrid computing nodes. A many core systems with a few operating system traffic processors and a horde of specialized processors that perform the floating point operations and other calculation intensive computing, network and communication processing, IO processing, memory handling, etc. The new battleground will be whether to share everything with everything, similar to graphic memory versus shared system memory and multi-core processors sharing all of the work, or to break everything up into specialized and “tuned for the specific job” processors.
My bet is on the latter. I’ve always noticed that in order to make everything shared and working as one, everything had to be running like one, i.e., one fast clock rate and processors consuming lots of power waiting for something to do. This is counter to many specialized processors in a system tuned to doing specific jobs more efficiently and usually better. The idea here is that you don’t need a maximum clock rate if you know what’s being processed. Think of a system with several general purpose CPUs for OS processing as we have today, and a thousand offload cores for calculations, processors for communications and networking, and processors for IO. Then scale-out these nodes. The early stages of these types of systems do exist today and are always evolving. The difficult part is getting the user and programmer interfaces as simple to use as today’s general purpose OS and processors. Until these hybrid systems become as easy to program and use as a basic Linux or Windows system, they will always be as accessible as race cars and race tracks are to the rest of us.


