United States
Site Map Contacts Hitachi Global
Techno Musings Blog - Content and Information Management Hitachi - Inspire the Next

Content and Information | Physical Infrastructure | Enterprise Systems Management

Home > Corporate > HDS Blogs > HDS Blog Roll > Techno Musings
Products, Solutions and more

Techno Musings

To Multi-Core or Not to Multi-Core?

An article over at Infoworld brings up again something that I’ve talked about in reference to BlueArc: thinking in parallel.  The old adage about chewing, gum, walking, holding a conversation and trying to do a math problem roughly gets to why this is a challenging task.  (I’m going to get a bit esoteric here so bear with me for a minute.)  Our brains do indeed to many things in parallel and most of which we aren’t even aware of like maintain those good old autonomic functions freeing the conscious mind to do more interesting tasks.  The way that I like to think about it is that the human brain has put together a bit of scaffolding which is like an Operating System (OS).  This OS is rather like our “lizard brain” in that it keeps track of house keeping functions that if our conscious mind had to deal with would lead us all to a catatonia.  I could imagine the chaos if everything was processed in our conscious mind: “breath, breath, oh wait move arm, blink, oh breath, oops forgot make the heart, and deadlock quite literally.”  In my example here the “lizard brain” again is rather like an OS and the conscious mind is like the business application, storage application, game, or productivity application.  So I’m basically going to leave behind the OS/”lizard brain” and move on to the application/conscious mind and state that we need an OS (“lizard brain”) and application (conscious mind) both to solve useful problems.

Our conscious minds are wonderful things, but they are heavily serial in nature.  We tend to be more productive when we think about one task at a time.  I know that folks who post comments on the site will talk about multitasking, but speaking for myself even as a divergent thinker if I’m given time pressure and a single task I tend to produce quality work on time.  Where we can get parallelism without thinking too much is through an offload technique.  In fact I’m employing one right now: typing.  Basically I’ve practiced typing so much that I don’t have to think about where the keys are or look at the keyboard any longer.  I think in words and sentences and allow the offloaded logic deep in my brain to press the keys, to create the words, to generate the meaning, to communicate to you.  My conscious mind is largely not attending to the typing function and as a result I’m pretty efficient.  This is rather like the parallelism that is desired by computer scientists and processor manufactures alike.  However the usual approach of adding more processors or cores to the problem and forcing developers to start thinking in parallel is challenging because our brains while they process things in parallel don’t think in parallel.

The referenced article shows that Microsoft and Intel are investing $20M in establishing research organizations to study how to create applications with a high degree of concurrency to take advantage of the ever escalating number of cores per processor.  However, are there already successful models to look at?  For the sake of my post offloading is the key. One example of offloading in the digital world is the graphics processing unit (GPU) on most PCs today. Basically this is an offload which allows the higher order applications and OS to merely ask the GPU, hey go and do this, it does it and then comes back with the results not bothering the OS or application with the details.  This is rather like my hands typing. My primary application, my conscious mind, is free to think about the task of writing this article without having to be concerned about hunting and pecking to find every key; in a sense I have a TPU (Typing Processing Unit) and I can just feed it words and out they come.  Just to be clear to the folks that enjoy the Intel and AMD architectures I’m not saying that a certain number of cores is not a good idea, but there is a limit I think and a perfectly valid approach in offloading discrete tasks to special purpose systems much like GPUs.

I’m sure you are asking what is the point and how is this related to storage.  Well Hitachi’s approach to storage uses offload techniques already today both on the block and file lines.  We have  general purpose CPUs, ASICs, and FPGAs coupled to key offloaded functions like RAID algorithms, file systems, etc. to realize our designs.  In short I think Hitachi realized long ago that a hybrid architecture with offloading to different systems is a great way to go.  After all it is only natural!

What are your thoughts on this point?

Related Posts Plugin for WordPress, Blogger...

Comments (3 )

Post Comment

Shmuel Shottan on 31 Mar 2009 at 5:02 pm

Indeed a bit esoteric yet very well written.
Achieving a large degree of parallelism has been the Holy Grail ever since I started to design processors and systems.
The additional point I would like to add is about the applications. Some applications indeed lend themselves well to achieving parallelism with multiplicity of cores, most do not.
SIMD (Single instruction stream, multiple data streams) have achieved high degree of parallelism in past implementations, provided the data has parallelism. (Vector machines of the past are an example).
MIMD (Multiple instruction streams, multiple data streams), is the model the referenced article refers to. Past implementations have been either of the shared memory type (cache coherent), Like the Sequent Symmetry, or message passing implementation. Both require synchronization and create a chatter that does limit scalability. This might be the place to interject and pint out that I do believe in further acceleration of achieving large scale parallelism, and admit for the record that my claim to fame in the early ‘90s has been as the development lead for the AST Manhattan symmetric multiprocessing system).
Having been impacted by Michael’s esoteric description…this was like chewing gum, walking, holding a conversation and doing math while having to communicate to a steering oversight committee at the same time, which made the tasks non autonomic.
This leads me to Michaels’ offloading examples. Coprocessors have achieved the task of providing parallelism. Coprocessors by their definition interact in conjunction with a CPU.
DSPs, VLIW special purpose processors and systolic arrays were used for solving special purpose applications. This brings us to the point of using the appropriate resource to the relevant task.
Compilers will improve; the $20M investment is only yet another point in a long journey. It is indeed about the compilers. The compiler needs to have the data placed such that the communication overhead is reduced and exploit larger parallelism. The compiler development team with whom I working in the early ’90 did miracles, yet achieved little in the way of solving the challenge for running existing applications. Future progress will hinge on adapting programming extensions for newly developed applications.
As long as Moore’s law provided the easy way out by making the single processor faster, progress slowed. Necessity will accelerate the adaptation of new paradigms in parallelism.
Back to the hybrid model: The best implementations have indeed used offloading. BlueArc’s implementation offloads the file system execution to a special purpose designed offload engine, while leveraging the processing core for data applications.
I would like to conclude with the reason for the offloading of the file system functions. The key reason for it was the need to achieve massive fine grain parallelism. Since a NAS system will “park” on network and storage resources, any implementation that requires multiplicity of processors will create synchronization chatter larger than the advantage of adding processing elements beyond a very small number. Thus, the idea of offloading to a “co-processor” required the design from the ground up of an inherently parallelized and pipelined processing element by design. Choosing a state machine approach and leveraging ASIC design methodology by implementing in FPGAs provided the massive parallelism for the file system, as the synchronization was “free”.
And, I did write this paragraph practicing Michael’s recommended method of thinking about the sentences and offloading the typing. It works!

Michael Hay on 15 Apr 2009 at 4:51 am

Shmuel, thanks for the thoughtful comments. It appears that Barry is being quite contrary with is most recent post and may have read Dune recently.

[...] I think that you might have read Dune recently. Anyway I want to get back to a point in my previous posts on programming for multi-core processors.  The notion Barry asserts that Hitachi is somehow [...]

Post a Comment





.

Techno-Musings

Techno Musings

Connect with Us

   

Recent Videos

Switch to our mobile site