Infiniband, a Dead Man Walking? – UPDATED
by Michael Hay on June 28, 2010
Okay now that I have grabbed your attention, I want to open up the discussion on this topic. Specifically, I’ve been watching as IB has been put into several appliances of late: Oracle Exadata, Clusterix KVS, etc. There are a series of well oiled uses too such as RDMA for HPC, NetApp’s usage for internode communications, SGI’s mapping of NUMAflex on top of IB, XSigo’s use for I/O aggregation, etc. However, these are all niche plays and IB was meant to be the I/O messiah in the datacenter — or solve world hunger, I forget — but alas IB hasn’t really lived up to the hype. With all of that in mind is there a replacement on the horizon? (Mind you by replacement I’m thinking 3+ years out…)
There is a dark horse — thanks Phil for the usage of this term but in a different light — data center I/O contender and it is not RapidIO. Drum roll please; the dark horse is our friend PCIe. I know that a lot of you might think that PCIe is nothing more than slots used to put HBAs, HCAs, CNAs, GPUs, etc. into servers, laptops and deskside workstations, but there is more to it. In fact, there are whole other uses being explored today beyond that slot in the pizza-box — yes this is a silly reference to solving world hunger.
Just for fun I started poking around to see what I could find in the way of PCIe usage beyond I/O slots. Here are a few findings:
- Switching – there are several vendors offering external PCIe switches to solve a variety of problems. Depending on the application, connection type comes in the form of x4 and x8 links allowing for up to 40GB/s bandwidth at extremely low latencies, 200ns-300ns. For reference this is basically comparable to IB in both bandwidth and latency.
- I/O expansion/consolidation – several companies are on the market today offering server I/O expansion shelves coupled to servers, Hitachi is one such company for the BladeSymphony 2000. External I/O expansion increases flexibility and density of I/O interfaces, GPUs (for GPGPU applications), storage, etc. connections to high density servers that don’t traditionally have a lot of I/O slots.
- Internode communications – at least one company is doing this today, One Stop Systems, and Hitachi (inside of our blade servers) is using PCIe to make real the magic of Virtage. Specifically, we have the ability to gang 4 blades together to create a single SMP system, and the underlying HW behind this is PCIe. (Rick has an interesting take on Virtage here.)
- Flash Storage – at least two companies are doing this today mostly in partnership with the likes of Marvell, FusionIO and Texas Memory. The usual package is basically a PCIe I/O expansion shelf packed with Flash storage and attached to a server with PCIe as the interconnect.
Beyond these activities other uses and features are emerging; notably is the work within the PCI-SIG to tunnel Ethernet over the PCIe, see the PLX article on this topic. One of the key points in paper from PLX is a point on costs. Now as a part of my digging to prepare for this post I did take a look at the cost points for IB and PCIe interface cards as well as comparable switches. What I found through some digging is that for less than $800 you can get an x8 PCIe card or a DDR IB card. To me this means that while there are actual price differences, since differences are within $30USD of each other, I suggest that the prices are about the same. Now as to switches I looked at a less than 12 port configuration and for PCIe the cost of an 8 port switch was cited at $2485 in “OEM quantities” so just for fun let’s double the price to $5570. The smallest IB switch that I could find is a 18 port 1U rack mount system comes in at around a price of $5532. So, while the interface card costs are about the same, based on my quick “back of the envelop” price checking, the switching costs of PCIe, when compared to IB, rule out large scale usage today for huge HPC systems. However, since PCIe technology is in every laptop, deskside, desktop, rackmount, and blade server on the planet and IB is not, expect the prices of PCIe things to drop, especially when considering a 3 year or greater time horizon. (Note just for a bit of trivia, in Feb., 2008 PLX celebrated 2,000,000 PCIe chip shipments for a variety of applications. I’m sure I could spend more time hunting the PLX SEC filings for more updated numbers, but I do have a day job to do so…)
With all of that said, I want to propose my personal hypothesis about IB vs. PCIe. When thinking beyond the 3 year mark and assuming Ethernet on top of PCIe for in-rack or between a fixed number of racks and 10GigE or 40GigE for long haul within the data center, I hypothesize that IB will not make it as a technology. There I said it, and now I open up the flood gates to various inputs on Twitter or via comment on this point. Let’s see what happens…
References:
UPDATE
There is a new article talking about wireless PCIe here.
Comments (5 )
Greg Knieriemen on 29 Jun 2010 at 6:26 am
Michael: Where does FCoE fit into this (or doesn’t it?)
Michael Hay on 29 Jun 2010 at 2:36 pm
Greg, good question. I think that the PLX article, which talks about tunneling Ethernet on top of PCIe including FCoE, implies that while the physical media might change from native Ethernet to Ethernet mapped over PCIe, the Ethernet protocol remains persistent. Note that one of the articles in my reference section also points to an RDMA-like stack being built on top of Ethernet as well called RoCE (pronounced rocky), so add that to the mix as well. Does this answer your question?
Greg Knieriemen on 30 Jun 2010 at 8:15 am
Yep… perfectly. I forgot about RoCE. Thanks!
Michael Hay on 30 Jun 2010 at 4:06 pm
No problem!
Techno-Musings >> Blog Archive >> 2010 meant great changes for Techno-Musings. What will 2011 bring? on 15 Aug 2011 at 4:27 pm
[...] 2. Infiniband, a Dead Man Walking? [...]



