Answering Ilja’s Request for Server Capacity – HDS Style
by Ken Wood on Nov 30, 2011
For everyone that celebrates the holiday, Happy [late!] Thanksgiving. The week of Thanksgiving in the US is a great time to catch-up on many of those little work chores that pile up or slip through the cracks while traveling and prioritizing big tasks ahead of fun work stuff.
This morning I was catching up on some HDS Industry Influencer Summit bloggers’ and analysts’ write-ups and opinions from the days of Nov 10th and 11th. As I wrote previously my blog, I was in a concurrent breakout event with several industry bloggers on the “other stage” during the afternoon’s main session. I was following links to various write-ups and ran across Ilja’s Coolen’s blog. Funny thing is, it was a write-up from a separate event back in March in the UK that I didn’t participate in. So, when I finished writing this blog, I realized what happened: too much link-clicking, and I ultimately ended up at an article that was six months old. While it is several months old, Ilja’s post did ask a very good question that still I would like to respond to on server packaging and capacity. In his blog, he states:
“Hitachi is able to deliver a completely filled 42u rack with 320 high density micro servers. The total rack would consume less than 12 Kilowatt. Whether or not this is a great accomplishment, actually depends on the total processing capacity this rack would have. I need to dig deeper into this to make a comparison.”
I have, in the past, performed paper exercises of sizing computing horsepower for initial comparisons. When everyone uses the same processor chips from either Intel or AMD, do they all perform the same? Where are the differentiators? To cut through the fat, one area is packaging.
The whole discussion of rack mount servers versus blade server systems has been in debate for a number of years, and it is somewhat falling into a religious discussion. The argument for commodity-like advantages of pizza box servers over the easier to manage enterprise blade server architectures is not going to be resolved here. But what I will offer is some of my insight into performance and the advantage of packaging using the same processor chipsets.
Using a standard formula to calculate floating-point operations per second (FLOPS) then applying to a server system such as the one I use here:
(Number of FLOPS per Cycle) * (Clock Cycle) * (Number of Cores)
Adding additional system level information will yield the system’s or blade’s overall calculated FLOPS performance (of course this is a brute force approach, but it is a good starting point). For instance, using the Compute Blade 320 (CB320) blade server option as stated in Ilja’s blog and as mentioned by Lynn McLean in her presentation:
- Using the X5670 XEON based blade
- (there is a reason I’m using this one and not the fastest processor option for now)
- 4 FLOPs per clock cycle
- times 2.93GHz
- times 6 cores per processor
- times 2 processors per blade
- Single precision FLOPS – 4 times 2.93GHz times 6 times 2 = 140 GFLOPS
- times 10 blades in a system= 1.4 TFLOPS
- times 7 CB320 systems in a rack = ~10 TFLOPS
I should note here also, these are general purpose computing FLOPS as compared to GPGPU FLOPS, which require additional coding and compiling steps to take advantage of this technology. This means almost everything running on these systems can take advantage of the performance assuming internode awareness (application is scale-out aware).
If you followed this same formula for the standard 1U rack mount server using the same chipset and core count, it would result in about 6 TFLOPS (140 GFLOPS per server times 42 servers in a fully populated rack) compared to almost 10 TFLOPS per rack using the CB320 (70 blade servers in a rack). So, net results of this exercise is packaging and density of 70 blade servers in a rack compared to 42 servers using the standard 1U rack mount servers yields a 40% improvement in floor space requirements for the same computational capability. Stated in a more measurable metric, the CB320 yields 235 GFLOPS per rack U compared to 140 GFLOPS for a standard 1U server in the same 42U rack. On the higher end of the CB320 product line, the X5690 blade, the calculated floating-point performance for this blade is 166 GFLOPS, which would put the rack’s total calculated performance at 11.6 TFLOPS or 277 GFLOPS per rack U.
Hopefully this helps answer Ilja Coolen’s question about server density and capacity. The other point to finish off this article is the notion of data intensive and computational intensive architectures. I’ve just shown some data that suggests the CB320 has the capacity to be very computationally capable. On the other hand, the CB2000 has the capability to be very data intensive. The specifications for the CB2000 states 16 GB/s total bandwidth in a single blade system and 64 GB/s of total bandwidth from a fully configured rack. Combined, these two systems form a formidable platform for solving Big Data challenges. Not that Big Data problems are a floating-point intensive workloads, but you never know.
More on this thought in future blogs.
Thanks for your response on my question in a post I did many moons ago.
It is a clear answer to that question and the numbers look impressive. I guess the next step would be to conquer the European market and take away a chunk of Dell’s and HP’s market share. But if I remember correctly, the Hitachi servers are only sold in a “complete solution” package, so taking away server vendors’ market shares is not part of HDS strategy.