There has been so much discussion in the past 2 weeks over Hitachi Dynamic Provisioning (HDP) and the size of our pages which are, by comparison, large when compared to other thin provisioning implementations. As we all recall from our first course in Computing 101 and memory management, larges pages in memory provide better performance but at the expense of using more capacity. In the “olde” days of extremely expensive memories, large pages were considered a very bad thing. Now let’s look at this phenomenon as it applies to storage, and we see a different argument since: (1) the scale of costs is dramatically different, (2) we’re only looking at disk access, not the majority of I/O which is typically satisfied out of cache, and (3) in our HDP implementation, we actually can use the entire 42 MB page depending on whether the next WRT is sequential or random. I won’t rehash all the arguments here; you can read them all in Hu’s blog, Marc Farley’s, Nigel Poulton’s (replete with video; nice touch, Nigel!), not to mention Tony Asaro’s this morning.
But there is a related topic I’ve been wanting to write about for a while now, and it relates to the trade-off in memory management at it relates to HDP and page size. But before I dive into it, let me take you back in time and (re)introduce some buzzwords and concepts from the past, in this case, the late 1970’s and early 1980’s. The two concepts were: “Access Density” and “Short Stroking”. Access density is a measure of IOPS per capacity of a hard drive and Short Stroking was what a lot of folks were doing to compensate for declining Access Density. The two terms are, or course, still used but rarely outside the domain of DBA’s. Put 10 DBA’s in a room and you’ll know what I mean (I used to be a DBA so I’m kinda allowed to poke them).
The problem at the time was that drive capacities on those 14-inch diameter behemoth drives were increasing, but they couldn’t spin the drives faster than 3200 RPM. Spinning a 14-inch drive reminds me of those circus guys spinning dinner plates on the ends of sticks: the tolerances by today’s standards were pretty bad. The problem was increasing capacity with the same RPM. Sound familiar? That is the Access Density problem.
The answer to this dilemma was “Short Stroking”, meaning just putting less data on the drive (to reduce the IOPS demand). It even included physically placing data in the center of the platters so that the heads wouldn’t have to seek (stroke, hence the term) as far. And you all thought Short Stroking was a description of my golf game? It actually is, but that’s a discussion left for another day…
Enter the new decade. Anyone see a repeat here? It’s been quietly happening again, since the drive folks are stuck at the 9-year old 15,000 RPM drives. Unfortunately, that will not change; there is no 20,000 RPM drive in our future. And while we all want utilization rates to increase from the measly 20%-30% industry average, the fact is that unless we start adopting available technology, it will not, for many applications.
I’ve spoken with many customers that were delighted with their 36GB 15K drives, tolerated their 76GB counterparts, but limit the amount of data stored on their 300GB or larger drives. Meaning if something is not done, utilization rates might actually go down further. In other words, they’re Short Stroking, especially in the database world. Performance still rules, and the drive guys won’t be giving us any relief, and flash drives are still too pricey. So what do we do?
Well, HDP, and Thin Provisioning in general, address the utilization rate problem, I think we’ll all agree on that one. But does it address the Access Density problem? Read on…
Our implementation of HDP lays out data in 42MB stripes (32MB stripes on the HDS AMS2xxx midrange array, that was announced today) in a wide striping scheme that offers as much as a 700% improvement (measured) in workloads (or more!!). Access Density, what’s that? Having a 700% improvement in workload throughput is equivalent to having 105,000 RPM drives (very crude arithmetic and workload assumptions, but you get the point; don’t beat me up on this!). We’re seeing many of our customers adopting HDP only for performance reasons, and not only for thin provisioning reasons. A customer I spoke to last week had a batch job that went from 10-15 minutes to less than 1 minute using HDP wide striping (150,000-225,000 RPM drives?).
And it gets better…
Considering that the major effort in provisioning, especially databases, is to “provision for performance”, now you don’t even have to do that. I’ll defy anyone to manually provision in a way that will outperform the wide striping we do with HDP. So now I’m telling our customers to just throw the data out into a HDP pool and let us manage for performance. For the thin provisioning argument, there are some caveats with some older file systems, but for performance, there are no drawbacks.
So with HDP we have:
1. Thin provisioning – 30%-40% improvement is not uncommon
2. Massive improvements in performance – orders of magnitude
3. Provisioning for performance – What’s that?
Maybe soon we can get all the world’s DBA’s to have a bonfire-party to burn all those old whitepapers they’ve been clutching for 30 years…..I’ll provide the matches…
Comments (4 )






Bas Raayman on 30 Jun 2009 at 1:06 am
Claus,
I can see your point of the 105K RPM drive, but wouldn’t the 42 MB page size necessity (or the reasoning behind this page size) become obsolete if you were to use flash drives…?
Bas
Bas Raayman on 30 Jun 2009 at 1:07 am
By the way, I think it’s Marc Farley and not Marc Foley.
Claus on 30 Jun 2009 at 11:27 am
Firstly, Marc, I apoligize for messing your name up (note to self: no more midnight posts!!). Thank you Bas, for pointing that out. I’m always having my name mispelled so I should know better.
As to your other question, I would never argue that we’d be as fast as flash but I would argue that we’re cheaper!!
The way I look at it is wide-striping becomes an interesting alternative somewhere between standard FC drives and flash. What’s interesting is that they both boost performance for the same type of workload, namely, high R/W ratio with random reads. There’s WRT performance benefit as well, but it’s less dramatic.
It’s the customer’s vote on this one. We just provide the candidates…
Hu Yoshida » Blog Archive » HDP Is More Than Thin Provisioning on 06 Jul 2009 at 11:20 pm
[...] Claus Mikkelsen visited a financial customer in New York where they were excited about the performance improvements which come from the wide striping of pages across the width of the HDP pool. They saw a 10 – 15 min batch job run in less then a minute. Claus points out that a major task of provisioning, especially for data bases, is to provision for performance Since HDP stripes across all the disks in the HDP pool, and now with v05, automatically rebalances the stripe when new pages are added to the pool, performance tuning by manually balancing spindle usage is a thing of the past. [...]