The Capacity Illusion
by Hu Yoshida on Oct 8, 2006
Storagemojo by Robin Harris is a blog I always find interesting since he draws a lot of discussion. His recent post on Utilization versus cost: the capacity illusion was a comment on two of my posts about the low cost of storage which enable Web 2.0 companies to offer free storage to their subscribers while a large enterprise customer complains about the under utilization of his expensive enterprise storage. He finds both these comments to be true, "but in dire conflict". He points out that storage suffers from a "capacity Illusion" and really should be rated in terms of IOPs and management costs. This post has already spawned 8 very thoughtful comments.
I agree that the days of measuring the cost of storage based on capacity should be over. Unfortunately the industry pundits still measure capacity shipped by server price band, which is another measurement that is becoming irrelevant.
I don’t feel that the two observations I made are in conflict. The lowering cost of capacity does enable the Web 2.0 companies to offer a free GB of storage to each subscriber and still make money, and it has also led the enterprise customer to over buy storage because it was cheaper than trying to manage it. The Web 2.0 guys have only been doing this for a short period of time, less than a year in some cases. The enterprise guys have been compounding their problem for a much longer period. Imagine if a Web 2.0 company signs up a million users, and each user fills up a GB of storage, that could be a petabyte of primary storage and 4 to 5 petabytes of storage for protection and management, etc. If both these types of companies continue to buy storage without managing it they are headed for disaster.
Changing the conversation from GBs to IOPs could also be an illusion. One of the comments to this post by Storagemojo, suggested that vendors publish SPC1 and SPC2 storage performance specifications. While HDS is an active member of the Storage Performance Council, we do not publish SPC1 or SPC 2 benchmarks. Primarily because these benchmarks include price per performance metrics. Since other vendors OEM or resell our storage we leave it to them to do these benchmarks. There are also a lot of questions as to the value of these benchmarks as Robin points out in one of his comments.
Ideally the value of storage and IT as a whole should be reflected in the companies bottom line. The right storage, on the right tier, at the right time, should help to minimize costs and increase service to their end users thereby increasing business growth. Storage and IT costs should track to revenue. The right storage depends on the application requirements. Sometimes it is high capacity, somethings high IOPs, sometime high transfer rates, sometimes functionality like distance replication, etc.
As Robin points out in one of his comments, "…getting customers focused on more important issues ultimately pays dividends: happier customers; more accurate competition; higher margins."
Comments (2 )
Hu, David Merrill dropped by StorageMojo.com as well, adding to the substantial number of comments on the post. Here is my most of my response to David:
When RAID arrays were developed, capacity was expensive and I/Os relatively cheap. Which is why the bright but impoverished academics and students at Cal came up with the idea of using small, cheap, unreliable drives to build a big reliable drive.
Now the world is different – or at least the technology is – and capacity is cheap and I/Os expensive and getting more so. Therefore, I submit, if Patterson et. al. were designing a fast, very big, very reliable drive today, it would look very different.
How? For one thing, lots of copies on different disks would provide both reliability and performance. Writes would be bunched for sequential write performance. Overwriting would be a background garbage collection function rather than a function of writing. Variable stripe writes might be implemented for performance. Cheap mirrored independent controllers might maintain small write caches if needed, rather than today’s costly dual-port caches.
I don’t know what a really clever team of engineers would design. I’m real confident that it would not be the 20 year old architecture we use today. That is why the discontinuities I saw in Hu’s posts are significant: customers are feeling uncomfortable, they don’t know why, and it is pointing to a bigger problem.
The company that designs and successfully markets the next generation (RAID 2.0?) storage array is going to make a hell of a lot of money. Why doesn’t HDS do it?
Think you miss another interesting point in the SPC benchmarks, HDS has consistantly been in the bottom 5% of performance and thru-put of results published for the past years. HDS OEM from Sun has resulted in published results, very very poor performace.
If you also see no benefit in publishing your SPC benchmarks that fine but hiding behind price performance continues a marketecture rather than architecture approach.