Clearthought on Cartesian Scaling
by Hu Yoshida on Feb 22, 2010
Vinod Subramaniam, the founder and CEO of Clearthought Inc. provided the following comment on my post on Cartesian scaling.
“I tend to look at this entire scalability issue from a different perspective. Storage vendors have to classify customers as Capacity Hungry or IO Hungry. Once this is done the foundation is laid for choosing a scalability model.
For a capacity hungry customer e.g 500TB and around 20000 IOPS it may not make sense to pick a USPV and the scale up model. It may make more sense to pick a USPVM with AMS2500′s behind it and the scale out model. For a IO hungry customer e.g 100TB and 200000 IOPS it will make sense to pick a USPV and the scale up model.
The Storage Industry’s mantra of every 1TB added requires 1GB of cache needs to revisited. A capacity hungry customer may need 1GB of cache for every 10TB added. A IO hungry customer may need 2GB of cache for every TB added.
Array sizing and capacity planning ( TB, Processors, Cache ) is mostly black magic at best. ”
Vinod makes a good point. A lot of the old rules of thumb that storage administrators have used in the past are changing. I/O hungry customers with a lot of virtual servers or active data bases may need much more cache per capacity than ever before. Unstructured data may need high performance up front but becomes static very rapidly. 60% to 80% of the data we store is reference data that could be on lower cost tiers of storage until it is needed. While utilization is driven up to contain costs, demand buffers must be created to contain wider and wider swings in volumes as businesses react instantly to unexpected events in Dubai or other global markets.
It is getting harder and harder to predict what you will need in the next 3 to 5 years for your storage needs. The best course is to go with storage systems that can give you the most flexability and scalability. A storage system that can scale in all three directions, up with processing power, out with capacity, and externally through virtualization provides the best solution for the long term.
So if we continue the 3-dimentional cartesian model, we have to include the ROA-hungry user that wants to extend the useful life of older arrays. The 3rd dimension is virtualization, and assimilates external capacities to be shared by the pool. If the 2-dimensional model is hard to balance, considering another view will necessitate more planning AND new storage metrics.