New Considerations for Tiered Storage
by Hu Yoshida on Jan 26, 2010
Tiered storage is one of those terms which people use freely and assume that everyone understands. The basic concept is that you can reduce the cost of storage by assigning your data to different cost tiers of storage depending on the requirements of the data. However, there are different technologies to address tiered storage which can make a great deal of difference in the value or benefits that can be derived. In fact some implementations of tiered storage may end up causing more complexity and cost. Here are a number of considerations which may be helpful.
Often I hear people talk about assigning data to tiers of storage based upon the “value” of the data, and they go through a very complicated study to classify the data by “value”. Some companies have spent several years on the classification of data and never finish. First, I would say that all data is valuable or you shouldn’t be keeping it. Secondly I would split out primary data from replicated data. Replicas are growing faster than primary data since we can not afford to stop applications today to do backups, development/test, data transformation, data mining, data distribution, and disaster recovery, etc. Rather than disrupt the application server to make these copies, it is simpler to have the storage systems make those copies, especially if we want a consistent copy across a group of related volumes. These copies do not have to be on the same tier of storage as the primary data. With storage virtualization you can snap them off to lower cost tiers of storage in the same storage system or to lower cost, externally attached, storage systems. There are also technologies to reduce the time and capacity required for making copies such as copy on write and Dynamic (thin) Provisioning.
Another way to classify a tier is by performance. This makes sense if there is a significant difference in price/performance between the storage tiers. Today we offer 200GB flash drives, 600 GB SAS disks, and 2 TB SATA in our modular AMS 2000 product which can move and copy data between internal tiers of storage without disruption to the application. As you can imagine, the differences in performance and cost per GB between these different types of media can be very significant. There are performance differences in rotation speed and RAID mapping which may make a difference for some types of workloads that are assigned to static tiers, but these differences may not justify the work to dynamically move data up and down tiers of storage on a frequent basis. Today, movement of data between tiers of storage is done by volumes or files, and moving large volumes and files is a very heavy workload that you might not want to do on a frequent basis. You can start by allocating a volume to a mid tier of storage initially and if it turns out to need higher performance you can promote it to a higher performance tier with storage virtualization. Storage virtualization provides forgiveness if you happen to make a bad choice with your initial allocation.
With the USP and USP V dynamic movement of data across tiers can be automated through policies in a Tiered Storage Manager that are triggered by time or events that are generated by a Tuning Manager. In the USP and USP V, we can also shred the old volume after migration to ensure privacy. In the USP V, performance for a given tier of storage may be increased several times over through wide striping a volume across a large number of spindles. Wide striping is a feature of Dynamic (thin) Provisioning which also shortens the time to move thin volumes versus normal fat, over allocated, volumes.
Data centers that implement disaster recovery, classify applications on the basis of RPO/RTO and assign critical application data to storage systems which have the capability to do distance replication for business continuity. Typically if an application must recover in hours, it uses enterprise storage to do synchronous and/or asynchronous replication. Enterprise storage used to consist of one tier of very expensive storage. However, today, with Hitachi Data System’s storage virtualization, any internal or external tier of storage can be replicated for business continuity through the replication services of the USP or USP V. Here again we can use Dynamic (thin) Provisioning to reduce the time and capacity needed for replication.
I have heard tiered storage referred to as HSM or Hierarchical Storage Management. HSM has a specific meaning for mainframes. In mainframes there used to be one tier of expensive storage, so to save storage capacity mainframes would do compression of the data and move the compressed data to a migration level 2 pool of storage. When they needed to access the data they would have to decompress it back to migration level 1 before they could use it. With the USP and USP V we can use virtualization to move mainframe volumes to low cost SATA disks tiers, either internal or external to the USPs. When we want to access the volumes we can access the volumes directly from the SATA tier without the need to move the data for decompression.
With the price and performance differences that we see today between flash drives, FC/SAS disks, and large capacity SATA disks the benefits of tiered storage have become very compelling. Storage Virtualization makes it easy to copy, move, and replicate data between internal and external tiers of storage without disruption to the applications. Additional services like copy on write, and Dynamic (thin) Provisioning can decrease the workload needed to do the tiering.
Comments (19 )
Happy New Year, Hu!
Just curious – you specifically state the drives supported by the AMS, but make no mention of which drives and capacities are supported on the USP-V.
Symmetrix V-Max supports 200GB and 400GB SSDs, 300GB, 450GB and 600GB 10/15K rpm FC drives, and both 1TB and 2TB 7200rpm SATA drives (CLARiiON supports those plus some lower-speed SATA drives as well).
(Just for fun, with 2TB drives, a single V-Max scales to 3.6+ PB usable – what’s the largest usable INTERNAL capacity on a single USP-V these days?)
I wish Barry would have something more constructive to say, when it comes to storage tiering or virtualization, than currently supported drive sizes inside a box.
I hope he would spend his time a bit better, so possibly V-Max could then do migration of thin luns to different tiers inside V-Max. Now it can’t. I think this is a bit more important than drive sizes… This is from the customer perspective…
soikki i agree! plus its not like the actual marketing size of a drive is what we actually get as the Customers. Why not put what the actually usable size of the drives are if you are going to play that game?
HNY too Hu.
All this talk of what is essentially “online” data migration is interesting, but with SVC we’ve been doing this since 2003, and USP and USPV for a similar amount of time. EMC have made ***A BIG SONG AND DANCE*** about something IBM and HDS have been able to do for years. That just shows that EMC spend more on marketing than we do… Energetic Marketing Corp…
The ability to move data without disruption was interesting several years ago, and now it is a necessity.
That said, we all only support the movement of an entire volume now, and we have tools and management software that can help to automate this…. however the devil is in the detail as always.
With EMC and FAST, you can do volume migration, but only with EMC kit.
With HDS, you can do volume migration, between HDS and a few other vendors kit, but you need to buy the big monolithic thing to do this…
With IBM, you can do volume migration between over 200 different storage controllers, from the small vendors right up to the big 3, BUT you don’t need to spend 100′s of 1000′s of $ to start. With SVC, your virtualization, tiering and storage purchase decisions are divorced from each other.
All that said, the REAL interesting and soon to be must have migration and tiering solution is going to be the sub-volume optimization schemes.
EMC have said FAST v2 will do this (again between EMC kit) and IBM have said our own smart tiering solution will be available later this year, (again between ANYONES kit)
What about HDS ?
OK, so now we have 3 simple questions:
1) What drive sizes and types does the USP-V support internally?
2) When and where is the USP-V self-aware AUTOMATION part of Storage Tiering (as opposed to professional-services-developed scripted full LUN relocations with TSM)?
3) When will the USP-V support sub-LUN automated tiering (and will it be limited to the GINORMOUS allocation unit of 42MB that is the foundation of Dynamic Provisioning)?
Inquiring minds want to know…all this talk about tiering, when does the Hitach USP-V version get AUTOMATED???
The Storage Anarchist has focussed on drives which every vendor has or will have in the near future. We all buy them from the same media vendors like Hitachi Global Storage Technology, Seagate, and STEC.
As the customers, Soikki and Steven point out, the value is in the non disruptive tiering, and as Barry Whyte points out IBM and HDS have been tiering across a number of different storage vendors for many years now. Our modular AMS product also can do tiering within the frame. EMC has had to create a new product, VMax, to be able to provide a feature which many other vendors already provide.
As far as whether the IBM or HDS solution costs more or is more efficient, the “devil will be in the details” as Barry says. I will respond to Barry in a subsequent post.
I will not talk about unannounced products or capabilities in this blog. But as far as “Smart Tiering” goes, our Tiered Storage Manager can non disruptively move thin provisioned volumes across internal or virtualized external storage pools, based on policies that are triggered by time or events that may come from a Tuning Manager. I would say that is already pretty “smart”.
Let’s discuss features from customer perspective?
I would still like to point out, that v-max still isn’t capable of doing storage tiering inside even itself. You can only do lun migrations for “fat” lun’s, not thin luns. Lun migrations really are essential.
SA-Barry, when will migration of thin luns inside the v-max be available? Before that you cannot say that v-max tiering is usable. I thought it would be available now, but apparently it isn’t.
“all this talk about tiering, when does the v-max get one?”
Also v-max is still dependent on meta luns. Biggest lun you can make is around 250 GB. Anything bigger, you have to create meta luns (equals HDS LUSE). This is very restrictive and requires much more “clicks” on the gui than the competition. SA-Barry, when do we get rid of this restriction? Storage provisioning is much about adding capacity for existing SAN-connected hosts. Having to use meta-structure instead of just privisioning ie. 500 GB lun (or extending one) is much more complicated.
A humble customer request: Please do concentrate more on getting your own product as good as possible, instead of barking at the others. On blogs, make constructive discussion, not war…
Something is better on the EMC-box, something on the HDS-box. And this is the way it’s probably going to be…
And… sorry for a second header…
A question, or challenge for each and everyone… Can you imagine that the following could be done with some storage or virtualization solution:
Automated storage tiering on sub-lun level, so that I have X amount of flash, Y amount of FC or SAS, Z amount of SATA capacity. I have hundreds of servers (a few hundreds of TB’s) per array, and the performance criteria for the servers is automatically handled by the storage array or virtualization device. So that the important stuff automatically goes to flash, and unimportant goest to SATA. This selection would not be affected by ie. backups.
I would charge every server the same “tier price”, as otherwise it would be a nightmare to do chargeback (say what you say…). On some highly performing servers I would maybe lose some money as they use more of the expensive stuff, but maybe on majority of the capacity I would get profit as probably most of the stuff would be on SATA.
I have noticed that quite often the tendency is to buy higher tier than required for the hosts.
I would not know about the hosts or their data or applications as those do not belong to my company (service provider).
SA-Barry: I know that v-max is on this road, but what do you think, is the above too high a hope?
All: what do you think, is this a bit too wild idea?
I don’t know, maybe it’s just me, but you seem to be avoiding very simple questions. Your hand-waving that we all support the same drives is misdirection, hiding the fact that the USP-V doesn’t yet support the 400GB STEC SSD, although EMC has been supporting them since before Hitachi shipped it’s first 73GB SSD (remember, Hu, that was back when you were telling beat writers that nobody needed SSDs). And Hitachi has been as slow on SATA drives, so I’m guessing you don’t support the 2TB SATA yet either (indeed, Hu, you must also admit that there was a time you were telling people that SATA had no place inside of an Enterprise storage array, either).
The slowness of Hitachi’s uptake of new drive capacities (and technologies – it took over a year for the USP-V to support the STEC SSDs), I can only conclude that Hitachi has not embraced the idea of multi-tiered storage arrays, and thus offer customers a much more complicated and costly solution than the tightly-integrated multi-tiering that EMC has been the market leader of for the past 4 years.
And that you continue to claim TSM is synonymous with automated tiering is sad. The truth remains, TSM does nothing more than does the “Virtual LUN” feature on CLARiiON, DMX and V-Max – non-disruptive MANUAL (human initiated) relocation of a LUN to a different drive type.
With TSM, there is no automated decision-making without a costly Hitachi/HDS services engagement, and even then the focus is primarily to relocate hot spots to less heavily used devices rather than to relocate workloads to Flash or SATA – essentially you have a services-led alternative to Symmetrix Optmizer, but nothing even barely approaching the automated intelligence of FAST (v2 OR v2).
But all I originally asked is “what drives are you supporting these days on USP-V?” Clearly, the answer must be uncomfortable for some reason, or you’d just answer the question.
And judging by all the bluster and misdirection since, the embarrassment must be embarrasingly HUGE!
Storage anarchist: Please do answer the EMC directed questions by the other commenters. Who gives a flying rat about drive capacities? Unless your array is near the physical limit, the customer (like me) couldn’t care less. Automation, ease of use, and high ROI are FAAARRR more important than if my array has 1TB SATA drives or 2TB drives.
Soikki (& Robert)-
I cannot respond to many of your specific questions, as (like Hu) I cannot discuss specifics that are not yet announced. Please contact your account team to arrange an appropriate NDA session to discuss futures, if you’d like.
As to what I think about your suggestions around sub-lun level teiring and charge-back…in my opinion, that is the objective of automated tiering. In fact, I’d take it a small step further – the storage administrator wants the ability to create different “effective” tiers by using different blends of device type (SSD, 15K, SATA, etc.) and perhaps even different spindle counts, cache allocations, relative priority, and the like. Collectively, these could be used to create “Platinum”, “Gold”, “Silver” and even “Bronze” offerings, each priced in accordance with the allowed allocation of resources.
Not that all customers might want to do it this way, but the solution should support both the “one size fits all” approach you describe, and the more granular “pick the size you can afford” approach I’ve outlined.
In my opinion, that is.
I’m still left wondering why nobody will answer my simple question about which drive types and sizes are supported on the USP-V, though. Why so easy to spell out which drives are supported on the AMS (as Hu did in his original post), but not to do the same for the USP-V?
Surely the list of currently supported drives is not NDA-protected information?
Barry, Thanks for boosting my traffic. You can look it up here.
I see the link, containing a list of capacities and general drive categories….but the storage anarchist is asking what manufacturers you support. Are you supporting STEC? Seagate? HDS only?
With all of this finger pointing among all of you storage bloggers, one has to wonder: When is one of these storage vendors going to buy out STEC and really turn the screws on all of the rest?
Jeff, currently we support STEC flash drives just like EMC does. All the major storage systems vendors buy Flash and Hard disk drives from the same few media vendors. There isn’t much differentiation there. Whether one vendor qualifies some media a month or two before another doesn’t really matter since we will all end up with the same media sooner or later. The difference comes in the architecture and functions that surround it. Expensive media like Flash drives should be done in a system that can support dynamic tiering and dynamic provisioning. Yesterday Reuters reported that the excess inventory at EMC will impact thier Q1 forecast, so it seems that EMC is not selling as much as they had expected. http://www.reuters.com/article/idCNSGE61M0JP20100223?rpc=44
[...] his blog post, New Considerations for Tiered Storage, Hu examines reduction of [...]
EMC says in the market that the best archtecture of Tiered Storage is “in a box”.
HITACHI says that TIERED STORAGE can be with tier 2 or 3 “virtualized”, midrange behind enterprise storage.
What is your opinion about that? What are the advantages of each one?
Paula, please see my new blog post on your question. . .
[...] Paula Sequeira, posted This comment on my post on New Considerations for Tiered Storage. [...]
[...] 1. New Considerations for Tiered Storage [...]