United States
Site Map Contacts Hitachi Global Community
Hu's Blog - Data Storage and Virtualization Thought Leader Hitachi - Inspire the Next

Hu Yoshida's Blog - Vice President | Chief Technology Officer

Home > Corporate > HDS Blogs > HDS Bloggers > Hu's Blog
Products, Solutions and more

Hu's Blog

What is the difference between Internal and external tiers of storage

by Hu Yoshida on Mar 10, 2010

Paula Sequeira, posted This comment on my post on New Considerations for Tiered Storage.

“EMC says in the market that the best architecture of Tiered Storage is “in a box”.
HITACHI says that TIERED STORAGE can be with tier 2 or 3 “virtualized”, midrange behind enterprise storage.
What is your opinion about that? What are the advantages of each one?”

Since this is asked a lot I decided to answer this in a new post.

EMC has no other choice than tiered Storage in a box since they do not support “virtualized”   or externally attached storage on the DMX or VMax.  That locks you into their more expensive DMX or VMax storage and prevents you from using their more economical Clariion for tier 2 or tier 3 storage requirements.  While the DMX can copy volumes to internal tiers of storage it requires host based software or external appliances to move volumes between tiers of storage. The VMax only recently provided the ability to move volumes between internal tiers of storage.

The USP and USP V can support internal or external tiers of storage for moves and copies.  With virtualization or external attach, we can support most vendor storage systems as long as they support standard FC ports. You don’t have to buy higher capacity or lower cost disks to install as an internal tier if you already have existing legacy storage that can be used for that purpose through virtualization.  External tiers of storage do not require a vendor lock in as with internal tiers of storage.

Externally attached tier 2 or3 storage has other advantages over internal tier 2 or 3 storage since it does not take up expensive tier 1 real estate, tier 1 power and cooling,  and valuable tier 1 resources like cache connections, RAID controllers, and back end disk connections.   Data for external storage is passed through to the external storage where the back end processing for RAID and read/write to the disk is offloaded from the Tier 1 virtualization controller.

External tiers of storage allow unbundling for more flexible configurations and longer life cycles.  If you have to end of life the DMX or VMax, you have to end of life all internal tiers of storage.  With the USP and USP V, you can end of life the internal and external tiers of storage  independently. External Tier 2 or tier 3 storage may not need a maintenance contract after the warranty expires. It may be good enough to maintain this type of storage on a time and materials basis and extend the life of the storage asset beyond the normal 3 or 5 years, especially for 1 or 2 TB SATA disks that are infrequently used.

The advantages of external tiers of storage over internal tiers of storage is flexibility, no vendor lockin  and lower cost

Related Posts Plugin for WordPress, Blogger...

Comments (14 )

the storage anarchist on 11 Mar 2010 at 8:26 pm

Actually, Hu, (or whoever it is that is writing your content) Symmetrix can readily push and pull data to/from different tiers of external storage without any “host based software or external appliances.”

the storage anarchist on 12 Mar 2010 at 4:46 am

(No offense intended, by the way – it’s just that your blog so frequently misrepresents the truth I guess I assume it’s not your fault).

Vinod Subramaniam on 13 Mar 2010 at 10:54 am

The biggest reason for Tiering Storage is the economics involved.

There are three ways to classify Tiers

A. Based on Performance Requirements
B. Based on Availability Requirements
C. Based on Access Patterns ( Archival )

Internal Tiering only satisfies A above. Populating a Storage Device with different classes of drives does not help a customer take advantage of the fact that all applications are not the same in terms of availability. If 40% of a customer’s data does not require 99.999% availability then it makes economic sense to dump that data on a modular array. Archiving less frequently used data within a Storage Device also does not let a customer take advantage of cheaper archival media.

I think a better argument would be

Which model lends itself to more efficient tiering in terms of economics :- appliance based virtualization or array controller based virtualization ?

– Vinod

Hu Yoshida on 13 Mar 2010 at 5:54 pm

Hello Barry thanks for being such a fan of my blog. This comment sets a record for you in terms of brevity.

There seems to be two Symmetrix now, DMX and VMax. I was referring to DMX with its static point to point Direct Matrix. Does your comment apply to DMX? Since you use the term push and pull, are you referring to Open Replicator? can you use SATA tiers for Mainframes? Once you move the data can you access it from the new tier without disruption to the application? Is this a combination of Power Path Migration Enabler, TimeFinder/Clone, Host Copy, Invista, and or Open Replicators? Can you tell us how you do dynamic tiering in a DMX?

I believe you can do this is in VMax since you can move data across the RapidIO Switch between what are essentially, loosely coupled Clariion type nodes.

I would like to hear more and if I am wrong I will help set the record straight.

the storage anarchist on 15 Mar 2010 at 4:47 am

Hu, you really should take the time to learn how Symmetrix works – from someone who really understands it (and not whomever you’ve been listening to lately). Your questions indicate a deep lack of comprehension about both the hardware architecture and the software services supported by Symmetrix. Somebody, it seems, is feeding you bad information.

As to your questions:

Yes, DMX can push and pull data, just like V-Max. It has nothing to do with Direct vs. Virtual Matrix either – the same software provides the functionality on both platforms.

Yes, Open Replicator provides array-based push and pull on both DMX and V-Max. Open Replicator/Live Migration (which is free to all Symmetrix customers) is frequently used to relocate data off of old storage platforms, both EMC (Symm & CLARiiON), as well as a plethora of 3rd party arrays. Its operation is nearly identical to the way UVM+TSM are used with USP-V to pull data off of old storage (e.g. for tech refresh). As with UVM+TSM, insertion without the use of PowerPath Migration Enabler is a disruptive task, but the outages are typically seconds or minutes. And both DMX and V-Max can move the data 2-6x faster than a USP-V can, greatly shortening the transition time for a tech refresh/replacement.

Yes, SATA (and EFD) tiers are supported for both mainframe and open systems, on both V-Max and DMX4 (DMX3 did not support EFD or SATA, but does offer lower-cost fibre channel drives as a tier)

Yes, the combinations you list work on both DMX and V-Max.

While V-Max uses hardware very similar to that used by CLARiiON, the software platform is 100% Symmetrix Enginuity. In fact, there is NO CLARiiON software in a V-Max, and every single line of code in V-Max’ Enginuity evolved from the same source code tree that supports DMX.

As I have tried to help you understand before – Symmetrix is fundamentially a multitude of individual processing nodes sharing a large global memory. Whether these nodes are formed on Power4-based “slices” running front-, back- and remote replication-services and interconnected to memory with a Direct Matrix (and in DMX), or they are Intel Xeon-based nodes running the I/O services on multiple cores and interconnected to memory via the RapidIO fabric as in V-Max, they are still very much Symmetrix. The code differences required to support the different interconnects are extremely minor, and though the code endian-ness had to be modified for V-Max, the fact you can use SRDF between V-Max and DMX demonstrates that the transformation was completed without any adverse effects to interoperability.

Vinod Subramaniam on 15 Mar 2010 at 2:52 pm

Barry

As a customer am very curious about this comment :-

I’m trying to understand this better. Please correct the assumptions if they are wrong.

“Its operation is nearly identical to the way UVM+TSM are used with USP-V to pull data off of old storage (e.g. for tech refresh)”

Let me try to understand this clearly.

Lets look at the the way UVM + TSM works.

The basic sauce in any virtualization solution is “resource sharing through address remapping”.

In the world of storage there are many levels of virtualization.

1. A single HDD is addressed through a CHS scheme. (Cylinder, Head, Sector) scheme.
On a open systems OS the file system or volume manager layer uses a linear LBA scheme
which is remapped by the driver to the CHS address.

2. In the case of eight HDD’s configured as 7D+1P by a RAID controller, the RAID controller presents
a virtual CHS scheme to the OS. The driver on the OS writes to the virtual CHS scheme and the RAID
controller translates this to multiple physical CHS addresses.

3. Now consider a 7D+1P RAID group on a AMS2500. Assume that the entire RAID group is one LUN.
Now when this LUN is virtualized behind a USPV and presented to a host, the host OS writes to a virtual
CHS address, the USPV realizes that this is a external RAID group and translates this CHS address to
a CHS address on the AMS2500.

4. Call the OS CHS address in 3 above as CHS_X, the USPV CHS address as CHS_Y and the AMS CHS
address as CHS_Z.

5. Now consider this scenario :-

VMWARE Server A is currently running at 30% utilized overall and has storage mapped from the AMS which
is virtualized behind a USPV. Server A currently has 6 VMs that are lightly loaded. So I decide to stack 12 more
VMs on this server. Over time the response times on the LUNs grows to 14ms from 8ms owing to growing IO
workload.

I can with UVM + TSM do the below without any intervention from either the server admins or application admins.

Migrate the data off the AMS2500 to the USPV internal disk with zero outage.
We have seen transfer rates using TSM between 500GB/hr to 1TB/hr.

What happened behind the scenes is the below :-

The existing virtualization scheme was CHS_X –> CHS_Y –> CHS_Z.

The modified virtualization scheme is CHS_X –> CHS_T

Where CHS_X = CHS Address of LUN seen by OS
CHS_Y = CHS Address of LDEV on External RAID group seen by USPV
CHS_Z = CHS Address of LDEV on AMS2500 RAID group
CHS_T = CHS Address of LDEV on USPV Internal RAID group

Now let me explain the way I understand Open Replicator / Live Migrator will work for the same VMWare scenario above if the VMWare Server A was originally on a Clariion and had to be moved to the Symmetrix.

1. Carve out LUNs on the Symmetrix which are target LUNs for VMWare
2. Map the LUNs on the Symmetrix to the VMWare Server A
3. Use Open Replicator to copy data from the Clariion to the Symmetrix
4. Use Powerpath Migration Enabler to Cut Over from the Clariion to the Symmetrix.

Note that this procedure requires more work from both Server and Storage Admins as compared to TSM + UVM.
Also can you specify what are the transfer rates with Open Replicator. You mentioned that is 2x to 6x times TSM + UVM. So are you transferring data at the rate of 2TB to 6TB /hr ??

the storage anarchist on 23 Mar 2010 at 5:58 am

My point was specific to the task of a tech refresh migration (although the mechanisms are frequently used for other purposes).

In the case of a tech refresh, where the objective is specifically to take ALL of the data off of an existing array (say, at the end of its lease) and move it to another array, the steps for Symmetrix and USP-V are the same:

1. Create new target LUNs of equal or larger size on the new (Symm/USP-V)
2. Connect new (Symm/USP-V) to the existing array via FC, zone the existing LUNs to the new (Symm/USP-V)
3. Stop the application hosts running against the old storage
4. Map/Mask the new LUNs on the (Symm/USP-V) to the hosts
5. Start the new array’s migration process to copy the data into the (Symm/USP-V)
6. Restart the hosts, now pointing to the (not-yet-fully-copied) new LUNs on the (Symm/USP-V)
7. Continue on as normal while the new (Symm/USP-V) moves the data transparently in the background – grabbing data that is not yet copied from the old array on-demand, and mirroring writes to both old and new array in case Something Bad Happens
8. When the copies are completed, disconnect and decommission the old array.

As for your VMotion case, it if frequently far more practical and less complex to merely use vMotion to relocate an ESX server farm from one array to another.

As to copy rates for Open replicator, indeed 2-6TB an hour is attainable with sufficient spindles in source and target. More importantly, V-Max provides much higher copy rates than USP-V with far less impact on other applications utilizing the array.

3.

nikkel&dime on 31 Mar 2010 at 11:51 am

“My point was specific to the task of a tech refresh migration”

Above itself proves how “Energetic Marketing Corporation” works.

We are here talking about tiered storage data movement strategy in general, the one time “tech refresh migration”
is very less frequently needed compare to the data movement based on preset performance/access pattern/cost/
policy we are discussing. In regards to the v-motion comment, what about other apps that rely on storage vendor to
do the job(not the one time migration, but the policy based ones)? It seems that you are tailoring the facts to suit your EMC (see above) propaganda.

alex Lopez on 01 Apr 2010 at 5:50 pm

1 – Total cost of Ownership
2 – Documentation

I’m a huge fun or your blogs, but Storage Tiering is not Cheap ! since you will pay $$$ to attach the storage array
to the USP-V in our case, Such as license, cache, ans ports.
We invested over $200K to attach a AMS2500 to the USP-V, and that does not include PS(professional services)
We are now, with a connected AMS2500. Now we have to do the job ourserlves. But HDS documentation is scarce and
and ca’t get enough resources from HDS, to help us.
In the end, if you try to virtualize, tiering etc., make sure that you do your home work , in every aspect of your project
not only, in the technology piece.

Alex

Hubert Yoshida on 06 Apr 2010 at 9:41 am

Alex thanks for reading my blogs. As you mentioned the cost of tiering will include many factors. Once external storage is attached, additional functions like replication for business continuance is often added at additional costs. However, the real measure is the return that you get from that investment. Computer Economics published a report on “Storage Virtualization Adoption Trend and Economic Experience” in November of 2009 which showed that 78% of Storage virtualization adopters achieved a positive rate of return in two years while 17% were break even. Only 5% had a negative experience. This report can be obtained at :
http://www.computereconomics.com/custom.cfm?name=postPaymentGateway.cfm&id=1509&CFID=6706177&CFTOKEN=15320529

I will be reaching out to you to evaluate your experience and see how we can improve our documentation and support.

Thanks for your comment.

[...] Yoshida seemed to hit a nerve a few weeks ago with some blog comments around tiering, the economics and practicality of in-box tiering (aka intermix) and the out-of-box [...]

Gunpai Chiramanaphan on 23 Nov 2010 at 12:21 pm

Any comparisons between EMC DMX-3 and AMS2500 when suing with IBM AIX and SAP environment ? The IOPS of AMS2500 normally is higher or lower than EMC DMX-3 ?

Hu Yoshida on 23 Nov 2010 at 3:32 pm

Any comparison of AMS 2000 and DMX-3 would be an apples to oranges comparison. The AMS 2000 is a dual controller modular array while the DMX-3 is a multi-controller, global cache, enterprise array. A multi-controller array with large global cache can support multiple parallel I/O streams across multiple controllers, while a dual controller with separate controller caches is designed for fast performance through one controller or the other. So if you need to support multiple workloads, the DMX-3 would have higher IOPs than an AMS 2000, but if you are running one or two workloads, the AMS 2000 will have higher IOPs since it has less overhead than a multi-processor, global cache system. In cases where the I/O is very random and the large cache has little benefit, the performance of the AMS 2000 could be competitive with its faster, point to point, Serial Attached SCSI back end.

If multi-stream IOPs is not the major requirement, the AMS 2000 has many other advantages since it is a current generation modular array. It has many new features like load balancing across the controllers, Dynamic Provisioning (thin provisioning and wide striping), dynamic tiering, point to point Serial Attached SCSI drives, the latest SSD, SAS, and SATA disks, 8 Gb/s FC front end connections, and much lower pricing than a DMX-3 which is an N-2 generation system.

dududukkkkkkk on 28 Jun 2012 at 3:20 am

HDS Blogs: What is the difference between Internal and external tiers of storage – Hu' Blog Very nice post. I just stumbled upon your weblog and wished to say that I’ve truly enjoyed surfing around your blog posts. In any case I’ll be subscribing to your rss feed and I hope you write again soon!

Hu Yoshida - Storage Virtualization Thought LeaderMust-read IT Blog

Hu Yoshida
Vice President and Chief Technology Officer

Connect with Us

     

Switch to our mobile site