How fast is FAST?
by Hu Yoshida on Dec 10, 2009
EMC announced FAST version 1 this week and one of the more insightful articles was by Beth Pariseau of SearchStorage.com.
I am fairly certain EMC briefed Beth on this announcement and that she had access to their references, so it’s safe to say her information is pretty accurate here are my thoughts on what I read in the PR and from Beth’s article:
Is EMC behind the competition? – Hitachi has had policy based file and LUN level tiering for some time. HDS announced a non disruptive tiering feature called “Dynamic Optimizer” over 10 years ago in the Freedom 7700 product, which later became Cruise Control, then Volume Migrator, then Tiered Storage Manager. EMC is still behind with the inability to extend FAST to external storage systems through storage virtualization and support FAST with thin provisioning. At this time it is not known how FAST v1 will play with other EMC features like TimeFinder, SRDF, etc. FAST v2 with sub LUN level tiering is still in the future and Compellent already offers that today. Whether they will be the first of the enterprise storage array vendors to provide this capability is still an open question until they deliver it. However, being first is not as important as meeting market requirements. An increasing market requirement is performance.
Performance impact – With the trend toward consolidated multi-core processors, stacking of operating systems with VMware, and increasing storage network bandwidth to 8Gbs FC and 10Gbs FCoE, the demand for storage system performance will increase dramatically. Migration of LUNs will impact performance through contention for storage array resources so customers must have visibility into application windows where data migrations can occur. Beth reports that previous customer experience with DMX Symm Optimizer found that the whole frame was locked during a migration. Remember VMax is still a static “Bin File” mapped cache and not a dynamic cache like the USP V. I believe they will need to lock some resources during the migration. Movement of data within a storage frame requires a lot of meta data management. They have to manage the meta data within their data cache over the same internal paths that are used for data access. The VMax and CX4 are single processor controllers that are loosely coupled. They are not multi-processors like the USP V and they don’t have a separate control memory with separate internal connections to the processors for efficient processing of meta data.
Complexity issues – Creation of 256 different tiers according to RAID level, drive speed or drive type, assignment of storage groups to tiering policy, percentage of a storage groups capacity that can reside on any tier, all implies that there is a lot to consider and a heavy amount of EMC services will probably be required to implement this. New utilities are required to help administrators discover available storage tiers and add policies for migrating data and a new “wizard” to asses existing storage efficiencies that can be gained with more tired storage. Sounds like a lot of new things to manage.
Customer fear of automation – Beth also reports that customers at EMC World were wary of automation, fearful that it may cause them to lose control of their data center. When errors occur in an automated process, the errors get propagated automatically. There must be some controls to avoid thrashing. A bigger question may be whether tiering is a task which should be done automatically at all, especially at the LUN level. Moving or migrating LUNs is a very heavy task and should be done sparingly. Hitachi offers tiering for copies of primary data that do not need to be assigned to the same tier of storage as the primary data. We can assign primary data to an initial tier of storage and if for some reason it needs to be moved, Tiered Storage Manager software can dynamically move it to another tier, but moving the volume up and down tiers at the slightest change in usage will have an impact on system performance and is not best practice.
No thin provisioning – this is lacking in FAST v1. The benefits of dynamic or thin provisioning far out weigh the benefits of LUN tiering and has no impact on performance in the case of Hitachi Dynamic Provisioning. HDP provides dynamic provisioning of LUNs, wide striping performance, thin provisioning economics, thin moves/copies for operational efficiencies, and zero page reclaim for immediate payback. HDP is easy to manage, has better economics, and better performance. Usually once you assign storage to an HDP pool, with automatic wide striping, there will be little need to migrate it for performance reasons, even with SATA disks.
FAST is only available on VMax and Clariion CX4 – why not DMX? You can do FAST within a VMax or within a CX4 but not between VMax and CX4. What about other storage assets? With USP V we can provide HDP and tiering to internal as well as externally attached storage from other vendors. You are not locked in to a single vendor as you are with EMC’s VMax or CX4. Our modular AMS2000 already has the capability to do HDP and tiering within a frame.
Celerra file level FAST – apparently there is a different FAST for NAS. Our HNAS already has the capability to do, policy based, file aware tiering across internal and external, heterogeneous storage, archiving to HCP, and content aware search with Hitachi Data Discovery Suite. Setting policies for file aware tiering is as easy as selecting a menu option.
FAST v2 – FAST v1 sounds like the Virtual LUN feature that was introduced in the Clariion FLARE 16 operating system some time ago. However, there are some new management tools. To add interest and encourage customers to buy VMax, EMC announced a roadmap for FAST v2 which includes sub LUN level migration, de-duplication, spin down, and the possibility of migration between disparate EMC arrays like VMax and CX. This will take a lot more CPU power and cache resources for meta data processing and will place an even heavier load on the two controller architectures of the VMax and Clariion. This will also require another learning curve for customers. By the time they figure out how to set tiers and policies for LUN level tiering, they will have to learn how to do sub LUN level tiering and policy management. Since the pools will be configured for LUN tiering, will the pools have to be reconfigured for sub LUN tiering and will that require BIN File changes on the VMax?
Agility – Implementing FAST on modular architectures like VMax and CX4 limits the ability to meet changing storage requirements. If you exceed the ports, cache, or storage capacity of a VMax engine or CX, you have to add another VMax Engine or CX frames, you cannot incrementally add ports, cache, or capacity as you can to a USP V. Without Dynamic Provisioning you can not automatically load balance by striping across the width of a storage pool and rebalance the pool if more capacity is added. You would have to take a hit on resources in order to move LUNs from one tier to LUNs on another tier. Without storage virtualization, you cannot extend this to your existing assets. You would need to rip and replace your assets with a VMax or CX4.
Summary – This announcement of FAST sounds like a re-announcement of Virtual LUNs, which has been available on the FLARE operating systems since release 16. There will be a lot of things to learn about performance and management and integration with other key features like thin provisioning and SRDF, before this can be cut into production. This is a catch up announcement and the roadmap for FAST v2 only raises more question about how FAST will perform at the sub LUN level. The impact of FAST on storage subsystem performance will be the key to its acceptance.
I do not have much information on FAST, or VMax for that matter, so much of my opinions are speculation. If any of you have experience with FAST or VMax and can share your experiences or opinions please add a comment to this post. Tells us what you like about it, what works, and what doesn’t.
Comments (6 )
[...] This post was mentioned on Twitter by Hitachi Data Systems, Avnet StoragePath™. Avnet StoragePath™ said: Hu Yoshida How fast is FAST?: EMC announced FAST version 1 this week and one of the more insightful articles was b… http://bit.ly/8iJl0n [...]
[...] HDS / Hu Yoshida – How fast ist FAST? [...]
This post includes numerous technical inaccuracies.
For example, both Symmetrix and CLARiiON are in fact multi-processor based, with each processor providing multiple cores. V-Max scales to multiple closely-coupled engines to form a larger cluster by adding processor, memory, ports and drives to a running system (non-disruptively).
The FAST management interface is inherently simple, scalable and intuitive, with wizards to help storage admins to jump-start the implementation and management.
The operation of FAST is highly automated, inherently managing performance impact of both analysis and relocation operations while adapting to changing workloads.
FAST (and VLUN) can relocate LUNs far, far faster than can a USP-V, and it can relocate LUNs without any impact to running applications or to dependent/active replication sessions (local or remote).
V-Max inherently allows for numerous config changes to occur concurrently, redically reducing the frequency and impact of the “lock-out” you referenced above.
And indeed, there will be a learning curve for FAST, just as there has been a learning curve that preceeded the adoption of SATA and Flash drives. And just as with SATA and Flash, EMC’s customers are able to begin that learning curve long before they could with almost everyone else’s storage products.
Should you desire to understand V-Max, FAST and the modern-day Symmetrix architecture, my prior offer to explain it to you remains open. Drop me an email and we can schedule something.
Barry A. Burke
Chief Strategy Officer
Symmetrix & Virtualization Product Group
Test post – will you accept comments from me?
Barry, Thanks for the invitation to educate me on your products. I accept your invitation and would like to schedule some time at the next SNW event to have that session.
I am under the impression that a VMax engine and a Clariion have dual, active/passive processors where each processor has its own front end ports, cache, and back end disk directors. What comes in and out the front port of one processor is written or read in that processor’s cache and passed through the backend RAID directors to the LUNs that are assigned to that processor (LUN Ownership). While they may have dual processors, only one processor does the work on an I/O. I am aware that you can loosely couple VMax engines together over a Rapid I/O switch, but I question whether you can write data that comes in from one engine’s front end ports to another engines cache, and then to a third engine’s raid director’s LUNs. Is that possible? Can you do I/O load balancing across multiple VMax engines? Can you create a virtual provision pool across multiple VMax engines?
You can with a USP V, or a DMX for that matter. These are tightly coupled multiprocessors with a shared global cache. You can come in through multiple port processors, access a common cache, and drive I/O through a different set of back end processors to the LUNs.
Your impression about the I/O flow in V-Max (and CLARiiON, actually) is incorrect. First off, all the processors in both are active/active, not active passive. With both, any I/O can be serviced by any port and destined to any disk.
V-Max scales out using the Virtual Matrix. V-Max engines are not “loosely” coupled with RapidIO – RapidIO plays the same inter-process(or) communications role as does the Direct Matrix in DMX. The one ASIC in the V-Max manages the interface to the RapidIO fabric, as well as providing numerous data manipulation services to the local nodes.
The simplest way to explain it, since you seem to have a better grasp on the DMX architecture is this: V-Max assigns specific tasks to each core in each director (there are 8 cores per node). These “tasks” are analguous to prior Symm FA, DA, RA (etc.) ports. Incoming I/O requests are received by an FA, and queued to the appropriate DA pair. The DA pair could be local (the two nodes in an Engine), or they could in fact be targeted to a pair of DA processes running within a different engine. In this case, the I/O is queued over the RapidIO fabric – the fabric itself carries multiple simultaneous communication sessions, moving commands and data at different priorities.
Net-net – very similar to DMX, just all the I/O processes run on cores instead of “slices”, and the Virtual Matrix replaces the Direct Matrix.
There’s much more to the Virtual Matrix than I can explain here, such as how all the memory that resides within each node (2 per engine) is utilized as a single, scalable globally addressable cache. But hopefully this helps you understand that V_Max is truly a scale-out architecture where any I/O can be serviced by any port on any engine, and any I/O can be destined for any disk behind any engine, and any block of data can be cached on any node/engine on the Virtual Matrix that the software desires.
Alas, I’ll not be going to SNW, but I’d be happy to chat via phone, or to have you out to our offices should your travels find you in the Boston MA area (you’ll probably want to wait until the spring, though – it’s already winter here).
I look forward to more discussions with you… you can email me anytime at work or using the link on my blog.