Monolithic versus modular storage is not an either/or question
by Hu Yoshida on Aug 3, 2010
Those of you who subscribe to Gartner reports may have seen their recent report:
“Choosing Between Monolithic Versus Modular Storage: Robustness, Scalability and Price Are the Tiebreakers”
While I agree with some of their definitions of monolithic and modular storage, it is no longer a question of one versus the other. With the Hitachi USP V/VM we combine the best of both worlds, by providing a “monolithic” or enterprise tier 1 front-end with lower cost modular back-end storage.
I agree with their description of monolithic storage as having many controllers that share direct access to a large, high performance, global cache, supporting a large number of host connections, including mainframes, and providing redundancy to ensure high availability and reliability.
I also agree with their definition of modular storage, which contains two variants, a dual controller architecture with separate cache memory and a scale out architecture that can have many nodes with separate caches in each node. I also agree that modular storage is easier to expand capacity by adding modules of storage trays, and that their acquisition costs are lower due to their simpler design (no global cache).
The differences between monolithic storage and modular storage
The key difference between monolithic storage and modular storage is the cache architecture. A dynamic global cache enables the tight coupling or pooling of all the storage resources in a monolithic storage system. As we add incremental resources like front-end port processors, cache modules, back-end array processors, disk modules, and program products like Hitachi Universal Replication, they are tightly coupled through the global cache so that they create a common pool of storage resources, which can be dynamically configured to scale up or to scale out to meet different host server requirements. Separate caches, in controllers or in nodes, create silos of storage resources. Host server volumes can only access the storage resources that are in the controller or node that it is attached to. The host server may access another volume in another controller or node, but it cannot have one volume extend across multiple controllers or nodes. Since this is not a common pool of storage resources, this leads to fragmentation and under-utilization of resources within the controllers or nodes. One node may be running at 90% utilization while other nodes are idling at 10% or 20%.
While most analysts like Gartner will acknowledge that dual controller systems, with limited amounts of cache and compute capacity, cannot match monolithic systems in performance and throughput. They make the assumption that “multinode scale-out architectures hold the promise of helping modular systems to asymptotically approach monolithic storage system levels of throughput.”
I disagree since the throughput and performance that you get from multimode scale-out architecture is limited to a distribution of the workload across multiple nodes. Unless the distribution is perfectly balanced across the nodes, you have the fragmentation that I mentioned earlier. Even if the cumulative total of cache and compute capacity is the same as what is in a monolithic storage system, it is not tightly coupled into a common pool of resources, and cannot match their performance and throughput .
The Hitachi AMS 2000 family of modular storage is a dual controller storage system with separate caches. However, there is additional intelligence in the architecture that enables load balancing of LUN ownership between the two controller caches to ensure that one controller is not overworked while the other controller is idling. There are some single thread workloads where modular storage can outperform monolithic storage, but in multithreaded workloads the monolithic storage will have higher performance and throughput due to its larger cache, multiple compute processors, and load balancing across storage port processors.
So while there are important differences between monolithic and modular storage, the best way to use them is to use them in a tiered configuration. Since 60% to 80% of storage does not need tier 1 performance, it does not need to be on tier 1 storage. However, all your storage needs tier 1 protection and availability. You can achieve that by virtualizing modular storage as tier 2 or 3 storage behind a tier 1 monolithic storage front-end. The modular dual controller or multi-node scale out storage systems now sit behind a global cache, and become part of a pool of common resources that can be dynamically allocated based on business requirements. The advantages of modular storage around cost and ease of expansion are coupled with the advantages of monolithic (enterprise) tier 1 functionality and performance, with common management, protection, and search.
USP V/VM: the best of both worlds
One of the disadvantages cited for monolithic storage is the higher cost. That is only true in smaller configurations if all the storage capacity resides in the monolithic system. If most or even all of the storage capacity resides on external modular storage that is virtualized behind a USP V/VM, the cost of the combination will be even lower since all the storage is now efficiently managed as a common pool of storage resources, saving operational as well as capital costs. Since the USP V/VM does Dynamic Provisioning, it can save time and the costs for provisioning external modular storage, thin provision and reclaim unused capacity, and wide stripe the modular storage for higher performance. The data mobility provided by the USP V/VM will increase availability by non-disruptively moving the data off of the modular storage during scheduled down times or for technology refresh migrations, further reducing operational costs over stand alone modular storage.
Host servers are going through a massive consolidation with the availability of multi-core processors and virtual server platforms like VMware and Hyper- V. These virtual server platforms are driving 10 to 20 times the I/O workload of non- virtual servers, and virtual server cluster are driving as much as 100 times this load through one file system. This type of workload requires a monolithic storage system that can scale up through a tightly coupled, global cache on the front end while the majority of the storage capacity resides on lower cost modular storage that is virtualized behind it.
So I do agree with Gartner for the most part on the differences between monolithic and modular storage, but I do not think it has to be an either/or decision as to which storage you chose. I believe the best choice is a combination of modular storage that is virtualized behind monolithic storage as we do with the USP V/VM. This way you can have the best of modular storage combined with the best of monolithic storage, at the lowest total cost.
Where do you fall on this issue?
Comments (6 )
[...] their knowledge, which is the same as building trust. Don’t be skimpy. Look at the length of this excellent blog post by Hu Yoshida at [...]
Here are my thoughts at the risk of sounding like a megalomaniac.
Gartner is asking the wrong question or at least barking up the wrong tree which is why there is no right answer to this question. There are two major focus areas for optimization in a Storage Array. One is the capacity itself and the field of capacity management and estimation does a good job of optimizing this area. The other focus area is the infrastructure surrounding the capacity that consists of front end cpus, back end cpus and cache. This area has very little focus and is mostly estimated using vendor supplied thumb rules that are heavily outdated. The old 80-20 rule applies here. 80% of the storage arrays run at 20% average cpu utilization particularly the back end cpus.
So the way to go for a customer is to ask the question : Is the workload IO hungry or Capacity hungry ?
Once there is a clear answer to that question what is needed from Storage Vendors whether modular, enterprise or grid is a way of partitioning the array and assigning workloads to partitions that have differing IOPS capability.
Vendors do offer a capability for partitioning e.g the USPV. But there are limitations. The backend on the USPV is shared and cannot be partitioned. All cache algorithms including destaging algorithms are global.
The customer should be given the capability of partitioning a array and in addition to that carving out virtual CPU’s or vCPU’s that he can assign to partitions. For e.g a 20TB SATA disk partition that is used to hold videos probably needs only 4 front end vCPU’s and maybe 2 backend vCPU’s. On the other extreme a partition running SAP BW would need maybe 16 front end vCPU’s and 32 back end vCPU’s. Also workloads shift with time and also are subject to seasonal highs and lows based on business cycles. Customers should have the capability to steal vCPU’s from one partition and assign it other partitions.
So back to the 80/20 rule again and the scale out and scale up argument. Vendors need to add two dimensions to scaling – Capacity and CPU. If you scaling simply because the array is maxed out on number of drives maybe you need to implement some form of Storage Virtualization and Tiering particularly controller based virtualization. If you are scaling because the CPU’s are maxed out then you need to scale up.
80% of the customers scale / replace arrays since they are maxed out on capacity
20% of the customer scale / replace arrays since they are maxed out on CPU’s.
I think I agree with most of your comments and there’s no denying I’ve always been a fan of Hitachi’s UVM product. However I do find that the technology benefits of multi-tiering are countered by the different approach to management that’s required. For instance, I believe many people still think UVM virtualisation is a 1:1 LUN pass-through technology – that the 10GB LUN on the modular storage has to be presented as a 10GB LUN to the USP V host. Of course that isn’t true; as you know, a modular array can be configured with large LUNs that are then carved up by the USP V as if they were RAID groups.
If perception about UVM is less than optimal then understanding the best approach to its implementation is probably also true; should the customer use stripe across the modular devices? Should external LUNs be presented from many storage ports? How do I create a layout I can expand in the future?
Ultimately of course, I believe the benefits of UVM outweigh the additional management overhead for most customers. But it’s not a simple technical discussion.
Chris, you are right nothing is ever as simple as it seems.
[...] sometimes called Enterprise storage array. Hu Yoshida discusses the subject on one of his recent blog posts. Looking at the wide range of storage devices, I’ve categorised arrays into the following [...]
Hu, In general I agree with your comments, and there are some interesting metrics out there that both back you up and (to some degree) refute your thesis. These can be found, by and large, on the SPC, Storage Performance Counil’s, website. If you look at the SPC1 benchmark, where heavier and heavier workloads are applied till no more IOPS can be obtained for a given storage array, the dual-controller modular arrays from essentially all vendors hit some “knee” on their performance curve, and suddenly latency and lag skyrocket as you attempt to add workload. This seems to happen around 60%-80% of max performance.
If you look at the HDS USP and its SUN and HP brethern, there is no “knee”, the performance line is nearly flat, lag and latency increase very slowly, until you max out with no more IOPS available from the platform, and the max is much higher than any dual-controller array. Clearly, the dual-controller arrays are bottlenecking on something in the mix, almost certainly CPU power in the controller or maxed out controller bus systems. The very large amount of storage used for the SPC-1 benchmark pretty much limits the influence of cache, for most products. So, your thesis is proven and well documented in this one (admittedly fairly simple-minded) metric. I/O loads vary, but in the aggregate, this is what would be expected to happen out in the real world as well, with multiple transactional loads and other random loads eventually finding the bottleneck at peak load time.
The fly in the ointment is 3PAR, which is not really modular storage, having up to 8 controllers fronting shared storage. But your assertion is that this type of multi-node array without shared global cache will still not perform as well as the USP/USP-V. In fact, 3PAR exhibits almost exactly the same type of near-flat performance scaling, with no ‘knee’ on the line at all, just like USP. No bottleneck appears as you approach max IOPS. And 3PAR has a higher IOPS max than USP. The USP SPC1` test was from 2008, and 3PAR was from 2009 (I believe), but these were the latest published figures for either. USP @ ~ 200K IOPS, and 3PAR @ ~225K IOPS.
I suspect that this is the type of possible performance from multi-node modular arrays that Gartner was referring to in their paper (which I have NOT seen, so I don’t know for sure) – but it lines up nicely with the one line from Garner that you quoted, where they assumed that “multinode scale-out architectures hold the promise of helping modular systems to asymptotically approach monolithic storage system levels of throughput.”
Quite frankly, the 3PAR SPC-1 result makes their assertion a bit obsolete – at least the part about “asymptotically approaching” monolithic performance. Multi-node modular arrays without shared global cache can outperform monolithic, even at extremely heavy pseudo-random transactional loads. Of course, real world environments never match benchmark loads, no matter how much pseudo-randomness is embodied in the benchmark, which is EMC’s pretend reason for not particpating (in reality, if they could outperform the rest, they’d be there with bells on, can there be any question of that?)
So, I am watching the new USP/V/VS with great interest, to see how it stacks up using the industry standard SPC benchmark (if HDS will go there). I would expect it to handily put HDS back into the lead for max IOPS, but 3PAR and others have not been sitting idle either.