Redefining Modular and Monolithic Storage
by Hu Yoshida on Feb 7, 2013
If you Google modular storage and monolithic storage; you will find that many of the analysts’ definitions are very similar. Basically, modular storage is considered to be lower cost, two controller systems that are usually packaged in a 19-inch rack. Monolithic storage is considered to be big expensive boxes that are great for mainframes but are overkill for the requirements of open systems. Lately some analysts have been defining a tier 1.5 storage system, which is typically a cluster of modular systems that are supposed to provide the availability and performance of a monolithic system at lower costs.
Steve Duplessie of ESG wrote a paper that changes this perception. In fact he says that modular systems are also monolithic.
The title of Steve’s paper is “Why Virtualization is Broken and How to Fix It”.
This is an excellent read. It changed my mind about what I was calling monolithic and modular storage.
Steve defines monolithic storage as “self contained compute, memory and I/O system in a box.” What the industry has been defining as modular storage is really monolithic storage because the controllers are self contained in a box. It is monolithic because you cannot increase the power of this type of storage without replacing the box. Steve’sdefinition of monolithic storage is not about the size or cost of a storage system, but about its ability to scale up.
Steve goes on to say that monolithic storage systems have been this way since the beginning of the industrial computing era. The only major improvement has been clustering “where one monolithic system could take over for another monolithic system should a failure occur.” Monolithic architectures either clustered or standalone have been finite and static. This definition applies to all storage systems with dual controllers like VNX and Hitachi Unified Storage (HUS), and also multi-cluster storage systems like 3PAR and VMAX. They are all monolithic storage systems because they are clusters of “self contained compute, memory and I/O system in a box.” Clustering does not enable a monolithic node to scale up.
What is wrong with monolithic storage? According to Steve, this type of architecture was adequate until the move to server virtualization.
“If one physical server contains 10 virtual machines vying for data access on one physical array, the likelihood of varied and unpredictable I/O clearly arises. And it is causing performance problems throughout the industry today.”
“Server cores are cheap and getting cheaper, but storage is actually becoming more expensive. We are trying to fix the problem the same way we fixed monolithic infrastructure problems: Make it bigger and faster… and destroy whatever efficiency and utilization gains we picked up elsewhere.”
Steve’s paper concludes with the “Bigger Truth” to be able to truly scale to meet the needs of virtual machine requirements:
• The industry must stop developing monolithic answers to grid problems.
• Develop applications to take advantage of multiple physical components – not just bigger individual ones.
• These storage systems also must be built so that “No physical failure of anything should ever keep an executable from competing its task.”
Hitachi Data Systems recognized this problem a number of years ago and has designed a non-traditional storage architecture to address this. Hitachi’s VSP as well as its predecessors, USP, NSC, USP V, USP VM, and the new HUS VM are architected around an internal switch that connects multiple combinations of compute, memory, and I/O systems. They are connected in a way that enables all the compute, memory and I/O systems in these storage arrays to be utilized as a single pool of storage resources. With storage virtualization this architecture also enables external storage from other vendors to be included in this pool of storage resources.
VSP and HUS VM can scale up to meet the demands of server virtualization and fix it where it is broken.
Comments (7 )
Hu, in my opinion the definition of monolithic and modular storage is not fitting the current products. I would use:
1. Monolithic storage for subsystems which are cache centered, any host to any back-end access, automated internal path selection, large scalability, large connectivity, advanced functionality and advanced remote copy techniques. Extensive redundancy – a component failure will have negligible impact on operation. Examples HDS VSP and predecessors, IBM DS8000 and EMC Symmetrix and DMX
2. Clustered modular subsystems, when the number of the modules is 2 or more. For example: 3PAR InServ, EMC V-Max, NetApp with ONTAP 8.1, NEC HYDRAstore, XIV , V7000 (cluster of 4) (IBM) and Pillar Axiom 600. Some of them may deliver enterprise or enterprise close functionality.
3. Two side subsystems which are based on two CU, mirrored cache, active/active or active/passive configuration. Limited scalability, connectivity, functionality. A failure of a component on one of the cards may have major impact on performance, 50% of lost connectivity and the user will risk data loss because the cache is not mirrored. Example- too many but most of what is called mid-range.
Forgot to add in 3 that the multi-pathing is done by software or or virtualization on the host.
All above is related to block access. Most of the NAS clusters can be in category 2, as well.
Hello Josh, your description is similar to what I would have said before I read Steve Duplessie’s paper.
If we go back to the Merriam Webster definition of monlithic which defines it as “consisting of or constituting a single unit” or “exhibiting or characterized by often rigidly fixed uniformity” it seems to say that monolithic systems do not scale.
What you describe in point 1 is a scalable architecture where we can dynamically add resources and functionality that are all connected through a global cache. This is more like a grid architecture. What we have been calling modular and clusters of modular storage are really monolithic storage architectures because they do not scale by adding individual components like processors, cache, and I/O ports. Modular storage is a single unit containing a fixed configuration of processor, cache, and I/O ports. When you cluster these single units together, the resources in one unit cannot add any processing power, or cache, or I/O ports to the other units.
Hello Hu, I will not argue with Webster definition of monolithic however this word emerged in the 90-ies as a definition of the high-end storage from EMC, HDS and IBM ( I think that it was Gartner first to use this). This was bad definition because these subsystems scale much better than the modular. Anyhow, this is not the only misused definition in our industry; Architecture and Technology are much more misused. Ionic or Corinthian columns are architecture, but if made of marble(in ancient Greek temple) or sugar (on a wedding cake) is technology. x86 is architecture and CMOS is technology but how many times you read about architecture and technology talking about software, for example.
I fully agree what you wrote “What we have been calling modular and clusters of modular storage are really monolithic storage architectures because they do not scale by adding individual components like processors, cache, and I/O ports. Modular storage is a single unit containing a fixed configuration of processor, cache, and I/O ports”. However the reality is different because use of monolithic and modular is well entrenched today and has nothing to do with the subsystem structure or capability.
In my opinion, all the current definitions of storage are bad and I hope that some independent body such as SNIA or IEEE will decide on new definitions. I personally don’t use them. I use “enterprise high-end” or “mid-range” instead.
“Grid” structure is also something different; the grid idea was using loosely connected free resources over network which is different to multi module clusters which I mentioned in the previous comment. In my opinion the term “modular” may best fit these structures because they scale by adding modules.
Hello Josh, I agree that definitions have become confusing.
However, enterprise and midrange are also confusing terms because midrange systems now run hypervisors, which require scalability that demands a global cache solution.
According to me , the difference between modular(midrange array) & enterprise(monolithic array) is the abilty to connect to mainframe data .
If it cannot store mainframe data it’s called midrange array.
Hello Prakhar, there are many customers today who run their critical enterprise applications on open systems servers and require the same performance, availability, and scalability that was required of mainframes. They also need the same performance, availability, and scalability in their storage system. So mainframe capability is no longer a requirement for enterprise systems.