Differences between DMX and VMax
by Hu Yoshida on Dec 15, 2009
If you saw the comments by EMC’s Barry Burke to my last blog post, Barry give his explanation of how VMAX works.
“V-Max scales out using the Virtual Matrix. V-Max engines are not “loosely” coupled with RapidIO – RapidIO plays the same inter-process(or) communications role as does the Direct Matrix in DMX. The one ASIC in the V-Max manages the interface to the RapidIO fabric, as well as providing numerous data manipulation services to the local nodes.”
“The simplest way to explain it, since you seem to have a better grasp on the DMX architecture is this: V-Max assigns specific tasks to each core in each director (there are 8 cores per node). These “tasks” are analguous to prior Symm FA, DA, RA (etc.) ports. Incoming I/O requests are received by an FA, and queued to the appropriate DA pair. The DA pair could be local (the two nodes in an Engine), or they could in fact be targeted to a pair of DA processes running within a different engine. In this case, the I/O is queued over the RapidIO fabric – the fabric itself carries multiple simultaneous communication sessions, moving commands and data at different priorities.”
“Net-net – very similar to DMX, just all the I/O processes run on cores instead of “slices”, and the Virtual Matrix replaces the Direct Matrix.”
From information I can gather about DMX and VMax, I do not believe they are very similar. The main difference is not about the connections of a Direct Matrix or Virtual Matrix switch. The main difference is that the DMX has a global cache and the VMax has local cache.
Here are schematics of the DMX and of VMax that I have downloaded from Barry’s Blog.
In the DMX, the Front End Adapters and Back End Device Adapters are separate processors and are tightly coupled through an internal Direct Matrix to a global cache. All the FAs and DAs are working directly with the same cache and any FA can read or write to any DA.
In the VMax the FAs and DAs are bundled with a cache into one multi core processor Although EMC labels this as Global Memory it is really local cache memory. Two of these processors make up one VMax engine. It looks very similar to the Clariion Architecture with two processor complexes with separate local cache and separate FA/DA. Instead of a true global cache that is connected directly to the FAs and DAs through an internal Direct Matrix, the VMax cache consists of local caches that are bundled with FAs and DAs and are connected through an external Rapid IO switch. This type of connection requires a store and forward cache architecture which I would not classify as a global cache.
Reads or writes come in through a Front End port to the local cache in a VMax processor. If the I/O is queued for the Back End in another VMax engine, it must be forwarded across the RapidIO switch to the cache that belongs to the processor that owns the Back End port. The communication between the local caches requires a store and forward across an external switch. When a read is made to an FE port, the command is stored in the local cache, then forwarded across the external switch to another local cache where the BE port is located. If it is a READ miss, status is passed back over the switch to the FE cache, and the BE proceeds to read data from the local disk into its local cache, where it is stored and then forwarded to the FE cache. With writes, the commands and data will be stored and forwarded to two separate VMax engines to provide cache write protection. As Barry says, “the I/O is queued over the RapidIO fabric – the fabric itself carries multiple simultaneous communication sessions, moving commands and data at different priorities.” Sounds like a lot of overhead.
The DMX would not have to do this since all the FE and BE processors work with the same cache for commands, meta data, and data. There is no need to store and forward.
The USP V is similar to the DMX in that it is a tightly coupled multi-processor with a global cache. However, the difference is that the USP uses a dynamic cross bar switch between the processors and global cache modules, and command or meta data is kept in a separate control memory connected directly to the processors to avoid contention between the accesses to meta data and user data. This gives the USP V higher scale up as well as scale out performance and the ability to virtualize externally attached storage from heterogeneous vendors. EMC appears to have copied the switching concept from Hitachi to gain scale out capability but they put it in the wrong place for scale up performance.
If any of you have tested the VMax please let us know your experiences.
Comments (9 )
I give up.
You continue to provide inaccuraet discourse about that which you admittedly do not understand, even after I extend the sincere offer to explain it to you in person, on the phone or via e-mail.
My offer still stands, but clearly you prefer to continue on your own. Alas, I guess it serves your employer’s interests to keep trying to belittle, deride and cast as much FUD about V-Max as you can. Even if you undermine your own credibility in the process.
So be it. It is probably for the better from EMC’s perspective anyway – the less you understand about V-Max, the easier it is for EMC to win deals against HDS/Hitachi/HP (et al).
(And I almost thought Claus’ olive branch was real…)
I posted recently on my blog about how little value I find in a blog which just goes on and on about a competing product. If you don’t reference your own product until the last few paragraphs, it just comes across as worthless trash-talk. I wonder what your audience is for this blog; is it fellow HDSers who will love the anti-EMC commentary? Is it the true believers who feel that it fully justifies their purchase? Because I don’t believe it is actually anyone who wants to understand the HDS value proposition and what you are actually doing in the storage world.
Focus on what you do well and not what the opposition is doing. I tell you what, if a HDS sales-man turned up and went on and on about EMC, they wouldn’t get another meeting. Perhaps you could lead by example.
And actually, it is worse for HDS; if you turn out to fundamentally mis-understand the implementation of a product; how the hell can I expect you to virtualise that product in a safe way?
What you just outlined pertains merely to a simple IO operation (read or write). The overhead described would also apply to internal cache management as well. As data in cache is changed, expired or destaged, the engine responsible for maintaining the data would be required to broadcast any updates wrt to the status of that data to all the other engines participating in the configuration. The issue with sharing cache in a clustered environmnent is that as you add engines(servers), the performance payback decreases because the inter engine chatter increases disproportionately (instead of having to update one partner per change, now you have to update two or four, or six or 8….). Eventually, your configuration reaches the point where adding an engine/node (cache management instances) will spend more time servicing internal cache management than external IO requests. When V-MAX was first announced, there was a statement that the configuration could grow to 256 engines. That statement was quickly retracted and I suspect strongly that it was for the very reason outlined above. I’m not saying the design is bad so long as the engines have enough juice to handle the overhead without affecting performance.
I’ll await the SPC benchmark results
V-Max is loosely coupled structure and which interface is used is irrelevant. In fact it reminds me the NUMA architecture of the 90-ies. If the requested I/O is not in the cache or connected to DA of the same module it must be retrieved from another module. To maintain redundancy it is recommended to distribute host connections between two different modules. In system z it will be more critical because it supports up to 8 paths for each volume.
So far there is no performance feedback from V-Max users however EMC presentation showing x2.3 performance improvement for RAID 5 (Oracle). This is a pretty low factor considering generation change and in absolute figures the DMX performance is also not very impressive.
Locutus – one small nit: the Symmetrix architecture does not require all N nodes to be notified of every update. Although the data is mirrored across different engines (or between the 2 nodes in a single engine), at most 2 nodes must be updated.
Josh – thanks, it seems indeed you understand the architecture. And indeed, spreading connection between at least 2 different nodes, if not 2 different engines is appropriate for any HA environment. The Virtual Matrix makes this just as transparent (and faster) than does the Direct Matrix.
But since when is a better than 2x performance improvement “low?”
Thanks for the correction Barry…. and 2X is respectable.
Barry, I would really like to meet you in person. I offered to meet you at SNW. Unfortunately I do not get much opportunity to visit Hopkington, so if you ever get to Santa Clara please let me know and I will make time to meet you if I am in town.
Martin G. this was a response to Barry’s comment on my previous post, If you or Barry find errors in my post please point them out.
We actually met once, back when I was working with Mercury Computer Systems, running the shared storage division with a product called SANergy (later sold to IBM/Tivoli).
I’ll see if/when my schedule may bring me to the valley area again – hopefully we can arrange something next time I’m out there.
Meanwhile, if you have questions about the Symmetrix V-Max architecture that I haven’t already answered in my blog (I did do a fairly detailed review back on April 14th), I’ll be happy to answer you in detail via e-mail. This blog-based banter is perhaps not the most efficient communications vehicle at our disposal.