VMware Alternate Pathing: The Importance of Being Active/Active
by Hu Yoshida on Apr 22, 2011
VMware provides many benefits in server consolidation, performance, scalability, availability, and ease of use. However, this introduces a greater demand on storage systems, since standalone host servers that once had their own LUNs now share a VMFS file system (datastore), which requires the single LUN to provide a solid high performance foundation for the consolidated VM’s. They must share their connections to the data store with other Virtual Machines that act independently, with different peak times and different access patterns. When you throw in additional activities like cloning and vMotion, you can see that a single shared LUN approach is susceptible to workload imbalances, which are difficult to manage. ESX 4.1 provides VAAI to relieve many of the bottlenecks around formatting VMDK disks, cloning, and SCSI reserve, which has been described in a previous post by Michael Heffernan.
Another area of concern with storage is the use of alternate path connections to multiple storage controllers for load balancing and availability. There are two parts to this concern, the alternate path software within VMware and the architecture of the storage subsystem.
DeinosCloud has provided an excellent series of blogs on the subject of alternate pathing with VMware. A recent post describes the ESX Native Multipathing Plug-In (NMP), which provides three basic path selection choices:
- Most Recently Used (MRU): ESX chooses the most recently used path to access a device. If that path becomes unavailable or busy it switches to the next available path and continues using that path.
- Round Robin (RR): This uses a path selection algorithm that rotates through all the available paths to provide a form of load balancing.
- Fixed: This uses a designated path or the first available path and stays with that path until it becomes unavailable. It then switches to an alternate path but reverts back when the designated path becomes available again.
While Round Robin or “Most Recently Used” may be the best choices for load balancing, they do not work well with Active/Passive, and Asymmetric (also known as ALUA or Asymmetric Logical Unit Access) dual controller modular storage systems, which assign LUN ownership to one controller or favors one controller over the other. If path selection on this type of storage architecture is rotated, thrashing between the caches in the two controllers will be the result as LUN ownership bounces back and forth between the controllers.
One way to solve this problem is to use a third party Multi Pathing Plug-in. In addition to VAAI, VMware provides a Pluggable Storage Architecture that coordinates the simultaneous operation of third party multiple multipathing plug-ins. So far EMC provides this in PowerPath/VE and Dell provides their plug-in with the EqualLogic Multipathing Enhancement Module to work with their ALUA controllers. The difference is that EMC charges for their plugin and Dell’s is free.
Another way to solve this problem is to use an Active/Active storage architecture, which can work directly with the native multipathing in VMware without the cost of additional software or the need to create plug-ins.
The Hitachi AMS 2000 is an Active/Active modular storage system. While the AMS 2000 is a dual controller storage system, it can automatically load balance the LUN ownership between the two controller processors.
Why is this important?
First, it makes it much simpler to configure since you do not have to manually allocate the LUN ownership to each controller. It also balances the load on the two controllers automatically so the performance can be optimized. Without this capability, dual controller systems are especially susceptible to VMware ESX host workload imbalances, which require administrators to spend time manually diagnosing and mitigating load imbalance problems. Lastly, it does not require a plug in to VMware, which may incur overhead and additional licensing costs.
Like other applications the VMware server is assigned to one controller on a dual controller modular storage system. When a VMware server begins to spin up virtual machines, the I/O workload on that controller is multiplied by the number of virtual machines, creating a bottleneck on that controller. The AMS 2000 will avoid this bottleneck by automatically load balancing across the two controllers. For a description of how this works, see our best practices guide on using VMware vSphere on the AMS 2000 Family.
While doing the load balancing with a plug-in to the VMware kernel makes it able to manage any storage system that attaches to the HBA, it is specific to VMware. The load balancing is done indirectly through management of control paths. Load balancing is addressed in the storage system with an Active/Active controller for any host that attaches to it, VMware or otherwise, and it has direct knowledge how busy the controllers are.
The advantages of using an Active/Active storage system like the AMS 2000 with VMware are:
- Native hardware support
- Eliminate single points of failure
Which approach would you prefer?
Comments (4 )
First of all thanks for pinging back to my two blog posts. Much appreciated
Second I wanted to clarify ALUA and path trashing as your post might be a bit confusing on these topics.
VMware NMP assigns a default PSP for every logical device based on the SATP associated with the physical paths for that device. If NMP says to use RR PSP policy then traffic will rotate through optimized paths only, and not through ALL available paths as you might suggest in your article. So you might have a lot of proxied IO’s but no path trashing here.
How comes? ALUA compliant storage devices communicate with VMware PSP framework through the Target Port Group Support (TPGS) about the characteristics of the paths, characteristics such optimized or non-optimized path. Based on that RR PSP policy knows what are the optimized paths to rotate through…
With regards to path trashing now. This is a problem likely to happen when choosing a Fixed PSP policy with an asymmetric storage device where the ownership of the LUN is ping-ponging to the various SPs as I/O arrives
sing Fixed PSP policy is much more tedious as fixed paths must be set the same way across all ESX hosts and on all LUNs.
I hope I have clarified ALUA and path trashing.
PowerPath/VE is not required to use VMware servers with EMC Symmetrix, CLARiiON or VNX storage arrays and with VPLEX – they all work just fine with the standard VMware multipathing. Both Symmetrix (all generations) and VNX are active/active arrays (Symmetrix andVPLEX are actually active*n, as LUNs can be exported on all front-end ports, if desired).
What PowerPath/VE introduces is actively-managed multipathing that monitors response times of the available paths and optimizes I/O requests across the paths for maximum performance. For more intelligent than simple MRU or RR, PowerPath/VE can provide as much as 20X more throughput and 1/20th the response time, especially when used in configurations consisting of multiple VMware servers sharing common SAN and storage array infrastructure. By increasing utilization of the HBAs and the network paths significantly further than does the standard VMware multipathing, PowerPath/VE allows for even higher application consolidation onto fewer VMware servers, reducing overall total cost of ownership, even considering the price of PowerPath/VE.
Hello Didier, Thanks for the clarification. Your posts and comments have been very helpful in understand this area.
Thanks for your comment Barry. Just so we are clear, based on your statement, I am assuming that it is no longer a requirement to define a preferred and secondary controller for each and every LUN within the Clariion and VNX?