VAAI – Driving The Need For Storage Computers
by Hu Yoshida on Oct 6, 2011
Chris Evans’ recent post raises a warning that enabling the host to issue direct commands to the storage system via interfaces like VAAI can have performance impacts which can create more work to balance environments, simply because too many VAAI requests may be thrown at the array. Chris is quick to add that he is not against VAAI or the concept of offloading that kind of heavy lifting as long as the array can handle the workload.
Hitachi’s Virtual Storage Platform (VSP) has been designed to address this type of workload, and has a pool of global processors, which are separate from the I/O processors so that non I/O workloads associated with storage program products like replication, and the offload of host commands like VAAI UNMAP have a minimal impact on I/O performance. This pool of global processors are connected to, and communicates with, front and backend I/O processors, as well as a global pool of cache modules across an internal switch matrix. There is also a separate control store in level two memory in the processors that eliminates contention between the meta data for control functions versus the I/O data, which resides in the global cache pool. The pool of global processors starts with a pair of quad core Intel processors (or eight cores) and can scale non-disruptively up to four pairs for a total of 32 cores as more workload is offloaded onto VSP. Other storage systems do not have an internal switch architecture with separate processors to do the I/O and other processors to do the general processing. They must use the same processors to do all the processing, including I/O, storage program products, and offload of host commands.
Specifically in the case of the VAAI UNMAP command that Chris mentions in his blog post, VSP has another advantage over other storage products in the way that we do dynamic paging. Unlike other products that do paging on a chunk/chunklet design (where you reserve a large chunk and index into a smaller chunklet size), VSP uses a 42 MB page, which is managed directly. There is very little overhead to an UNMAP command since it is simply a matter of changing a flag bit on a page from map to free. There are other SCSI commands which communicate the page size to the host so that the responsibility for alignment to page boundaries lies with the host. As long as the host issues the UNMAP along page boundaries, this function should have very little impact on performance of the storage systems that use dynamic paging. All this is done automatically, and as far as this is concerned, there should be no need to add complex controls from a user perspective. Both our AMS + VSP have been tested in synchronous mode for the original vSphere 5 test and we had NO impact with the original test.
Michael Heffernan explains:
The UNMAP command for space reclamation in vSphere 5.0 is used during operations like vMotion, snapshot consolidation and deletion of VMDK’s in order to free up space on the Thin Provisioned volume (some people are describing this as a garbage collector). How this command is implemented by the storage vendors is determined by the performance of the array and the ability to execute this command without impacting the response time to the ESX host. So the question is asked – How is this command expected to be implemented? It has been proposed to have this either implemented synchronously or asynchronously (i.e. based on a scheduling mechanism). I believe we will see over the next few months how this is architected and how VMware works with the storage vendors to optimize this feature to ensure that it is implemented in the best possible way. Implementing VAAI, and now adding new commands like UNMAP, just proves that it is necessary to ensure your storage array has been designed to handle this type of integration.
It is difficult to devise a meaningful benchmark for the effects of this type of offload; the proof will be in customer experience. We have a customer who reported a 20% reduction in processor and memory usage while seeing a 300% to 400% improvement in provisioning time for VMs through the use of ESX v4.1 VAAI on a VSP, in contrast to a non-VAAI enabled storage system.
Using storage to offload host processing and memory to make applications more efficient is a positive direction, as Chris would agree. Storage must be more than a storage container; today storage must become storage computers, and that requires a change in the way that storage systems are architected.
What are your thoughts on the need for storage to offload the host?
Glad to hear HDS was not impacted by this problem. Is there going to be a HDS firmware release supporting unmap soon, or is VMware still refusing to certify anyone?
SCSI unmap was actually my favorite feature, and storage DRS does not make a lot of sense without it.