I recently spoke to two storage analysts about the effect of server virtualization on storage resources. Both agreed that the effect of virtual machines will be to increase the I/O workload coming from the VM hardware platform by the number of VMs that are virtualized and that the resulting I/O would be very random.
They agreed that the increasing workload required a storage system that could scale up as well as scale out as I noted in my previous post on Scale up or Scale out. Their next question was about the need for cache when I/O loads are random, since random I/Os are unpredictable and do not benefit from cache prefetch or cache reuse.
I believe there are at least three reasons for cache. The first reason is to mask the latency of the media on the back end through reuse and prefetch as noted above. When the I/O is very random the only way to address this is with faster media such as Flash disks, or wide striping which is available with most thin provisioning implementations. Thin provisioning also helps to make more efficient use of expensive flash disks by eliminating the waste of allocated but unused capacity.
The second reason for cache memory is the support of functions like snap shots, copy on write, distance replication, tiering, and thin provisioning. All these functions require access to control or meta data which needs to be stored in cache memory. These functions are requiring more and more memory for meta data. This meta data has to be mirrored in cache and/or stored on disk. Thin provisioning is a very heavy user of meta data. One competitor has 8GB cache in each drawer. When you read their user manual they say that 3 GB of that 8 Gb is used for control data, leaving only 5 GB for user data. Another vendor requires 143 KB for every thin device and 8 KB for every GB of reported thin device size. Not only does this take away from the cache for use by production data, but it also impacts cache performance through the contention of production data with meta data traffic.
The USP V addresses this problem by servicing control data out of a separate control store memory that is accessed by separate busses than the switched connections to the data store. The result is no contention and more efficient operation of functions like thin provisioned wide striping which is important for support of random I/O.
The third function of cache is to provide tight coupling of the storage resources so that it can be focussed on the processing of an I/O load. In this case I am talking about the ability to scale up as well as out with tight coupling through a global cache. As a virtual server platform scales up by adding more virtual machines or migrates to a faster multi core processor or switches from 4 Gbs FC to 8 Gbs FC or 10 Gbs FCoE, that virtual server can access more of the storage port processors, cache, and back end directors through a single global cache image. Loosely coupled modular storage systems will not be able to scale up to provide this capability.
So random I/O may not benefit directly from the cache benefits of reuse and prefetch, but they benefit from the additional benefits of meta data caching, and tight coupling of all the storage system’s resources like port porcessors, control memeory, production cache, and backend directors to service a random I/O request.
Comments (0)





