Storage Performance Efficiency
by Hu Yoshida on Feb 17, 2012
This is a continuation of my series on storage efficiencies, looking specifically at storage performance efficiencies. Here are just a few considerations.
Drive Performance Efficiency
While large caches in front of disk drives help to mask the performance of mechanical disks with cache hits, cache misses, where the data needs to be retrieved from disk and cache destaging where data needs to be written from cache to disk are still impacted by the mechanical performance of disks. Disk drive performance is determined by rotation time, seek time, and RAID configuration. In the past disks would be tiered by rotation speed and/or RAID types. RAID 1 with 15K RPM disks would be the highest performing tier and slower RPM disks using RAID 5 were used for lower storage tiers. Low cost, high capacity SATA disks were used for the lowest tier.
For high performance applications a volume would be manually spread across many RAID groups in order to wide stripe the access activity across many disk arms. Some users would under populate the disk and restrict the allocation of data to a limited number of tracks in order to reduce the seek time. While Flash drives have been available for high performance applications, its higher cost relative to disk drives, restricted its use since most volumes placed on Flash drives would only see a small portion of the volume being accessed while the rest of the volume would sit idly by on this expensive real estate. However, recent changes in technology have altered the need for many of these practices and provided ways to make more efficient use of Flash drives.
Wide Stripe Performance
Hitachi Dynamic Provisioning, (HDP) creates pools of pages that are spread across many RAID groups. When a volume is created, it is written a page at a time across the width of the pool. This automatically provides wide stripe performance. The following chart was done by the Enterprise Strategy Group:
- The Red line is the response time for 8 Logical Devices that were mapped to a single RAID 10 group consisting of 2 drives mirrored to 2 drives, for a total of 32 drives.
- The yellow line shows the performance of the 8 logical devices that are manually wide striped across all 32 drives through the use of a software logical volume manager.
- The green line shows the same 8 logical devices wide striped automatically by the HDP software in VSP.
The wide striping shows a 716% improvement in transactions at 20 ms response time over logical devices, which are not wide striped. There is nothing magic about wide striping—the difference between doing this automatically or manually is in operational efficiency. If you need to increase the strip, manually you need to stop the application, unload the data, add another RAID group, and then reload the data. With HDP we just add the RAID group and rebalance the stripe in the background without any interruption to the application.
Thin Provisioning Performance
HDP also provides thin provisioning by provisioning capacity only as it is actually written to the pool. This saves not only allocated and unused capacity, but also makes the performance of functions like snapshots, copies, and replication more efficient by eliminating the need to replicate unused space.
Page Level Tiering Performance
With VSP we have taken paging one step further by providing page level tiering or Hitachi Dynamic Tiering (HDT). HDT now makes it economical to use Flash drives since we no longer need to store a whole volume in the Flash tier. Page level tiering retains the hot pages of the volume on a small amount of SSD while the bulk of the data that is less active is migrated down to lower cost tiers of HDD. The application experiences SSD like performance for most of its data request without having to pay for a whole volume’s worth of expensive SSD. Page level tiering can also be used to optimize the use of different levels of HDD performance.
Serial Attached SCSI Performance
Serial Attached SCSI (SAS) drives are a performance differentiator versus Fibre Channel Arbitrated Loop (FCAL) for the drive attachment. SAS is a point-to-point architecture, which eliminates the contention between drives on an arbitrated loop. SAS is available with transfer speeds of 3Gbps and 6Gbps while FCAL for disk attachment is currently at 4Gbps. A point-to-point a 3Gbps SAS can provide better performance than a 4Gbps FCAL. Look for storage systems with SAS drives if performance is a concern.
VSD Global Processor Performance
Aside from faster processors, a major performance difference in VSP is the addition of a pool of global Virtual Storage Directors (VSD) which is quad core Intel processor that are separate from the Front End Directors (FED) and Back End Directors (BED) which are the front and back end RISC processors that handle the specialized I/O processing. This enables VSP to offload additional functions like dynamic provisioning, page level tiering, replication, copies, and application/server function that come through APIs like VAAI from contending for resources with the front and back end I/O processors. Other storage systems add all these additional functions to the I/O processors, which lead to contention for processor resources and create performance impact.
Storage Port Performance
I/O requests come to the storage controller through the front-end ports that are connected to cache. Since VSP has a global cache, which can be accessed by multiple ports, applications can use alternate paths to load balance across the ports for improved performance. The AMS does not have a global cache but can load balance the I/O across the dual controller caches by managing the LUN ownership. Other dual controller systems must assign LUN ownership to each controller and cannot load balance I/O like the AMS. The AMS and VSP can also assign port priorities to manage port performance
The most important performance element of a storage system is cache, which masks the latencies of the back end drives. The effectiveness of cache depends on the workload, but the larger the cache, the more opportunity there is to service the I/O request at cache speeds rather than drive speeds. The VSP cache is a write-protected cache, which means that only the writes are mirrored in cache to prevent data loss in the event of a cache module failure. This means that more cache is available for use than other cache controllers which have to mirror the entire cache, including the read and writes. The VSP cache can be partitioned so that a higher speed server or application does not dominate the entire cache at the expense of other users. The VSP cache is also designed to support the virtualization of other storage systems. It can do a cache write through so that the write is mirrored in the external storage system and more cache is available in VSP. In many cases the performance of externally virtualized storage is enhanced by the larger cache and load balancing features of VSP. Caches also have algorithms to detect access patterns, which they use to optimize staging and destaging of data.
These are just a few of the higher-level functions that deliver storage performance efficiencies. I would welcome any additional efficiencies that you might identify.
For other posts on maximizing storage and capacity efficiencies, check these out: http://blogs.hds.com/capacity-efficiency.php