How do you increase storage utilization?
by Hu Yoshida on Mar 16, 2010
A while back we did a storage assessment for a non HDS customer and showed him that his storage utilization was actually around 30% which is typical in most accounts. While that was not surprising to the operations people this was a surprise to the financial people who could not understand why 70% of their storage capacity, in this case, several hundred TBs, was not being utilized.
Management was embarrassed and immediately fingers were being pointed at the storage architect and storage administrators, who in turn pointed to the application users who were asking for way more storage than they appeared to need. Management decided that they didn’t need to buy more storage and edicted that storage utilization should be managed to 60%. They decided to stay with their current vendor and buy that vendor’s proprietary software tools to better manage storage utilization. IT operations and storage administrators had to work overtime to implement the tools, monitor allocation and usage, enforce allocation edicts, and recover from increasing outages caused by out of space conditions.
In my view that was the wrong decision since edicting an increase in utilization by working harder was not going to solve the problem. Low storage utilization has been a standard practice to reduce operational costs and provide flexibility. In their case, they bought storage on a three year cycle so they had to have a lead time buffer for storage capacity that would hold them until the next acquisition cycle. The IT operations people knew that additional capacity was needed beyond what their users requested, for administration of backup, business continuity, development test, data transformation, and data mining, so they added to the capacity buffer. Application users knew that they needed head room to grow their applications, and bad things happen if you run out of capacity. Lacking a crystal ball to accurately predict their growth they requested more storage than they expected to use. Storage administrators who wanted to avoid those midnight and weekend calls to shift storage around when some one ran out of capacity would also add to that buffer. Low utilization is one way to manage growth in a dynamic business environment. However, increasing utilization by working harder and micro managing allocations could lead to more costs and less business agility
A more effective way to increase utilization is to implement storage virtualization and Dynamic Provisioning. Dynamic Provisioning eliminates the waste of allocated unused space and allows application users to over allocate as much as they think they need so that they never run out of capacity. Storage virtualization enables existing storage and lower level storage to contribute to Dynamic Provisioning pools through virtualization in an enterprise storage controller. With storage virtualization there is no vendor lock in for storage capacity. Storage virtualization enables the dynamic movement or reallocation of storage capacity during prime time, eliminating the need for midnight and weekend callouts. Storage virtualization and Dynamic Provisioning also eliminate the need for 3 to 5 year lead time buffers for storage acquisition. The lead time buffers become virtual and you can add storage capacity incrementally, as you need it, taking advantage of the 30% to 35 % yearly price erosion in storage capacity costs.
When you increase utilization you run a greater risk of outages caused by out of space conditions, especially if you have silos of storage. If you have 10 storage frames running at 60% utilization, there is a great likelihood that you will run out of capacity on one of those storage frames. Even if you did thin provisioning in each of those frames, one of those frames could run out of capacity. That is where storage virtualization can help by pooling all the frames into one common pool of storage capacity. The excess capacity in any or all the other storage frames can be used to address a peak demand from any application connected to this pool.
In the hype over thin provisioning some analysts were claiming that thin provisioning could enable user to run their storage at 60 to 80% utilization of real capacity. I would caution against running utilization of real capacity at that level especially if it is not part of a virtualized pool of storage. For example if you have a 50 TB storage frame and you have it at 80% utilization with thin provisioning, you only have 10 TB of head room to support that new business application or a sudden spike in demand when something spooks the currency markets on the other side of the globe. Remember your users think they have 60 or 80 TB allocated to their applications when the real capacity is only 40TB. So thin provisioning without the pooling benefit of storage virtualization can be risky when you start to drive higher utilizations.
Even with Storage virtualization and Dynamic Provisioning I would recommend staying below 60% utilization to avoid running out of capacity before you are able to acquire and provision more storage. Another thing to remember is that not all files systems or data bases are thin provision friendly and if your thin provisioning solution does not have the ability to reclaim deleted file space, you will have to defrag or your thin provisioning pool will be eaten up by deleted file holes.
Storage capacity is cheap and getting cheaper while operating costs continue to increase. I see nothing wrong in using a buffer of storage capacity as a management tool to reduce operational costs. Yes you can certainly improve utilization with Dynamic Provisioning and Storage virtualization, but don’t go overboard. Leave yourself enough head room for growth and the unexpected spike in demand.
Comments (5 )
I think HDS should recommend that customers classify applications first before recommending Capacity Depletion Limits and Over Provisioning Limits for Dynamic Provisioning. One could use something like the below :-
1. Predictable and controlled growth rate : HR Databases
2. Predictable but uncontrolled growth rate : Video editing and archival
3. Unpredictable but controlled growth rate : OLTP Databases with a web interface
4. Unpredictable and uncontrolled growth rate : Home directories without quotas
One could run applications that fall in 1 above at 70% utilization and applications that fall in 4 at around 40% with 2 and 3 being somewhere in between.
[...] Yoshida has an interesting view on his recent post discussing storage utilisation rates. His concluding remark suggests running at a maximum of 60% [...]
Thin friendly stacks offer the best approach to optimizing the current crop of Thin provisioning systems. At best a TP system which supports zero page reclamation can recover some of the storage allocated to a filesystem post migration but at that point zero page ceases to be useful.
Zero page cannot recover blocks that a filesystem has used but which it is is no longer using since the blocks do not contain only nulls.
Filesystems do not typically write lots of blocks containing nulls, it certainly isn’t routinely done during say a file deletion and filesystems like ZFS which do copy on write do not want to further punish performance by writing nulls to the blocks that it has just copied from.
De frag should not be necessary and in any case it does not necessarily help for example where you have a filesystem that stripes small amounts of metadata across the volume particularly with large block TP systems.
What is needed is migration tools that only copy the blocks being used by the filesystem, no blocks containing nulls and no blocks that have been used but which are not currently in use which zero page cannot reclaim. This kind of migration technology largely removes the need for zero page reclamation.
Secondly you need a filesystem that is efficient at re-using deleted blocks and which keeps as far as possible its data contiguous and does not scatter metadata everywhere.
Finally you need a filesystem that can provide a list of deleted blocks (not containing nulls) to the TP system and a TP system that can recover these non-null but not used blocks.
Without all these 3 capabilities TP systems will under perform and the under performance will get worse over time as non thin friendly filesystems create and discard more and more blocks which cannot be recovered by the TP system but which are not being used by the filesystem.
VxFS/VxVM from Symantec ticks all of these boxes. Most other host stacks do not. But I am biased I work for Symantec.
I somewhat tend to agree with Andrew.
I guess NAS vendors are in a better position as compared to Block Storage Vendors as far as Thin Provisioning is concerned. The NAS device has awareness of the filesystem metadata and can reclaim unused space far more efficiently.
They are also in a better postion with encrypted filesystems and software dedupe e.g commvault.
What is needed from block storage vendors is better vertical integration with Operating Systems. Maybe something like a tweak to the SCSI protocol where the OS communicates the LBA/CHS address of the metadata to the RAID controller via a special SCSI command and then the RAID controller stores the pool address of the metadata in control memory. The RAID controller can then better understand the unused space and reclaim it.
Zero Page Reclaim may not work with encrypted filesystems or with software dedupe. Also what if I was into Astrophotography and stored photos/videos of DSO’s M45 etc where most of the photo and video is RGB 000000 ? Would ZPR go ahead and reclaim the zeroes ??
[...] posting my blog on increasing storage utilization with Dynamic Provisioning, I visited a very large customer who was in the process of implementing [...]