Data Center Transformation Part 4: Dynamic Provisioning
by Hu Yoshida on Jul 27, 2010
This is the fourth part in my series on data center transformation. My last post was on storage transformation and the impact of storage virtualization on the data center. In this post I will address the impact of Dynamic Provisioning on the Data Center.
The history of storage provisioning
The provisioning of storage has been a major effort since the introduction of the random access disk drives in 1956. Prior to disk drives, data was stored on punched cards or magnetic tapes, and provisioning storage was a matter of throwing more cards in the reader or hanging another reel of tape. But disks were more difficult to provision since they were very expensive, and had limited capacity, so they were only used for applications that needed random access to data, and the cost was shared by allowing multiple users to consume parts of the capacity.
Originally, only mainframes could afford to use DASD (Direct Access Storage Devices), and IBM developed catalogs, naming conventions, and a Job Control Language that you used to specify your allocation of storage with a Data Definition statement. This specified a data set name, the volume serial number of the storage you wanted to use, primary and secondary units of capacity that you expected to use, and disposition of the extents after use, whether you wanted to keep the data set, release the capacity that you didn’t need, catalog it, or delete it. Later, this was simplified through the use of Systems Managed Storage which allowed the user to define data classes and storage groups and set policies for storage and data management. Then non-mainframe systems started to use external disk storage in the early 1990s, but without the disciplines that were developed by IBM for the mainframes.
At first it seemed simpler. You did not have to work with job control languages and worry about how much space you used since it was not shared in those days. But as open systems began to explode and Storage Area Networks (SANs) consolidated more servers to larger and larger storage systems, life became much more complicated, and storage administrators were required to provision storage on behalf of many applications that shared the same storage resources. Users estimated how much capacity they needed for an application, then doubled it to make sure they had enough, before they asked a storage administrator for an allocation of storage.
The storage administrator would look at his spreadsheets to see where capacity was available, format the disks, create a RAID group, carve out the LUNs, concatenate the LUNs for a specific volume size, and stripe the LUNs across multiple spindles, or limit the allocation to certain bands on a disk to minimize seek time between tracks. The storage administrator might even add some more buffer capacity to avoid getting a request for more capacity in the middle of the night on a week end or add replicas of the allocation for off line processing copies for backup, data mining, extract/translate/load, development test, etc. All these copies also replicated the over allocation. This could take days or even weeks to provision storage. If new storage had to be acquired, it could take months to requisition, acquire, and install additional storage capacity.
The advantages of dynamic provisioning
Dynamic provisioning helps to solve these problems by creating a pool of preformatted 42 MB pages which can consist of hundreds of RAID groups or spindles. When a user asks for an allocation, the storage administrator can allocate a virtual volume in a matter of minutes. No storage is actually used until the user starts to write to the allocation. Storage is physically allocated a page at a time and the pages are striped across the width of the provisioning pool which automatically wide stripes the data and gets the maximum number of spindles working on each I/O request. By allocating storage by pages from a common pool of pages, we have the greatest flexibility and agility in provisioning storage for new application requirements. If one application happens to need more storage pages than it had originally requested we can borrow from the pool of pages that other applications may have requested but are not using. When copies are needed for backup or data mining, or data needs to be moved to another tier of storage, the copies and moves only use the pages that are actually used and not the whole allocation that the application had requested. This not only requires less storage but also reduces the operations time by doing “thin” copies and “thin” moves.
Thin provisioning is only one of the benefits of Dynamic Provisioning. Thin provisioning is not the Holy Grail as some vendors and analysts promote it to be. Not every file or storage application is thin provision friendly. They might write formatting information or meta data across the entire allocation in which case thin provisioning from the storage side would not be effective. Some file systems may start out thin but become fat very quickly as they add and delete files, unless they tell the storage device where in the file share they deleted a file. Symantec does this with a SCSI “Write Same” command which describes the extent boundaries that were freed up for Dynamic Provisioning to recover.
Dynamic Provisioning with HDS USP V/VM
With storage virtualization and Dynamic Provisioning in the USP V/VM, any external storage that is virtualized can be included in the Dynamic provisioning pool. Another feature of Dynamic Provisioning is the ability to do Zero Page Reclaim (ZPR). If external storage is virtualized behind the USP V/VM, we can move a “fat” volume from an external storage system into a Dynamic Provisioning pool of pages without disruption to the application. Once this is done we can check the pages to see if any pages have zero formats and release those pages back to the pool. Some of our customers have recovered as much as 40% of their existing capacity simply by attaching to the USP V/VM and moving their fat volumes into a Dynamic Provisioning pool. I am not aware of any other thin provisioning solution that can convert fat volumes as easily as that.
Most storage vendors do not use pages for thin provisioning. They will use chunks and chunklets within the larger chunk as their unit of thin provisioning. Using smaller granularity chunks or pages has a cost. Nothing is free. The cost in this case is metadata that describes the pages or chunks and the processing of a mapping table to keep track of the configuration. In the USP V/VM, we keep this metadata in a separate control store so that it does not impact the performance of the data store. Storage systems that do not have a separate control store must process this in their data store or on external disk which creates a performance penalty. In order to reduce the impact of metadata, one method is to define a large chunk and then index into the chunklet for the unit of provisioning. Since our AMS 2000 does not have a separate control store like the USP V/VM, we also use that technique with the AMS 2000.
Another feature of Dynamic Provisioning is wide striping of pages for performance. As pages are created for a volume they are striped across the width of the Dynamic Provisioning pool which may be hundreds of disk spindles. The performance gains that come from striping a volume across many spindles is not magic. Storage administrators have been doing that for years with software volume managers. The difference is Dynamic Provisioning does this automatically. If an administrator decides to increase the stripe for a volume, he must stop access to the volume, back it up, reformat and restripe the spindles, than reload the application. With Dynamic Provisioning, we can dynamically add disks to the pool and the stripe will be automatically restriped across the new pool width with no impact to the application. Although data bases are not thin provision friendly, our customers find that wide striping performance is a major benefit. Databases like Oracle with ASM, the ability to expand databases, also find the ability to dynamically provision new capacity in minutes to be another major benefit.
The primary benefit of Dynamic Provisioning is agility and it is a perfect complement to the agility provided by virtual servers. When a sudden event occurs, triggered by other events across the globe, it may require a data center to spin up a dozen virtual servers in a matter of minutes. But servers are not very useful without storage. Dynamic provisioning can provide storage resources in a matter of minutes. When we combine this with Storage virtualization in the USP V/VM, the benefits of Dynamic Provisioning can be extended to external storage from other vendors and use existing assets to transform the data center. These benefits include:
- Dynamic Provisioning of Storage in a matter of minutes
- Thin Provisioning to eliminate the waste of over allocation
- No performance impact with USP V/VM Architecture
- Thin Copies and Thin Moves to reduce the operational costs of copies and moves
- Zero page reclaim to reclaim wasted allocation from existing storage
- Automatic wide striping for increased performance and automatic tuning
- Combined with storage virtualization, it can enhance existing storage assets with all of the above benefits
Comments (3 )
Good insight. I like to read history lessons especially when there are very few who can narrate these.
With software volume managers you do not need to stop/backup etc. while adding spindles. VxVM has the option of online relayout. # vxassist -g dbaseg relayout vol03 ncol=+2
I have heard that the lottery industry still uses tapes for active storage. And then I browsed for information and found a HDS case study. This is for Française Des Jeux, the French nationallottery, it says
“With mirroring at two separate sites, the IT staff can now start recovery operations of data cartridges from one side or the other without transferring cartridges, so production systems are never interrupted. Française Des Jeux was also able to significantly reduce maintenance costs.”
Any idea, why they are still living in history?? Or have I heard it wrong??
[...] my last post on Data Center Transformation Part 4: Dynamic Provisioning, where I talked about the benefits of Hitachi Dynamic Provisioning, Lucas Mearian published a [...]
[...] Very fast provisioning (mean time to provision) can be enabled with dynamic provisioning [...]