When is Thin Provisioning Too Thin?
by Hu Yoshida on Jun 14, 2008
In Open systems, application users must preallocate the storage capacity that they need before they begin to write to storage, and they must request enough storage capacity so that their application does not run out of capacity in the middle of a peak processing period. Since there is no accurate way to predict how much capacity they will need, users always request more storage than they expect to need as a standard practice. Thin provisioning is a storage feature that allows the a storage system to share a pool of storage capacity with many users by “thinly” provisioning their request for capacity with only the physical capacity that is actually used versus the capacity that was requested. This takes the guess work out of provisioning storage and makes more efficient use of storage capacity by eliminating the waste of allocated but unused capacity. Since allocated/unused capacity is the biggest reason for the low utilization of storage capacity, storage vendors are rushing to implement this capability.
The common approach to thin provisioning is to chop up the storage capacity into chunks, and allocate the physical storage in increments of chunks or groups of chunks as it is required. This requires additional overhead to keep track of the chunk assignments versus normal storage provisioning. It also requires management of the holes that these chunks come from. When you are allocating chunks to different users with different I/O requests you get fragmentation of the storage pool, and at some point these fragments have to be cleaned up or the pool becomes full of unusable fragments. The smaller the chunks the more overhead is required. One vendor uses a 4KB chunk. Another vendor provides a choice of chunk sizes from 32KB to 256KB. This approach to thin provisioning can create even more complexity and fragmentation, and have a significant impact on performance and throughput, especially if this overheard occurs in the data path.
The Hitachi approach is significantly different. Our chunking unit is a 42 MB page which is striped across the width of a thin provisioned storage pool. By this I mean that, if the storage pool consists of 32 disks, the 42 MB page is stripped across all 32 disks. This enables all the disk arms to support the I/O that goes against this page. The Enterprise Strategy Group tested this in their labs, comparing the performance of a random access workload with 8KB block size on 8 spindles in a RAID 10 configuration, against 32 spindles in four RAID 10 groups and saw a 716% improvement in throughput, with a 118% faster response time at the start point.
Why did we choose a 42 MB page rather than a 4KB, 32KB, or even a 256KB chunk? Well those small chunk sizes don’t stripe well across the width of a large thin provisioning pool. By using a uniform page size that spans the width of the storage pool we also minimize fragmentation. We believe that volumes today are much bigger than they were 10 or even 5 years ago. With 500 GB and 1 TB disks today, who is allocating volumes in KB or even MB sizes? Some file systems have a 2TB limitation which used to be fine 10 years ago, but is too limiting today. Even at the disk media level, there is a movement to increase the traditional 512 byte sector to a 4 KB sector for increased density, performance, and availability. Not only is a 42 MB page more relevant to today’s volume sizes, it also requires less overhead to provision one 42 MB page than an equivalent 10,500 x 4 KB chunks.
In terms of overhead, Hitachi’s thin provisioning overhead is masked since we store the provisioning information in a separate mirrored cache or control store, than the data cache. That means the overhead for provisioning is done outside of the data path.
So as more storage vendors join the thin provisioning ranks, look to see how ‘thin” they are and how it affects performance and manageability. Hitachi’s thin provisioning will be criticized for being “fat” with a 42 MB page size versus the conventional wisdom of 4 KB or 32KB chunks. Instead of following the crowd with small chunks, we chose to use this page size so that we could reduce overhead, simplify the provisioning process, and increase performance by wide striping across the width of the thin provisioning pool.
Comments (2 )
When you allocate a 42MB chunk, I assume this is 42MB of contiguous LBA ranges for the volume. So if you have an application that happened to write 4K randomly over the disk at say 50MB intervals, would you end up consuming the entire capacity fairly quickly?
Or is this where the “de-fragmentation” you mention comes in?
With smaller chunk sizes you don’t need to de-fragment. You allocate a contiguous 32K of LBA and from then on the 32K exists in the volume. So a corresponding 4K random write only allocates 32K times the number of writes. Would take a long time to allocate the whole capacity. The 50MB gaps remain unallocated.
[...] some vendors use really big pages. Some folks made fun of Hitachi for using 42 megabyte pages, since, if there’s one bit in 42 megabytes of potential ones or zeros, the Hitachi will not [...]