The Changing Storage Pyramid
by Hu Yoshida on Aug 9, 2009
The storage pyramid has been used to describe the hierarchy of storage products since the beginning of modern storage history. Basically it is a pyramid which maps storage products against total industry storage capacity. The tip of the pyramid is the fastest performing and usually most expensive storage product and lower levels in the pyramid are occupied by storage products which are increasingly lower in cost and performance. Not surprisingly, the lower cost, higher capacity, storage products accounts for more of the storage capacity as data retention elongates and the base of the pyramid broadens.
For most of the time this pyramid has consisted of two types of products, magnetic disk on the top and magnetic tape on the bottom. There have been several products that made brief appearances like, optical disk, Mass Storage Library, and solid state DRAM devices. But for the most part the pyramid layers were made up of different cost and performances levels of disk and tape, tapes occupying a major part of the area in this pyramid.
Lately, that has begun to change. With the introduction of SATA disks, disk densities are far out striping the densities of tape, and the lower operational costs associated with online, random, access; makes it more competitive with tape for long term access. New technologies like de-duplication, Spin down, and storage virtualization which enables policy based migration for technology refresh puts greater pressure on tape for long term retention. While tape will not go away for some time, its place in the storage hierarchy is diminishing.
The introduction of flash drives introduces a new tip to the storage pyramid. Since it is much lower in cost than DRAM SSD devices and fits into current disks systems as if it were a magnetic disk, its position in the storage pyramid is assured. However, the cost of flash drives will remain an order of magnitude higher than magnetic disk for some time and its use will be very selective.
Block and file virtualization with dynamic tiering will enable data to be promoted to flash, when flash level performance is required. The majority of data for the majority of the time will not require that level of performance and will be more cost effectively stored on lower cost magnetic disk.
So going forward, I believe the storage pyramid will be predominately made up of magnetic disks with different performance and capacity characteristics. SSD will occupy the top of the pyramid but will not occupy a large portion of the pyramid until the cost per capacity comes down to within several times the cost of high performance disk. Large capacity disk will continue to cut into tape and tape will occupy a smaller and smaller portion of the pyramid.
First, thanks for the interesting post. Most accounts that I’ve seen are top heavy with Tier-1 being Enterprise FC (60-80%), Tier-2 being Modular FC ( 20-30% ), Tier-3 being SATA ( 10-20% ) and finally Tape as Tier IV. Flipping this pyramid over in a petabyte environment is quite a task. HDS does have block virtualization that allows tiering based on performance requirements. Does HDS have any products in the pipeline for file virtualization ?
Second, a lot of vendors are talking about tiering within the box. I would like to take this further and propose a tiered architecture within the box so that the equation of 1GB Cache per 1 TB of usable storage is not valid anymore. I think HDS did a good job with deferred destage for RAID5 or what is called RAID5+.
However with HDP how does RAID5+ work ? If there is heavy small random IO writes to the HDP pool isn’t the cache WP going to be higher with deferred destage and the large stripe size ?
So here’s what I propose. An array with Level 1 (Tier 0) cache of 64GB which is fixed, a Level 2 (Tier 1) cache of 2TB SSD configured as a RAID10 HDP pool and Level 3 (Tier 3) can be either FC/SAS disk and Level 4 (Tier 4) being SATA. The LRU and other cache algorithms work only on the SSD pool with the user being given the option of using the algorithms on L1 cache if required ( Mainly for mainframe apps like TPF ). Also the watermark for SSD cache would be user tunable like the old days with Graphtrack and the 7700′s.
What does this buy the end user ?
1. Every time there is a storage upgrade there is no need to upgrade L1 cache.
2. Read pool being large (1TB) would result in higher read hits albeit not as fast as L1 cache but still fast enough from SSD for open systems
3. Deferred destage is more efficient for HDP pools
Add block virtualization and external storage and tiering is complete. No cache upgrades each time you add external storage.
Taking this a step further add auto flush to tape based on a watermark on capacity utilization on Tier 4 (SATA) ……
What is your take on this ?