Storage Fusion – StoraFgUeSION – SfTuOsRiAoGnE
by Ken Wood on Oct 14, 2009
Storage Fusion – StoraFgUeSION – SfTuOsRiAoGnE
“HybridStorage” refers to the mix of flash and magnetic storage medias blended together to increase performance and efficiency. This particular effort is part of the HybridStorage Alliance to promote and educate the industry on the benefits and advantages of this approach. While I’m not personally involved in this organization’s efforts, Hitachi is. This bring back memories of the floptical disk and the NeXt Machine. However, I’m not going to discuss these devices here in detail. What I am going to describe is Storage Fusion which is a term I just made up because Hybrid storage was taken. I love thesauruses. Also, this is different than Controlled Instancing which places data at the right place, at the right time, for the right reason.
Specifically, what I see Hybridstorage proposing is that certain types of data can use or benefit from different types of storage media, not that we storage aficionados didn’t know this. So Hybridstorage is tiered storage media within a HDD form factor. Similarly, but at a higher level, Storage Fusion proposes a similar premise, certain types a data needs different types of storage or file systems. For example, with the proliferation of internet data and various services, it was noticed a while back that most of the bits on the web are static. While the storage industry calls this stuff “fixed content”, reference data, or write once-read many (WORM), these terms tend to be use to describe compliance or archiving types of storage uses and have a less active nature. In reality, there is a lot of bits patterns out there that never change, are highly active (read a lot) and are usually just replaced instead of being modified. I just used the term “bit patterns” but could also use ‘byte patterns” or binaries because most people use the term “data” to describe “their data” or information. What is an application, or operating system programs, devices drivers or binaries when stored on disk? This too is data, but is rarely referred to as “data”. An interesting difference with this type of data is that there are millions of copies in some cases of this data. Think of your Windows Vista or XP, or a Linux distro. For every installed version there is at least one CD/DVD copy associated with it, site licenses not withstanding. With many Linux distros there’s the online copy and usually a plethora of mirror sites, your downloaded ISO images and possibly the CD/DVD version you’ve burned. So protection of this data is usually not an issue, but the inconvenience of rebuilding your systems is always annoying but doable. Useless you make a Ghost image like I do.
So let’s scale this scenario up to say 500 TB and climbing. There’s data and there’s “data”. For some reason, when we get to this level of complexity, we end up saying operational data and user generated data. Operational data is the programs, applications, binaries, templates and configuration information that constitute the “state” of the system, and user generated data, with “user” including another machine, is the data that is stored by users or the results of applications.
Take an online picture storage and sharing web site for example. The web servers, database servers and application servers running the applications of this web service could store and provision the storage centrally. This would allow the compute nodes to essentially be stateless and easily replaceable during failures or upgrades. Likewise, the user generated data would be stored centrally in a content repository like the newly announce Hitachi Content Platform. The operational data is relatively small compared to the user generated data, but the activity of this data is highly active with much of it pinned in system memory. Booting nodes, loading application programs, serving up static web content and the assorted graphics that are locally stored are candidates for Storage Fusion where the file system is comprised of different storage types and media for the different types of data being stored. Hitachi Tiered Storage Manager does just this under the file systems. The operational data would end up on SSD or high speed SAS/FC drives using RAID-5 or RAID-10 protection levels depending on the databases. The bulk of the storage would be magnetic using SATA drive and probably a RAID-6 protection level and would store the user generated content, in this case pictures, that have a much lower activity level and that continues to diminish in activity over time. However, unlike Enterprise data management, this type of data cannot be deleted by the service provider so long as the service is available and the subscriber is paying the fees.
Actually when you think about this scenario and other similar use cases, only a small percentage of any of the data stored is modified. The databases are probably the most active with regards to updating data. The most activity is reading both operational data and user generated data, uploading data and storing it, maybe moving it somewhere after initial load and removing data. In these cases you could craft ways of using optimized file systems and object storage with different media types and RAID levels to obtain the most optimal storage infrastructure to meet performance demands.