The Answer to the Question
by Michael Hay on Oct 6, 2010
However, there have been criticisms from competitors about the Hitachi Virtual Storage Platform including the implemented page size. In fact, during an interview at the tail end of launch week I was asked my opinion of the criticisms. My response to the reporter’s question was honest laughter and the following statement: “Thus far I have found the competitive responses downright amusing.”
Let me relate the source of my amusement. With the rise of unstructured content, file sizes have radically ballooned — for example high resolution satellite images, Virtual Machine images, podcasts, and desktop demonstrations, which are persisted as movies, CGI renderings, etc. (It is not uncommon to see 20MB Microsoft PowerPoint files or 130MB AVIs with people passing around Microsoft Sharepoint links instead of the actual documents themselves. In fact, for an internal project I just uploaded a 130MB AVI file that encapsulates an early demo of proof-of-concept technology.) However, it is more than just the file size that is growing; it is the volume of content creation that is also accelerating making unstructured content the super majority of storage growth by 2014.
Another facet of unstructured content is that usually there is a kind of clustered aging behavior. For example, clustered aging can happen on all content tied to a specific project, whereby project aging results in content access frequency progressively moving towards a state of inactivity. If the project contains large files (a highly likely occurrence today) ranging in size from MBs to nearly 0.25GBs, then all will age together with HDT efficiently demoting a relatively small number of pages.
In fact, even EMC recognizes that there are efficiency requirements mandating larger page/chunk/allocation sizes. For example, on CX’s FASTv2 implementation, a 1024MB sub-LUN chunk is their granularity, while their “thin provisioning” requires 3196MB of initial capacity investment. And finally EMC VMware defaults to a 1MB block size in VMFS — depending on the number and size of the VM images VMFS can support up to an 8MB block size. (Note I cannot comment on the kilobyte chunk size on the FASTv2 implementation for VMAX because as of the writing of this post I don’t believe it is generally available. Perhaps someone can school me on availability?)
Inflating file sizes, the acceleration of unstructured content, and behaviors like clustered aging makes me personally question the need for kilobyte scale pages or sub-LUN chunks in general as perhaps a solution to a problem which is on the way out. And this is why, ladies and gentlemen, I find the comments from our competition amusing.
Comments (6 )
December 2010 for FASTv2 on Symmetrix, apparently.
Chris is this a real date or is it just like the others?
Come on gents, get the lingo right…. its FAST VP now. VP for Virtual Provisioning.
Tut… if you can’t talk the talk, how do you expect to be able to walk the walk
Hmmm…an interesting perspective.
But nonetheless interesting.
While the object size of items stored in a file system have indeed increased (in porportion to growth in the gigapixel scale of digital cameras over the past few years, some would say), the unit of operation within databases has not drastically changed for more than a decade.
So, if you are building automated tiering to optimize storage utilization and performance for storage of digital media files, then I’d probably agree with you that “as big as possible” is best.
But, if your objective is to optimize the utilization of very expensive capacity (DRAM or Flash, for example) for business applications, then it stands to reason that you want to support a granularity that aligns with the I/O sizes used by database engines and file systems and applications. None of these operate on an entire file or object as a unit, heck, even a PowerPoint presentation file these days is actually a linked-list of individual objects within a single file – including many of the objects that you have deleted while creating the presentation!
Oracle databases operate on 8K pages, and NTFS operates (by default) on 4K pages. And despite the growth in digital media density, my 23 Megapixel Canon 7D camera (one of the higher resolution semi-pro cameras available today) creates raw files that are 23MB or less in size (plus a JPG file that is only 7-8MB, and a thumbnail that is only 40K or so).
Thus, for my photo store directories, any extent size larger than 23MB is a waste – causing the system to read 42MB of data off of a tier just to service my request to open a single picture. And if Oracle’s random I/O requests an 8K page that isn’t in cache, it’s hard to imagine that reading (much less relocating) 42MB to support the 8K RRM is optimal.
But in the end, the optimal extent size of automated tiering will be determined by the platform upon which it is implemented. Hitachi’s unit of granularity is 42MB (for some reasonably good reasons), the DS8700′s is 1GB (probably too the DS8800, but we won’t know until it actually supports Easy Tiering), the SVC/V7000 supports a configurable size but recommends the default of 256MB. CLARiiON uses 1Gb, and Symmetrix (as I have explained prior) will utilize the Virtual Provisioning extent size of 768KB as its basis.
Significant to the choice of size is the amount of metadata required to track utilization of the extent…most mid-tier arrays (and the DS8K series, apparently) choose not to keep a lot of metadata in memory for their auto-tiering, a decision that forces larger extent sizes (so there are fewer chunks to keep track of). With a large-cache array like Symmetrix, we have the option of tracking more extents, and thus smaller ones. And when I/O sizes are larger than the smallest extent, it is a simple matter of stringing multiple of them together for the I/O operations; unfortunately for the memory-limited systems, you can’t get any smaller that the smallest extent.
All that said, let me tell you what *I* find really funny:
For almost a full year after EMC first shipped Flash drives (in 2008), HDS’ CTO was telling the world that “nobody needed flash.” And then you try to claim innovation leadership for delivering a functionality that EMC defined and described to the industry in 2009.
Now THAT’S funny.
But you miss the whole purpose of sub-lun automated tiering. If customers wanted to relocate whole files or objects, they would use an HSM to manage aging policies at the file level (which they can do with EMC Celerra today, by the way).
The point of block-level FAST is that file/object granularity is too big and would waste a lot of expensive DRAM or Flash capacity to hold the parts of those files and databases that are never accessed or that have been deleted. And for this, the SMALLER the granularity, the better – less data has to be relocated, and less of the expensive capacity is consumed by data that will never be accessed.
Bigger, in the case of automated tiering, is NEVER better (which is why CLARiiON implemented FASTcache in addition to FAST, by the way – the FASTcache is able to operate at a much finer granularity than FAST alone).
I can’t confirm or deny the VMAX FAST VP date at this point in time, but you won’t have to wait much longer for an answer.
Nigel, glad to know the formal product name. However in this case perhaps VP really means Virtual Product…
Barry, it has been a long time since you have commented here. Welcome back. While I won’t address your comment point by point I will generally summarize what I was trying to get to.
Many modern file systems, like XFS and ZFS, tend to write using models like “allocate on flush” to avoid fragmentation. While not a 100% guarantee of perfectly ordered blocks it is much closer to a sequentially ordered stream. This write concept means that birds of a feather flock together and many files in the same directory or location in the file system tend to be closely linked in behaviors such as aging. This means that lots of contiguous files tend to be candidates for tier demotion at the same time. Further, fundamental things like prefetching tend to pull back data nearby the requested data because we as humans or as systems tend to look at or need nearby information. I point this out again, because as I stated earlier the growth of unstructured content continues to accelerate and both the amount of content and object sizes are ballooning.
However, while our system is designed to account for the future, we did not leave applications correlating to OTLP, etc. behind. We in fact did our homework looking at a variety of smaller page/chunk sizes and found that our choice of a meatier page did not detrimentally impact the application I/O. In fact, because we have a meaty page size we are able to keep the active mapping tables in our control store — note there are secondary copies — leading to a faster I/O experience for the application.
Further with respect to our statements on FLASH storage, what we’ve always said is that not all applications are created equal and not all of them need FLASH. This sets a more pragmatic tone that correlates application I/O behavior to storage media type, and is something we’ve been pursuing for a long time. It is certain that EMC’s tone does have significant market impact – all you have to do is look at the historical stock price and market capitalization of some FLASH memory suppliers.
Finally, as to your point about engineering versus marketing innovation. You are correct EMC beat Hitachi to the punch with marketing innovation; you got the message out first. We lead with engineering innovation; the proof is in the delivery to market because, as you said, your product is not yet available.