Google File System Revisited
by Michael Hay on Aug 15, 2009
There is a great article about GFS V2 — well that is the current name used in the public press — which talks about a new file system being created removing limitations imposed by the original nearly decade old version of GFS. The key challenge: application I/O patterns changing due to increased interaction directly with end users, and the main culprits: YouTube and GMail. Both applications demand near instantaneous response times from their users, but still both are far different applications from one another: video streaming and email.What I find interesting and something I’m going to put out there is that Google appears to be running into many of the problems that have long plagued clustered file systems. The question that I wonder about, have they realized this and what are they cooking up if anything to resolve these issues?
After literally billions of VC funding,significant corporate R&D, and many open source efforts, the genesis of the general purpose scale-out clustered file system has yet to occur. The proof basically comes from the fact that unique file systems are secretly cropping up to solve specific application problems. (Ultimately, this is why NetApp will fail because they are trying to patch their way into perfection with one file system to solve all problems.) Actually a few years ago I had a great conversation with a customer who talked about this very fact. With a lot of hindsight I’ve strengthened my belief that this gentlemen was correct. The problem of the clustered file systems comes from the difficulties of lock management, metadata management, right sizing the block size, etc. across a plurality of nodes. What results are, again specialized file systems which are good for a specific workload. Take SGI’s CXFS which works great for media streaming operations, which are typically characterized for large block sequential I/O. Whereas HP’s Polyserve product was one such system and was plagued — prior to HP’s purchase — of scaling limitations ultimately being better suited to being an MS-SQL DMBS clustering facilitator. So my thesis is that ultimately we will end up with file systems which are more unique than not. Already with the likes of OCFS and the MS Exchange mailstores we are seeing application tuned file systems, the question becomes what is next beyond this. Ironically, if NetApp had done the right thing with the Spinnaker purchase, they would have adopted the Spinnaker file system into OnTap and not ported the functionality into AWFL, err WAFL. In closing, I can only hope that Google recognizes this trend that I see and instead of stopping development on GFS V1, they actually create a new file system called something else as both have merits for different application I/O profiles.
Comments (4 )
Your comments on clustered file systems touch on an increasingly asked question. Are file systems (clustered or not) really the right technology for massive file/content repositories? We all know about the issue of scale to billions of files and Petabytes of capacity. But, I’m curious as to Hitachi’s perspective on the future of storage for file based information. Is block going to do the trick? Are you working towards object-based storage like your competitors for the future? Also, curious as to where you see the benefit of using NAS versus object.
Dan, as I started to write the comment I figured it was long enough to warrant a post. I’ll update the thread shortly with the URL to the post.
Dan I posted an answer here: http://blogs.hds.com/technomusings/wp-admin/post.php?action=edit&post=755&message=6
[...] Google File System Revisited(August 15th, 2009) [...]