The Rainbow Connection
by Michael Hay on October 18, 2009
Digressions from the previous post
If you had not guessed already I’m sticking with the Kermit the Frog theme here. And just for grins here is the Rainbow Connection with Kermit and Debby Harry singing.
So on my previous post I talked about files being “the objects” stored out there in the cloud. I know, I know that is a really obvious point, but as I mentioned even truly novel ideas are obvious when you look at them on paper. One of the greatest quotes of learning a horrible yet obvious point is “Soylent Green is people!” which is from the 1973 film Soylent Green. In this the main character through various adventures shows that the reality behind the food people are eating is well people. I feel like this was my point at files being objects.
There’s a pot of gold at the end of the rainbow?

Okay sorry for the digressions, in this post what I want to touch on is scale and infrastructure in the cloud. The first point that I want to make: for new efforts which have not been done before that do not have commercial software already available of course new technologies emerge. However there is always a hype cycle which everyone is familiar with and is rather like a gold rush in some sense. However, hopefully the consequences are no where near the dotcom crash. Speaking of hype, I’m sure that during the early part of the RDBMS hype cycle databases were going to replace everything in the world including file systems. Oops that didn’t happen. The same with the hype surrounding distributed systems versus the mainframe where the distributed system was going to replace the mainframe and the mainframe was a dinosaur — yeah a T-Rex! Well that did not happen either. So cloud is not the end all be all for everything that the world is hyping it up to be. Cloud is about a new set of problems that cannot be solved using traditional COTS approaches. That is why Google, Microsoft, Amazon, Apple and others are using a lot of manpower and a mixture of various commercial and open source stacks to scratch their itch for their businesses. This is largely my point behind citing good old Kermit’s song where the lyrics talk about the rainbows being visionary but merely an illusion. So what I’m saying is that I agree with Tony about the current point in the hype cycle and that the promise of cloud solving everything sounds like a great vision but when you finally get there, like a pot of gold at the end of the rainbow it was a myth.
Hammer in a nail not a screw
I think back to several years ago when I talked to a budding SNS company. They were looking for specialized file systems which were application specific and not general purpose. A file system that was flat and tuned for this company’s particular application which required billions of small files and really really random I/O. Essentially, two of NetApp’s highest end systems were not cutting the muster with their CPU busy rate pegged to as close to 100% as possible. The bulk of the company’s image I/O was handled by their squid web cache, which responded to 95% of the I/O requests from their application and users. While the NetApp systems were on their knees with only 5% of the remaining workload. Basically the WAFL file system is not well suited to really small file random I/O, in the long term, without running what is effectively a defrag on the WAFL file system. (In fact if it was read optimized we would call it the Read Anywhere File Layout or RAFL for short. Wanna buy a ticket?) Because the company in question could never take one of the systems down — the CPUs were pegged at 100% — they could never run the defrag and therefore this was causing them significant performance availability problems. I know that in the intervening time NetApp has surely worked to remedy this, but I want to use this as an illustration of the fact that you have to use the right tool for the job. If there was a specific image repository or store which was optimized for the I/O workload and capacity then I think that the company in question would have been okay. They were effectively trying to hammer in a screw which is fatal for the screw, NetApp was the wrong tool for their application. Hopefully you can take the leap here and see that the SNS domain is an emerging effort and therefore requires new technologies to solve related problems.
Modern Internet geology
One of the most famous communities on the Internet is Slashdot who largely predates many of today’s web communities as well as architectures which are now famous — like Google. If you take a look at their architecture you can find that they dedicate special purpose systems to storing static content, a common practice for many web applications. What I see as going on in the cloud segment today, with respect to emerging applications and businesses, is the big infrastructure crunch. That is to say offerings like the Hitachi Content Platform which serves up static content via HTTP/REST allows these emerging businesses to not worry about the mundane parts of their infrastructure and instead focus on providing value added features to their end users. The image and model that I have in my mind is of sedimentary rock which starts out as living vibrant things with a lot of action and motion, and eventually falls into a lake or another body of water becoming brown peat moss or middleware. Give things a few million years and eventually this vibrant stuff turns into well rock that you can build new structures on top of. The infrastructure crunch is really just like that. It is about taking things that need scale and performance and sticking them in a layer that can provide these attributes and then well allowing the upper application layers to forget about this stuff. In the case of the Slashdot architecture HCP could crunch down their architecture and hold their static content and then they could forget about having to manage these systems moving on to supplying more value added functions/features.
Mature applications
With all of this talk about the emerging applications what about the mature applications like ERP systems, email systems, groupware systems, RDBMS systems, etc.? In the race to deploy things in the cloud, these systems need some approach to get them into a cloud like infrastructure which seems to me like a utility approach to IT infrastructure. In my opinion Hyper-V, XEN, VMWare, Oracle-VM and others can do just that. While the current driver for virtual machine deployments is really about cost savings — note that even Green gets people’s attention because it saves money — in the long term I believe it will be the way that mature applications make it into the cloud. The switching cost of changing these applications is super high and the cost of engineering new applications which are functionally equivalent is super high too. So in my humble opinion I think that we will see architectures much like what Amazon has deployed, but inside of a company as the private cloud. My colleague Hu has just put up a great post on what he views as an in effective architecture, VMAX for these workloads. I think Hu is on to something here and I want to augment his point: people who have already made an investment in their SANs and existing infrastructures want to reuse their investments and rightly so. Throwing these SAN architectures out which have served companies for years to create strong infrastructures is a short sighted approach.
Conclusion
So just to summarize a bit. The objects that everyone is talking about are files. As we pass through this cloud hype I think we will find that a lot of what we thought it would be was basically the illusion that there is a pot of gold at the end of the rainbow. You need the right tool for the right job and COTS solutions today don’t completely solve the infrastructure problems for emerging applications. Currently we are in a phase I call the infrastructure crunch which seeks to commercialize the portions of these emerging technology architectures for added scale, reliability and performance. Finally we cannot forget about the emerging applications, heck even Amazon did not do that, and we have to help people and companies maximize the return on their investment they’ve already made.
Comments (2 )
ircguru on 08 Nov 2009 at 3:45 am
Nice Post, btw do you know any good usenet archives and or mailing list archives site for unix / linux / bsd
Michael Hay on 08 Nov 2009 at 5:09 pm
Not sure if I do or not, have you checked out Google groups since I believe they are the stewards of the uunet archives?



