Of Web 2.0 Storage – Part 1 of 2
by Michael Hay on Jan 4, 2008
Maui nice vacation spot that, however, EMC has tried to make it something more when coupled to Hulk. By the way, I do have a question about Hulk is it the really smart but Evil Maniacal Caustic gray version of the Hulk or is it the big Green simpleton? Joking aside, there is a recurring trend where storage users are looking for: scalable web based storage derived from commodity infrastructures. Specifically, it appears that several companies are trying to offer either competing services to Amazon’s S3 or build a system similar to Google. (Yes I realize there a whole bunch of other online storage service providers and the much rumored GDrive hasn’t seen the light of day, but you should get the idea here by mentioning Amazon and Google.)
Commentary on S3
Over the past 2-3 years Amazon has been rethinking not only the core paradigm of storage, but its basic protocol too. REST, an HTTP based protocol, is the way in to and out of S3 for Amazon. Usage of HTTP drastically simplifies semantics needed for dealing with file objects, simply because it doesn’t have to carry the legacy that both NFS and CIFS do. Two other areas that Amazon has been innovating include the way that they monetize the infrastructure (users are charged for both capacity and access to the data stored) and in fostering both a community of startups and open source projects working with the infrastructure. Key examples of startups and open source projects wrapped around S3 include.
With an ecosystem of startups and open source projects using and building on top of Amazon’s simple storage service means S3 has gotten beyond the science experiment stage. (By the way, Amazon does now have an SLA, but it is three 9s, so if you are paranoid about data safety, S3 may not be for you.) One interesting point behind the Amazon service is the fact that while you pay to move data in and out, you don’t have to pay to move data from one part of their infrastructure to another, although you still have to pay to store it. By doing this Amazon has appears to be incenting users to keep the data inside their infrastructure, with a natural side effect being potentially a reduced latency penalty. In essence if you are concerned that running your application at your site but storing your data at Amazon will be too slow, simply co-locate the application, using EC2, and voila! Latency issues are hopefully handled. However the one thing that Amazon is clearly not doing: touching the end user directly. They are leaving that to companies like SmugMug. In short, I think the Amazon approach is rapidly becoming a de facto standard. I just hope they get some competition soon.
I use the above as background because as of late I’m personally hearing several customers requiring this kind of infrastructure, internally; to build their own applications or they want to potentially offer online Web 2.0 storage to their customers. Key requirements I’ve been witness to include:
- Scales to petabytes under a single namespace
- Relatively few points of management for the system
- Usage of a REST style protocol for access
- Implementation of various capacity optimization strategies such as single instance storage and compression
- Performance measured in tens of thousands of objects per second for the aggregated system
- Value added services capable of performing advanced read caching, capacity balancing, automated node management(fail over, etc.), reclamation of previously unused space, etc.
- Usage of commodity components and in some cases users would like to procure their own infrastructures apart from the software provider, that’s a fancy way of saying you need the ability to sell software independently from the hardware
- Cool candy features such as checks for data authenticity and integrity, Energy friendly infrastructures demanding less direct and indirect (i.e. cooling) power consumption
- High performance media streaming
I could go on and on, but the point is that this is not “your father’s typical storage infrastructure,” but it can take advantage of RAID protected, multi-tier, scale-up storage infrastructures.
Hopefully this provides you with a taste of what is needed for this market. In the next posting I’ll be talking about how Hitachi can respond today with existing products and technologies.
Comments (2 )
[...] http://blogs.hds.com/technomusings/?p=6 [...]
[...] so if you want to read a little more on the platform I highly suggest my File Storage category and this and [...]