Of Web 2.0 Storage – Part 2 of 2
by Michael Hay on Jan 19, 2008
In part one of this series I largely talked about what is needed for Web 2.0 storage systems, or at least what customers have asked or talked to me personally about. Whilst reviewing the core requirements I’ve been witness to, I know that Hitachi can solve many of them today not through hulking infrastructures that haven’t yet seen the light of day – rhyme and pun intended. I’ll list several below, and talk about how Hitachi can respond today.
- Petabyte scaling under a single system image – yes
- Fewer points of management – yes
- Ingestion of 10s of thousands of objects/second – yes
- Rest style protocol for access – yes
- Implementation of capacity optimization features – yes
- Implementation of value added services (capacity balancing, automated node management/control, garbage collection, authenticity checking etc.) – yes
- Usage of commodity components and/or RAID backed storage and ability to sell software independently of hardware – yes
- High performance media streaming – yes
- Local and wide area content distribution/replication – yes
- Low latency rich media streaming – partially
These are merely a few requirements I wanted to relate, but the point is that we can and do solve these problems for our customers now. We have live systems that scale into the hundreds of terabytes, meaning that like Amazon we have a mature platform: quite pointedly EMC is just now approaching their science experiment stage with both Maui and Hulk.
As to what this orderable product is that can do all of this, drum roll please: it is the software running at the core of the Hitachi Content Archive Platform, internally we code name the platform “Prime” (as in Optimus Prime, or Transformers which are “More than meets the eye”). The talented engineers behind the HCAP and Prime have been hard at work carefully taking the pulse of the industry, and have made something that very much mirrors what could back an online storage service like S3. This very novel system has seen significant improvements such as core capabilities that make it ultra reliable, protect data privacy with encryption, utilization of various capacity optimization features (e.g. single instance storage, etc.), and finally being as the system is web based object storage at its core it can be highly customized to meet user requirements or deployed out of the box.
I do want to provide some level of detail on peta-scaling of the platform. As of HCA V2, Hitachi deploys what we call SAIN (SAN + Array of Independent Nodes) disaggregating the storage from the nodes, meaning that storage and front end processing nodes scale independently. Specifically this means that each node can sport up to 64 LUs at 2TB each and also includes all of the goodness you’d expect from a SAN attached system such as multi-pathing, encryption of data in-flight/at-rest, swapping of the LUs between nodes, and proven RAID backed full featured Hitachi storage all leading to maximum reliability, performance and efficient scaling. When storage capacity scaling is coupled to software/node scale out – we’ve tested node scaling to 80 nodes and have no architected limit – a true peta-scale system emerges of over 10PB realizable today – actually the total addressable capacity of the system is 80 nodes x 16TB/LU x 64LU/node = 81920TB but hey whose counting. However, if users want to perform their own hardware procurement, we can and do sell the software apart from the Hitachi storage and nodes. But, due to the fact that the maximum number of hard drives an x86 class system can hold is 48, it means that to create a similar 10PB scale system would require 1.3 times the number of nodes or 107 nodes – note as of today I’m only aware of SUN’s Thumper that can carry a maximum of 48 hard drives, and I’m assuming each drive can be 1TB of capacity if there is another DAS system than can have more then let me know. So while it is possible and we do sell software independently of the nodes, at really large capacities the scaling out based on a DAS architecture starts to make a lot less sense then what Hitachi has with SAIN – and yes the pun is intended for SAIN, because those that do not sport a SAIN architecture are inSAIN.
Comments (2 )
[...] Read more from the original source: Of Web 2.0 Storage – Part 2 of 2 Backup [...]
[...] Of Web 2.0 Storage – Part 2 of 2 … style protocol for access – yes; Implementation of capacity optimization features – yes; Implementation of value added services (capacity balancing, automated node management/control, garbage collection, authenticity checking etc. … [...]