The “Black Box” of Archiving
by Claus Mikkelsen on Apr 27, 2009
We all know what a “black box” signifies. “It is a technical term for a device, system or object when it is viewed in terms of its input, output and transfer characteristics without any knowledge required of its internal workings.” I know this because I just read the Wikipedia definition and did a copy/paste into this blog. I think Wikipedia just defined itself as a black box. I don’t know its inner workings but I do know that when I need the information, it is there. Thank you Jimmy Wales!!
But we mostly identify it with aircraft. When bad stuff happens, like USAir’s flight # 1549, it was the black box that helped correlate pilot evidence that the aircraft had an unfortunate “aviary moment” on its way to the Hudson River runway. Fortunately, all survived quite nicely.
But here is a device that hour after hour, day after day, and year after year, is constantly recording data so that when the data is needed, it is available. No one really cares about all of the data it has captured over the years, just that when “specific” data is needed, it is retrievable.
Today’s ever-increasing mountains of data are no different. We’re creating data at rates unfathomable even just a few years ago. On the one hand, we can’t just erase stuff we don’t think we’ll ever need again because, one day after we do, we will (need it). So we’re lulled into thinking we’ll just keep everything forever. That’s not right either, so it’s no wonder “archiving” has been a big focus recently. I think “archive” is an unfortunate term since it implies a one-way street. Sort of like landfill: put it there and eventually cover it up. But when you think about it, it’s really all about the retrieval. Accumulating these mountains of data without easily and quickly being able to retrieve exactly what’s needed, when it’s needed, is a wasted exercise.
So today, HDS (re)announced the capabilities of our Hitachi Content Archive Platform (HCAP), but more importantly, how it has been used by our customers. Why did we do this (other than because it’s a really cool product)? Well, like we do with all of our products, we are constantly interviewing and surveying our customers to discover the good, the bad, and the ugly so we can improve, and when the results come back overwhelmingly positive, as in this case, why not announce it?
Too often, products like this get ignored in the overall datacenter priorities, and they shouldn’t be. Being able to relatively quickly and easily install a “black box” (yes it is, see picture) with your defined policies so that it seamlessly archives data no longer needed and (and this is the REALLY important part) brings it back when it is needed, is not only crucial, but a big time and cost saver. You don’t need to understand the “internal workings”, just that when you need to retrieve the data, it is easily available. But know that one of the great things about HCAP is that all of the archived data is pooled in a way that allows direct retrieval using search commands that bypass the need for the original archiving software. Yes, just simple search.
Other findings in our survey: some companies were able to archive over 80% of their online data, almost half were able to achieve a return on investment (payback) in less than 12 months, 20% ingest more than a million objects per month, and almost all were able to manage their entire HCAP environment with a single administrator. So take a look at the announcement. And if you really want to understand the “internal workings” of HCAP, go no further than your nearest HDS representative or check out the website.
[...] check out these blog posts announcing HCP from Hu and Michael in October. And back in April, I posted on the ever expanding capabilities of the HC(A)P. Fast forward to now, we GA’d HCP a couple of [...]