An Economic View for Big Data Planning
by David Merrill on Jun 6, 2013
Two weeks ago I spoke at a CIO forum in Australia, sharing the stage with IDC on the topic of trends and directions with big data. Last week, same topic but in Thailand. Next week, same topic but in London.
I will probably take this topic of ‘big data economics’ and break it down into some bite-size (or blog-size) messages that I have seen over the years. I have observed and measured Hadoop and Azure storage infrastructure costs for about 3 years now. Back in early 2010, I don’t even know if it was called big data, since we have had this type of analytics and data warehousing function for years. What has changed, and what analysts and surveys keep showing, is a rapid acceleration of the amount of data, the variety of data and the impact of machine-to-machine generated data. So this is where we can start on some economic concepts, as well as some simple points. Here is a summary of my observations:
- Data for analytic purposes is growing rapidly
- The retention of this data varies, but it tends to have a short shelf-life, and many times can be disposed of after a period of time (the data is thrown out, not the information it generates)
- The variety of data makes the task of storing, cataloging and referencing the data interesting
- New storage architectures, file systems and infrastructures are popping-up in data centers (Hadoop, Azure) and on the web (S3, many others)
- I have seen new big data IT initiatives start-up over the past 12-18 months, and many of these start-ups are done on a storage and server infrastructure that is not (economically) optimized for the demands of big data (volume, variety, velocity and value)
- When it comes to designing a cost-efficient, in-house big data infrastructure, IT planners tend to be unable to determine (or to be told) the business value and criticality of the analytic work that these large data stores will provide to the business. Without the business value understood, it is impossible to align or optimize the cost structures to be commensurate with the ‘business value’
- Some ambitious projects have a hard time getting the funding necessary, since the current storage and server infrastructure cannot support the rate of growth and volume at a price that is right for the job
- 1st generation big data infrastructures are good to start, but when the scale-out begins they fail at a technical, operational, and economic level
- May people associate or lump big data projects with cloud projects. I think this is a mistake. Even though cloud offerings may support the end-game, these can be viewed and managed as separate initiatives
My quick summary of the situation is this – we need fundamentally new IT infrastructures and architectures to (cost) effectively support these new initiatives. Your current storage, server and operational processes are probably not suited for the new unit-cost structures needed in the future.
There are several older blogs on this topic in the HDS blog library:
In my next blog entry, I will share some ideas around a “total cost ratio” that may be helpful as you start technical and operational plans to build and deploy big data infrastructures.