But Which Big Data Again?
by Michael Hay on Jan 4, 2012
As I have mentioned before, there is more to the Big Data story than Data Warehousing. Let me conclude first and back my way into the “why”.
I would say that the next tool in the arsenal of any Big Data question is Search!
However, the big “S” Search that I’m talking about is before an analytic query across data residing in a data mart, Key Value Store, Columnar Data Store, or any other NO-SQL (not only-SQL) system. Since in the era of the big bang of Data the super majority of data is potentially exabytes in scale and structured, unstructured and semi-strucured in type, I argue that this pre-Search may indeed be the most important of all.
In his post, Philip Russom talks about this very point: an early step in the overall analytic process, he calls “Discovery Analytics,” which is prior to the institutionalization phase requiring formal ETL placing the data into a DWH or NO-SQL store. This is not dissimilar to early phases in eDiscovery, which include a kind of raw search across mounds of content. Results from this search are then passed to a case management tool for further refinement and analysis. This Discovery Analytic process, to use Philip’s term, identifies the insightful diamonds in the rough which can literally transform, refine, revolutionize, or save an enterprise. Without this phase we are left with no seed to initiate a longer term or deep and recurring analytic process—the kind that Mr. Russom dubs as being institutionalized.
My worry is that the industry is largely leaving behind Search or Discovery Analytics in the general discussions surrounding Big Data. Instead there appears to be fascination with NO-SQL data stores, feeding Hadoop, releasing your own version of a Hadoop, evolving BI tools to handle Big Data, etc. Perhaps this is due to the fact that Search is not trendy enough to warrant hype and excitement, but I suppose if we modify the name to “Discovery Analytics” things could change.
Rest assured that worrying about Search within the enterprise can yield real and tangible results beyond Big Data. In fact, at least Forrester states, as of 2009 information workers spend almost a half a day a week merely finding things inside of an enterprise. To me, this means if the enterprises and vendors who provide to the enterprise focus on Search as Discovery Analytics, we could improve the lives of everyday users and put in the rebar needed to pave the path towards managing Big Data.
Furthermore, I think that an added and positive consequence of focusing on search is the real potential to start the democratization of the Data Scientist. In my humble opinion this could not happen soon enough so that the role is prevented from being entrenched in an almost ivory tower-esque way throughout the industry.
Here’s to a Big-Data-verse for the people, of the people, and by the people.