Big Data? Big Deal!
by Frank Wilkinson on Oct 19, 2011
There have been many articles, opinions and positions written about the big data phenomenon in the past year, and if you were not confused before, you may be trying to decipher its impact to your organization and your business. There is no doubt about what big data implies, and its affect in business and beyond the data center. It is an important milestone as we enter the next phase of the IT technology era, as we look to not only the data created by business applications, databases, machine generated data, email and file data, but social media generated data and business productivity tools (such as business social media applications).
They all have their impact to how we run business, and that is the easy part to discern from big data issues. What about insight? How will we better utilize our data objects, object stores and their associated metadata to really help drive real opportunities and insight? This was part of the premise of business intelligence solutions, and what they were going to deliver, right?
Not so much!
Seriously Big Data
There are many reasons to take big data issues seriously, not to mention how to best manage it and integrate it, leverage it, and back it all up. Those are challenges all in themselves. For me the greater issue is: how will we interact with the data and make decisions based upon factors that can have an immediate impact to our business? Sure, integration, mobility, backup and exponential data growth are important and they do relate to how we can gain better
insight for making more accurate business decisions, but the fact is, if we have not figured out how to manage the data growth by now, we are in serious trouble. Data consolidation, converged infrastructures and scalable architectures are the beginning to solving how best to approach big data impacts, but what do you do with the data, how will you leverage its insights, and what will you use to extract the insight of the data?
Data mashup dashboards and business dashboards are not necessarily new concepts, and while there are some that have made impacts to the market, none of them have breached beyond basic information without leaving you feeling like you forgot something (like that feeling you get when you leave the house in the morning).
Analytics are becoming more relevant in unstructured data, just as they did for structured data. This is such a big market for enterprise vendors such as IBM, EMC, NetApp and Hitachi, that we are all hard at work building better solutions to help this effort. But that does not mean all solutions will be created equal, nor does it mean that they will offer identical or similar approaches. The end result will be end user adoption, usability and customization.
The question I often ask myself is:”what is really needed in an architecture that will lend itself to provide greater clarity and insight of the data?” This is a complex problem, and one with many different variants for the answer. I believe that we are facing a dynamic shift in how solutions will be designed in the future with a heavier emphasis on truly understanding how businesses run and keying in on the requirements to help business make more intelligent decisions. Throughout the past twenty years we have focused on helping to improve processes, IO, smarter applications and the underlining hardware technologies. I feel, however, the next step is determining the best way to marry hardware and software functions more tightly with more internal hand-off. As an example of this, Michael Hay has written a post describing the essence of autopoiesis, the marriage and tighter integration between hardware and software functions. This is a start. But how?
There is no doubt that our systems and solutions have become much more intelligent—as well as more complex—but at the same time they have become less dynamically integrated with no common connector to share information in a true usable way. This can be leveraged for greater insight to what is happening within the systems and applications, being able to report data into a business dashboard (which can be used as the single pane of glass view of the data and its relevance to the business).
What Are We Searching For? What Will We Discover?
If I want to understand a particular event, such as a sales engagement, what would I like to know about it? Is there information already residing within my data stores? What type of data? Could I also have instant access to social media data and what kind of insight can I gain to assist my decisions going forward?
What I am talking about is how can we have a true 360 degree view of the data that we have access to, as well as data that we need to have access to? While we are in the midst of a big data revolution, the issue is how can we find what we are searching for when the results are collected from various, and perhaps disparate, data silos presented in a correlated view? This would enable data to be displayed with relevancy across data streams and data types. As an example: If I perform a search for John Dowe, the results returned should show all John’s activity related to the constructed query. In this case, the results may show some relevant emails, documents, John’s network or server activity, log file data (he was tying to gain access to a SharePoint site which he does not have access to), some voice mails and perhaps some security surveillance video of him while on the company campus and let’s add in his social network activities, to see if he has breached corporate security by discussing corporate secrets (Yes big brother is watching). Alone, the data parts are not very interesting by themselves, but If we can tie the data together, it can show that John was up to something very nefarious due to the email correspondence to a competitor, voice mails that captured a conversation between John and a corporate spy, security footage with audio capturing me in the parking lot exchanging a sealed envelope.
Is that interesting? Absolutely!
How Do We Get There?
The Search is NOT over!
Search is a term which is too widely used and not fully understood for its potential abilities. Whether you are searching data as part of an eDiscovery process, corporate governance or simply to find relevance around a certain subject, search is the tool we want to leverage and has the results we want to gain insight from to make decisions based on. Search is just the beginning, but to be fair, its just search, and what we need is more data and data types to help us see a true picture of what we are looking at.
Starry Night by Van Gogh is a beautiful painting depicting a wondrous night sky. But what if you only could see the Rhone river? What would we know about the rest of the painting? Would we know that the artist wanted us to know that it is evening? Would we know that there are other aspects and objects? Of course not, we would only see what we have access to. Much like search and discovery of data, we only know what we know. Nothing more.
There are often discussions around what it would take to get there, and the answer I hear quite often is that it is too hard and too costly. To get to where business needs us to be, we need to think about how we can develop THE NEXT smarter architectures and infrastructures leveraging open source solutions and common connectors, while leveraging PCIe, smarter controllers, FPGA’s, SSDS and imbed software functions more closely to the data streams.
Sounds hard right? Yes, but not impossible. We have spent the last two decades adding more and more complexity and infrastructure to handle large amounts of data we generate, but we fell short of true integration across the data center and business applications. I do agree that there are technical obstacles to overcome in order to have a better integration, and how we can push search functions and its associated IO down to the hardware layer to minimize the IO across the infrastructure. This is where HDS shines, as we have the best scientists, engineers and resources that are working and solving these problems. Let’s not forget that Hitachi develops some of the worlds best technology solutions and core IP that it utilized in almost every facet of life and is leveraged in some of the technology you have in your data center. We know a thing or two about taking the overtly complex and making it simple.
While I cannot go into great detail about our current endeavors around our research and development efforts, rest assured that we have been working to decipher the necessary technologies to bring THE NEXT—whatever it may be.
Check back soon for my next blog, which will discuss next generation discovery needs and data correlation.


