by Frank Wilkinson on Sep 26, 2011
This is my first official blog with Hitachi Data Systems. I started with HDS on of all days, Valentines Day 2011! This reminded me of what I love: of course my wife and children, but also what I have chosen as a career. I mean, who can’t get excited everyday when you have the unique opportunity to try and make the world (or at least technology) a better place?
Some background on me: I have been in the technology industry for over 15 years. I am a Master Architect, both hardware and software, and have worked in sales, pre-sales and strategy roles for most of my career.
For the past 10+ years my main focus has been around search and eDiscovery, and I have had the pleasure to work with some of the industry’s top thought leaders and developers, who actually created this space for archiving and search in 1999. I have helped create this market, and along with some others, we have watched our “child” grow into what it is today, which is far bigger than most thought it would be: hungry for better, faster, richer search capabilities and faster insight to data. I have worked for some of the largest providers of search and eDiscovery solutions and it has been a joy ride all the way.
Like I said, I started on Valentines Day of this year and I think it is apropos, that I have the fortunate opportunity to work for the best IT Solutions company in the world! I get to do what I love and that is to create new concepts and technology, and bring new ideas for Content Cloud solutions for HDS. Who wouldn’t be excited?!
Speaking of exciting…
Big Data Problems
IT is not a mystery in that at some point we were going to run into issues of having too much data which led to too much overly complex architectures, networking latency issues and backup issues—just to name a few. The reality is this is not a new problem, but one that has always existed; just look back at the first Disk Pak that you bought, 10mb and we thought that was enough capacity. When that space ran out we bought a larger one, and thought that was going to be enough capacity! We simply added capacity as needed and along with that we grew our complexities for backup, recovery, replication, networking, etc. You get the picture.
As the decades flew by we had grown to the point of an information overload and grew data beyond our wildest imaginations. While archiving content gave us some relief for email and file data, it also added an extra capability that allowed the ingest data to be indexed, making the data searchable. This was the beginning of the next era for search and eDiscovery.
Fast forward a few years and now we see that there are hundreds of companies who offer search and archiving solutions—perhaps you have one of those solutions in your enterprise today.
I am not telling you something that you don’t already know, but I will tell you that Big Data impacts your ability for greater insight to your data.
Why do we keep data?
Why don’t we expunge that data when its usefulness expires?
Well, we know that data comes in all types and forms, and while some have a high relevance to your business—such as a financial record or contract–we also keep data for historical reference.
So What’s The Issue Here?
The reality is that we have tired to address the big data issue with options like archiving, data de-duplication, application retirement, consolidation, file planning, etc. Please don’t get me wrong, we need these technologies in order to hold the Big Data beast at bay, and these technologies also offer the baseline for building the next generation architectures. Since data has grown exponentially over the past several years, we have tired our best to contain data in its place while trying to adjust our architectures for the next great thing.
So Where Are We Today?
We have complicated and rigid architectures which may not lend themselves to take advantage of Cloud solutions. We have complicated applications and integration issues. We have a ton of meta-data, objects and even some archiving solutions in place—perhaps too many and not easily integrated together. We’ll even throw in some business intelligence solutions.
So What Do We Have?
A big mess!
Maybe it’s just me, and perhaps I see things from too simplistic a view point, but the data we create should serve the business and allow us to make better business decisions that react to the markets with more agility. As Michael Hay and my friends over at the Techno-Musings blog preach, data should not hold us hostage and cause such pains. To me, that is what is important: how can we make business’ run better and more efficiently while reducing risk and reducing the size complexity of IT, while maximizing new technology to gain better insight to the data? This is what I love, and it is my passion (just ask anyone who knows me).
I do have a point to all of this, I promise!
The challenge, as I see it, is not giving you more solutions and adding more complexities, as some companies are trying to sell you, but rather the opposite. How can we look to technology to help solve some basic issues with regards to Big Data?
We have all been promised at one time or another that technology will unlock the value in IT. Well, I am here to tell you that we are almost there, but we still have work to do.
While there are many challenges to address, there are a few which make it to the top of the list:
- Objects and Meta-data: In order for greater insight, we know that we need to do a much better job at exploring and expanding meta-data capabilities. And while we are at it, a better way to standardize meta-data abstraction and find relationships between dissociative meta data.
- Search: we need to think bigger than just search. Analytics need to be more tightly integrated with the data and meta-data. I call it “Content Analytics,” since that is really what we need. If all the data and its associated meta-data can be made to be more intelligent—and yes we are talking way beyond meta-data tagging here, and yes I am leaving out all the other parts to this like (data dictionaries, indices, BI, etc.) because they are understood, at least for now—then we will be able to provide a much richer set of data and meta-data. This could lend itself to a more refined result, giving us greater insight. But how do we get there?
- Open Frameworks and API’s: If we really want to think about the impact that big data has today and its strong hold on our data centers and architecture, then we need to think of a better way for data to communicate with business’. Part of this can be addressed with the adoption and implementation of such open source solutions like OpenSearch, which allows for the sharing of data through a common and structured format for data to be shared and even extended with formats like RSS and Atom just to name a few.
I Told You I Had A Point…..Here It Is!
When I look at the different variants and the many ways in which meta-data is utilized today, and how we are just learning how to maximize its potential, we are still tied to the old architectures, and thus tied to a less spectacular way to gain insight of our data. Being able to leverage that analysis will help us make better business decisions. I am not talking about Business Intelligence solutions, although I do ponder that perhaps if data and its meta-data were able to leverage a common open source set of API’s, then we may have something here. Perhaps we could then be the well for all intelligent meta-data and be the providers and or helpers who other solutions can leverage for meta-data analytics. In order to shrink our architectures and squeeze out more benefits from our data, we need to think about a new approach to meta-data management and what we could expect from next generation solutions.
In Michael Hay’s recent blog post Interacting with Cloud Stores, Michael articulates precisely what is needed around a common set of open sourced APIs, which can be leveraged throughout the enterprise across all applications, storage, data objects and meta-data in order to get us to be able to link and share meta-data across not only the enterprise, but with external meta-data from social media networks (and even from public cloud solutions). Meta-data is the Holy Grail, and the more we can leverage, embed unique markers and add intelligence into our meta-data, the more we can begin to reap the benefits of clearer insight to that data.
As you can probably tell, my blog will take on the challenge of looking at Big Data in many different ways, but most importantly around Content Cloud, Search and eDiscovery.
I look forward to your thoughts and comments!