Big Data Variety
by Hu Yoshida on May 14, 2012
In past posts we talked about Big Data Volume and Velocity requirements and how they could be addressed with Hitachi Data Systems block, file and content storage. Today we will be looking at big data variety.
The reason that big data is getting such attention today is the greater variety of new data types that are available. This data is being generated by many new sources, like click streams, smart meters, smart phones, RFID, NFC (Near Field Communications), etc. Almost every new piece of equipment has sensors built in that can transmit information. In some verticals the increasing us of tools creates many different types of data. A stay in the hospital may involve x-rays, CT scans, MRIs, sonograms, electrocardiograms, PET scans and other monitoring tools for patient care, pharmacy and billing. This variety of data provides more information to solve a particular problem or provide better service. The trick is to capture these different types of data in a way that makes it possible to easily correlate the information that is contained in the data.
In order to harness the power of this big data variety, we must be able to virtualize the data from the application that created it so that it can be used with other applications. How do you virtualize the data from the application? If you simply separate the data from the application, it is just a bunch of bits unless you put that data into a container with the metadata that describes the data content and the policies that govern it. Once you do this, you have created an object, which is self describing and is not dependent on the application that created it. There are many ways to create an object. Here are some examples of how objects are created and stored within Hitachi Content Platform (HCP).
HCP is a multi-tenant content or object store that can store a variety of data using standard protocols like NFS or HTTP in the same virtualized storage pool that can scale to petabytes and billions of objects. HCP can store content from different data owners together in the same object pool, but assign each data owner its own secure tenant space so that other unauthorized users can not access it. However, with the right permissions, a user can do a content aware search across all the modalities of data. A good example of this is in healthcare with the use of Hitachi Clinical Repository (HCR).
Here you see a variety of data sources or modalities within the hospital. Hitachi Data Ingestors (HDI) at remote clinics can ingest data into a central HCP where they are kept as separate tenants. An authorized patient care provider can access all the clinical tests and evaluations done on a patient to coordinate his care. There may be others who do not need to see the patient’s personal data, but may be authorized to do an analysis on the effectiveness of procedures or medication. All this variety of data can be managed by HCP so that authorized people can see all the data that is relevant to a certain task.
A content platform like the HCP that can ingest a variety of data objects is a key enabler for big data variety. Having all the different data objects in the same repository makes it easier to correlate the different varieties of data. Storing the metadata with the data makes it possible to do a content aware query or search against the metadata without having to access the actual data. If each variety of data were stored in its own repository, you would have to process or query each repository separately or manually merge the data from each repository, which may be impractical if you are also concerned with the volume and velocity of big data.
Liked this one too.