Chewing Metadata…
November 22nd, 2007
Yes I saw and heard a lot about Green IT lately, especially after 2 weeks of interesting SNW conferences in Dallas and Frankfurt last month, including for the new SNIA Green Storage Initiative. But I won’t mention the G word in this note… especially since my fellow blogger Hu already covers this very eloquently.
This time I’d like to take a look at some of the challenges the world of data storage will face with the so-called metadata and its utilization for data and information management purposes. Some of the publications and related work associated with the Semantic Conference earlier this year got me to think about what it could mean for data storage and data management. In so many different IT layers including applications as well as management systems, many IT tools build a huge amount of reference data and/or contextual information about data that are created, accessed, transported, shared or protected. The ILM & DLM requirements got the storage industry to think that it would be very useful to organize the use of and access to such metadata. By doing so it becomes possible to optimize the management of IT assets and data assets according to policies that relate to business & IT operations efficiencies, including managing the changing value of data. This is a great development as storage solutions have tons of potential to optimize where, when and how data should be stored; just think about thin-provisioning, tiered-storage or data de-dup as examples.
Handling metadata is so important that the storage industry is looking at easing the access to it especially for environments where it matters a lot such as fixed-content or archiving. The SNIA XAM initiative is a typical example of a future industry standard to enable open access to both data and metadata without depending on a specific application framework. So as storage solutions get closer to information management requirements – and here I mean that storage technologies and standards will give access to actionable resources and services to help achieving information management objectives – it is very likely that the amount of metadata will increase, starting with what can be generated by the original application that created the data in the first place. Associated with the metadata growth we can also anticipate that there will be a diversification of metadata to make it richer, more relevant and more dynamic. Multi-tenancy metadata is likely to be generalized. So if we can foresee that access to metadata will become easier, will the utilization of metadata become any easier too? This is a genuine question since metadata interpretation is very much application dependent today. Of course, “basic” details such as creation date & time, last modification date & time, data type etc. can be leveraged outside the application context. However more contextual references could become more difficult to utilize. As an example take a set of CRM records that have been tagged as “critical” because the CRM application has determined that the status of those customers required such tagging: whilst the context of such decision is obvious within the CRM application, it is less evident to refer to this decision context outside the application. The significance of the tag “critical” is also a difficult notion to export outside the application context. Is “Critical” a level of dangerousness or a sales process indicator? We can see that it becomes relatively difficult to automate data management tasks based on metadata content that is hard to interpret or for which a human intervention is required. With the anticipated explosion of metadata and metadata types, it is not difficult to believe that handling contextual metadata will become increasingly complex and will require a different approach to allow storage solutions to make sense out of them.
Several years ago, the Web architects faced a similar situation. The Web is by definition a very distributed environment where data is disseminated in many ways and formats. It is therefore challenging for any web users, man or machine, to use different sets of data to (re)construct or determine a set of information matching specified objectives. However the human brain is very quick at performing these tasks, combining data from different sources even if different terminologies are used. What could be done to help the computers and applications to perform in analogous ways on the web? In addition to traditional parameters to call services, a notion of semantic could be added. The Semantic Web project started by W3C in the 90’s helped to specify how metadata should be constructed to provide “portable” metadata that could be directly interpreted by the computer: a machine processable format including a common naming (URI), a common data model for expressing metadata (RDF) and where to find such metadata on the web and a common vocabulary (Ontologies). So W3C standards RDF (and RDFS) and OWL have helped positioning the use of semantic technologies to extend the Web as a reasoning metadata-based infrastructure. The semantic fields have since been extended to address larger notions such as interoperable content, executable knowledge or semantic infrastructure. Web2.0 applications have also leveraged such semantic concepts.
Could semantic technologies enhance data management solutions? Future trends for the storage industry describe the next generation of storage solutions as more autonomous, self-healing, and capable to react or even anticipate the rapid evolution of the IT infrastructure in response to business changes. Such “intelligence” will have to include a significant amount of contextual analysis based on information managed by the different layers involved in IT operations, accessible and interpretable by management frameworks including those driving storage services. So yes, it looks like semantic technologies could be leveraged to help with metadata handling for data storage solutions as well as ILM/DLM best practices. This is not an obvious step forward as many hurdles exist including the questions around metadata formats or the need for a data model. But addressing complex metadata contexts will become necessary for achieving higher-level of management intelligence including for sophisticated storage services. Maybe to even further the greening of IT… Ooops! No G word I said.

