The changing enterprise data profile- IDC
by Hu Yoshida on Dec 9, 2007
Changing Enterprise Data profile – IDC
Rick Villars of IDC presented a study on the Changing Enterprise Data Profile at our APAC Storage Summit in Ho Chi Minh City last Week. He presented a chart that showed the IDC analysis of World Wide, enterprise, disk consumption by data type from 2005 through 2011.
Note that IDC is now using EB or Exabytes as a unit of measure for storage capacity. This chart only shows the external disk storage capacity in an enterprise and does not include the internal storage capacity that resides in workstations and PCs, nor the enterprise data that resides on other media like tape or optical. Both these categories could equal or exceed the amount shown here:
IDC’s chart shows traditional structured, transactional, data growing at a 32.3% compounded annual growth rate, over this period. On top of that he showed the explosive growth of file based, unstructured, data growing at 63.7%, essentially twice the growth rate of structured data. By 2010 IDC is predicting that unstructured data will surpass the amount of structured data that is stored in the enterprise. This means that from 2010 on, the majority of data will be acquired through file services.
On top of this, IDC added a category for replicated data, which included backup, discovery, archiving, business analysis, and, I would assume, business continuance. That data is projected to grow at 43.9%, driven by compliance, the need to retain data longer, real time mining of data, and the need for multi data center business continuance. Replicated data is projected to surpass traditional data by 2009. This category was surprising to me since I would have assumed it to be much higher. It is not uncommon to see 3 to 20 or more copies of production data today for backup cycles, business continuance, development test, and data mining. Usually, replicated data is not considered a separate form of data since it is a replication of unstructured or structured data. The portion of replicated data that will be file based will grow in proportion to the growth in unstructured data and may grow even more as structured data is replicated as archived or nearline data.
Overall, enterprise storage will grow 10 fold over the six years from 2005 to 2011. This growth includes some assumptions around improvements in operational efficiency and deployment of new technologies like thin provisioning, deduplication, and Virtualization 2.0, which Rick defined as storage virtualization with non disruptive data mobility, the ability to move and migrate data without disruption to the application. (Storage Virtualization 1.0 would be volume virtualization. Whether the volume virtualization resided in the server or in the network, the movement of data was not virtualized and required a disruptive read/write across the server or network virtualization engine. The USP V not only virtualizes volumes and the capacity within the volumes with thin provisioning, it also virtualizes data movement in its global cache and masks it from the network and the application servers and meets the IDC definition for Virtualization 2.0.)
What is clear from this study is the need for high performance, massively scalable, NAS systems, to address the explosion in file based, unstructured data. In order to scale to peta bytes, these systems need to support Virtualization 2.0, as IDC defines it. They must be able to provide data mobility without changing the mount point, move data across tiers of storage based on policies, replicate and migrate for business continuance and technology refresh. Instead of building separate storage arrays for NAS and Active Archive, they must be able to leverage the same storage arrays that we use for structured data, with common management and policy engines.
To this end HDS recently announced enhancements to our High Performance, HNAS portfolio. Building on our successful partnership with BlueArc, we announced two models of the HNAS 2000, with significant enhancements in clustering, data protection, virtual server, and integrated management. The HNAS 2000 Nearline scales to 2 PB, and can be used to address the replicated segment of data growth, providing economical remote replication, a cost effective archive, and tape replacement. The base HNAS 2000 is an entry level HNAS which is fully upgradeable to our HNAS 2200 and 2100. All the HNAS systems have been enhanced with deeper integration with the HiCommand management software which also manages our USP V and AMS storage platforms. These systems are designed to use our USP V or AMS storage systems for their repository, and leverage the block virtualization services in those products along with HNAS file services.
Get ready for the coming surge in unstructured data with scalable, high performance, file based systems that can integrate with your existing investment in traditional block storage systems. .
Comments (3 )
[...] Storage requirements are exploding causing more and more small and medium businesses to employ creative solutions to stem the tide. In December, Hu Yoshida, CTO of Hitachi Data Systems (HDS), posted a blog entry about projected enterprise data growth. The entire posting is worth a read, but the included chart really paints the picture well: [...]
[...] solutions to stem the tide. In December, Hu Yoshida, CTO of Hitachi Data Systems (HDS), posted a blog entry about projected enterprise data growth. The entire posting is worth a read, but the included chart [...]
[...] noting the torrid growth of data for a number of years. For example, in the past, research firm IDC has said that enterprise data storage would grow tenfold from 2005-2011, with transactional data growing at [...]