Storage Data Protection Efficiencies
by Hu Yoshida on Feb 24, 2012
Data Protection has two roles.
One is the protection of data, and the other is to protect the application that uses that data.
Data must be protected from loss, corruption, denial of access, and unauthorized access. Protection from loss is done in a combination of techniques, like erasure codes (of which RAID is a common example), combined with other techniques like mirroring, snapshots, backup, replication and archive. Hashing, encryption, and point-in-time copies are also ways to detect, protect, and recover from corruption. Denial of access can be prevented by the partitioning of common shared resources like storage networks, storage ports, spindles, and cache so that a user cannot dominate that resource at the expense of the others. Unauthorized access can be prevented by taking security measures like authentication, authorization, passwords, encryption and scrubbing. This is not an exhaustive list of exposures and remedies, but here are some considerations for data protection efficiencies.
Backup is probably the most inefficient way to protect data. Backup of data is disruptive to the application, unless a point in time copy is taken so that the backup can be made from the copy. Even then, there is a slight disruption for the application to synch the inflight data with the point in time, before the copy is made. Point in time copies are made on a periodic basis, so that if the primary storage is lost, or if an application corrupts the data, it can be recovered from a prior point in time copy.
This means that several days or weeks of copies need to be kept just in case. Today these copies are usually kept on disk until the user decides to roll them up and copy them to tape. Processes must be followed to ensure the recycling of the disks and the retention of tapes, which involves the maintenance of a library for removable media needing to be refreshed on a regular basis.
If backup is such an inefficient way to protect data, why don’t we just mirror it so that we always have a copy to recover from? That is a great idea for efficiency, except when the data is being updated, which can corrupt or delete the primary data, which results in a corrupted mirror. Mirrors and backups can also be corrupted by the hardware and software involved in the mirroring or backup. One way to protect against this type of corruption is to create a hash, checksum, or erasure code for the data, as it is stored.
There are a number of ways to improve the efficiency of this process, such as copy on write, or continuous data protection, which is a form of copy on write, and deduplication. Thin provisioning also helps to eliminate the backup of allocated but unused capacity. However, the best way to make backup more efficient is to eliminate it all together when the data is not being modified, as is the case for most unstructured data. An example of this is the use of Hitachi Data Ingester (HDI) with Hitachi Content platform (HCP). This slide shows how this works:
Since files are replicated to HCP, there is no need for backup. If a branch office is destroyed, the data can be recovered at the HCP site or from other branch offices that are authorized to access that branch’s data. Objects in HCP are encrypted and single instance stored to eliminate duplicates. A hash of the object is created on ingestion to prove immutability when it is retrieved.
HCP supports versioning and auto –pruning of content. If an object on HDI changes, then a new version is pushed to HCP. HCP will maintain the older versions for a customer specific duration, which is very useful for recovery if the new version is corrupted or deleted.
Once that period elapses then HCP will automatically age out (delete) the historical version. HCP can support many remote HDI systems in many to one and chain topologies on the same physical HCP, but in different tenants for safe multi-tenancy. An HDI system can have read write access to one HCP tenant and read only access to other tenants for sharing of information.
In addition to HDI, our High Performance NAS (HNAS) can also tier files and stub them out to an HCP. Multiple HDI and HNAS can ingest into the same HCP, which provides secure multi tenancy. HDI and HNAS work with HCP to eliminate back up for unstructured data that comes in via NFS or CIFS.
Unstructured data is primarily write-once, read-many. The majority of data in a database is not changed after it is entered and is rarely accessed after its initial business purpose is served. Yet we spend a great deal of time backing up data that does not change when all we need are two copies for data protection. If this type of data is ingested into HCP, it can automatically copy data to a local or remote HCP, eliminating the need for backup.
HCP can be used with applications like Exchange, SharePoint, and SAP Netweaver, to reduce the working set that needs to be backed up. HCP can augment backup solutions like Hitachi Data Protection Suite, which can backup Microsoft applications, dedupe the backup, move the backup from disk to tape without the need to rehydrate, and ingest the backup to HCP for long-term retention with the ability to do content aware recovery.
The flip side of data protection is availability. HCP is designed to service read request in the face of a number of hardware failures (systems, node, disk, network, etc.) while minimizing the amount of capacity that each object consumes.
HCP and HDI provide an extremely efficient data protection and data availability solution, which can eliminate or greatly reduce the need for backup.
Replication is another data protection technique for site disasters. Replication can be done over synchronous distances with no data loss, as well as across asynchronous distance with loss of some inflight data, which may be acceptable for a specified recovery point objective. Combinations of synchronous and asynchronous replication may be used to provide out of region disaster recovery with no data loss.
Hitachi Universal Replicator’s (HUR) unique journaling feature provides efficiency for asynchronous data replication, which is not available from other vendors. Other asynchronous replication techniques hold the data in the primary site’s cache until it is sent and acknowledged by the secondary site. If delays occur at the secondary site or the link is temporarily blocked, data backs up in the primary cache until it “punctures cache” and the replicated data is lost—requiring the replication process to be restarted.
This is what can be characterized as a push model for replication, where the primary site pushes the data without any knowledge of whether the secondary site is ready to receive it or if the link is still up. With HUR, data to be replicated is stored on disk and a journal entry is sent to the secondary site. Using this journal, the secondary site “pulls” the data from the primary site. If delays occur, the data builds up on primary site’s disk until the secondary site can catch up using the journal. This makes more efficient use of the link, since it can be configured for less than peak bandwidth. When peaks occur and replication data backs up, the journal can catch up during off peak periods. This journal-based approach to replication makes asynchronous replication more efficient and more reliable. It is also an efficient way to provide delta resynch across multiple out of region recovery sites.
In closing, the two major efficiencies for Hitachi data protection and availability include:
- Eliminating or reducing backup through the use of HDI/HCP
- Improving the efficiency of asynchronous replication with the use of HUR and journal technology
I’m interested in hearing your feedback. What do you think?
Comments (4 )
Very well written.
[...] for articles on data replication and disaster recovery techniques, I discovered an interesting post by Hu Yoshida, VP and CTO at HDS. In it, Hu describes the benefits of the Hitachi Universal [...]
Hu. Thanks for an excellent blog post. I’ll admit, I had trouble with this statement, however: “Combinations of synchronous and asynchronous replication may be used to provide out of region disaster recovery with no data loss.” Here’s a link to some additional thoughts about zero data loss data protection.
What, in your opinion, is the need for an In-Region Recovery Data Center, if the data can be fully protected within the Primary Data Center? It seems to us that the In-Region Recovery Data Center can be completely eliminated, leaving need for only the Out-of-Region Remote Data Center.
Actually there has been has been a renewed interest in the regional data center for business where RPO, RTO are extremely short such as in manufacturing of critical multi stage components and financial transactions.During the Tsunami in Japan, some manufacturing sites who were dependent on remote asynch recovery were not able to restart their operations since the plant and the data center that controlled or monitored the process were destroyed at the same time. The asynch, out of region sites was too far behind to do them any good in recovering their work in process. If they had a regional site that was in synch with their primary site they could have recovered much faster. That in synch regional site could also be used to asynch to a remote site for out of region with no data loss. There are a number of banks that are using this approach today to support requirements for out of region disaster recovery – with no data loss.