How to reduce backup storage requirements by more than 90% without data deduplication
by Rich Vining on Oct 14, 2013
One of the most egregious causes of the copy data problem for many organizations is the common practice of performing full backups every weekend. The architecture of the backup application forces this practice, as it requires a recent full backup data set as the basis for efficient recovery. But each full backup set contains mostly the same files that were backed up in the previous full backup, and the one before that.
Below is a simple example to illustrate this, showing the differences between the common “full + differential”, “full + incremental” and “incremental-forever” backup models. First, the basic definitions of these models.
- Full + differential: copies any new or changed data since the last full backup; a periodic full backup helps to keep the size of the differential set from growing out of control.
- Full + incremental: copies the new or changed data since the last full or incremental backup; a periodic full backup helps to keep the number of incremental sets from growing out of control.
- Incremental-forever: starts with a full backup, then copies only the new and changed data during each backup; it never performs another full backup.
The differential backups require more storage and will copy the same files multiple times during the week, but they offer the benefit of faster, more reliable recoveries since you need to restore only the last full backup set and then the last differential set (a 2 step recovery process). However, the size of the differential backup will increase each day, until a new full backup is completed. Doing differentials forever would eventually be the same as doing full backups every day.
In comparison, the full + incremental method uses a little less storage, and the daily backups will transfer less data, but recovery can be complicated by needing to restore multiple incremental data sets, in the correct order.
The incremental-forever backup solutions on the market are able to track each new file within its repository and present a full data view from any point-in-time within the retention period. This enables a one-step recovery process, which is faster and less error prone than the other models. And of course, this method eliminates the periodic full backups.
Better backup, better recovery
For this example, let’s assume we have a normal business, school or agency that operates 5 days per week, 50 weeks per year. They have 100TB of data, and a total data change rate of 50% per year (50TB). This equates to 1% per week (1TB), and 0.2% per day (200GB). They retain their backups for 12 weeks for operational recovery, assuming that data that needs to be retained longer is archived.
The full + differential model copies 200GB on Monday, 400GB on Tuesday, through to 1TB on Friday, and then copies the full 100TB during the weekend. The full + incremental and the incremental-forever models each copy 200GB per weekday, but the full + incremental copies the full 100TB on the weekend while the incremental-forever system takes the weekend off.
Including the initial full backup (100TB), the total backup storage capacity needed for 12 weeks for each model is:
- Full + differential: 1,336 TB (1.3 PB)
- Full + incremental: 1,312 TB (1.3PB)
- Incremental-forever: 112TB (0.1PB)
That’s a 91% reduction in capacity requirements without spending any money or compromising system performance on data deduplication. How much does 1.2PB of backup storage cost to acquire, manage and maintain? Actually, it’s 2.4PB of extra storage, since we’ll want to replicate the backup repository for off-site disaster recovery. If the backup data is retained for longer than 3 months, then these savings will be increased even more.
Continuous vs. scheduled incremental-forever
As with all choices in technologies, there are some trade-offs involved when selecting an incremental-forever backup model. The classic, scheduled approach to incremental backup used by most data protection applications requires the scanning of the file system being protected to determine which files have changed since the last backup. If the system contains millions or even billions of files, that scan can take many hours, consuming most of the available backup window. Copying the actual data to be backed up takes relatively little time.
This scanning time can be completely eliminated by using a continuous data protection (CDP) approach, which captures each new file, or block of data, as it is written to the disk. There are only a few solutions on the market, including Hitachi Data Instance Manager, that combine the benefits of incremental-forever and continuous data protection.
The CDP model will require a little more storage than the scheduled incremental model, since it will be capturing multiple versions of some files during the day as they are edited, but that’s a good thing. And the storage required will still be far less than the solutions that require full backups.
To learn more about how HDIM can reduce the costs and complexity of protecting your data, watch this video [link TBD], or to request a demo send a note to email@example.com.