HDS and Diligent Deduplication
by Hu Yoshida on Jul 10, 2006
Today HDS is announcing a global reseller relationship with the premier VTL vendor, Diligent. Diligent and HDS have been working together for several years now in services engagements and have developed a relationship which we are taking to the next step where we will sell and support the Diligent products as part of our integrated solutions for the Virtual Tape Library Market. We will sell Diligent’s mainframe as well as open systems VTL products and integrate it with our TagmaStore replication and multi-tiering solutions with the USP and NSC intelligent controllers and AMS modular storage arrays.
We will also resell their unique deduping technology which they brand as ProtecTIER. and announced at the recent Gartner Planet Storage event in June. This was the icing on the cake for us since it gives customers a dramatic reduction in storage costs for any given amount of data requirements and guarantees 100% data integrity.
ProtecTIER is a deduping or factoring technology which is unique. Unlike other deduping technologies, it does not rely on chunking the data, hashing the chunk than comparing it to previous hashes to determine if the data chunk has been hashed before. This approach eliminates data blocks with duplicate hashes. However, it does not protect against hash collisions where the same hash may be generated from different strings of digital data.
ProtecTIER scans the data stream against a cached index looking for similarities, when it finds a similarity it reads the similar data stream and compares it, byte by byte, to the new data stream, to insure that it is the same before it is deleted. In this way they avoid hash collisions and insure data integrity. The efficiency of deduplication increases as the amount of data increases, and more similarities are found.
ProtecTIER can create dramatic levels of deduplication, as much as 25 to one or more. This is especially helpful in backup environments, where retention of weekly back ups and daily incrementals might have to be retained for several months, and only 10% of the data may actually be changed.
ProtecTIER will also help to reduce the volume of data that needs to be sent off site electronically for disaster recovery. Last year there were numerous instances of backup tapes being lost on their way to off site storage facilities. Many analysts are now recommending tapeless backup, backing up to disk and sending it off site electronically rather than by tape courier for greater security. Now with the combination of ProtecTIER deduplication and HDS universal replication products (replication between heterogeneous tiers of storage) this can be done even more securely and cost effectively.
Additional advantages of an HDS/Diligent partnership is the ability to exploit FC SATA disks behind a USP/NSC which supports ESCON/FICON attachment to the Diligent mainframe VTL solutions. All other Mainframe VTL solutions will require the use of expensive ESCON/FICON disk storage subsystems.
The Diligent VTL and ProtecTIER solution are a great fit with our TagmaStore multi-tier storage solutions for optimizing TCSO..
Comments (15 )
Hu, sounds a great product, however whenever I hear about compression products that will allegedly save me space, time and effort I tend to be a bit sceptical about the compression rates. Is there a tool that can be used to validate the compression ratio of existing backups?
Chris, I understand your skepticism. I was skeptical too at first. But we have dozens of customers with this solution in production and their factoring ratios range from 13:1 to 44:1. So, 25:1 is not just a “marketing” claim, it is more of an average number. In addition, we recently invited several analysts into our labs to conduct their own tests and they have written reports that back up these claims as well.
Do I understand it correctly that ProtecTIER is actually a part of the VTL solution?
If the answer is yes, then it’s performance would obvously depend on the data being compressed AND (even more importantly) on the backup software being used. I really doubt if there can be any improvement when this technology is used in conjunction with any decent back-up software whith incremental back-up strategy and host-level software compression.
Anyway, it would be nice to see detailed reports.
Alex, Yes, ProtecTIER has all the capabilities of a standard VTL solution plus it does data deduplication on-the-fly. You are right about the type of data having a major impact on the de-duplication factoring you will see. Large files and databases factor well. Very small files factor poorly. Incremental backups obviously don’t have as much duplicate data as full backups, but think about this . . . if you change one word in a document that is 20 pages long, an incremental backup will backup the entire file. But ProtecTIER will only write the small portion of the file that was changed. ProtecTIER de-duplicates down to the byte level. Does that make sense?
Thanks, Victor; yes it makes sense. I also missed the fact that VTL must be primarily targeted at the mainframe market.
I still think that in open systems world typical ratio will be much lower, especially when backing up database data (take Oracle RMAN as an example). That kind of data doesn’t have this much redundancy.
Hi Alex, our VTL solutions are both mainframe and Open Systems. We have several customers that love ProtecTIER for their Database backups because you only back up the changed bits not the whole thing. Even if the ratio is much lower, say 10:1, think about the economic power of that. 10TB of storage for the cost of 1TB. Of course, you have to factor ProtecTIER costs in but the more storage you need the lower the Total Cost of Ownership is.
Alex, consider other types of data – email for instance – huge amount of duplication (especially all those joke emails that circulate to everyone). Also consider those backups where incrementals are done because full backups take too long and would take too much media, but where restore from a full backup would be much more desirable – well now you can take full backups every night, effectively the stored data will be the incremental.
We have been selling a de-dup product for a couple of years, DataDomain Retorer and see compression rates from 1:5 to 1:80. Even on PACS xray pictures we have seen compression rates up to 1:5. On tape the compression might be negative. The best is databases with a lot of “air”, and repeted every day/night. I am realy looking forward to the unique de-dup technology combined with the possible speed of Fibre Channel.
Tom, how much data is being stored by your customers with the DataDomain solution? I’ve heard the performance of the DataDomian solution is good until the storage hits a size around 10-20TB. At that point their index needs to be stored on disk because it won’t fit in memory. When this happens their performance tanks to 60 MBs or less. Have you also heard of these performance problems?
I have a couple of questions:
1. What kind of performance can you get with this? I would imagine that the ProtectTIER deduplication engine could impose a performance bottleneck of some sort, if not initially, then perhaps with larger data pipes. I.e., if you have 10, 20, or 30 LTO-3 drives, that you’re looking to replace, would this solution be able to provide comparable performance, or are there limits?
2. Is the performance scalable? I.e., can you add I/O engines as you can in some other solutions such as Sepaton’s?
3. With the Hitachi integrated product, will you be able to add non-Hitachi storage capacity, or will this only be supported as a fully integrated, Hitachi-only solution?
Different backup products have different licensing policies… For example Veritas NetBackup Enterprise Server requires no any license in case of backup to file system (in case of classic DataDomain). However additional licenses should be installed to backup data to VTLs ($1,000 per VTL terabyte – http://searchstorage.techtarget.com/originalContent/0,289142,sid5_gci1182931,00.html ). Thus several licensing collisions may persist due to compression or deduplication technologies.
There is a performance trade-off when using ProtecTIER. For example, our standard VTFOpen product writes about 480 MBs, while ProtecTIER writes about 260 MBs. But what you lose in performance, you gain in being about to store a lot more data on a smaller physical amount of disk. Each solution addresses a different customer need. Some customers want the pure performance of VTFOpen to shrink the backup window. Others want to reduce the amount of data to replicate to an off-site location for DR purposes. Of course you can add more servers to increase bandwidth and our VTL solution works with most storage systems.
It should be known that there’s discussions in our recently launched forums lead by Victor around VTL:
Swing by here:
You can participate too, just register (and no, we won’t send you anything unless you check the box for it)
I’m trying to imagine a hash collision with a well constructed hash such as sha-1, and I would really have to see one demonstrated before I could believe in such a beast. The whole point in such hashing is strong collision resistance.
And if you could get one, which according to the literature I see might happen once in several quadrillion years, then generate two hashes, one forward and one backwards, and you will never have a collision.
I see the advantage of deduplication, but also see some problems. One problem is that we do battle against huge amounts of data to be backed up and not solving the source of the problem namely the obsolete data that is backed up and has to be restored e.g. in case of a disaster. The second Problem, if I understand correctly, we can store huge amount of data on a single (virtual) tape. However tapes cannot be shared between users (even if the virtual tape is in reality on disk) thus can lead to problems since the chance increases that more than one user will accesse the same tape while it is in use by an other e.g when using some backup tools.