Why RAID and Erasure Codes Need to be Considered in Disk Purchases
by Hu Yoshida on Jan 11, 2012
Recently, I spent a few days with Garth Gibson, a computer scientist at Carnegie Mellon University and the founder of Panasas, an enterprise server and storage company. Garth and I were in Singapore for a review with the Data Storage Institute.
Garth is best known for the research paper that he authored with David Patterson and Randy Katz in 1988, “A Case for Redundant Arrays of Inexpensive Disks (RAID)”, which was the catalyst for the RAID storage industry. Once this paper was published and presented at conferences, it took only a few years for all the major storage vendors to deliver RAID storage systems and for customers to adopt this new technology. The rate of this adoption was phenomenal.
When I asked Garth about this, he said the reason RAID was adopted so quickly was that this was delivered as a paper and not as a patent. It was freely available to the industry. Garth and Randy also included a taxonomy that defined RAID levels 1 to 5 and mathematical calculations to determine Mean Time To Failure (MTTF), a key factor in fault tolerance.
It also was helped by the availability of relatively inexpensive 5.5-inch disk drives and the premise that a RAID array of inexpensive disk drives could replace more expensive enterprise storage systems with the same reliability and performance. The industry was quick to drop the term “Inexpensive” in favor of “Independent” and RAID was redefined as Redundant Array of Independent Disks.
Tracing RAID’s Origins
Actually, the concept of RAID was introduced much earlier. Garth does not claim that he invented RAID. The earliest patent on RAID was filed by Norman “Ken” Ouchi of IBM, who was issued U.S. Patent 4,092,732 titled, “System for recovering data stored in a failed memory unit” in 1978. (Claus Mikkelsen and I worked with Ken at IBM.)
This patent described what Garth and his colleagues later defined as RAID 5. Ken’s patent also mentioned mirroring (RAID 1) and dedicated parity (RAID 4) as prior art at that time. So RAID has been around for some time but was not adopted until the RAID paper in 1988 which gave it a name, taxonomy, and a financial justification.
RAID as a fault tolerance mechanism for storage is running out of gas as the densities of disk media increase and the probability of multi drive failures increase. RAID levels up to RAID 5 only protect against a single drive failure in a RAID group. As densities increase, the probability of a drive failure increases and the RAID rebuild time also increases which affects performance due to drive contention and the increased probability of another drive failing during the rebuild. There is also an increasing problem with uncorrectable read errors as densities increase.
Getting Familiar with Erasure Codes
In information theory, an erasure code is a Forward Error Correction (FEC) code for a binary channel (where data is transmitted as one of two symbols, usually a 0 or 1) that can reconstruct symbols that are erased. It can be used for networks as wells as storage.
RAID is really a simple form of an erasure code where a parity or check sum is appended to a number of records, so that if one record is lost, it can be reconstructed by summing or XOR the remaining records and parity.
If we want to correct more than one error, additional redundancy must be added and the calculation now becomes a polynomial. This is where you will begin to hear more about erasure codes. RAID 6 is a polynomial erasure code that was introduced with large capacity drives in the last decade to protect against dual drive failures. RAID 6 has two redundancy records, so it requires more overhead than RAID 5 in capacity and processing. As a result it has not been widely adopted until recently.
RAID 6 also helps with uncorrectable read errors. Today, we strongly recommend the use of RAID 6 with RAID pools where data is stripped across many RAID groups, since a dual drive failure in one RAID group would create data loss in all the applications that are using this pool. The cost of an additional parity drive in each RAID group is relatively inexpensive compared to the application down time and the cost of recovering an entire provisioning pool.
However, RAID 6 is not a long-term panacea since it only protects against dual drive failures. With the increasing rate of drive densities, it won’t be too long before we get concerned over three or more drive failures in a RAID group. Storage vendors are working to address these long term requirements.
While storage systems vendors source their disks from the same disk vendors, the reliability of the disk in a storage system will vary depending on how well the system vendors scrub the drives for errors, the effectiveness of their proprietary error detection and recovery software, their maintenance practices, and their proprietary implementation of erasure codes. Users will need to consider the track record of the vendor’s disk availability and then consider the costs and performance trade offs of different erasure codes.
Comments (4 )
True, but realistically it all comes down to the profile of the data. While RAID has limited recovery capabilities (redundancy limitations), it is extremely efficient with minimal overhead; it might still be the best choice for smaller non-distributed response time critical workloads (eg databases). Other workloads (unstructured file/print, video streaming, etc) which can be distributed and are less response time critical can be moved to other technologies such as IDAs which offer greater levels of redundancy, but aren’t as lean.
I guess what we are starting to see is that RAID is no longer a one size fits all. As with the obesity in the general population, considerations need to be made for the extreme ends of the scale (no pun intended).
The ACM article “Triple Parity and Beyond”, as well as storagemojo’s “raid 5 died in 2009″ are other good reads on the need for why RAID 5 is going to cut it anymore.
I’ve noticed in the past year that I’m getting less pushback from customer’s when recommending RAID 10 and 6 with large dynamic provisioning for wide striping in place of traditional raid 5. That said I occasional run into “interesting” configurations (like a 92 disk RAID 5 group) and the storage admins who love raid 5 seem to be digging in more than ever.
I’ve heard mixed things on RAID for SSD’s. I’ve heard of some array’s having problems keeping up with parity calculations and getting significantly improved benchmarks with RAID 10 over 5/6, while others argue for raid 5 still as the disks are small and rebuilds are extremely quick. Anyone have any thoughts/experience or recommendations here?
Hu, good blog. Not sure if you saw Randy’s on the same topic… http://www.evaluatorgroup.com/2012/egi_blogs/life-after-raid-storage-soup-blog-by-randy-kerns/
John, sadly people are economic focused and most have never had an unrecoverable error with RAID 5. The usual arguments are relate to RAID 6 being 33% slower on random writes workloads and RAID 10 resulting in n/2 capacities (for some reason people still think disk is expensive ???). As for the probability of a failure occurring itself, yes the risk is there and yes its is orders of magnitude greater than it was with smaller disks, but on a probabilistic scale the numbers are still fairly small and the mathematical models don’t account for all the technologies which preemptively detect and correct errors, they are generally not indicative of what occurs in the real world (like most mathematical models). There is a good chance a large number RAID 5 groups (with small rebuild domains (<7D+1P) will never experience a failure which can’t be recovered. That said we still push people onto RAID 6 and 10. At the end of the day RAID is a form of insurance and when managing peoples data you want the best insurance you can get.