RAID Controllers and RAID levels
Posted by MatjažS Admin, Nazadnje spremenjeno s strani MatjažS Admin on 23 September 2013 08:37 AM
And now on to our guest editorial by Mike Pepe...
Choosing the right way to protect your data can be a daunting task. Many system administrators may simply opt to add more drives to a system and implement mirroring and consider the task done. However there are many options available to you and understanding the implications of one data protection scheme over another will help you make the best choice.
The basic forms of RAID
RAID (Redundant Array of Independent Disks) has been around for almost as long as there have been hard drives. The most commonly encountered RAID types are defined by their number; 0, 1, 5 and (most recently) 6. Let's quickly review these RAID levels and what they really mean:
RAID-10 and compound RAID levels
RAID Level 10 is a compound RAID level. More precisely, it's RAID 1+0 (or sometimes, RAID 0+1) – and combines both striping and mirroring. A RAID-10 must consist of at least 4 drives: (two mirrors of two two-drive stripes is the minimum) but can consist of any number of drives, which we'll call stripe width (n) multiplied by the number of mirrored copies (m).
RAID-10 arrays combine excellent performance characteristics as well as good data integrity. There are potentially a great number of drives to pull data from, meaning there is a theoretical read performance of n * m times over that of a single drive. Their biggest downfalls are in capacity: which is only that of n drives, cost: since you must purchase n times m drives, and in write performance depending on how well you data stripes across the drives, which is something we will explain later.
Other "compound" RAID levels are possible, for instance, striping across multiple RAID-5 arrays (RAID 50) or mirroring two RAID-5's (RAID 51) although not every controller supports these more complex scenarios. These compound RAIDs are not officially defined and therefore may not be portable across systems or controllers.
Different types of RAID controllers
Now that we've reviewed the different ways which we can use multiple disk drives for varying degrees of performance, capacity and reliability you may have already decided on what the best scheme is for your application: but RAID type is only one part of the equation. How you control these disks is also important. We can bundle RAID controllers into three distinct categories.
Hardware RAID controllers offer the best performance since they are, in effect, self-contained computers dedicated to running RAID arrays. The controller manages all the aspects of the RAID, and the host system is free to do other tasks while the RAID controller manages everything behind the scenes. Hardware RAID controllers often have their own cache to improve performance, and often have an option for a battery back-up to prevent data loss if the contents of a write cache were not written to disk. All this power has a price, however in this case literally. The best high-end RAID controllers can be very expensive. There are other potential pitfalls as well, which we will discuss a little later.
Software-based RAID uses your host operating system to virtualize your storage into RAID (or RAID-like) groups. For instance, creating a mirror (RAID-1) of your boot disk in Windows Disk Administrator is a simple example of a software RAID. On the other end of the complexity spectrum, Windows Server 2012 introduces a storage management system called Storage Spaces. Using Spaces, you can make a pool out of your storage and apply different protection schemes to your data on a folder-by-folder basis rather than at the partition or disk level. Software RAID has the advantage of being the least expensive option in most cases since the functionality is part of the operating system and requires no additional hardware, or the addition of relatively low-cost host bus adapters to connect disks to your system if you need more ports. Software RAID also has the potential to be the most flexible. For instance, it is possible in Windows to create a RAID-1 mirror using half the capacity of two disks, and then create a RAID-0 volume out of the remaining storage. You'd then have a volume for data that needs protection and one for data that's not critical: all on the same two disks. The main disadvantage of a software RAID setup is that your operating system must manage it, therefore performance may suffer as your CPU time is used for disk I/O rather than for your application. We'll also examine the real-world implications of this later.
Somewhere in the middle are "hybrid" RAID controllers. These sort of controllers are marginally more expensive (or in some cases, the same price as) non-RAID host bus adapters. They generally have firmware that host CPU actually runs to provide the RAID controller functionality, and OS drivers that do the same. In that sense, they are not much different than a software RAID. However, these devices may have some form of caching or dedicated hardware to help speed up operation of a RAID array: for instance, a hardware parity calculator for RAID-5 and 6 arrays. So these devices sit somewhere in between the functionality provided by software and hardware and therefore the pros and cons of both may apply.
Choosing the right RAID controller
So which one is best? Most people would assume that a high-end hardware RAID controller is obviously the best choice, but that's not always the case. At the entry-level server spectrum, a high-end RAID controller can be more costly than an entire server! Some of them have their own out of band network configuration and can be rather complex devices for the non-techie to get working. Interchangeability also is a potential issue for the hardware and hybrid RAID controllers: if your controller fails, you'd likely need at least something from the same product family with similar firmware installed to insure you can read/recover the disks. Good luck to the system administrator who has to try and track down a specific version of a RAID controller that hasn't been made in half a decade!
Contrast this to a software RAID where there's a very good chance that any machine running the same operating system can have transplanted disks from a failed server back up and running very quickly. Recoverability in the event of a crisis may be better here, unless you keep a spare RAID controller card handy. The software/hybrid solutions do utilize your system's resources to a much greater degree than the hardware solutions, but except in the most demanding and critical systems the few percentage points of processor utilization is hardly likely to be noticed.
Price versus performance is a second key decision point, but let's talk more about recoverability. We touched on this earlier with a key advantage of a software-based RAID: the RAID volume should be readable in any machine running the same operating system, whereas with a hardware RAID controller there's a good chance that your RAID volume would not be readable with another brand or type of controller. However there is one exception to this; a RAID-1 that consumes an entire disk; often these volumes are simple block-by-block copies of what would normally be written to a single disk. In many cases it is indeed possible to take one of the copies of a failed mirror volume and put it into any random machine and read it.
Recovery time and reliability
Recovery time and reliability are another point of consideration. As of today, a 4TB drive is the largest available capacity. Average transfer rate on a drive of this size is somewhere around 180 megabytes per second, which means it would take, on average, over seven and a half hours to completely fill this drive up. (In the real world, the time to rebuild an active RAID-5 using drives of this size would be two or more times that!)
Why is this important? Let's consider a RAID-5 built with five 4TB drives. One drive fails and is replaced, and the rebuild process begins. Since hard drives are electro-mechanical devices, there is an engineered in error rate. In this case our drives have a 1 in 10E14 chance of an uncorrectable bit error during any read. In order to reconstruct the RAID-5, we must read a total of 16TB of data, which is 1.28x10E14 bits! There's a very real chance that during the rebuild, we'll encounter an uncorrectable error on one of the remaining drives: if the controller deems that drive bad, we'll have a RAID-5 array with two dead drives and the entire array will fail, and our data disappears.
RAID-6 will help here, since it will continue operating even if two drives fail. However given the high likelihood of an error, even RAID-6 starts to look less and less attractive.
The value of triple redundancy
Given that there is a statistical chance of catastrophic failure of a parity-based RAID group, you should always remember a few things; first and foremost: RAID is not a replacement for a sound backup (and recovery) strategy. Make sure you have backups in place, and test them periodically to make sure that they are recoverable. Secondly, consider triple-redundant options using RAID-10 striping and mirroring.
It's probably safe to say that many people have encountered random silent corruption in their daily lives. It's that picture that won't display anymore, or the video that's broken at some point in playback. Sure these things can happen with single drives and single copies of data, but they do appear even when disk mirroring is in place. Why would that be? Consider the following scenario: a server running a RAID-1 array with two drives crashes or loses power. A random spurious write corrupts a random sector on the hard drive. When the machine comes back up, the controller detects a dirty shutdown and re-mirrors the drive, and encounters a data difference. Which block is the correct one? It's entirely possible the RAID controller doesn't know, and there's potentially a 50% chance that it'll guess wrong, permanently corrupting the file.
What if there were three copies instead of two? Well, in that case, the RAID can take a vote; if two of the blocks agree, it's probably the "right" data. Add a checksumming filesystem, such as Windows Server 2012's ReFS and Storage Spaces on top of that with triple mirroring, and the chances of silent corruption in your data drop dramatically.
Stripe size and RAID performance
Also consider performance of your stripes. RAID types that stripe data across disks have what is known as a "stripe size". A common stripe size is 64k, meaning that data is written to each drive in 64k chunks. As an example, a 4 drive RAID-5 would then commit data to disks in chunks of 256k (4 drives, 64k each). There is nothing wrong with this, as long as your files are generally larger than 256k. If they are not, updating the smaller files within this stripe will require a read of all 256k, a modification to the data, recalculation of parity, and then a 256k write back to all the drives! If you have a lot of very small files, the performance penalty to write or modify them can be enormous.
A few guidelines concerning RAID
Armed with these basic guidelines, the data protection scheme you choose is a balance between needed capacity, performance, and the ever-present constraints of budget. However here are some guidelines based on some real world experience:
About Mike Pepe
Mike Pepe joined Microsoft in 2006 after working in the IT field for ten years providing clustering, backup, and storage solutions for the telecommunications industry. He is currently a Service Engineer working on datacenter-scale automation and service design for Bing.is a Service Engineer for the Bing Information Platform at Microsoft.