RAID Levels: RAID 0,RAID1,RAID 10,RAID 5,RAID 6(Complete Tutorial)
Hard Disk drives are one of the most complex devices that are attached to a computer system(or a server machine). The complexity of a hard disk is due to the fact that it is a mechanical device, used for storage. Most of the internal parts of a disk drive are moving parts, that moves its head to fetch data for the user.
Due to this movement there is a high chance of failure of the disk drive. Advancement in disk drive's has resulted in removing the mechanical parts to make a solid state drive, normally called as an SSD. However there are yet some shortcomings in SSD drives, due to which, we cannot completely replace mechanical disk drives. Using a single hard disk for a server machine is not at all advisable because that will be a single point of failure(a heavy risk of data loss).
Raid Addresses most of these problems, because its fast,fault tolerant,and a high performing solution.
What is RAID?
RAID stands for redundant array of independent disks. The name indicates that the disk drives are independent, and are multiple in number. How the data is distributed between these drives depends on the RAID level used.
The main advantage of RAID, is the fact that, to the operating system the array of disks can be presented as a single disk.
RAID is fault tolerant because in most of the RAID level's data is redundant in multiple disks, so even if one disk fails,or even two sometimes, the data will be safe and the operating system will not be even aware of the failure. DATA loss is prevented due to the fact that data can be recovered from the disk that are not failed.
Let's understand different terminologies that are used in RAID before getting inside different levels of RAID.
What is Stripping in RAID
Writing data on a single disk is slower, but writing data by spreading it on multiple disks is faster(because data is written in small chunks to different disks, and also fetched in small chunks by different disks)
When data is fetched from different disks, the CPU does not have to wait, because the throughput will be a combined one of all the disks.
Each and every disk drives are partitioned in small chunks (ranges from 4kb to 512kb sometimes).
In the above shown example diagram you can see that all three disks contains different data(data that needs to be stored, will be striped in chunks and will be spreaded across different disks).
There is no redundancy in this method, but is better known for high performance.
What is mirroring in Raid?
Mirroring is a mechanism in which the same data is written to another disk drive. The main advantage of mirroring(multiple sets of same data on two disks), is that it provides 100 percent redundancy.
Suppose there are two drives in mirroring mode, then both of them will contain an exact same copy of data. So even if one disk fails, the data is safe on the other.
What is parity in Raid?
Parity is an interesting method used to rebuild data in case of failure of one of the disks. Although its interesting to understand, how parity works, you will find less documentation about it on the internet.
Parity makes use of a very famous mathematical binary operation called as "XOR".
XOR is a mathematical operation that's done to produce one output from two input. Some examples of XOR operations are as below.
|1'st operator||2'nd operator||XOR OUPUT|
You can simply make a rule while performing XOR binary operation, that if there is a difference in the operator then the XOR output is 1.
In the above shown example table consider the columns "1'st operator" and "2nd operator" as hard disks in a RAID array, and the third column "XOR OUPUT" as a parity disk.
And now if one of the disk fails, you can easily construct the data on the failed disk with the help of the parity disk and the other disk which is not failed.
Parity in raid can be of two types.
- Dedicated Parity(XOR of data bits on a dedicated parity disk)
- Distributed Parity(XOR of data bits distributed across all data disks)
RAID 5 does a distributed parity so it can survive one disk failure.
RAID 5 does a double distributed parity so it can survice two disk failures.
If you notice the above picture that depicts distributed parity, if suppose one disk is failed, you can build the data in it with the parity data of other disk.
Which means one stripe on one of the disk, while storing data will be used for parity of the data in other disks.
So if a disk fails, data can be reconstructed from other disk parity, and parity blocks can also be reconstructed from other disk's data.
What are hot spares in Raid?
Hot Spare is an extra drive added to the disk array, to increase the fault tolerance. If you have a hot spare in your Raid disk array, the Raid controller will automatically start rebuilding data on to that hot spare drive, if one of the disk from the array fails.
Which means the hot spare will automatically take the role of the failed drive once data rebuilding is complete.
You can later on replace your failed drive.
RAID management software's will provide you with a mechanism to specify the hot spare drive for your array.
What is hot swap in Raid?
Hot swapping is a term used to describe the ability to replace a failed disk drive without rebooting the machine.
Or in other words, hot swapping enabled you to replace a component without interrupting the normal operation of a server machine.
Different Levels of Raid
RAID(redundant array of independent disks), can be classified to different levels based on its operation and level of redundancy provided.
There is no "One size fits all" solution as far as raid levels are concerned. Selecting the suitable raid level for your application depends on the following things.
- You can select a raid level based on the performance that it provides
- Raid level based on the level of redundancy it provides
- Raid level based on read and write operations.
Let's discuss some of the widely used raid levels.
RAID 0 or No RAID
If your main priority is performance, then raid 0 fits right.
An important fact that should be kept in mind is that, RAID 0 does not provide any kind of redundancy. Which means even if one drive fails, your data is at risk.
It is simply striping done on your disk array. Data is broken into smaller chunks and are spread across the number of disks you have.
It has no mirroring, no parity(which means no redundancy !)
In fact raid level 0 is not RAID, because raid was primarily build for redundancy, and raid 0 does not provide any kind of redundancy, although it provides high performance.
Raid level 1(RAID 1)
RAID 1 implements heavy use of mirroring. All data in the drive is duplicated to another drive. It can be used in a situation where fault tolerance is of primary importance.
Maximum number of drives in RAID 1 can be 32, from a starting number of 2(even number of disks are required.)
Striping and parity are not used in RAID 1
You can refer the diagram shown in mirroring of raid section in this article, for RAID 1.
Raid Level 5(RAID 5)
RAID level 5 uses striping, so data is spread across number of disks used in the array, and also provides redundancy with the help of parity.
RAID 5 is a best cost effective solution for both performance and redundancy. Striped method of storing data always improves performance, and parity used in this level of raid is distributed parity.
Minimum number of disks required for raid 5 is 3, and maximum can go upto 32(depending on the RAID controller used.)
One important fact to note is that, reading rate in raid 5 is much better than writing. This is because reading can be done, by a combined rate of all disks used.
As a reference you can have a look at the distributed parity diagram shown in the Parity in Raid section of this article.
Raid level 6 (RAID 6)
Raid level 6 is very much similar to raid level 5, but it has got one more added advantage.
The added advantage is that it can sustain 2 drive failures instead of 1. This is achieved again with the help of parity. In raid level 6, double distributed parity is used to achieve this level of redundancy.
You can cleary see in the above diagram, each and every stripe set contains two parity on multiple disks.
So even even if two disks gets failed at one time, data can still be recreated.
Performance of raid 6 is very much similar to performance of raid 5, its much well suited for reads that writes.
Raid 10 (Combination of Raid 0 and Raid 1)
Raid 10 is a good solution that will give you both the performance advantage of raid 0 and also the redundancy of raid 1 mirroring.
Raid 10 was made by a combination of raid 0 and raid 1. And hence you get qualities of both the raid levels.
Lets understand how data is stored in raid 10 array.
If you see the above diagram data is redundant with a duplicate set in raid 1, and also is stripped across multiple raid 1 groups to achieve performance.
This is best suitable for heavy IO usage & also provides 100 percent redundancy. Minimum number of drives required is 4. It is quite expensive, as you can clearly see that you are dedicating one disk per raid 1 array for redundancy.
But is an excellent choice for both performance and redundancy.
Summary of different RAID levels
- RAID 0 uses striping for high performance. Raid 0 cannot be considered as RAID as it does not provide fault tolerance.
- RAID 1 uses mirroring for redundancy.
- RAID 5 uses striping as well as parity for redundancy. It is well suited for heavy read and low write operations.
- RAID 6 uses striping and double parity for redundancy.
- RAID 10 is a combination of raid 1 and raid 0. It also provides heavy redundancy because of mirroring, and also provides performance as the data is striped across multiple raid 1 groups.