RAID controllers use multiple physical disks that appear as a single logical drive. RAID levels 0, 1, 5 are commonly used. RAID 0 stripes data across disks for speed but has no redundancy. RAID 1 mirrors data onto two disks for redundancy but is expensive. RAID 5 stripes data across disks and uses parity for redundancy, avoiding bottlenecks of RAID 4. Larger RAID groups can implement dual distributed parity for fault tolerance from two drive failures. Nesting RAID levels can boost performance by combining redundancy with RAID 0 striping. Rebuilding failed drives uses parity calculation with XOR to reconstruct lost data.
2. RAID
• Redundant Array of Independent Disks
• Redundant Array of Inexpensive Disks
• 6 levels in common use
• Not a hierarchy
• Set of physical disks viewed as single logical
drive by O/S
• Data distributed across physical drives
• Can use redundant capacity to store parity
information
3. • Each RAID scheme affects reliability and
performance in different ways. Every
additional disk included in an array
increases the likelihood that one will fail,
but by using error checking and/or
mirroring, the array as a whole can be
made more reliable by the ability to
survive and recover from a failure.
4. RAID 0
• No redundancy
• Data striped across all disks
• Round Robin striping
• Increase speed
– Multiple data requests probably not on same
disk
– Disks seek in parallel
– A set of data is likely to be striped across
multiple disks
5. RAID 1
• Mirrored Disks
• Data is striped across disks
• 2 copies of each stripe on separate disks
• Read from either
• Write to both
• Recovery is simple
– Swap faulty disk & re-mirror
– No down time
• Expensive
6. RAID 2
• Disks are synchronized
• Very small stripes
– Often single byte/word
• Error correction calculated across
corresponding bits on disks
• Multiple parity disks store Hamming code
error correction in corresponding positions
• Lots of redundancy
– Expensive
–
7. RAID 3
• Similar to RAID 2
• Only one redundant disk, no matter how
large the array
• Simple parity bit for each set of
corresponding bits
• Data on failed drive can be reconstructed
from surviving data and parity info
• Very high transfer rates
8. RAID 4
• Each disk operates independently
• Good for high I/O request rate
• Large stripes
• Bit by bit parity calculated across stripes
on each disk
• Parity stored on parity disk
9. RAID 5
• Like RAID 4
• Parity striped across all disks
• Round robin allocation for parity stripe
• Avoids RAID 4 bottleneck at parity disk
• Commonly used in network servers
• N.B. DOES NOT MEAN 5 DISKS!!!!!
10. RAID-0RAID-0
Strip 12
Strip 8
Strip 4
Strip 0
Strip 13
Strip 9
Strip 5
Strip 1
Strip 14
Strip 10
Strip 6
Strip 2
Strip 15
Strip 11
Strip 7
Strip 3
Striped, non-redundantStriped, non-redundant
Parallel access to multiple disksParallel access to multiple disks
Excellent data transfer rate (for small strips)Excellent data transfer rate (for small strips)
Excellent I/O request processing rate (for large strips)Excellent I/O request processing rate (for large strips)
Typically used for applications requiring high performance forTypically used for applications requiring high performance for
non-critical datanon-critical data
11. RAID-1RAID-1
Strip 3
Strip 2
Strip 1
Strip 0
Strip 3
Strip 2
Strip 1
Strip 0
Mirrored/replicated (most costly form of redundancy)Mirrored/replicated (most costly form of redundancy)
I/O request rate: good for reads, fair for writesI/O request rate: good for reads, fair for writes
Data transfer rate: good for reads; writes slightly slowerData transfer rate: good for reads; writes slightly slower
Read can be serviced by the disk with the shorter seek distanceRead can be serviced by the disk with the shorter seek distance
Write must be handled by both disksWrite must be handled by both disks
Typically used in system drives and critical filesTypically used in system drives and critical files
Banking, insurance dataBanking, insurance data
Web (e-commerce) serversWeb (e-commerce) servers
12. Combining RAID-0 and RAID-1Combining RAID-0 and RAID-1
Strip 12
Strip 8
Strip 4
Strip 0
Strip 13
Strip 9
Strip 5
Strip 1
Strip 14
Strip 10
Strip 6
Strip 2
Strip 15
Strip 11
Strip 7
Strip 3
Strip 12
Strip 8
Strip 4
Strip 0
Strip 13
Strip 9
Strip 5
Strip 1
Strip 14
Strip 10
Strip 6
Strip 2
Strip 15
Strip 11
Strip 7
Strip 3
Can combine RAID-0 and RAID-1:Can combine RAID-0 and RAID-1:
Mirrored stripes (RAID 0+1, or RAID 01)Mirrored stripes (RAID 0+1, or RAID 01)
Example: picture aboveExample: picture above
Striped Mirrors (RAID 1+0, or RAID 10)Striped Mirrors (RAID 1+0, or RAID 10)
Data transfer rate: good for reads and writesData transfer rate: good for reads and writes
Reliability: goodReliability: good
Efficiency: poor (100% overhead in terms of disk utilization)Efficiency: poor (100% overhead in terms of disk utilization)
13. RAID-2RAID-2
b0 b1 b2 b3 f0(b) f1(b) f2(b)
Hamming codes capable of detecting two or more erasuresHamming codes capable of detecting two or more erasures
E.g., single error-correcting, double error-detecting (SEC-DED)E.g., single error-correcting, double error-detecting (SEC-DED)
Problem with small writes (similar to DRAM cycle time/accessProblem with small writes (similar to DRAM cycle time/access
time)time)
Poor I/O request ratePoor I/O request rate
Excellent data transfer rateExcellent data transfer rate
14. RAID-3RAID-3
b0 b1 b2 b3 P(b)
Fine-grained (bit) interleaving with parityFine-grained (bit) interleaving with parity
E.g., parity = sum modulo 2 (XOR) of all bitsE.g., parity = sum modulo 2 (XOR) of all bits
Disks are synchronized, parity computed by disk controllerDisks are synchronized, parity computed by disk controller
When one disk fails…When one disk fails… (how do you know?)(how do you know?)
Data is recovered by subtracting all data in good disks from parity diskData is recovered by subtracting all data in good disks from parity disk
Recovering from failures takes longer than in mirroring, but failures areRecovering from failures takes longer than in mirroring, but failures are
rare, so is okayrare, so is okay
Hot spares used to reduce vulnerability in reduced modeHot spares used to reduce vulnerability in reduced mode
Performance:Performance:
Poor I/O request ratePoor I/O request rate
Excellent data transfer rateExcellent data transfer rate
Typically used in large I/O request size applications, such as imaging orTypically used in large I/O request size applications, such as imaging or
CADCAD
15. RAID-4RAID-4
Coarse-grained striping with parityCoarse-grained striping with parity
Unlike RAID-3, not all disks need to be read on each writeUnlike RAID-3, not all disks need to be read on each write
New parity computed by computing difference between old and new dataNew parity computed by computing difference between old and new data
Drawback:Drawback:
Like RAID-3, parity disk involved in every write; serializes small readsLike RAID-3, parity disk involved in every write; serializes small reads
I/O request rate: excellent for reads, fair for writesI/O request rate: excellent for reads, fair for writes
Data transfer rate: good for reads, fair for writesData transfer rate: good for reads, fair for writes
Blk 12
Blk 8
Blk 4
Blk 0
Blk 13
Blk 9
Blk 5
Blk 1
Blk 14
Blk 10
Blk 6
Blk 2
Blk 15
Blk 11
Blk 7
Blk 3
P(12-15)
P(8-11)
P(4-7)
P(0-3)
16. RAID-5RAID-5
Blk 12
Blk 8
Blk 4
Blk 0
P(12-15)
Blk 9
Blk 5
Blk 1
Blk 13
P(8-11)
Blk 6
Blk 2
Blk 14
Blk 10
P(4-7)
Blk 3
Blk 15
Blk 11
Blk 7
P(0-3)
Key Idea: reduce load on parity diskKey Idea: reduce load on parity disk
Block-interleavedBlock-interleaved distributed paritydistributed parity
Multiple writes can occur simultaneouslyMultiple writes can occur simultaneously
Block 0 can be accessed in parallel with Block 5Block 0 can be accessed in parallel with Block 5
First needs disks 1 and 5; second needs disks 2 and 4First needs disks 1 and 5; second needs disks 2 and 4
I/O request rate: excellent for reads, good for writesI/O request rate: excellent for reads, good for writes
Data transfer rate: good for reads, good for writesData transfer rate: good for reads, good for writes
Typically used for high request rate, read-intensive data lookupTypically used for high request rate, read-intensive data lookup
17. Striped set with dual distributed parity.Striped set with dual distributed parity. Provides fault tolerance from twoProvides fault tolerance from two
drive failures; array continues to operate with up to two failed drives. This makesdrive failures; array continues to operate with up to two failed drives. This makes
larger RAID groups more practical, especially for high availability systems. Thislarger RAID groups more practical, especially for high availability systems. This
becomes increasingly important because large-capacity drives lengthen the timebecomes increasingly important because large-capacity drives lengthen the time
needed to recover from the failure of a single drive.needed to recover from the failure of a single drive.
18. Nesting RAID LevelsNesting RAID Levels
When nesting RAID levels, a RAID type that providesWhen nesting RAID levels, a RAID type that provides
redundancy is typically combined with RAID 0 toredundancy is typically combined with RAID 0 to
boost performance. With these configurations it isboost performance. With these configurations it is
preferable to have RAID 0 on top and the redundantpreferable to have RAID 0 on top and the redundant
array at the bottom, because fewer disks then needarray at the bottom, because fewer disks then need
to be regenerated when a disk fails.to be regenerated when a disk fails.
19. RAID 01 and RAID 10RAID 01 and RAID 10
The minimum number of disks required to implementThe minimum number of disks required to implement
this level of RAID is 4. The difference between RAIDthis level of RAID is 4. The difference between RAID
0+1 and RAID 1+0 is the location of each RAID0+1 and RAID 1+0 is the location of each RAID
system — RAID 0+1 is a mirror of stripes. The size ofsystem — RAID 0+1 is a mirror of stripes. The size of
a RAID 0+1 array can be calculated as follows wherea RAID 0+1 array can be calculated as follows where
nn is the number of drives (must be even) andis the number of drives (must be even) and cc is theis the
capacity of the smallest drive in the array:capacity of the smallest drive in the array:
Size = (nxc) / 2Size = (nxc) / 2
20.
21. RAID level 30RAID level 30 is also known as striping of dedicated parityis also known as striping of dedicated parity
arrays. It is a combination of RAID level 3 and RAID level 0.arrays. It is a combination of RAID level 3 and RAID level 0.
RAID 30 provides high data transfer rates, combined with highRAID 30 provides high data transfer rates, combined with high
data reliability. RAID 30 is best implemented on two RAID 3data reliability. RAID 30 is best implemented on two RAID 3
disk arrays with data striped across both disk arrays. RAID 30disk arrays with data striped across both disk arrays. RAID 30
breaks up data into smaller blocks, and then stripes the blocksbreaks up data into smaller blocks, and then stripes the blocks
of data to each RAID 3 raid set. RAID 3 breaks up data intoof data to each RAID 3 raid set. RAID 3 breaks up data into
smaller blocks, calculates parity by performing an Exclusivesmaller blocks, calculates parity by performing an Exclusive
OR on the blocks, and then writes the blocks to all but oneOR on the blocks, and then writes the blocks to all but one
drive in the array. The parity bit created using the Exclusivedrive in the array. The parity bit created using the Exclusive
OR is then written to the last drive in each RAID 3 array. TheOR is then written to the last drive in each RAID 3 array. The
size of each block is determined by the stripe size parameter,size of each block is determined by the stripe size parameter,
which is set when the RAID is created.which is set when the RAID is created.
25. Rebuilding Failure DrivesRebuilding Failure Drives
Parity CalculationParity Calculation
Parity data in a RAID environment is calculated using the Boolean XOR function. ForParity data in a RAID environment is calculated using the Boolean XOR function. For
example, here is a simple RAID 4 three-disk setup consisting of two drives that holdexample, here is a simple RAID 4 three-disk setup consisting of two drives that hold
8 bits of data each and a third drive that will be used to hold parity data.8 bits of data each and a third drive that will be used to hold parity data.
Drive 1:Drive 1: 0110110101101101
Drive 2:Drive 2: 1101010011010100
To calculate parity data for the two drives, a XOR is performed on their data.To calculate parity data for the two drives, a XOR is performed on their data.
i.e.i.e. 0110110101101101 XORXOR 1101010011010100 == 1011100110111001
The resulting parity data,The resulting parity data, 1011100110111001, is then stored on Drive 3, the dedicated parity, is then stored on Drive 3, the dedicated parity
drive.drive.
Should any of the three drives fail, the contents of the failed drive can be reconstructedShould any of the three drives fail, the contents of the failed drive can be reconstructed
on a replacement (or "hot spare") drive by subjecting the data from the remainingon a replacement (or "hot spare") drive by subjecting the data from the remaining
drives to the same XOR operation. If Drive 2 were to fail, its data could be rebuiltdrives to the same XOR operation. If Drive 2 were to fail, its data could be rebuilt
using the XOR results of the contents of the two remaining drives, Drive 3 andusing the XOR results of the contents of the two remaining drives, Drive 3 and
Drive 1:Drive 1:
Drive 3:Drive 3: 1011100110111001
Drive 1:Drive 1: 0110110101101101
i.e.i.e. 1011100110111001 XORXOR 0110110101101101 == 1101010011010100
26. Hamming CodeHamming Code
There's an error correction codeThere's an error correction code
that separates the bits holdingthat separates the bits holding
the original value (data bits) fromthe original value (data bits) from
the error correction bits (checkthe error correction bits (check
bits), and the difference betweenbits), and the difference between
the calculated and actual errorthe calculated and actual error
correction bits is the position ofcorrection bits is the position of
the bit that's wrongthe bit that's wrong
27. For M data bits and K check bits, we must have:For M data bits and K check bits, we must have:
22KK
– 1 >= (M+K)– 1 >= (M+K)
Calculate Check bits for M = 8?Calculate Check bits for M = 8?
28. Hamming Code CalculationHamming Code Calculation
Bit PositionBit Position 1212 1111 1010 99 88 77 66 55 44 33 22 11
Position noPosition no
(Binary)(Binary)
Data bitsData bits
D8D8 oo oo oo oo oo oo DD
11
Check bitsCheck bits
(Power of 2’s)(Power of 2’s)
CC
44 xx xx
CC
11
Calculate the Hamming word for data = 00111001
29. RAID is not BackupRAID is not Backup
A RAID system used as a main drive is not aA RAID system used as a main drive is not a
replacement for backing up data. Data may becomereplacement for backing up data. Data may become
damaged or destroyed without harm to the drive(s)damaged or destroyed without harm to the drive(s)
on which they are stored. For example, some of theon which they are stored. For example, some of the
data may be overwritten by a system malfunction; adata may be overwritten by a system malfunction; a
file may be damaged or deleted by user error orfile may be damaged or deleted by user error or
malice and not noticed for days or weeks. RAID canmalice and not noticed for days or weeks. RAID can
also be overwhelmed by catastrophic failure thatalso be overwhelmed by catastrophic failure that
exceeds its recovery capacity and, of course, theexceeds its recovery capacity and, of course, the
entire array is at risk of physical damage by fire,entire array is at risk of physical damage by fire,
natural disaster, or human forces.natural disaster, or human forces.
30. Classes of RAIDClasses of RAID
Failure-resistant disk systems (FRDS) (meets aFailure-resistant disk systems (FRDS) (meets a
minimum of criteria 1 - 6):minimum of criteria 1 - 6):
Protection against data loss and loss of access to data due toProtection against data loss and loss of access to data due to
disk drive failuredisk drive failure
Reconstruction of failed drive content to a replacement driveReconstruction of failed drive content to a replacement drive
Protection against data loss due to a "write hole"Protection against data loss due to a "write hole"
Protection against data loss due to host and host I/O busProtection against data loss due to host and host I/O bus
failurefailure
Protection against data loss due to replaceable unit failureProtection against data loss due to replaceable unit failure
Replaceable unit monitoring and failure indicationReplaceable unit monitoring and failure indication
31. Failure-tolerant disk systems (FTDS) (meets aFailure-tolerant disk systems (FTDS) (meets a
minimum of criteria 7 - 15 ):minimum of criteria 7 - 15 ):
Disk automatic swap and hot swapDisk automatic swap and hot swap
Protection against data loss due to cache failureProtection against data loss due to cache failure
Protection against data loss due to external power failureProtection against data loss due to external power failure
Protection against data loss due to a temperature out ofProtection against data loss due to a temperature out of
operating rangeoperating range
Replaceable unit and environmental failure warningReplaceable unit and environmental failure warning
Protection against loss of access to data due to deviceProtection against loss of access to data due to device
channel failurechannel failure
Protection against loss of access to data due to controllerProtection against loss of access to data due to controller
module failuremodule failure
Protection against loss of access to data due to cache failureProtection against loss of access to data due to cache failure
Protection against loss of access to data due to power supplyProtection against loss of access to data due to power supply
failurefailure
32. Disaster-tolerant disk systems (DTDS) (meets aDisaster-tolerant disk systems (DTDS) (meets a
minimum of criteria 16 - 21):minimum of criteria 16 - 21):
Protection against loss of access to data due to host and hostProtection against loss of access to data due to host and host
I/O bus failureI/O bus failure
Protection against loss of access to data due to externalProtection against loss of access to data due to external
power failurepower failure
Protection against loss of access to data due to componentProtection against loss of access to data due to component
replacementreplacement
Protection against loss of data and loss of access to data dueProtection against loss of data and loss of access to data due
to multiple disk failureto multiple disk failure
Protection against loss of access to data due to zone failureProtection against loss of access to data due to zone failure
Long-distance protection against loss of data due to zoneLong-distance protection against loss of data due to zone
failurefailure