Some slides on the original design of RAID, a Redundant Array of Inexpensive Disks. Demonstrates the tradeoffs between the varying RAID levels and gives some historical context.
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Â
Overview of Redundant Disk Arrays
1. Andrew Robinson
University of Michigan
<androbin@umich.edu>
Redundant Arrays of
Inexpensive Disks (RAID)
What a cool idea!
2. Authors
⢠David A Patterson
⢠Garth Gibson
⢠Randy H Katz
Officially published in 1988.
3. Overview
⢠What is RAID?
⢠Why bother?
⢠What is RAID, really?
⢠How well does it work?
⢠Howâs it holding up?
4. What is RAID?
⢠Take a bunch of disks and make them appear
as one disk.
⢠Put data on all of them
⢠Use all at once to gain performance
⢠Duplicate data to gain reliability
⢠Buy cheap disks to gain dollars
7. CPUs and Memory kept getting fasterâŚ
⢠Exponential growth everywhere!
⢠CPU Performance: 1.4X increase per year
â More transistors
â Better architecture
⢠Memory Performance: 1.4-2X increase per
year
â Invention of caches
â SRAM technology
8. ⌠but disks did not.
⢠Itâs hard to make things spin exponentially
faster every year (they tend to fly apart).
⢠Disk seek time improved at a rate of
approximately 7% a year.
⢠Caching had been employed to buffer I/O
activity, this works reasonably well for
predictable workloads.
9. Slow I/O Makes Slow Computers
⢠Amdahlâs Law describes the impact of only
improving some pieces, while leaving others.
1
S=
S â The effective speedup
F â Fraction of work in faster mode
(1- f ) + f / k K â Speedup while in faster mode
10. âŚreally slow.
⢠If applications spend 10% of their time in I/O,
when computers are 10 times faster, they will
only appear 5% faster.
Something needed to be done.
11. What should we do?
⢠Single Large Expensive Disks (SLED) are not
improving fast enough.
⢠Larger memory or solid state drives werenât
practical
⢠Small personal hard drives are emerging⌠can
we do something with those?
14. Why didnât someone do this before?
⢠Standards like SCSI have finally allowed drive
makers to integrate features seen in
traditional mainframe controllers.
15. There is a problemâŚ
⢠A hundredfold increase in number of disks
means a hundredfold increase decrease in
total reliability
MTTFSingleDisk
MTTFDiskArray =
nDisks
17. A couple levels⌠a single idea
⢠RAID manages the tradeoff between
performance and reliability
⢠RAID comes in levels (RAID1 to RAID5)
⢠These levels represent points in the
performance reliability space
18. Groups, Disks, and Check Disks
⢠RAID organizes disks into groups of reliability
⢠Some of the disks in a group store error
correcting data
D = Total disks with data
G = Disks in a group
C = Number of check disks in a group
19. Metrics
⢠Useable Storage â Percent of storage that
holds data, excluding parity information
⢠Performance â Tough to make one number:
â Reads, Writes, and Read-Modify-Write Access
Patterns
â Sequential and Random Data Distribution
20. RAID1 â The Naive Approach
⢠Mirroring of all data
⢠To read:
â Use either disk
⢠To write:
â Send to both disks
simultaneously
⢠Minor read
performance increase.
21. Evaluation
Pros Cons
⢠Reads can occur ⢠Useable storage is cut in
simultaneously half
⢠Seek times can improve ⢠All other performance
with special controllers metrics are left the same
⢠Predictable performance
Alright for large sequential jobs and transaction
processing jobs
22. RAID2 â Bit Level Striping
⢠Uses Hamming Code for Error Detection
⢠Requires many check disks
â For 10 data disks, 4 check disks
â For 25 data disks, 5 check disks
⢠Can detect errors, and determine the at-fault
disk
24. Evaluation
Pros Cons
⢠Better useable storage, 71% ⢠Dismal small random data
for G=10, 83% for G=25 access performance: 3-9%
of RAID1 or SLED
Good for large sequential jobs, bad for transaction
processing systems.
25. RAID3 â Byte Level Striping
⢠Simpler parity error correction
⢠Only a single check disk required for error
detection
⢠Cannot determine which disk failed, but thatâs
usually pretty obvious
⢠Transfers of large continuous blocks is good
27. Evaluation
Pros Cons
⢠Even better useable ⢠Small random data access
storage, 91% for G=10, 96% performance: Just as bad as
for G=25 RAID2
Even better for large sequential jobs, bad for
transaction processing systems.
28. What is parity?
⢠Parity is calculated as an XOR of the data
blocks.
⢠XOR is reversible:
â 1011 (A1) XOR 1100 (A2) => 0111 (AP) âparityâ
â 0111 (AP) XOR 1011 (A1) => 1100 (A2)
â 0111 (AP) XOR 1100 (A2) => 1011 (A1)
⢠This makes error detection and reconstruction
possible!
29. RAID4 - Block Level Striping
⢠Like RAID3, but more parallelly
⢠Interleave data at sector level rather than bit
level
⢠Allows for servicing of multiple block requests
by different drives
⢠Still keeps all the parity information on a
single drive
31. Evaluation
Pros Cons
⢠Finally better small random ⢠Small writes, and read-
access. Reads are fast! write-modifies are still slow.
Good for large sequential jobs, still not great for
transaction processing systems.
32. RAID5 â Block Level Striping with
Distributed Parity
⢠Instead of checksums on a single disk, we
distribute them across all disks.
⢠Allows us to support multiple writes per group
34. Evaluation
Pros Cons
⢠Really good usable storage ⢠Slightly worse write
⢠Finally decent small random performance, data must be
data access performance written to two disks
across the board! simultaneously
Finally, a system that works well for both applications!
36. As a Whole
⢠RAID has many different levels that achieve
different tradeoffs in reliability and
performance
⢠Almost all of them, for some (or many) use
cases will outperform a SLED for the same
cost.
40. RAID has held up remarkably well
⢠Data centers around the world use RAID
technology.
⢠The small, inexpensive disk is the de facto
standard of storage
⢠The ideas developed for RAID have been
applied to many not-RAID things
41. Some open questions
⢠What will become of RAID as new, super fast
storage mediums start to become cost
effective?
⢠How does it fit in with massive internet-scale
storage farms?
42. Take Aways
⢠RAID offers significant advantage over SLED for
the same cost
â RAID5 offers 10x improvement in performance,
reliability, and power consumption while reducing size
of array.
⢠RAID allows for modular growth (add more disks)
⢠Cost effective option to meet challenge of
exponential growth in processor and memory
speeds
43. References
⢠âA Case for Redundant Arrays of Inexpensive
Disksâ by David A Patterson, Garth Gibson,
and Randy H Katz
⢠âRAID: A Personal Recollection of How Storage
Became a Systemâ by Randy H Katz
⢠Slides by David Luo and Ramasubramanian K.
⢠Images generously borrowed from Wikipedia
<http://en.wikipedia.org/wiki/RAID>
----- Meeting Notes (1/21/12 13:53) -----Invented around 1987.
----- Meeting Notes (1/21/12 13:53) -----Patterson - BerkeleyGibson â Currently at CMUKatz - Berkeley
Exploits clever XOR trick to not require reading data off of all the disks to recalculate parity.Each small write requires 2 disks and 4 accesses, 2 reads and 2 writes.Each small read requires only 1 access.