Overview of Redundant Disk Arrays

Andrew Robinson
University of Michigan
<androbin@umich.edu>

Redundant Arrays of
Inexpensive Disks (RAID)
What a cool idea!

Authors
• David A Patterson
• Garth Gibson
• Randy H Katz

Officially published in 1988.

Overview
• What is RAID?
• Why bother?
• What is RAID, really?
• How well does it work?
• How’s it holding up?

What is RAID?
• Take a bunch of disks and make them appear
as one disk.
• Put data on all of them
• Use all at once to gain performance
• Duplicate data to gain reliability
• Buy cheap disks to gain dollars

This seems like a lot of work…

why bother?

CPUs and Memory kept getting faster…

• Exponential growth everywhere!
• CPU Performance: 1.4X increase per year
– More transistors
– Better architecture
• Memory Performance: 1.4-2X increase per
year
– Invention of caches
– SRAM technology

… but disks did not.
• It’s hard to make things spin exponentially
faster every year (they tend to fly apart).
• Disk seek time improved at a rate of
approximately 7% a year.
• Caching had been employed to buffer I/O
activity, this works reasonably well for
predictable workloads.

Slow I/O Makes Slow Computers
• Amdahl’s Law describes the impact of only
improving some pieces, while leaving others.

1
S=
S – The effective speedup
F – Fraction of work in faster mode
(1- f ) + f / k K – Speedup while in faster mode

…really slow.
• If applications spend 10% of their time in I/O,
when computers are 10 times faster, they will
only appear 5% faster.

Something needed to be done.

What should we do?
• Single Large Expensive Disks (SLED) are not
improving fast enough.
• Larger memory or solid state drives weren’t
practical

• Small personal hard drives are emerging… can
we do something with those?

Why didn’t someone do this before?
• Standards like SCSI have finally allowed drive
makers to integrate features seen in
traditional mainframe controllers.

There is a problem…
• A hundredfold increase in number of disks
means a hundredfold increase decrease in
total reliability

MTTFSingleDisk
MTTFDiskArray =
nDisks

that’s all really nice, but

what is RAID, really?

A couple levels… a single idea
• RAID manages the tradeoff between
performance and reliability
• RAID comes in levels (RAID1 to RAID5)
• These levels represent points in the
performance reliability space

Groups, Disks, and Check Disks
• RAID organizes disks into groups of reliability
• Some of the disks in a group store error
correcting data

D = Total disks with data
G = Disks in a group
C = Number of check disks in a group

Metrics
• Useable Storage – Percent of storage that
holds data, excluding parity information
• Performance – Tough to make one number:
– Reads, Writes, and Read-Modify-Write Access
Patterns
– Sequential and Random Data Distribution

RAID1 – The Naive Approach
• Mirroring of all data
• To read:
– Use either disk
• To write:
– Send to both disks
simultaneously

• Minor read
performance increase.

Evaluation
Pros Cons
• Reads can occur • Useable storage is cut in
simultaneously half
• Seek times can improve • All other performance
with special controllers metrics are left the same
• Predictable performance

Alright for large sequential jobs and transaction
processing jobs

RAID2 – Bit Level Striping
• Uses Hamming Code for Error Detection
• Requires many check disks
– For 10 data disks, 4 check disks
– For 25 data disks, 5 check disks
• Can detect errors, and determine the at-fault
disk

Evaluation
Pros Cons
• Better useable storage, 71% • Dismal small random data
for G=10, 83% for G=25 access performance: 3-9%
of RAID1 or SLED

Good for large sequential jobs, bad for transaction
processing systems.

RAID3 – Byte Level Striping
• Simpler parity error correction
• Only a single check disk required for error
detection
• Cannot determine which disk failed, but that’s
usually pretty obvious
• Transfers of large continuous blocks is good

Evaluation
Pros Cons
• Even better useable • Small random data access
storage, 91% for G=10, 96% performance: Just as bad as
for G=25 RAID2

Even better for large sequential jobs, bad for
transaction processing systems.

What is parity?
• Parity is calculated as an XOR of the data
blocks.
• XOR is reversible:
– 1011 (A1) XOR 1100 (A2) => 0111 (AP) “parity”
– 0111 (AP) XOR 1011 (A1) => 1100 (A2)
– 0111 (AP) XOR 1100 (A2) => 1011 (A1)

• This makes error detection and reconstruction
possible!

RAID4 - Block Level Striping
• Like RAID3, but more parallelly
• Interleave data at sector level rather than bit
level
• Allows for servicing of multiple block requests
by different drives
• Still keeps all the parity information on a
single drive

Evaluation
Pros Cons
• Finally better small random • Small writes, and read-
access. Reads are fast! write-modifies are still slow.

Good for large sequential jobs, still not great for
transaction processing systems.

RAID5 – Block Level Striping with
Distributed Parity
• Instead of checksums on a single disk, we
distribute them across all disks.
• Allows us to support multiple writes per group

Evaluation
Pros Cons
• Really good usable storage • Slightly worse write
• Finally decent small random performance, data must be
data access performance written to two disks
across the board! simultaneously

Finally, a system that works well for both applications!

sounds complicated,

how well does it work?

As a Whole
• RAID has many different levels that achieve
different tradeoffs in reliability and
performance
• Almost all of them, for some (or many) use
cases will outperform a SLED for the same
cost.

Read-Modify-Write Per Disk
Performance

wow, raid sounds awesome,

how’s it holding up?

RAID has held up remarkably well
• Data centers around the world use RAID
technology.
• The small, inexpensive disk is the de facto
standard of storage
• The ideas developed for RAID have been
applied to many not-RAID things

Some open questions
• What will become of RAID as new, super fast
storage mediums start to become cost
effective?
• How does it fit in with massive internet-scale
storage farms?

Take Aways
• RAID offers significant advantage over SLED for
the same cost
– RAID5 offers 10x improvement in performance,
reliability, and power consumption while reducing size
of array.
• RAID allows for modular growth (add more disks)
• Cost effective option to meet challenge of
exponential growth in processor and memory
speeds

References
• “A Case for Redundant Arrays of Inexpensive
Disks” by David A Patterson, Garth Gibson,
and Randy H Katz
• “RAID: A Personal Recollection of How Storage
Became a System” by Randy H Katz
• Slides by David Luo and Ramasubramanian K.
• Images generously borrowed from Wikipedia
<http://en.wikipedia.org/wiki/RAID>

Overview of Redundant Disk Arrays

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Overview of Redundant Disk Arrays

Ähnlich wie Overview of Redundant Disk Arrays (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Overview of Redundant Disk Arrays

Hinweis der Redaktion