SlideShare a Scribd company logo
1 of 68
Download to read offline
Dependable Systems
!
Hardware Dependability - Redundancy
Dr. Peter Tröger

!
Sources:

!
Siewiorek, Daniel P.; Swarz, Robert S.: 

Reliable Computer Systems. third. Wellesley, MA : A. K. Peters, Ltd., 1998. , 156881092X

Roland Trauner, IBM Mainframe Summit, Hasso Plattner Institute, 2012

IBM zEnterprise System Technical Guide, IBM RedBooks

Some images (C) Elena Dubrova, ESDLab, Kungl Tekniska Högskolan
Dependable Systems Course PT 2014
Redundancy (Reiteration)
• Redundancy for error detection and forward error recovery

• Redundancy types: spatial, temporal, informational (presentation, version)

• Redundant not mean identical functionality, just perform the same work

• Static redundancy implements error mitigation

• Fault does not show up, since it is transparently removed

• Examples: Voting, error-correcting codes, N-modular redundancy 

• Dynamic redundancy implements error processing
• After fault detection, the system is reconfigured to avoid a failure

• Examples: Back-up sparing, duplex and share, pair and spare

• Hybrid approaches
2
Dependable Systems Course PT 2014
System-Failure Response Strategies
[Sieworek / Swarz]
3
System reliability
Redundant systemNonredundant system
Fault detection Fault tolerance
Masking redundancy Dynamic redundancy
Reconfiguration Retry Online Repair
Dependable Systems Course PT 2014
Redundancy
• Redundancy is never for free !

• Hardware: Additional components, area, power, shielding, ...

• Software: Development costs, maintenance costs, ...

• Information: Extra hardware for decoding and encoding

• Time: Faster processing (CPU) necessary to achieve same performance

• Tradeoff: Costs vs. benefit of redundancy; additional design and testing effort

• Sphere of replication [Mukherjee]
• Identifies logical domain protected by the fault detection scheme

• Questions: For which components are faults detected ? 

Which outputs must be compared ? Which inputs must be replicated ?
4
Dependable Systems Course PT 2014
Sphere of Replication
• Components outside the
sphere must be protected by
other means

• Level of output comparison
decides upon fault coverage

• Larger sphere tends to
decrease the required
bandwidth on input and output

• More state changing
happens just inside the
sphere

• Vendor might be restricted on
choice of sphere size
5
HDD HDD
Input Replication
Output
Comparison
I/O Bus
HDD HDD
Input Replication
Output
Comparison
HDD HDD
Controll
er
Controll
er
RAID1
RAID10
Dependable Systems Course PT 2014
Masking / Static Redundancy: Voting
• Exact voting: Only one correct result possible 

• Majority vote for uneven module numbers

• Generalized median voting - Select result that is the median, 

by iteratively removing extremes

• Formalized plurality voting - Divide results in partitions, 

choose random member from the largest partition

• Inexact voting: Comparison at high level might lead to multiple correct results

• Non-adaptive voting - Use allowable result discrepancy, put boundary on
discrepancy minimum or maximum (e.g. 1,4 = 1,3)

• Adaptive voting - Rank results based on past experience with module results

• Compute the correct value based on „trust“ in modules from experience

• Example: Weighted sum R=W1*R1 + W2*R2 + W3*R3 with W1+W2+W3=1
6
Voter
A B C
Dependable Systems Course PT 2014
Static Redundancy: N-Modular Redundancy
• Fault is transparently removed on detection
• Triple-modular redundancy (TMR)

• 2/3 of the modules must deliver correct results 

• Generalization with N-modular redundancy (NMR)

• m+1/N of the modules must deliver correct result, with N=2m+1 

• Standard case without any redundancy is called simplex
7
RT MR = RV · R2 of 3
= RV (R3
M + 3R2
M (1 RM ))
Dependable Systems Course PT 2014
N-Modular Redundancy (with perfect voter)
8
Module 1
Module 2
Module 3 Voter
...
Module N
Input Output
RNMR =
mX
i=0
✓
N
i
◆
(1 R)i
RN i
✓
n
k
◆
=
n!
k!(n k)!
R2 of 3 =
✓
3
0
◆
(1 R)0
R3
+
✓
3
1
◆
(1 R)R2
R2 of 3 = R3
+ 3(1 R)R2
R3 of 5 = ...
• TMR is appropriate if RTMR > RM (for given t)

• TMR with perfect voter only improves system reliability when RM > 0.5

• Voter needs 

to have RV>0.9 

to reach 

RTMR > RM
Dependable Systems Course PT 2014
TMR Reliability
90 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9
0,25
0,5
0,75
TMR
Single Module
System Reliability
Module Reliability
Dependable Systems Course PT 2014
Imperfect Voters
10
• Redundant voters

• Module errors do not propagate

• Voter errors propagate only by one stage

• Assumption of multi-step process, final voter still needed
Dependable Systems Course PT 2014
Hardware Voting
• Smallest hardware solution is the 1-bit majority voter

• f=ab + ac + bc
• Delivers the bit that has the majority

• Requires 2 gate delays and 4 gates

• Hardware voting can become expensive

• 128 gates and 256 flip-flops for 32-bit voter

• Input must be synchronized

• Central clock source may be single point of failure

• Can be solved by special event latching
11
Dependable Systems Course PT 2014
Dynamic Redundancy
• Reconfiguration of the system in response to an error state

• Prevents error propagation

• Triggered by internal fault detection in the unit, or external error detection based
on the outputs

• Dynamic redundancy combines error confinement with fault detection
• Still questions of coverage and diagnosability
• On transient errors, good modules may be deactivated

• Typically solved by combination of dynamic redundancy with retry approach

• Typical approaches: Duplex, sparing, degradation, compensation
12
• Reconfigurable duplication : Have relevant modules redundant, switch on failure

• Identification on mismatch („test“)

• Self-diagnostics procedure

• Self-checking logic

• Watchdog timer, e.g.

for having components resetting

each other (e.g. split brain)

• Outside arbiter for signatures

or black box tests

• Test interval depends on application scenario - each clock period / bus cycle / ...

• Also called dual-modular redundancy
• Reliability computation as with parallel / serial component diagram
Dependable Systems Course PT 2014
Duplex Systems
13
• Combination of working module and a set of spare modules (,replacement parts‘)

• Hot spares: Receive input with main modules, have results immediately

• Warm spares: Are running, but receive input only after switching

• Cold spares: Need to be started before switching
Dependable Systems Course PT 2014
Back-Up Sparing
14
MODULES
1
2
n
SWITCH
OUTPUT
INPUT
HOT, WARM AND COLD SPARES
• Special cases for combination of duplex (with comparator) and sparing (with switch)

• Pair and spare - Multiple duplex pairs, connected as standby sparing setup 

• Two replicated modules operate as duplex pair (lockstep execution), 

connected by comparator as voting circuit

• Same setting again as spare unit, 

spare units connected by switch

• On module output mismatch, comparators 

signal switch to perform failover

• Commercially used, e.g. Stratus XA/R Series 300
Dependable Systems Course PT 2014
Pair and Spare
15
MODULES
1
2
3
OUTPUTINPUT
COMPARATOR
4 COMPARATOR
SWITCH/COMPARATOR
Dependable Systems Course PT 2014
Graceful Degradation
• Performance design of the system allows continued operation with spares

• Many commercial systems supports this, but lack automated error processing

• Example: Operating system support for CPU off-lining, but no MCA handling

• Designed-In Resources:
• Replaceable or bypass-able components (f.e. caches, disks, processors)

• Support for operation with degraded performance
• Added-On Resources:
• Redundant units used for excess capacity during normal operation

• Still non-degraded performance on failure

• Interconnect reconfiguration: Use alternative paths in the network

• Hardware solutions in telco industry, today replaced by software solutions
16
Dependable Systems Course PT 2014
Example: Spanning Tree Protocol
• Modern implementation of interconnect reconfiguration for dynamic redundancy

• Bridges for connecting different Ethernet sub-networks 

• By default no coordination, better products use the spanning tree protocol

• Explicit removal of redundant paths (loops), 

while still supporting all point-to-point communication

• Each bridge has its own MAC address, protocol based on broadcast

• Create a tree of bridges, starting from a chosen root bridge

• All paths start from the root bridge

• Ports participating in redundant paths have to be switched off

• Cost model for paths to make a choice (root distance, speed)
17
Dependable Systems Course PT 2014
Example: Spanning Tree Protocol
• Determine root bridge

• Send your ID (MAC address) to a
multicast group, smallest ID wins

• Each non-root bridge determines
the ,cheapest‘ path to the root bridge

• This port becomes the root port (RP)

• For multiple bridges in a segment,
the ,cheapest‘ representative is elected -

designated port (DP)
• All ports that are not DP or RP are
deactivated -

blocked port (BP)
18
Dependable Systems Course PT 2014
Hybrid Approaches
• N-modular redundancy 

with spares
• Also called hybrid redundancy

• System has basic NMR 

configuration

• Disagreement detector replaces

modules with spares if their 

output is not matching

the voting result

• Reliability as long as the spare pool is not exhausted

• Improves fault masking capability of NMR

• Can tolerate two faults with one spare, while classic NMR would need 5
modules with majority voting to tolerate two faults
19
Adds fault
localization
Dependable Systems Course PT 2014
TMR with Spares
20
NASA Technical Report 32-1467, 1969
• Basic reliability computation based on the assumption of similar module failure
rates in spares and non-spares

• At least any two of all S+3 modules must survive
Dependable Systems Course PT 2014
Comparison TMR vs. TMR/S vs. NMR
21
NASATechnicalReport32-1467,1969
#Units = 2N + 1
Dependable Systems Course PT 2014
Hybrid Approaches
• Self-purging redundancy
• Active redundant modules, each can remove itself from the system if faulty

• Basic idea: Test for agreement with the voting result, otherwise 0
22
• If module output does not match
to system output, 0 is delivered

• Works fine with threshold voters
Dependable Systems Course PT 2014
Hybrid Approaches
23
• Sift-out modular redundancy (N-2), no voter required

• Pair-wise comparison of module outputs by comparator

• N inputs and N-over-2 outputs

• Detector uses these signals to identify the faulty module, includes also
memory cells for failed modules

• Collector sifts out the 

faulty input, based on

information from

detector
Dependable Systems Course PT 2014
Hybrid Approaches
• Triple Duplex Architecture

• TMR with duplex modules, used in the Shinkansen (Japanese train)

• Fault masking with comparator, no more contribution to voting from faulty one

• Allows tolerating another fault in the further operation, 

since comparator localizes again the faulty module

• Adds again fault location

capability to redundancy

scheme

• Supports also hot

plugging of deactivated

components
24
Dependable Systems Course PT 2014
The Real World of Hardware Redundancy -
Replacement Frequencies [Schroeder 2007]
25
760 node cluster,
2300 disks
ISP, multiple sites,
26700 disks
ISP, multiple sites,
9200 machines,
39000 disks
Dependable Systems Course PT 2014
IBM System z
26
© 2011 IBM Corporation56
IBM zEnterprise
Scheduled(CIE+DisruptivePatches+ECs)
Planned- (MES+Driver Upgrades)
Unscheduled(UIRA)
Sources of Outages
Pre z9
-Hrs/Year/Syst-
Prior
Servers
z9 EC Z10 EC z196
Unscheduled
Outages
Scheduled
Outages
Planned
Outages
Preplanning
requirements
Power &
Thermal
Management
Increased Focus over time
  
 
 





Temperature = Silicon Reliability Worst Enemy
Wearout = Mechanical Components Reliability Worst Enemy.
System z overall RAS Strategy
…..Continuing our RAS focus helps avoid outages
ImpactofOutage
Dependable Systems Course PT 2014
IBM System z
• Machine-Check-Handling mechanism in z/Series

• Equipment malfunction detection

• Permit automatic recovery

• Error states are reported by machine-check interruption

• Data error detection through information redundancy

• Recovery from machine-detected error states

• Error checking and correction - use circuitry redundancy
• CPU retry - checkpoint written at instruction-based synchronization points

• Channel-subsystem recovery - restart of I/O components

• Unit deletion - automated degradation of malfunctioning units
27
Dependable Systems Course PT 2014
IBM System z - Processor Books
28
© 2011 IBM Corporation52
IBM zEnterprise
z196 Cache / Book Topology
Fully connected 4 Book system:
 96 total cores
 Total system cache
 768 MB shared L4 (eDRAM)
 576 MB L3 (eDRAM)
 144 MB L2 private (SRAM)
 19.5 MB L1 private (SRAM)
CP1CP1CP2CP2
CP4CP4 CP5CP5CP3CP3
SC0SC1
Mem1 Mem0
FBCs
Mem2
CP0CP0
Book:
FBCs
• 4 PU‘s (cores) per
CP + co-processors

• SC: Storage Control

• 96-192 MB L4
cache per SC,
accessible from
other MCMs

• FBC: Fabric book
connectivity

• Support for dynamic
book addition and
repair

• 2+1 redundancy for
book power supply
Multi-Chip Module (MCM)
Dependable Systems Course PT 2014
IBM System z
29
© 2011 IBM Corporation57
IBM zEnterprise
z196 RAS Design of
……..Fully Redundant I/O Subsystem – of existing IO cage and drawers
Fully Redundant I/O Design
 SAP / CP sparing
 SAP Reassignment
 I/O Reset & Failover
 I/O Mux Reset / Failover
 Redundant I/O Adapter
 Redundant I/O interconnect
 Redundant Network Adapters
 Redundant ISC links
 Redundant Crypto processors
 I/O Switched Fabric
 Network Switched/Router Fabric
 High Availability Plugging Rules
 I/O and coupling fanout
rebalancing on CBA
 Channel Initiated Retry
 High Data Integrity Infrastructure
 I/O Alternate Path
 Network Alternate Path
 Virtualization Technology
processor
Failover
I/O Mux
I/O Hub I/O Hub
I/O Mux I/O Mux I/O Mux
Failover
I/Oadapter.
I/Oadapter.
Networkadapter.
Networkadapter.
CryproAccelerator
CryptoAccelerator
ISCLinks
ISCLinks
Alternate Path
I/O
Switch
Control
Unit
I/O
Switch
Control
Unit
Network
Switch
End
User
Network
Switch
End
User
Dependable Systems Course PT 2014
IBM System z196
• Prepared for Unscheduled Outages

• Advanced Memory RAIM (Redundant
Array of Independent Memory) design

• Enhanced Reed-Solomon code (ECC)
– 90B/64B

• Protection against Channel/DIMM
failures

• Chip marking for fast DRAM
replacements

• Mirrored Key cache

• Improved chip packaging

• Improved condensation management

• Integrated TCP/IP checksum
generation/checking

• Integrated EPO switch cover
(protecting the switch during repair
actions)

• Continued focus on Firmware

• Prepared for Scheduled Outages

• Double memory data bus lane sparing
(reducing repair actions)

• Single memory clock bus sparing

• Field Repair of interface between
processor chip and cache chip and
between cache chips (fabric bus)

• Fast bitline delete on L3/L4 cache
(largest caches)

• Power distribution using N+2 Voltage
Transformation Modules (VTM)

• Blower management by sensing
altitude and humidity

• Redundant (N+2) humidity sensors

• Redundant (N+2) altitude sensors

• Unified Resource Manager for zBX
30
Dependable Systems Course PT 2014
Memory Redundancy
• Redundancy of memory data for error masking

• Replication / coding at different levels

• Examples

• STAR (Self-testing and self-repairing computer, for early spacecrafts), 1971

• COMTRAC (Computer-aided traffic control 

system for Shinkansen train system)

• Stratus (Commercial fault-tolerant system)

• 3B20 by AT & T (Commercial fault-tolerant system)

• Most modern memory controllers in servers
31
Dependable Systems Course PT 2014
Coding Checks in Memory Hardware
• Primary memory
• Parity code

• m-out-of-n resp. m-of-n resp. m/n code

• Checksumming

• Berger Code

• Hamming code

• Secondary storage
• RAID codes

• Reed-Solomon code
32
Dependable Systems Course PT 2014
Hamming Distance (Richard Hamming, 1950)
• Hamming distance: Number of positions on which two words differ 

• Alternative definition: Number of necessary substitutions to make A to B

• Alternative definition: Number of errors that transform A to B

!
!
!
!
• Minimum Hamming distance: Minimum distance between any two code words

• To detect d single-bit errors, a code must have a min. distance of at least d + 1

• If at least d+1 bits change in the transmitted code, a new (valid) code appears

• To correct d single-bit errors, the minimum distance must be 2d+1
33
(C)Wikipedia
Dependable Systems Course PT 2014
Parity Codes
• Add parity bit to the information word

• Even parity: If odd number of ones, add one to make the number of ones even

• Odd parity: If even number of ones, add one to make the number of ones odd

• Variants

• Bit-per-word parity

• Bit-per-byte parity 

(Example: 

Pentium data cache)

• Bit-per-chip parity

• ...
34
Dependable Systems Course PT 2014
Two-Dimensional Parity
35
Example: Odd Parity
Allows
fault
location
Dependable Systems Course PT 2014
Code Properties
36
Code Bits / Word
Number of 

possible words
Coverage
Even Parity 4 8
Any single bit error, 

no double-bit error,

not all-0s or all 1s errors
Odd Parity 4 8
Any single bit error, 

no double-bit error,
all-0s or all 1s errors
2/4 4 6
Any single bit error, 

33% of double bit errors
Dependable Systems Course PT 2014
Checksumming
• General idea: Multiply components of a data word and add them up to a checksum

• Good for large blocks of data, low hardware overhead

• Might require substantial time, memory overhead

• 100% coverage for single faults

• Example: ISBN 10 check digit

• Final number of 10 digit ISBN number is a check digit, computed by:

• Multiply all data digits by their position number, starting from right

• Take sum of the products, and set the check digit so that result MOD 11 = 0

• Check digit „01“ is represented by X

• More complex approaches get better coverage (e.g. CRC) for more CPU time
37
Dependable Systems Course PT 2014
Hamming Code
• Every data word with n bits is extended 

by k control bits

• Control bits through standard parity approach 

(even parity), implementable with XOR

• Inserted at power-of-two positions 

• Parity bit pX is responsible for all position
numbers with the X-least significant bit set
(e.g. p1 responsible for ,red dot‘ positions)

• Parity bits check overlapping parts of the data
bits, computation and check are cheap, 

but significant overhead in data

• Minimum Hamming distance of three 

(single bit correction)

• Application in DRAM memory
38
takenfromWikimediaCommons
Dependable Systems Course PT 2014
Hamming Code for 26bit word size
39
takenfromWikimediaCommons
Dependable Systems Course PT 2014
Hamming Code
• Hamming codes are known to produce 10% to 40% of data
overhead

• Syndrome determines which bit (if any) is corrupted

• Example ECC

• Majority of „one-off“ soft errors in DRAM happens from
background radiation

• Denser packages, lower voltage for higher frequencies

• Most common approach is the SECDEC Hamming code

• "Single error correction, double error
detection" (SECDEC)

• Hamming code with additional parity

• Modern BIOS implementations perform 

threshold-based warnings on frequent correctable errors
40
(C) Z. Kalbarczyk
Dependable Systems Course PT 2014
Memory Redundancy
• Standard technology in DRAMs

• Bit-per-byte parity, check on read access

• Implemented by additional parity memory chip

• ECC with Hamming codes - 7 check bits for 32 bit data words, 8 bit for 64 bit 

• Leads to 72 bit data bus between DIMM and chipset

• Computed by memory controller on write, checked on read

• Study by IBM: ECC memory achieves R=0.91 over three years

• Can correct single bit errors and detect double bit errors

• Hewlett Packard Advanced ECC (1996)

• Can detect and correct single bit and double bit errors
41
Dependable Systems Course PT 2014
Memory Redundancy
• IBM ChipKill
• Originally developed for NASA Pathfinder project, now in X-Series

• Corrects up to 4 bit errors, detects up to 8 bit errors

• Implemented in chipset and firmware, works with standard ECC modules

• Based on striping approach with parity checks (similar to RAID)

• 72 bit data word is split in 18 bit chunks, distributed on 4 DIMM modules

• 18 DRAM chips per module, one bit per chip

• HP Hot Plug RAID Memory
• Five memory banks, fifth bank for parity information

• Corrects single bit, double bit, 4-bit, 8-bit errors; hot plugging support
42
Dependable Systems Course PT 2014
Memory Redundancy
• Dell PowerEdge Servers, 2005 (taken from www.dell.com)
43
Dependable Systems Course PT 2014
Memory Redundancy
• Fujitsu System Board D2786 for RX200 S5 (2010)

• Independent Channel Mode: Standard operational module, always use first slot

• Mirrored Channel Mode: Identical modules on slot A/B (CPU1) and D/E (CPU2)
44
Dependable Systems Course PT 2014
IBM System z - Memory RAID
45
• System z10 EC memory design

• Four Memory Controllers (MCUs) organized in two pairs, 

each MCU with four redundant channels 

• 16 to 48 DIMMs per book, plugged in groups of 8 

• 8 DIMMs (4 or 8 GB) per feature, 32 or 64 GB physical memory per feature 

• 64 to 384 GB physical memory per book = 64 to 384 GB for use 

(HSA and customer) 

• z196 memory design: 

• Three MCUs, each with five channels. The fifth channel in each z196 MCU is
required to implement Redundant Array of Independent Memory (RAIM)

• Detected and corrected: Bit, lane, DRAM, DIMM, socket, and complete memory
channel failures, including many types of multiple failures
Dependable Systems Course PT 2014
IBM System z196 - Memory RAID
46
© 2011 IBM Corporation62
IBM zEnterprise
Ch4
Ch3
Ch2
Ch1
ASIC
ASIC
Ch0
DIMM
CLK
Diff
CLK
Diff
C
R
C
C
R
C
DRAM
X
X
X
X
Layers of Memory Recovery
ECC
 Powerful 90B/64B Reed Solomon code
DRAM Failure
 Marking technology; no half sparing
needed
 2 DRAM can be marked
 Call for replacement on third DRAM
Lane Failure
 CRC with Retry
 Data – lane sparing
 CLK – RAIM with lane sparing
DIMM Failure (discrete components,
VTT Reg.)
 CRC with Retry
 Data – lane sparing
 CLK – RAIM with lane sparing
DIMM Controller ASIC Failure
 RAIM Recovery
Channel Failure
 RAIM Recovery
MCU0 ECC
RAIM
X
X
X
2- Deep Cascade
Using Quad High DIMMs
z196 RAIM Memory Controller Overview
Dependable Systems Course PT 2014
IBM z10 EC Memory Structure
47 © 2011 IBM Corporation63
IBM zEnterprise
z10 EC Memory Structure
DATA
CHECK
DATA
CHECK
ECC
Level 2 Cache
Key Cache
MCU 0
16B 16B
2B
Key Cache
MCU 1
16B 16B
2B
Key Cache
MCU 2
16B 16B
2B
Key Cache
MCU 3
16B 16B
2B
Dependable Systems Course PT 2014
IBM System z196 - RAIM
48
© 2011 IBM Corporation64
IBM zEnterprise
z196 Redundant Array of Independent Memory (RAIM) Structure
DATA
CHECK
DATA
CHECK
ECC
RAIM Parity
Level 3 Cache
Key Cache
MCU 0
16B 16B
2B
Key Cache
MCU 1
16B 16B
2B
Key Cache
MCU 2
16B 16B
2B
Extra column
provides RAIM
function
Dependable Systems Course PT 2014
Disk Redundancy
• Typical measure is the annual failure rate (AFR) - average number of failures / year





• Can be interpreted as failure probability during a year

• MTBF = Mean time before failure, here

• Disk MTTF: On average, one failure takes place in the given disk hours

• Example: Seagate Barracuda ST3500320AS: MTTF=750000h=85.6 years

• With thousand disks, on average every 750h (a month) some disk fails

• Measured by the manufacturer under heavy load and physical stress

• AFR=0.012
49
AFR = 1
MT BFyears
= 8760
MT BFhours
Dependable Systems Course PT 2014
RAID
• Redundant Array of Independent Disks (RAID) [Patterson et al. 1988]

• Improve I/O performance and / or reliability by building raid groups

• Replication for information reconstruction on disk failure (degrading)

• Requires computational effort (dedicated controller vs. software)

• Assumes failure independence
50
Dependable Systems Course PT 2014
RAID Reliability Comparison
• Treat disk failing as Bernoulli experiment - independent events, identical probability

• Probability for k events of probability p in n runs

!
• Probability for a failure of a RAID 1 mirror - all disks unavailable:

!
• Probability for a failure of a RAID 0 strip set - any faults disk leads to failure:
51
Bn,p(k) = pk
(1 p)n k n
k
⇥
pallfail = n
n pfail
n
(1 pfail)0
= pfail
n
panyfail = 1 pallwork
= 1
✓
n
n
◆
(1 pfail)n
pfail
0
= 1 (1 pfail)n
• Works for RAID levels were second outage during repair is fatal

• Basic idea is that groups of data disks are protected by additional check disks

• D - Total number of data disks

• G - Number of data disks in a group (e.g. G=1 in RAID1)

• C - Number of redundant check disks (parity / mirror) in a group 

(e.g. C=1 in RAID1)

• nG = D / G = number of groups, G+C : Number of disks in a group
Dependable Systems Course PT 2014
RAID MTTF Calculation [Patterson]
52
MTTFGroup =
MTTFDisk
G + C
·
1
pSecondF ailureDuringRepair
• Assuming exponential distribution, the probability for a second disk failure during the
repair time can be determined by:

!
!
• So:
Dependable Systems Course PT 2014
RAID MTTF Calculation [Patterson]
53
MTTFGroup =
MTTFDisk
G + C
·
1
pSecondF ailureDuringRepair
pSecondF ailure =
MTTR
MT T FDisk
G+C 1
MTTFRaid =
MTTFGroup
nG
=
MTTFDisk
2
(G + C) ⇤ nG ⇤ (G + C 1) ⇤ MTTR
Dependable Systems Course PT 2014
RAID 0
• Raid 0 - Block-level striping

• I/O performance improvement with many channels and drives

• One controller per drive

• Optimal stripe size depends on I/O request size, random vs. sequential I/O,
concurrent vs. single-threaded I/O

• Fine-grained striping: Good load balancing, catastrophic data loss

• Coarse-grained striping: Good recovery for small files, worse performance

• One option: Strip size = Single-threaded I/O size / number of disks

• Parallel read supported, but positioning overhead for small concurrent accesses

• No fault tolerance
54
(C) Wikipedia
MTTFRaid0 =
MTTFDisk
N
Dependable Systems Course PT 2014
RAID 1
• Raid 1 - Mirroring and duplexing

• Duplicated I/O requests

• Decreasing write performance, up to double read rate of single disk

• RAID controller might allow concurrent read and write per mirrored pair

• Highest overhead of all solutions, smallest disk determines resulting size

• Reliability is given by probability that one disk fails and the second fails while the
first is repaired

• With D=1, G=1, C=1 and the generic formula, we get
55
(C) Wikipedia
MTTFRaid1 =
MTTFDisk
2
·
MTTFDisk
MTTRDisk
Dependable Systems Course PT 2014
Raid 2/3
• Raid 2 - Byte-level striping with Hamming code -based check disk

• No commercial implementation due to ECC storage overhead

• Online verification and correction during read
• Raid 3 - Byte-level striping with dedicated XOR parity disk

• All data disks used equally, one XOR parity disk as bottleneck (C=1)

• Bad for concurrent small accesses, good sequential performance (streaming)

• Separate code is needed to identify a faulty disk

• Disk failure has only small impact on throughput

• RAID failure if more than one disk fails:
56
(C) Wikipedia
MTTFRaid3 =
MTTFDisk
D + C
·
MT T FDisk
D+C 1
MTTRDisk
Dependable Systems Course PT 2014
Parity With XOR
• Self-inverse operation

• 101 XOR 011 = 110, 110 XOR 011 = 101
57
Disk Byte
1 1 1 0 0 1 0 0 1
2 0 1 1 0 1 1 1 0
3 0 0 0 1 0 0 1 1
4 1 1 1 0 1 0 1 1
Parity 0 1 0 1 1 1 1 1
Disk Byte
1 1 1 0 0 1 0 0 1
Parity 0 1 0 1 1 1 1 1
3 0 0 0 1 0 0 1 1
4 1 1 1 0 1 0 1 1
Hot Spare 0 1 1 0 1 1 1 0
MTTFRaid5 =
MTTFDisk
N
·
MT T FDisk
N 1
MTTRDisk
Dependable Systems Course PT 2014
RAID 4 / 5
• Raid 4 - Block-level striping with dedicated parity disk

• RAID 3 vs. RAID 4: Allows concurrent block access

• Raid 5 - Block-level striping with distributed parity

• Balanced load as with Raid 0, better reliability

• Bad performance for small block writing

• Most complex controller design, difficult rebuild

• When block in a stripe is changed, old block and parity
must be read to compute new parity

• For every changed data bit, flip parity bit
58
(C)Wikipedia
Dependable Systems Course PT 2014
RAID 6 / 01 / 10
• Raid 6 - Block-level striping with two parity schemes

• Extension of RAID5, can sustain multiple drive

failures at the same time

• High controller overhead to compute parities,

poor write performance

• Raid 01 - Every mirror is a Raid 0 stripe 

(min. 4 disks)

• Raid 10 - Every stripe is a Raid 1 mirror 

(min. 4 disks)

• RAID DP - RAID 4 with second parity

disk

• Additional parity includes first parity + all but one of the data blocks (diagonal)

• Can deal with two disk outages

59
• Take the same number of disks

in different constellations

• AFRDisk = 0.029, MTTR=8h

• RAID5 has bad reliability, but

offers most effective capacity

• In comparison to RAID5, RAID10

can deal with two disk errors

• Also needs to consider different 

resynchronization times

• RAID10: Only one disk needs to be copied to the spare

• RAID5 / RAID-DP: All disks must be read to compute parity

• Use RAID01 only in 2+2 combination
Dependable Systems Course PT 2014
RAID Analysis (Schmidt)
60
Dependable Systems Course PT 2014
RAID Analysis (TecChannel.de)
61
RAID 0 RAID 1 RAID 10 RAID 3 RAID 4 RAID 5 RAID 6
Number of
drives n > 1 n = 2 n > 3 n > 2 n > 2 n > 2 n > 3
Capacity
overhead (%) 0 50 50 100 / n 100 / n 100 / n 200 / n
Parallel reads n 2 n / 2 n - 1 n - 1 n -1 n - 2
Parallel
writes n 1 1 1 1 n / 2 n / 3
Maximum
read
throughput
n 2 n / 2 n - 1 n - 1 n - 1 n - 2
Maximum
write
throughput
n 1 1 1 1 n / 2 n / 3
Dependable Systems Course PT 2014
Software RAID
• Software layer above block-based device driver(s)

• Windows Desktop / Server, Mac OS X, Linux, ...

• Multiple problems

• Computational overhead for RAID levels beside 0 and 1

• Boot process

• Legacy partition formats

• Driver-based RAID

• Standard disk controller with special firmware

• Controller covers boot stage, device driver takes over in protected mode
62
Dependable Systems Course PT 2014
Disk Redundancy: Google
• Failure Trends in a Large Disk Drive Population [Pinheiro2007]

• > 100.000 disks, SATA / PATA consumer hard disk drives, 5400 to 7200 rpm

• 9 months of data gathering in Google data centers

• Statistical analysis of SMART data

• Failure event: „A drive is considered to 

have failed if it was replaced as part 

of a repairs procedure.“
• Prediction models based on SMART only 

work in 56% of the cases
63
Dependable Systems Course PT 2014
Disk Redundancy: Google
• Failure rates are correlated with drive model, manufacturer and drive age

• Indication for infant mortality

• Impact from utilization (25th percentile, 50-75th percentile, 75th percentile) 

• Reversing effect in third year - „Survival of the fittest“ theory
64
Dependable Systems Course PT 2014
Disk Redundancy: Google
65
• Temperature effects only at high end of temperature range, with old drives
Dependable Systems Course PT 2014
IBM System z - Redundant I/O
• Each processor book 

has up to 8 dual port fanouts

• Direct data transfer 

between memory and 

PCI/e (8 GBps) or 

Infiniband (6 GBps)

• Optical and copper 

connectivity supported

• Fanout cards are 

hot-pluggable, without 

loosing the I/O connectivity

• Air-moving devices (AMD) have N+1 redundancy for fanouts, memory and power
66
Chapter 2. CPC hardware components 31
well as MES orders.
2.2 Book concept
The central processor complex (CPC) uses a packaging design for its processors based on
books. A book contains a multi-chip module (MCM), memory, and connectors to I/O cages
and other CPCs. Books are located in the processor cage in frame A. The z196 has from one
book to four books installed. A book and its components are shown in Figure 2-3.
Figure 2-3 Book structure and components
1 Dense Wave Division Multiplexing
MCM @ 1800W
Refrigeration Cooled or
Water Cooled
Backup Air Plenum
8 I/O FAN OUT
2 FSP
3x DCA 14X DIMMs
100mm High
16X DIMMs
100mm High
11 VTM Card Assemblies
8 Vertical
3 Horizontal
Rear
Front
DCA Power Supplies
Fanout
Cards
Cooling
from/to MRU
MCM
Memory
Memory
Dependable Systems Course PT 2014
IBM System z - Redundant I/O
• PCI/e I/O drawer supports up to 32 

I/O cards from fanouts in 4 domains

• One PCI/e switch card per domain

• Two cards provide backup path for
each other (f.e. with cable failure)

• 16 cards max. per switch
67
Figure 4-2 illustrates the I/O structure of a z196. An InfiniBand (IFB) cable connects the
HCA2-C fanout to an IFB-MP card in the I/O cage. The passive connection between two
IFB-MP cards allows for redundant I/O interconnection. The IFB cable between an HCA2-C
fanout in a book and each IFB-MP card in the I/O cage supports a 6 GBps bandwidth.
Figure 4-2 z196 I/O structure when using I/O cages
I/O Cage
for
Redundant I/O Interconnect
Passive
Connection
8 x 6 GBps
I/O Interconnect
IFB-MP IFB-MP
I/O Domain
Crypto
FICON/FCP
OSA
ESCON
I/O Domain
Crypto
FICON/FCP
OSA
ISC-3
Processor Memory
HCA2-C Fanout
Dependable Systems Course PT 2014
IBM System z - Redundant I/O
68
The HCA3-O LR (1xIFB) fanout comes with 4 ports and each other fanout comes with two
ports.
Figure 4-10 illustrates the IFB connection from the CPC cage to an I/O cage and an I/O
drawer, and the PCIe connection from the CPC cage to an PCIe I/O drawer.
Figure 4-10 PCIe and InfiniBand I/O Infrastructure
Book 3Book 1
Memory
x16 PCIe
Gen2 8 GBps
Memory Memory Memory
Book 0 Book 2
PCIe
switch
OSA-Express4S
PCIe (8x) PCIe (8x) HCA2 (8x)
PCIe
switch
PCIe
switch
FICON Express8S
PCIe
switch
4 GB/s PCIe
Gen2 x8
…
SC1, SC0 (FBC)
PU PU
PU PU PU
PU
SC1, SC0 (FBC)
PU PU
PU PU PU
PU
SC1, SC0 (FBC)
PU PU
PU PU PU
PU
SC1, SC0 (FBC)
PU PU
PU PU PU
PU
SC1, SC0 (FBC)
PU PU
PU PU PU
PU
SC1, SC0 (FBC)
PU PU
PU PU PU
PU
SC1, SC0 (FBC)
PU PU
PU PU PU
PU
SC1, SC0 (FBC)
PU PU
PU PU PU
PU
PCIe I/O drawerPCIe I/O drawer
FanoutsFanouts
6 GBps
HCA2 (8x)
IFB-MP
RII
IFB-MP
FICON Express8
2 GBps mSTI
Channels
1 GBps
mSTI
2 GBps mSTI
Ports
2GBps
mSTI
FPGA
RII IFB-MPIFB-MP
OSA-Express3
I/O Cage & I/O drawerI/O Cage & I/O drawer
RII
RII
FPGA

More Related Content

What's hot

Internet Of Things (Question Paper) [October – 2018 | Choice Based Syllabus]
Internet Of Things (Question Paper) [October – 2018 | Choice Based Syllabus]Internet Of Things (Question Paper) [October – 2018 | Choice Based Syllabus]
Internet Of Things (Question Paper) [October – 2018 | Choice Based Syllabus]Mumbai B.Sc.IT Study
 
Chap 2 discrete_time_signal_and_systems
Chap 2 discrete_time_signal_and_systemsChap 2 discrete_time_signal_and_systems
Chap 2 discrete_time_signal_and_systemsProf. Ihab Ali
 
Digital modulation
Digital modulationDigital modulation
Digital modulationAnkur Kumar
 
Computer organization-and-architecture-questions-and-answers
Computer organization-and-architecture-questions-and-answersComputer organization-and-architecture-questions-and-answers
Computer organization-and-architecture-questions-and-answersappasami
 
Neural networks...
Neural networks...Neural networks...
Neural networks...Molly Chugh
 
discrete time signals and systems
 discrete time signals and systems  discrete time signals and systems
discrete time signals and systems Zlatan Ahmadovic
 
Dynamic interconnection networks
Dynamic interconnection networksDynamic interconnection networks
Dynamic interconnection networksPrasenjit Dey
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkDEEPASHRI HK
 
Eye diagram in Communication
Eye diagram in CommunicationEye diagram in Communication
Eye diagram in CommunicationSivanesh M
 
Computer organisation -morris mano
Computer organisation  -morris manoComputer organisation  -morris mano
Computer organisation -morris manovishnu murthy
 
Moore and mealy machine
Moore and mealy machineMoore and mealy machine
Moore and mealy machineMian Munib
 
Unit v computer, number system
Unit  v computer, number systemUnit  v computer, number system
Unit v computer, number systemindra Kishor
 
Neural network in matlab
Neural network in matlab Neural network in matlab
Neural network in matlab Fahim Khan
 
IOT System Management with NETCONF-YANG.pptx
IOT System Management with NETCONF-YANG.pptxIOT System Management with NETCONF-YANG.pptx
IOT System Management with NETCONF-YANG.pptxArchanaPandiyan
 

What's hot (20)

Pipelining slides
Pipelining slides Pipelining slides
Pipelining slides
 
Internet Of Things (Question Paper) [October – 2018 | Choice Based Syllabus]
Internet Of Things (Question Paper) [October – 2018 | Choice Based Syllabus]Internet Of Things (Question Paper) [October – 2018 | Choice Based Syllabus]
Internet Of Things (Question Paper) [October – 2018 | Choice Based Syllabus]
 
Switching units
Switching unitsSwitching units
Switching units
 
Chap 2 discrete_time_signal_and_systems
Chap 2 discrete_time_signal_and_systemsChap 2 discrete_time_signal_and_systems
Chap 2 discrete_time_signal_and_systems
 
Digital modulation
Digital modulationDigital modulation
Digital modulation
 
Computer organization-and-architecture-questions-and-answers
Computer organization-and-architecture-questions-and-answersComputer organization-and-architecture-questions-and-answers
Computer organization-and-architecture-questions-and-answers
 
PLDs
PLDsPLDs
PLDs
 
Neural networks...
Neural networks...Neural networks...
Neural networks...
 
discrete time signals and systems
 discrete time signals and systems  discrete time signals and systems
discrete time signals and systems
 
Adc assignment final
Adc assignment finalAdc assignment final
Adc assignment final
 
Evolutionary Computing
Evolutionary ComputingEvolutionary Computing
Evolutionary Computing
 
Dynamic interconnection networks
Dynamic interconnection networksDynamic interconnection networks
Dynamic interconnection networks
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Eye diagram in Communication
Eye diagram in CommunicationEye diagram in Communication
Eye diagram in Communication
 
Computer organisation -morris mano
Computer organisation  -morris manoComputer organisation  -morris mano
Computer organisation -morris mano
 
Neural network
Neural networkNeural network
Neural network
 
Moore and mealy machine
Moore and mealy machineMoore and mealy machine
Moore and mealy machine
 
Unit v computer, number system
Unit  v computer, number systemUnit  v computer, number system
Unit v computer, number system
 
Neural network in matlab
Neural network in matlab Neural network in matlab
Neural network in matlab
 
IOT System Management with NETCONF-YANG.pptx
IOT System Management with NETCONF-YANG.pptxIOT System Management with NETCONF-YANG.pptx
IOT System Management with NETCONF-YANG.pptx
 

Similar to Dependable Systems - Hardware Dependability with Redundancy (14/16)

39245203 intro-es-iv
39245203 intro-es-iv39245203 intro-es-iv
39245203 intro-es-ivEmbeddedbvp
 
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Peter Tröger
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Switched networks (LAN Switching – Switches)
Switched networks (LAN Switching – Switches)Switched networks (LAN Switching – Switches)
Switched networks (LAN Switching – Switches)Fleurati
 
Progressive Migration From 'e' to SystemVerilog: Case Study
Progressive Migration From 'e' to SystemVerilog: Case StudyProgressive Migration From 'e' to SystemVerilog: Case Study
Progressive Migration From 'e' to SystemVerilog: Case StudyDVClub
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgaoEdhole.com
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgaoEdhole.com
 
Optimica Compiler Toolkit - Overview
Optimica Compiler Toolkit - OverviewOptimica Compiler Toolkit - Overview
Optimica Compiler Toolkit - OverviewModelon
 
Top schools in noida
Top schools in noidaTop schools in noida
Top schools in noidaEdhole.com
 
MIPI DevCon 2016: How MIPI Debug Specifications Help Me to Develop System SW
MIPI DevCon 2016: How MIPI Debug Specifications Help Me to Develop System SWMIPI DevCon 2016: How MIPI Debug Specifications Help Me to Develop System SW
MIPI DevCon 2016: How MIPI Debug Specifications Help Me to Develop System SWMIPI Alliance
 
Approximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsApproximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsSabidur Rahman
 
The art of system and solution testing
The art of system and solution testingThe art of system and solution testing
The art of system and solution testinggaoliang641
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with PipeliningAneesh Raveendran
 
You name it, we analyze it
You name it, we analyze itYou name it, we analyze it
You name it, we analyze itJim Gilsinn
 
Lessons learned validating 60,000 pages of api documentation
Lessons learned validating 60,000 pages of api documentationLessons learned validating 60,000 pages of api documentation
Lessons learned validating 60,000 pages of api documentationBob Binder
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Peter Tröger
 
Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Peter Tröger
 
PLC lecture by raj nayak
PLC lecture by raj nayakPLC lecture by raj nayak
PLC lecture by raj nayakRaj Nayak
 

Similar to Dependable Systems - Hardware Dependability with Redundancy (14/16) (20)

39245203 intro-es-iv
39245203 intro-es-iv39245203 intro-es-iv
39245203 intro-es-iv
 
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Switched networks (LAN Switching – Switches)
Switched networks (LAN Switching – Switches)Switched networks (LAN Switching – Switches)
Switched networks (LAN Switching – Switches)
 
Progressive Migration From 'e' to SystemVerilog: Case Study
Progressive Migration From 'e' to SystemVerilog: Case StudyProgressive Migration From 'e' to SystemVerilog: Case Study
Progressive Migration From 'e' to SystemVerilog: Case Study
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgao
 
Top schools in gudgao
Top schools in gudgaoTop schools in gudgao
Top schools in gudgao
 
Optimica Compiler Toolkit - Overview
Optimica Compiler Toolkit - OverviewOptimica Compiler Toolkit - Overview
Optimica Compiler Toolkit - Overview
 
Top schools in noida
Top schools in noidaTop schools in noida
Top schools in noida
 
MIPI DevCon 2016: How MIPI Debug Specifications Help Me to Develop System SW
MIPI DevCon 2016: How MIPI Debug Specifications Help Me to Develop System SWMIPI DevCon 2016: How MIPI Debug Specifications Help Me to Develop System SW
MIPI DevCon 2016: How MIPI Debug Specifications Help Me to Develop System SW
 
1.instrumentation ii
1.instrumentation ii1.instrumentation ii
1.instrumentation ii
 
Approximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsApproximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithms
 
The art of system and solution testing
The art of system and solution testingThe art of system and solution testing
The art of system and solution testing
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
 
You name it, we analyze it
You name it, we analyze itYou name it, we analyze it
You name it, we analyze it
 
13 risc
13 risc13 risc
13 risc
 
Lessons learned validating 60,000 pages of api documentation
Lessons learned validating 60,000 pages of api documentationLessons learned validating 60,000 pages of api documentation
Lessons learned validating 60,000 pages of api documentation
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)
 
Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)
 
PLC lecture by raj nayak
PLC lecture by raj nayakPLC lecture by raj nayak
PLC lecture by raj nayak
 

More from Peter Tröger

WannaCry - An OS course perspective
WannaCry - An OS course perspectiveWannaCry - An OS course perspective
WannaCry - An OS course perspectivePeter Tröger
 
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and VirtualizationPeter Tröger
 
Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2Peter Tröger
 
OpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsOpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsPeter Tröger
 
Design of Software for Embedded Systems
Design of Software for Embedded SystemsDesign of Software for Embedded Systems
Design of Software for Embedded SystemsPeter Tröger
 
Humans should not write XML.
Humans should not write XML.Humans should not write XML.
Humans should not write XML.Peter Tröger
 
What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.Peter Tröger
 
Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Peter Tröger
 
Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Peter Tröger
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Peter Tröger
 
Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)Peter Tröger
 
Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Peter Tröger
 
Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)Peter Tröger
 
Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)Peter Tröger
 
Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Peter Tröger
 
Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Peter Tröger
 
Operating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - SummaryOperating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - SummaryPeter Tröger
 
Operating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyOperating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyPeter Tröger
 
Operating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsOperating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsPeter Tröger
 
Operating Systems 1 (7/12) - Threads
Operating Systems 1 (7/12) - ThreadsOperating Systems 1 (7/12) - Threads
Operating Systems 1 (7/12) - ThreadsPeter Tröger
 

More from Peter Tröger (20)

WannaCry - An OS course perspective
WannaCry - An OS course perspectiveWannaCry - An OS course perspective
WannaCry - An OS course perspective
 
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and Virtualization
 
Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2
 
OpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsOpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissions
 
Design of Software for Embedded Systems
Design of Software for Embedded SystemsDesign of Software for Embedded Systems
Design of Software for Embedded Systems
 
Humans should not write XML.
Humans should not write XML.Humans should not write XML.
Humans should not write XML.
 
What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.
 
Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)
 
Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
 
Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)
 
Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)
 
Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)
 
Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)
 
Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)
 
Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0
 
Operating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - SummaryOperating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - Summary
 
Operating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyOperating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - Concurrency
 
Operating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsOperating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management Concepts
 
Operating Systems 1 (7/12) - Threads
Operating Systems 1 (7/12) - ThreadsOperating Systems 1 (7/12) - Threads
Operating Systems 1 (7/12) - Threads
 

Recently uploaded

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 

Recently uploaded (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 

Dependable Systems - Hardware Dependability with Redundancy (14/16)

  • 1. Dependable Systems ! Hardware Dependability - Redundancy Dr. Peter Tröger ! Sources: ! Siewiorek, Daniel P.; Swarz, Robert S.: Reliable Computer Systems. third. Wellesley, MA : A. K. Peters, Ltd., 1998. , 156881092X Roland Trauner, IBM Mainframe Summit, Hasso Plattner Institute, 2012 IBM zEnterprise System Technical Guide, IBM RedBooks Some images (C) Elena Dubrova, ESDLab, Kungl Tekniska Högskolan
  • 2. Dependable Systems Course PT 2014 Redundancy (Reiteration) • Redundancy for error detection and forward error recovery • Redundancy types: spatial, temporal, informational (presentation, version) • Redundant not mean identical functionality, just perform the same work • Static redundancy implements error mitigation • Fault does not show up, since it is transparently removed • Examples: Voting, error-correcting codes, N-modular redundancy • Dynamic redundancy implements error processing • After fault detection, the system is reconfigured to avoid a failure • Examples: Back-up sparing, duplex and share, pair and spare • Hybrid approaches 2
  • 3. Dependable Systems Course PT 2014 System-Failure Response Strategies [Sieworek / Swarz] 3 System reliability Redundant systemNonredundant system Fault detection Fault tolerance Masking redundancy Dynamic redundancy Reconfiguration Retry Online Repair
  • 4. Dependable Systems Course PT 2014 Redundancy • Redundancy is never for free ! • Hardware: Additional components, area, power, shielding, ... • Software: Development costs, maintenance costs, ... • Information: Extra hardware for decoding and encoding • Time: Faster processing (CPU) necessary to achieve same performance • Tradeoff: Costs vs. benefit of redundancy; additional design and testing effort • Sphere of replication [Mukherjee] • Identifies logical domain protected by the fault detection scheme • Questions: For which components are faults detected ? 
 Which outputs must be compared ? Which inputs must be replicated ? 4
  • 5. Dependable Systems Course PT 2014 Sphere of Replication • Components outside the sphere must be protected by other means • Level of output comparison decides upon fault coverage • Larger sphere tends to decrease the required bandwidth on input and output • More state changing happens just inside the sphere • Vendor might be restricted on choice of sphere size 5 HDD HDD Input Replication Output Comparison I/O Bus HDD HDD Input Replication Output Comparison HDD HDD Controll er Controll er RAID1 RAID10
  • 6. Dependable Systems Course PT 2014 Masking / Static Redundancy: Voting • Exact voting: Only one correct result possible • Majority vote for uneven module numbers • Generalized median voting - Select result that is the median, 
 by iteratively removing extremes • Formalized plurality voting - Divide results in partitions, 
 choose random member from the largest partition • Inexact voting: Comparison at high level might lead to multiple correct results • Non-adaptive voting - Use allowable result discrepancy, put boundary on discrepancy minimum or maximum (e.g. 1,4 = 1,3) • Adaptive voting - Rank results based on past experience with module results • Compute the correct value based on „trust“ in modules from experience • Example: Weighted sum R=W1*R1 + W2*R2 + W3*R3 with W1+W2+W3=1 6 Voter A B C
  • 7. Dependable Systems Course PT 2014 Static Redundancy: N-Modular Redundancy • Fault is transparently removed on detection • Triple-modular redundancy (TMR) • 2/3 of the modules must deliver correct results • Generalization with N-modular redundancy (NMR) • m+1/N of the modules must deliver correct result, with N=2m+1 • Standard case without any redundancy is called simplex 7 RT MR = RV · R2 of 3 = RV (R3 M + 3R2 M (1 RM ))
  • 8. Dependable Systems Course PT 2014 N-Modular Redundancy (with perfect voter) 8 Module 1 Module 2 Module 3 Voter ... Module N Input Output RNMR = mX i=0 ✓ N i ◆ (1 R)i RN i ✓ n k ◆ = n! k!(n k)! R2 of 3 = ✓ 3 0 ◆ (1 R)0 R3 + ✓ 3 1 ◆ (1 R)R2 R2 of 3 = R3 + 3(1 R)R2 R3 of 5 = ...
  • 9. • TMR is appropriate if RTMR > RM (for given t) • TMR with perfect voter only improves system reliability when RM > 0.5 • Voter needs 
 to have RV>0.9 
 to reach 
 RTMR > RM Dependable Systems Course PT 2014 TMR Reliability 90 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 0,25 0,5 0,75 TMR Single Module System Reliability Module Reliability
  • 10. Dependable Systems Course PT 2014 Imperfect Voters 10 • Redundant voters • Module errors do not propagate • Voter errors propagate only by one stage • Assumption of multi-step process, final voter still needed
  • 11. Dependable Systems Course PT 2014 Hardware Voting • Smallest hardware solution is the 1-bit majority voter • f=ab + ac + bc • Delivers the bit that has the majority • Requires 2 gate delays and 4 gates • Hardware voting can become expensive • 128 gates and 256 flip-flops for 32-bit voter • Input must be synchronized • Central clock source may be single point of failure • Can be solved by special event latching 11
  • 12. Dependable Systems Course PT 2014 Dynamic Redundancy • Reconfiguration of the system in response to an error state • Prevents error propagation • Triggered by internal fault detection in the unit, or external error detection based on the outputs • Dynamic redundancy combines error confinement with fault detection • Still questions of coverage and diagnosability • On transient errors, good modules may be deactivated • Typically solved by combination of dynamic redundancy with retry approach • Typical approaches: Duplex, sparing, degradation, compensation 12
  • 13. • Reconfigurable duplication : Have relevant modules redundant, switch on failure • Identification on mismatch („test“) • Self-diagnostics procedure • Self-checking logic • Watchdog timer, e.g.
 for having components resetting
 each other (e.g. split brain) • Outside arbiter for signatures
 or black box tests • Test interval depends on application scenario - each clock period / bus cycle / ... • Also called dual-modular redundancy • Reliability computation as with parallel / serial component diagram Dependable Systems Course PT 2014 Duplex Systems 13
  • 14. • Combination of working module and a set of spare modules (,replacement parts‘) • Hot spares: Receive input with main modules, have results immediately • Warm spares: Are running, but receive input only after switching • Cold spares: Need to be started before switching Dependable Systems Course PT 2014 Back-Up Sparing 14 MODULES 1 2 n SWITCH OUTPUT INPUT HOT, WARM AND COLD SPARES
  • 15. • Special cases for combination of duplex (with comparator) and sparing (with switch) • Pair and spare - Multiple duplex pairs, connected as standby sparing setup • Two replicated modules operate as duplex pair (lockstep execution), 
 connected by comparator as voting circuit • Same setting again as spare unit, 
 spare units connected by switch • On module output mismatch, comparators 
 signal switch to perform failover • Commercially used, e.g. Stratus XA/R Series 300 Dependable Systems Course PT 2014 Pair and Spare 15 MODULES 1 2 3 OUTPUTINPUT COMPARATOR 4 COMPARATOR SWITCH/COMPARATOR
  • 16. Dependable Systems Course PT 2014 Graceful Degradation • Performance design of the system allows continued operation with spares • Many commercial systems supports this, but lack automated error processing • Example: Operating system support for CPU off-lining, but no MCA handling • Designed-In Resources: • Replaceable or bypass-able components (f.e. caches, disks, processors) • Support for operation with degraded performance • Added-On Resources: • Redundant units used for excess capacity during normal operation • Still non-degraded performance on failure • Interconnect reconfiguration: Use alternative paths in the network • Hardware solutions in telco industry, today replaced by software solutions 16
  • 17. Dependable Systems Course PT 2014 Example: Spanning Tree Protocol • Modern implementation of interconnect reconfiguration for dynamic redundancy • Bridges for connecting different Ethernet sub-networks • By default no coordination, better products use the spanning tree protocol • Explicit removal of redundant paths (loops), 
 while still supporting all point-to-point communication • Each bridge has its own MAC address, protocol based on broadcast • Create a tree of bridges, starting from a chosen root bridge • All paths start from the root bridge • Ports participating in redundant paths have to be switched off • Cost model for paths to make a choice (root distance, speed) 17
  • 18. Dependable Systems Course PT 2014 Example: Spanning Tree Protocol • Determine root bridge • Send your ID (MAC address) to a multicast group, smallest ID wins • Each non-root bridge determines the ,cheapest‘ path to the root bridge • This port becomes the root port (RP) • For multiple bridges in a segment, the ,cheapest‘ representative is elected -
 designated port (DP) • All ports that are not DP or RP are deactivated -
 blocked port (BP) 18
  • 19. Dependable Systems Course PT 2014 Hybrid Approaches • N-modular redundancy 
 with spares • Also called hybrid redundancy • System has basic NMR 
 configuration • Disagreement detector replaces
 modules with spares if their 
 output is not matching
 the voting result • Reliability as long as the spare pool is not exhausted • Improves fault masking capability of NMR • Can tolerate two faults with one spare, while classic NMR would need 5 modules with majority voting to tolerate two faults 19 Adds fault localization
  • 20. Dependable Systems Course PT 2014 TMR with Spares 20 NASA Technical Report 32-1467, 1969 • Basic reliability computation based on the assumption of similar module failure rates in spares and non-spares • At least any two of all S+3 modules must survive
  • 21. Dependable Systems Course PT 2014 Comparison TMR vs. TMR/S vs. NMR 21 NASATechnicalReport32-1467,1969 #Units = 2N + 1
  • 22. Dependable Systems Course PT 2014 Hybrid Approaches • Self-purging redundancy • Active redundant modules, each can remove itself from the system if faulty • Basic idea: Test for agreement with the voting result, otherwise 0 22 • If module output does not match to system output, 0 is delivered • Works fine with threshold voters
  • 23. Dependable Systems Course PT 2014 Hybrid Approaches 23 • Sift-out modular redundancy (N-2), no voter required • Pair-wise comparison of module outputs by comparator • N inputs and N-over-2 outputs • Detector uses these signals to identify the faulty module, includes also memory cells for failed modules • Collector sifts out the 
 faulty input, based on
 information from
 detector
  • 24. Dependable Systems Course PT 2014 Hybrid Approaches • Triple Duplex Architecture • TMR with duplex modules, used in the Shinkansen (Japanese train) • Fault masking with comparator, no more contribution to voting from faulty one • Allows tolerating another fault in the further operation, 
 since comparator localizes again the faulty module • Adds again fault location
 capability to redundancy
 scheme • Supports also hot
 plugging of deactivated
 components 24
  • 25. Dependable Systems Course PT 2014 The Real World of Hardware Redundancy - Replacement Frequencies [Schroeder 2007] 25 760 node cluster, 2300 disks ISP, multiple sites, 26700 disks ISP, multiple sites, 9200 machines, 39000 disks
  • 26. Dependable Systems Course PT 2014 IBM System z 26 © 2011 IBM Corporation56 IBM zEnterprise Scheduled(CIE+DisruptivePatches+ECs) Planned- (MES+Driver Upgrades) Unscheduled(UIRA) Sources of Outages Pre z9 -Hrs/Year/Syst- Prior Servers z9 EC Z10 EC z196 Unscheduled Outages Scheduled Outages Planned Outages Preplanning requirements Power & Thermal Management Increased Focus over time             Temperature = Silicon Reliability Worst Enemy Wearout = Mechanical Components Reliability Worst Enemy. System z overall RAS Strategy …..Continuing our RAS focus helps avoid outages ImpactofOutage
  • 27. Dependable Systems Course PT 2014 IBM System z • Machine-Check-Handling mechanism in z/Series • Equipment malfunction detection • Permit automatic recovery • Error states are reported by machine-check interruption • Data error detection through information redundancy • Recovery from machine-detected error states • Error checking and correction - use circuitry redundancy • CPU retry - checkpoint written at instruction-based synchronization points • Channel-subsystem recovery - restart of I/O components • Unit deletion - automated degradation of malfunctioning units 27
  • 28. Dependable Systems Course PT 2014 IBM System z - Processor Books 28 © 2011 IBM Corporation52 IBM zEnterprise z196 Cache / Book Topology Fully connected 4 Book system:  96 total cores  Total system cache  768 MB shared L4 (eDRAM)  576 MB L3 (eDRAM)  144 MB L2 private (SRAM)  19.5 MB L1 private (SRAM) CP1CP1CP2CP2 CP4CP4 CP5CP5CP3CP3 SC0SC1 Mem1 Mem0 FBCs Mem2 CP0CP0 Book: FBCs • 4 PU‘s (cores) per CP + co-processors • SC: Storage Control • 96-192 MB L4 cache per SC, accessible from other MCMs • FBC: Fabric book connectivity • Support for dynamic book addition and repair • 2+1 redundancy for book power supply Multi-Chip Module (MCM)
  • 29. Dependable Systems Course PT 2014 IBM System z 29 © 2011 IBM Corporation57 IBM zEnterprise z196 RAS Design of ……..Fully Redundant I/O Subsystem – of existing IO cage and drawers Fully Redundant I/O Design  SAP / CP sparing  SAP Reassignment  I/O Reset & Failover  I/O Mux Reset / Failover  Redundant I/O Adapter  Redundant I/O interconnect  Redundant Network Adapters  Redundant ISC links  Redundant Crypto processors  I/O Switched Fabric  Network Switched/Router Fabric  High Availability Plugging Rules  I/O and coupling fanout rebalancing on CBA  Channel Initiated Retry  High Data Integrity Infrastructure  I/O Alternate Path  Network Alternate Path  Virtualization Technology processor Failover I/O Mux I/O Hub I/O Hub I/O Mux I/O Mux I/O Mux Failover I/Oadapter. I/Oadapter. Networkadapter. Networkadapter. CryproAccelerator CryptoAccelerator ISCLinks ISCLinks Alternate Path I/O Switch Control Unit I/O Switch Control Unit Network Switch End User Network Switch End User
  • 30. Dependable Systems Course PT 2014 IBM System z196 • Prepared for Unscheduled Outages
 • Advanced Memory RAIM (Redundant Array of Independent Memory) design • Enhanced Reed-Solomon code (ECC) – 90B/64B • Protection against Channel/DIMM failures • Chip marking for fast DRAM replacements • Mirrored Key cache • Improved chip packaging • Improved condensation management • Integrated TCP/IP checksum generation/checking • Integrated EPO switch cover (protecting the switch during repair actions) • Continued focus on Firmware • Prepared for Scheduled Outages
 • Double memory data bus lane sparing (reducing repair actions) • Single memory clock bus sparing • Field Repair of interface between processor chip and cache chip and between cache chips (fabric bus) • Fast bitline delete on L3/L4 cache (largest caches) • Power distribution using N+2 Voltage Transformation Modules (VTM) • Blower management by sensing altitude and humidity • Redundant (N+2) humidity sensors • Redundant (N+2) altitude sensors • Unified Resource Manager for zBX 30
  • 31. Dependable Systems Course PT 2014 Memory Redundancy • Redundancy of memory data for error masking • Replication / coding at different levels • Examples • STAR (Self-testing and self-repairing computer, for early spacecrafts), 1971 • COMTRAC (Computer-aided traffic control 
 system for Shinkansen train system) • Stratus (Commercial fault-tolerant system) • 3B20 by AT & T (Commercial fault-tolerant system) • Most modern memory controllers in servers 31
  • 32. Dependable Systems Course PT 2014 Coding Checks in Memory Hardware • Primary memory • Parity code • m-out-of-n resp. m-of-n resp. m/n code • Checksumming • Berger Code • Hamming code • Secondary storage • RAID codes • Reed-Solomon code 32
  • 33. Dependable Systems Course PT 2014 Hamming Distance (Richard Hamming, 1950) • Hamming distance: Number of positions on which two words differ • Alternative definition: Number of necessary substitutions to make A to B • Alternative definition: Number of errors that transform A to B ! ! ! ! • Minimum Hamming distance: Minimum distance between any two code words • To detect d single-bit errors, a code must have a min. distance of at least d + 1 • If at least d+1 bits change in the transmitted code, a new (valid) code appears • To correct d single-bit errors, the minimum distance must be 2d+1 33 (C)Wikipedia
  • 34. Dependable Systems Course PT 2014 Parity Codes • Add parity bit to the information word • Even parity: If odd number of ones, add one to make the number of ones even • Odd parity: If even number of ones, add one to make the number of ones odd • Variants • Bit-per-word parity • Bit-per-byte parity 
 (Example: 
 Pentium data cache) • Bit-per-chip parity • ... 34
  • 35. Dependable Systems Course PT 2014 Two-Dimensional Parity 35 Example: Odd Parity Allows fault location
  • 36. Dependable Systems Course PT 2014 Code Properties 36 Code Bits / Word Number of 
 possible words Coverage Even Parity 4 8 Any single bit error, 
 no double-bit error,
 not all-0s or all 1s errors Odd Parity 4 8 Any single bit error, 
 no double-bit error, all-0s or all 1s errors 2/4 4 6 Any single bit error, 
 33% of double bit errors
  • 37. Dependable Systems Course PT 2014 Checksumming • General idea: Multiply components of a data word and add them up to a checksum • Good for large blocks of data, low hardware overhead • Might require substantial time, memory overhead • 100% coverage for single faults • Example: ISBN 10 check digit • Final number of 10 digit ISBN number is a check digit, computed by: • Multiply all data digits by their position number, starting from right • Take sum of the products, and set the check digit so that result MOD 11 = 0 • Check digit „01“ is represented by X • More complex approaches get better coverage (e.g. CRC) for more CPU time 37
  • 38. Dependable Systems Course PT 2014 Hamming Code • Every data word with n bits is extended 
 by k control bits • Control bits through standard parity approach 
 (even parity), implementable with XOR • Inserted at power-of-two positions • Parity bit pX is responsible for all position numbers with the X-least significant bit set (e.g. p1 responsible for ,red dot‘ positions) • Parity bits check overlapping parts of the data bits, computation and check are cheap, 
 but significant overhead in data • Minimum Hamming distance of three 
 (single bit correction) • Application in DRAM memory 38 takenfromWikimediaCommons
  • 39. Dependable Systems Course PT 2014 Hamming Code for 26bit word size 39 takenfromWikimediaCommons
  • 40. Dependable Systems Course PT 2014 Hamming Code • Hamming codes are known to produce 10% to 40% of data overhead • Syndrome determines which bit (if any) is corrupted • Example ECC • Majority of „one-off“ soft errors in DRAM happens from background radiation • Denser packages, lower voltage for higher frequencies • Most common approach is the SECDEC Hamming code • "Single error correction, double error detection" (SECDEC) • Hamming code with additional parity • Modern BIOS implementations perform 
 threshold-based warnings on frequent correctable errors 40 (C) Z. Kalbarczyk
  • 41. Dependable Systems Course PT 2014 Memory Redundancy • Standard technology in DRAMs • Bit-per-byte parity, check on read access • Implemented by additional parity memory chip • ECC with Hamming codes - 7 check bits for 32 bit data words, 8 bit for 64 bit • Leads to 72 bit data bus between DIMM and chipset • Computed by memory controller on write, checked on read • Study by IBM: ECC memory achieves R=0.91 over three years • Can correct single bit errors and detect double bit errors • Hewlett Packard Advanced ECC (1996) • Can detect and correct single bit and double bit errors 41
  • 42. Dependable Systems Course PT 2014 Memory Redundancy • IBM ChipKill • Originally developed for NASA Pathfinder project, now in X-Series • Corrects up to 4 bit errors, detects up to 8 bit errors • Implemented in chipset and firmware, works with standard ECC modules • Based on striping approach with parity checks (similar to RAID) • 72 bit data word is split in 18 bit chunks, distributed on 4 DIMM modules • 18 DRAM chips per module, one bit per chip • HP Hot Plug RAID Memory • Five memory banks, fifth bank for parity information • Corrects single bit, double bit, 4-bit, 8-bit errors; hot plugging support 42
  • 43. Dependable Systems Course PT 2014 Memory Redundancy • Dell PowerEdge Servers, 2005 (taken from www.dell.com) 43
  • 44. Dependable Systems Course PT 2014 Memory Redundancy • Fujitsu System Board D2786 for RX200 S5 (2010) • Independent Channel Mode: Standard operational module, always use first slot • Mirrored Channel Mode: Identical modules on slot A/B (CPU1) and D/E (CPU2) 44
  • 45. Dependable Systems Course PT 2014 IBM System z - Memory RAID 45 • System z10 EC memory design • Four Memory Controllers (MCUs) organized in two pairs, 
 each MCU with four redundant channels • 16 to 48 DIMMs per book, plugged in groups of 8 • 8 DIMMs (4 or 8 GB) per feature, 32 or 64 GB physical memory per feature • 64 to 384 GB physical memory per book = 64 to 384 GB for use 
 (HSA and customer) • z196 memory design: • Three MCUs, each with five channels. The fifth channel in each z196 MCU is required to implement Redundant Array of Independent Memory (RAIM) • Detected and corrected: Bit, lane, DRAM, DIMM, socket, and complete memory channel failures, including many types of multiple failures
  • 46. Dependable Systems Course PT 2014 IBM System z196 - Memory RAID 46 © 2011 IBM Corporation62 IBM zEnterprise Ch4 Ch3 Ch2 Ch1 ASIC ASIC Ch0 DIMM CLK Diff CLK Diff C R C C R C DRAM X X X X Layers of Memory Recovery ECC  Powerful 90B/64B Reed Solomon code DRAM Failure  Marking technology; no half sparing needed  2 DRAM can be marked  Call for replacement on third DRAM Lane Failure  CRC with Retry  Data – lane sparing  CLK – RAIM with lane sparing DIMM Failure (discrete components, VTT Reg.)  CRC with Retry  Data – lane sparing  CLK – RAIM with lane sparing DIMM Controller ASIC Failure  RAIM Recovery Channel Failure  RAIM Recovery MCU0 ECC RAIM X X X 2- Deep Cascade Using Quad High DIMMs z196 RAIM Memory Controller Overview
  • 47. Dependable Systems Course PT 2014 IBM z10 EC Memory Structure 47 © 2011 IBM Corporation63 IBM zEnterprise z10 EC Memory Structure DATA CHECK DATA CHECK ECC Level 2 Cache Key Cache MCU 0 16B 16B 2B Key Cache MCU 1 16B 16B 2B Key Cache MCU 2 16B 16B 2B Key Cache MCU 3 16B 16B 2B
  • 48. Dependable Systems Course PT 2014 IBM System z196 - RAIM 48 © 2011 IBM Corporation64 IBM zEnterprise z196 Redundant Array of Independent Memory (RAIM) Structure DATA CHECK DATA CHECK ECC RAIM Parity Level 3 Cache Key Cache MCU 0 16B 16B 2B Key Cache MCU 1 16B 16B 2B Key Cache MCU 2 16B 16B 2B Extra column provides RAIM function
  • 49. Dependable Systems Course PT 2014 Disk Redundancy • Typical measure is the annual failure rate (AFR) - average number of failures / year
 
 
 • Can be interpreted as failure probability during a year • MTBF = Mean time before failure, here • Disk MTTF: On average, one failure takes place in the given disk hours • Example: Seagate Barracuda ST3500320AS: MTTF=750000h=85.6 years • With thousand disks, on average every 750h (a month) some disk fails • Measured by the manufacturer under heavy load and physical stress • AFR=0.012 49 AFR = 1 MT BFyears = 8760 MT BFhours
  • 50. Dependable Systems Course PT 2014 RAID • Redundant Array of Independent Disks (RAID) [Patterson et al. 1988] • Improve I/O performance and / or reliability by building raid groups • Replication for information reconstruction on disk failure (degrading) • Requires computational effort (dedicated controller vs. software) • Assumes failure independence 50
  • 51. Dependable Systems Course PT 2014 RAID Reliability Comparison • Treat disk failing as Bernoulli experiment - independent events, identical probability • Probability for k events of probability p in n runs ! • Probability for a failure of a RAID 1 mirror - all disks unavailable: ! • Probability for a failure of a RAID 0 strip set - any faults disk leads to failure: 51 Bn,p(k) = pk (1 p)n k n k ⇥ pallfail = n n pfail n (1 pfail)0 = pfail n panyfail = 1 pallwork = 1 ✓ n n ◆ (1 pfail)n pfail 0 = 1 (1 pfail)n
  • 52. • Works for RAID levels were second outage during repair is fatal • Basic idea is that groups of data disks are protected by additional check disks • D - Total number of data disks • G - Number of data disks in a group (e.g. G=1 in RAID1) • C - Number of redundant check disks (parity / mirror) in a group 
 (e.g. C=1 in RAID1) • nG = D / G = number of groups, G+C : Number of disks in a group Dependable Systems Course PT 2014 RAID MTTF Calculation [Patterson] 52 MTTFGroup = MTTFDisk G + C · 1 pSecondF ailureDuringRepair
  • 53. • Assuming exponential distribution, the probability for a second disk failure during the repair time can be determined by: ! ! • So: Dependable Systems Course PT 2014 RAID MTTF Calculation [Patterson] 53 MTTFGroup = MTTFDisk G + C · 1 pSecondF ailureDuringRepair pSecondF ailure = MTTR MT T FDisk G+C 1 MTTFRaid = MTTFGroup nG = MTTFDisk 2 (G + C) ⇤ nG ⇤ (G + C 1) ⇤ MTTR
  • 54. Dependable Systems Course PT 2014 RAID 0 • Raid 0 - Block-level striping • I/O performance improvement with many channels and drives • One controller per drive • Optimal stripe size depends on I/O request size, random vs. sequential I/O, concurrent vs. single-threaded I/O • Fine-grained striping: Good load balancing, catastrophic data loss • Coarse-grained striping: Good recovery for small files, worse performance • One option: Strip size = Single-threaded I/O size / number of disks • Parallel read supported, but positioning overhead for small concurrent accesses • No fault tolerance 54 (C) Wikipedia MTTFRaid0 = MTTFDisk N
  • 55. Dependable Systems Course PT 2014 RAID 1 • Raid 1 - Mirroring and duplexing • Duplicated I/O requests • Decreasing write performance, up to double read rate of single disk • RAID controller might allow concurrent read and write per mirrored pair • Highest overhead of all solutions, smallest disk determines resulting size • Reliability is given by probability that one disk fails and the second fails while the first is repaired • With D=1, G=1, C=1 and the generic formula, we get 55 (C) Wikipedia MTTFRaid1 = MTTFDisk 2 · MTTFDisk MTTRDisk
  • 56. Dependable Systems Course PT 2014 Raid 2/3 • Raid 2 - Byte-level striping with Hamming code -based check disk • No commercial implementation due to ECC storage overhead • Online verification and correction during read • Raid 3 - Byte-level striping with dedicated XOR parity disk • All data disks used equally, one XOR parity disk as bottleneck (C=1) • Bad for concurrent small accesses, good sequential performance (streaming) • Separate code is needed to identify a faulty disk • Disk failure has only small impact on throughput • RAID failure if more than one disk fails: 56 (C) Wikipedia MTTFRaid3 = MTTFDisk D + C · MT T FDisk D+C 1 MTTRDisk
  • 57. Dependable Systems Course PT 2014 Parity With XOR • Self-inverse operation • 101 XOR 011 = 110, 110 XOR 011 = 101 57 Disk Byte 1 1 1 0 0 1 0 0 1 2 0 1 1 0 1 1 1 0 3 0 0 0 1 0 0 1 1 4 1 1 1 0 1 0 1 1 Parity 0 1 0 1 1 1 1 1 Disk Byte 1 1 1 0 0 1 0 0 1 Parity 0 1 0 1 1 1 1 1 3 0 0 0 1 0 0 1 1 4 1 1 1 0 1 0 1 1 Hot Spare 0 1 1 0 1 1 1 0
  • 58. MTTFRaid5 = MTTFDisk N · MT T FDisk N 1 MTTRDisk Dependable Systems Course PT 2014 RAID 4 / 5 • Raid 4 - Block-level striping with dedicated parity disk • RAID 3 vs. RAID 4: Allows concurrent block access • Raid 5 - Block-level striping with distributed parity • Balanced load as with Raid 0, better reliability • Bad performance for small block writing • Most complex controller design, difficult rebuild • When block in a stripe is changed, old block and parity must be read to compute new parity • For every changed data bit, flip parity bit 58 (C)Wikipedia
  • 59. Dependable Systems Course PT 2014 RAID 6 / 01 / 10 • Raid 6 - Block-level striping with two parity schemes • Extension of RAID5, can sustain multiple drive
 failures at the same time • High controller overhead to compute parities,
 poor write performance • Raid 01 - Every mirror is a Raid 0 stripe 
 (min. 4 disks) • Raid 10 - Every stripe is a Raid 1 mirror 
 (min. 4 disks) • RAID DP - RAID 4 with second parity
 disk • Additional parity includes first parity + all but one of the data blocks (diagonal) • Can deal with two disk outages
 59
  • 60. • Take the same number of disks
 in different constellations • AFRDisk = 0.029, MTTR=8h • RAID5 has bad reliability, but
 offers most effective capacity • In comparison to RAID5, RAID10
 can deal with two disk errors • Also needs to consider different 
 resynchronization times • RAID10: Only one disk needs to be copied to the spare • RAID5 / RAID-DP: All disks must be read to compute parity • Use RAID01 only in 2+2 combination Dependable Systems Course PT 2014 RAID Analysis (Schmidt) 60
  • 61. Dependable Systems Course PT 2014 RAID Analysis (TecChannel.de) 61 RAID 0 RAID 1 RAID 10 RAID 3 RAID 4 RAID 5 RAID 6 Number of drives n > 1 n = 2 n > 3 n > 2 n > 2 n > 2 n > 3 Capacity overhead (%) 0 50 50 100 / n 100 / n 100 / n 200 / n Parallel reads n 2 n / 2 n - 1 n - 1 n -1 n - 2 Parallel writes n 1 1 1 1 n / 2 n / 3 Maximum read throughput n 2 n / 2 n - 1 n - 1 n - 1 n - 2 Maximum write throughput n 1 1 1 1 n / 2 n / 3
  • 62. Dependable Systems Course PT 2014 Software RAID • Software layer above block-based device driver(s) • Windows Desktop / Server, Mac OS X, Linux, ... • Multiple problems • Computational overhead for RAID levels beside 0 and 1 • Boot process • Legacy partition formats • Driver-based RAID • Standard disk controller with special firmware • Controller covers boot stage, device driver takes over in protected mode 62
  • 63. Dependable Systems Course PT 2014 Disk Redundancy: Google • Failure Trends in a Large Disk Drive Population [Pinheiro2007] • > 100.000 disks, SATA / PATA consumer hard disk drives, 5400 to 7200 rpm • 9 months of data gathering in Google data centers • Statistical analysis of SMART data • Failure event: „A drive is considered to 
 have failed if it was replaced as part 
 of a repairs procedure.“ • Prediction models based on SMART only 
 work in 56% of the cases 63
  • 64. Dependable Systems Course PT 2014 Disk Redundancy: Google • Failure rates are correlated with drive model, manufacturer and drive age • Indication for infant mortality • Impact from utilization (25th percentile, 50-75th percentile, 75th percentile) • Reversing effect in third year - „Survival of the fittest“ theory 64
  • 65. Dependable Systems Course PT 2014 Disk Redundancy: Google 65 • Temperature effects only at high end of temperature range, with old drives
  • 66. Dependable Systems Course PT 2014 IBM System z - Redundant I/O • Each processor book 
 has up to 8 dual port fanouts • Direct data transfer 
 between memory and 
 PCI/e (8 GBps) or 
 Infiniband (6 GBps) • Optical and copper 
 connectivity supported • Fanout cards are 
 hot-pluggable, without 
 loosing the I/O connectivity • Air-moving devices (AMD) have N+1 redundancy for fanouts, memory and power 66 Chapter 2. CPC hardware components 31 well as MES orders. 2.2 Book concept The central processor complex (CPC) uses a packaging design for its processors based on books. A book contains a multi-chip module (MCM), memory, and connectors to I/O cages and other CPCs. Books are located in the processor cage in frame A. The z196 has from one book to four books installed. A book and its components are shown in Figure 2-3. Figure 2-3 Book structure and components 1 Dense Wave Division Multiplexing MCM @ 1800W Refrigeration Cooled or Water Cooled Backup Air Plenum 8 I/O FAN OUT 2 FSP 3x DCA 14X DIMMs 100mm High 16X DIMMs 100mm High 11 VTM Card Assemblies 8 Vertical 3 Horizontal Rear Front DCA Power Supplies Fanout Cards Cooling from/to MRU MCM Memory Memory
  • 67. Dependable Systems Course PT 2014 IBM System z - Redundant I/O • PCI/e I/O drawer supports up to 32 
 I/O cards from fanouts in 4 domains • One PCI/e switch card per domain • Two cards provide backup path for each other (f.e. with cable failure) • 16 cards max. per switch 67 Figure 4-2 illustrates the I/O structure of a z196. An InfiniBand (IFB) cable connects the HCA2-C fanout to an IFB-MP card in the I/O cage. The passive connection between two IFB-MP cards allows for redundant I/O interconnection. The IFB cable between an HCA2-C fanout in a book and each IFB-MP card in the I/O cage supports a 6 GBps bandwidth. Figure 4-2 z196 I/O structure when using I/O cages I/O Cage for Redundant I/O Interconnect Passive Connection 8 x 6 GBps I/O Interconnect IFB-MP IFB-MP I/O Domain Crypto FICON/FCP OSA ESCON I/O Domain Crypto FICON/FCP OSA ISC-3 Processor Memory HCA2-C Fanout
  • 68. Dependable Systems Course PT 2014 IBM System z - Redundant I/O 68 The HCA3-O LR (1xIFB) fanout comes with 4 ports and each other fanout comes with two ports. Figure 4-10 illustrates the IFB connection from the CPC cage to an I/O cage and an I/O drawer, and the PCIe connection from the CPC cage to an PCIe I/O drawer. Figure 4-10 PCIe and InfiniBand I/O Infrastructure Book 3Book 1 Memory x16 PCIe Gen2 8 GBps Memory Memory Memory Book 0 Book 2 PCIe switch OSA-Express4S PCIe (8x) PCIe (8x) HCA2 (8x) PCIe switch PCIe switch FICON Express8S PCIe switch 4 GB/s PCIe Gen2 x8 … SC1, SC0 (FBC) PU PU PU PU PU PU SC1, SC0 (FBC) PU PU PU PU PU PU SC1, SC0 (FBC) PU PU PU PU PU PU SC1, SC0 (FBC) PU PU PU PU PU PU SC1, SC0 (FBC) PU PU PU PU PU PU SC1, SC0 (FBC) PU PU PU PU PU PU SC1, SC0 (FBC) PU PU PU PU PU PU SC1, SC0 (FBC) PU PU PU PU PU PU PCIe I/O drawerPCIe I/O drawer FanoutsFanouts 6 GBps HCA2 (8x) IFB-MP RII IFB-MP FICON Express8 2 GBps mSTI Channels 1 GBps mSTI 2 GBps mSTI Ports 2GBps mSTI FPGA RII IFB-MPIFB-MP OSA-Express3 I/O Cage & I/O drawerI/O Cage & I/O drawer RII RII FPGA