2. Recovery System
Failure Classification
Storage Structure
Recovery and Atomicity
Log-Based Recovery
Remote Backup Systems
3. Failure Classification
Transaction failure :
Logical errors: transaction cannot complete due to
some internal error condition
System errors: the database system must terminate
an active transaction due to an error condition (e.g.,
deadlock)
System crash: a power failure or other hardware or
software failure causes the system to crash.
Fail-stop assumption: non-volatile storage contents
are assumed to not be corrupted by system crash
Database systems have numerous integrity checks
to prevent corruption of disk data
4. Disk failure: a head crash or similar disk failure destroys all or
part of disk storage
Destruction is assumed to be detectable: disk drives use
checksums to detect failures.
Recovery Algorithms
Recovery algorithms are techniques to ensure database
consistency and transaction atomicity and durability despite
failures
Focus of this chapter
Recovery algorithms have two parts
1. Actions taken during normal transaction processing to ensure
enough information exists to recover from failures
2. Actions taken after a failure to recover the database contents
to a state that ensures atomicity, consistency and durability
5. Recovery Algorithms
Recovery algorithms are techniques to
ensure database consistency and
transaction atomicity and durability despite
failures
Focus of this chapter
Recovery algorithms have two parts
1. Actions taken during normal transaction
processing to ensure enough information exists
to recover from failures
2. Actions taken after a failure to recover the
database contents to a state that ensures
atomicity, consistency and durability
6. Storage Structure
Volatile storage:
does not survive system crashes
examples: main memory, cache memory
Nonvolatile storage:
survives system crashes
examples: disk, tape, flash memory,
non-volatile (battery backed up) RAM
Stable storage:
a mythical form of storage that survives all failures
approximated by maintaining multiple copies on
distinct nonvolatile media
7. Stable-Storage Implementation
Maintain multiple copies of each block on separate disks
copies can be at remote sites to protect against disasters such
as fire or flooding.
Failure during data transfer can still result in inconsistent
copies: Block transfer can result in
Successful completion
Total failure: destination block was never updated
Execute output operation as follows (assuming two copies of
each block):
Write the information onto the first physical block.
When the first write successfully completes, write the
same information onto the second physical block.
The output is completed only after the second write
successfully completes.
8. Stable-Storage Implementation (Cont.)
Protecting storage media from failure during data transfer
(cont.):
Copies of a block may differ due to failure during output
operation. To recover from failure:
1. First find inconsistent blocks:
1. Expensive solution: Compare the two copies of every
disk block.
2. Better solution:
Record in-progress disk writes on non-volatile storage
(Non-volatile RAM or special area of disk).
2. If either copy of an inconsistent block is detected to have an
error (bad checksum), overwrite it by the other copy. If both
have no error, but are different, overwrite the second block
by the first block.
9. Data Access
Physical blocks are those blocks residing on the disk.
Buffer blocks are the blocks residing temporarily in main
memory.
Block movements between disk and main memory are initiated
through the following two operations:
input(B) transfers the physical block B to main memory.
output(B) transfers the buffer block B to the disk, and
replaces the appropriate physical block there.
We assume, for simplicity, that each data item fits in, and is
stored inside, a single block.
10. Example of Data Access
x
Y A
B
x1
y1
buffer
Buffer Block A
Buffer Block B
input(A)
output(B)
read(X)
write(Y)
disk
work area
of T1
work area
of T2
memory
x2
11. Recovery and Atomicity
Modifying the database without ensuring that the
transaction will commit may leave the database in an
inconsistent state.
Consider transaction Ti that transfers $50 from account A
to account B; goal is either to perform all database
modifications made by Ti or none at all.
Several output operations may be required for Ti (to output
A and B). A failure may occur after one of these
modifications have been made but before all of them are
made.
12. Log-Based Recovery
A log is kept on stable storage.
The log is a sequence of log records, and maintains a
record of update activities on the database.
When Ti finishes it last statement, the log record <Ti
commit> is written.
We assume for now that log records are written directly to
stable storage (that is, they are not buffered)
Two approaches using logs
Deferred database modification
Immediate database modification
13. Checkpoints
Problems in recovery procedure as discussed earlier :
1. searching the entire log is time-consuming
2. we might unnecessarily redo transactions which have already
3. output their updates to the database.
Streamline recovery procedure by periodically performing check
pointing
1. Output all log records currently residing in main memory
onto stable storage.
2. Output all modified buffer blocks to the disk.
3. Write a log record < checkpoint> onto stable storage.
14. Example of Checkpoints
T1 can be ignored (updates already output to disk due to
checkpoint)
T2 and T3 redone.
T4 undone
Tc
Tf
T1
T2
T3
T4
checkpoint system failure