SlideShare a Scribd company logo
1 of 45
Download to read offline
1/18/12 1
4. Database System
Recovery
CSEP 545 Transaction Processing
for E-Commerce
Philip A. Bernstein
Copyright ©2012 Philip A. Bernstein
1/18/12 2
Outline
1. Introduction
2. Recovery Manager
3. Two Non-Logging Algorithms
4. Log-based Recovery
5. Media Failure
3
1. Introduction
• A database may become inconsistent because of
– Transaction failure (abort)
– Database system failure (possibly caused by OS crash)
– Media crash (disk-resident data is corrupted)
• The recovery system ensures the database contains
exactly those updates produced by committed
transactions
– I.e. atomicity and durability, despite failures
1/18/12
4
Assumptions
• Two-phase locking, holding write locks until after
a transaction commits. This implies
– Recoverability
– No cascading aborts
– Strictness (never overwrite uncommitted data)
• Page-level everything (for now)
– Database is a set of pages
– Page-granularity locks
– A transaction’s read or write operation operates on an
entire page
– We’ll look at record granularity later
1/18/12
5
Storage Model
• Stable database - survives system failures
• Cache (volatile) - contains copies of some pages,
which are lost by a system failure
Stable Database
Log
Read, Write
Fetch, Flush
Pin, Unpin, Deallocate
Cache Manager
Cache
Read, Write
1/18/12
6
Stable Storage
• Write(P) overwrites the entire contents of P on the
disk
• If Write is unsuccessful, the error might be
detected on the next read ...
– e.g. page checksum error => page is corrupted
• … or maybe not
– Write correctly wrote to the wrong location
• Write is the only operation that’s atomic with
respect to failures and whose successful execution
can be determined by recovery procedures.
1/18/12
7
The Cache
• Cache is divided into page-sized slots.
• Dirty bit tells if the page was updated since it was last
written to disk.
• Pin count tells number of pin ops without unpins
Page Dirty Bit Cache Address Pin Count
P2 1 91976 1
P47 0 812 2
P21 1 10101 0
• Fetch(P) - read P into a cache slot. Return slot address.
• Flush(P) - If P’s slot is dirty and unpinned, then write it to
disk (i.e. return after the disk acks).
1/18/12
8
The Cache (cont’d)
• Pin(P) - make P’s slot non-flushable & non-replaceable.
– Non-flushable because P’s content may be inconsistent.
– Non-replaceable because someone has a pointer into P or is
accessing P’s content.
• Unpin(P) - release it.
• Deallocate(P) - allow P’s slot to be reused (even if dirty).
1/18/12
9
Big Picture
Database
System
Query Optimizer
Query Executor
Access Method
(record-oriented files)
Page-oriented Files
Database
Recovery manager
Cache manager
Page file manager
Fetch, Flush
Pin, Unpin,
Deallocate
• Record manager is the main user of the cache manager.
• It calls Fetch(P) and Pin(P) to ensure the page is in main
memory, non-flushable, and non-replaceable.
1/18/12
10
Latches
• A page is a data structure with many fields.
• A latch is a short-term lock that gives its owner
access to a page in main memory.
• A read latch allows the owner to read the content.
• A write latch allows the owner to modify the
content.
• The latch is usually a bit in a control structure,
not an entry in the lock manager. It can be set and
released much faster than a lock.
• There’s no deadlock detection for latches.
1/18/12
11
The Log
• A sequential file of records describing updates:
– Address of updated page.
– Id of transaction that did the update.
– Before-image and after-image of the page.
• Whenever you update the cache, also update the log.
• Log records for Commit(Ti) and Abort(Ti).
• Some older systems separated before-images and
after-images into separate log files.
• If opi conflicts with and executes before opk, then
opi’s log record must precede opk’s log record.
– Recovery will replay operations in log-record-order.1/18/12
12
The Log (cont’d)
• To update records on a page:
– Fetch(P) read P into cache
– Pin(P) ensure P isn’t flushed
– write lock (P) for two-phase locking
– write latch (P) get exclusive access to P
– update P update P in cache
– log the update to P append it to the log
– unlatch (P) release exclusive access
– Unpin(P) allow P to be flushed
1/18/12
13
2. Recovery Manager
• Processes Commit, Abort and Restart
• Commit(T)
– Write T’s updated pages to stable storage atomically,
even if the system crashes
• Abort(T)
– Undo the effects of T’s writes
• Restart = recover from system failure
– Abort all transactions that were not committed at the time
of the previous failure
– Fix stable storage so it includes all committed writes and
no uncommitted ones (so it can be read by new txns)
1/18/12
14
Recovery Manager Model
Stable Database
Log
Read,
Write
Pin, Unpin
Fetch
Cache Manager
Cache
Read, Write
Recovery Manager
Flush
Deallocate
Transaction 1 Transaction 2 Transaction N
Commit, Abort, Restart
Read,
Write
Flush, dealloc for normal operat’n
Restart uses Fetch, Pin, Unpin1/18/12
15
Implementing Abort(T)
• Suppose T wrote page P.
• If P was not transferred to stable storage,
then deallocate its cache slot.
• If it was transferred, then P’s before-image must be
in stable storage (else you couldn’t undo after a
system failure).
• Undo Rule - Do not flush an uncommitted update of
P until P’s before-image is stable. (Ensures undo is
possible.)
– Write-Ahead Log Protocol - Do not … until P’s
before-image is in the log.1/18/12
16
Avoiding Undo
• Avoid the problem implied by the Undo Rule by
never flushing uncommitted updates.
– Avoids stable logging of before-images.
– Don’t need to undo updates after a system failure.
• A recovery algorithm requires undo if an update of
an uncommitted transaction can be flushed.
– Usually called a steal algorithm, because it allows a dirty
cache page to be “stolen.”
1/18/12
17
Implementing Commit(T)
• Commit must be atomic. So it must be implemented
by a disk write.
• Suppose T wrote P, T committed, and then the
system fails. P must be in stable storage.
• Redo rule - Don’t commit a transaction until the
after-images of all pages it wrote are in stable
storage (in the database or log). (Ensures redo is
possible.)
– Often called the Force-At-Commit rule.
1/18/12
18
Avoiding Redo
• To avoid redo, flush all of T’s updates to the stable
database before it commits. (They must be in stable
storage.)
– Usually called a Force algorithm, because updates are
forced to disk before commit.
– It’s easy, because you don’t need stable bookkeeping of
after-images.
– But it’s inefficient for hot pages. (Consider TPC-A/B.)
• Conversely, a recovery algorithm requires redo if a
transaction may commit before all of its updates are
in the stable database.
1/18/12
19
Avoiding Undo and Redo?
• To avoid both undo and redo
– Never flush uncommitted updates (to avoid undo), and
– Flush all of T’s updates to the stable database before it
commits (to avoid redo).
• Thus, it requires installing all of a transaction’s
updates into the stable database in one write to disk
• It can be done, but it isn’t efficient for short
transactions and record-level updates.
– Use shadow paging.
1/18/12
20
Implementing Restart
• To recover from a system failure
– Abort transactions that were active at the failure.
– For every committed transaction, redo updates that are in
the log but not the stable database.
– Resume normal processing of transactions.
• Idempotent operation - many executions of the
operation have the same effect as one execution.
• Restart must be idempotent. If it’s interrupted by a
failure, then it re-executes from the beginning.
• Restart contributes to unavailability. So make it fast!
1/18/12
21
3. Log-based Recovery
• Logging is the most popular mechanism for
implementing recovery algorithms.
• The recovery manager implements
– Commit - by writing a commit record to the log and
flushing the log (satisfies the Redo Rule).
– Abort - by using the transaction’s log records to restore
before-images.
– Restart - by scanning the log and undoing and redoing
operations as necessary.
• The algorithms are fast since they use sequential log
I/O in place of random database I/O. They greatly
affect TP and Restart performance.1/18/12
22
Implementing Commit
• Every commit requires a log flush.
• If you can do K log flushes per second, then K is
your maximum transaction throughput.
• Group Commit Optimization - when processing
commit, if the last log page isn’t full, delay the
flush to give it time to fill.
• If there are multiple data managers on a system,
then each data mgr must flush its log to commit.
– If each data mgr isn’t using its log’s update bandwidth,
then a shared log saves log flushes.
– A good idea, but rarely supported commercially.1/18/12
23
Implementing Abort
• To implement Abort(T), scan T’s log records and install
before images.
• To speed up Abort, back-chain each transaction’s update
records.
Transaction Descriptors
Transaction last log record
T7
Start of Log
End of Log
Ti Pk null pointer
Ti Pm backpointer
Ti’s first
log record
1/18/12
24
Satisfying the Undo Rule
• To implement the Write-Ahead Log Protocol, tag each
cache slot with the log sequence number (LSN) of the last
update record to that slot’s page.
Page Dirty Cache Pin LSN
Bit Address Count
P47 1 812 2
P21 1 10101 0
Log
Start
End
On disk
Main
Memory
• Cache manager won’t flush a page P until P’s last updated
record, pointed to by LSN, is on disk.
• P’s last log record is usually stable before Flush(P),
so this rarely costs an extra flush
• LSN must be updated while latch is held on P’s slot1/18/12
25
Implementing Restart (rev 1)
• Assume undo and redo are required.
• Scan the log backwards, starting at the end.
– How do you find the end?
• Construct a commit list and recovered-page-list
during the scan (assuming page level logging).
• Commit(T) record => add T to commit list
• Update record for P by T
– if P is not in the recovered-page-list then
• Add P to the recovered-page-list.
• If T is in the commit list, then redo the update,
else undo the update.
26
Checkpoints
• Problem - Prevent Restart from scanning back to the
start of the log
• A checkpoint is a procedure to limit the amount of
work for Restart
• Cache-consistent checkpointing
– Stop accepting new update, commit, and abort operations
– Make list of [active transaction, pointer to last log record]
– Flush all dirty pages
– Append a checkpoint record to log; include the list
– Resume normal processing
• Database and log are now mutually consistent
1/18/12
27
Restart Algorithm (rev 2)
• No need to redo records before last checkpoint, so
– Starting with the last checkpoint, scan forward in the log.
– Redo all update records. Process all aborts.
Maintain list of active transactions (initialized to content
of checkpoint record).
– After you’re done scanning, abort all active transactions.
• Restart time is proportional to the amount of log
after the last checkpoint.
• Reduce restart time by checkpointing frequently.
• Thus, checkpointing must be cheap.
1/18/12
28
Fuzzy Checkpointing
• Make checkpoints cheap by avoiding synchronized flushing
of dirty cache at checkpoint time.
– Stop accepting new update, commit, and abort operations
– Make a list of all dirty pages in cache
– Make list of [active transaction, pointer to last log record]
– Append a checkpoint record to log; include the list
– Resume normal processing
– Initiate low priority flush of all dirty pages
• Don’t checkpoint again until all of the last checkpoint’s
dirty pages are flushed.
• Restart begins at second-to-last (penultimate) checkpoint.
• Checkpoint frequency depends on disk bandwidth.1/18/12
29
Operation Logging
• Record locking requires (at least) record logging.
– Suppose records x and y are on page P
– w1[x] w2[y] abort1 commit2 (not strict w.r.t. pages)
• Record logging requires Restart to read a page
before updating it. This reduces log size.
• Further reduce log size by logging description of an
update, not the entire before/after image of record.
– Only log after-image of an insertion
– Only log fields being updated
• Now Restart can’t blindly redo.
– E.g., it must not insert a record twice
1/18/12
30
LSN-based logging
• Each database page P’s header has the LSN of the last log
record whose operation updated P.
• Restart compares log record and page LSN before redoing
the log record’s update U.
– Redo the update only if LSN(P) < LSN(U)
• Undo is a problem. If U’s transaction aborts and you undo
U, what LSN to put on the page?
– Suppose T1 and T2 update records x and y on P
– w1[x] w2[y] c2 a1 (what LSN does a1 put on P?)
– not LSN before w1[x] (which says w2[y] didn’t run)
– not w2[y] (which says w1[x] wasn’t aborted)
1/18/12
31
LSN-based logging (cont’d)
• w1[x] w2[y] c2 a1 (what LSN does a1 put on P?)
• Why not use a1’s LSN?
– must latch all of T1’s updated pages before logging a1
– else, some w3[z] on P could be logged after a1 but be
executed before a1, leaving a1’s LSN on P instead of
w3[z]’s.
1/18/12
32
Logging Undo’s
• Log the undo(U) operation, and use its LSN on P
– CLR = Compensation Log Record = a logged undo
– Do this for all undo’s (during normal abort or recovery)
• This preserves the invariant that the LSN on each page P
exactly describes P’s state relative to the log.
– P contains all updates to P up to and including the LSN
on P, and no updates with larger LSN.
• So every aborted transaction’s log is a palindrome
of update records and undo records.
• Restart processes Commit and Abort the same way
– It redoes the transaction’s log records.
– It only aborts active transactions after the forward scan1/18/12
33
Logging Undo’s (cont’d)
• Tricky issues
– Multi-page updates (it’s best to avoid them)
– Restart grows the log by logging undos.
Each time it crashes, it has more log to process
• Optimization - CLR points to the transaction’s log
record preceding the corresponding “do”.
– Splices out undone work
– Avoids undoing undone work during abort
– Avoids growing the log due to aborts during Restart
DoA1 DoB1 DoC1 UndoC1 UndoB1... ... ... ... ...
1/18/12
34
Restart Algorithm (rev 3)
• Starting with the penultimate checkpoint, scan forward in
the log.
– Maintain list of active transactions (initialized to content
of checkpoint record).
– Redo an update record U for page P only if
LSN(P) < LSN(U).
– After you’re done scanning, abort all active transactions.
Log undos while aborting. Log an abort record when
you’re done aborting.
• This style of record logging, logging undo’s, and replaying
history during restart was popularized in the ARIES
algorithm by Mohan et al at IBM, published in 1992.
1/18/12
35
Analysis Pass
• Log flush record after a flush occurs (to avoid redo)
• To improve redo efficiency, pre-analyze the log
– Requires accessing only the log, not the database
• Build a Dirty Page Table that contains list of dirty pages
and, for each page, the oldestLSN that must be redone
– Flush(P) says to delete P from Dirty Page Table
– Write(P) adds P to Dirty Page Table, if it isn’t there
– Include Dirty Page Table in checkpoint records
– Start at last checkpt record, scan forward building the table
• Also build list of active txns with lastLSN
1/18/12
36
Analysis Pass (cont’d)
• Start redo at oldest oldestLSN in Dirty Page Table
– Then scan forward in the log, as usual
– Only redo records that might need it,
that is, those where LSN(redo record)  oldestLSN,
hence there’s no later flush record
– Also use Dirty Page Table to guide page prefetching
• Prefetch pages in oldestLSN order in Dirty Page Table
1/18/12
37
Logging B-Tree Operations
• To split a page
– log records deleted from the first page (for undo)
– log records inserted to the second page (for redo)
– they’re the same records, so long them once!
• This doubles the amount of log used for inserts
– log the inserted data when the record is first inserted
– if a page has N records, log N/2 records, every time a
page is split, which occurs once for every N/2 insertions
1/18/12
38
User-level Optimizations
• If checkpoint frequency is controllable,
then run some experiments.
• Partition DB across more disks to reduce
restart time (if Restart is multithreaded).
• Increase resources (e.g. cache) available to
restart program.
1/18/12
39
Shared Disk System
• Use version number on the page and in the lock
Process A
P
r2
Process B
P
r7
P
• Can cache a page in two processes
that write-lock different records
• Only one process at a time can
have write privilege
• Use a global lock manager
• When setting a write lock on P,
may need to refresh the cached
copy from disk (if another process
recently updated it)
1/18/12
40
Shared Disk System
• When a process sets the lock, it tells the lock
manager version number of its cached page.
• A process increments the version number the first
time it updates a cached page.
• When a process is done with an updated page, it
flushes the page to disk and then increments
version number in the lock.
• Need a shared log manager, possibly with local
caching in each machine.
1/18/12
41
4. Media Failures
• A media failure is the loss of some of stable storage.
• Most disks have MTBF over 10 years.
• Still, if you have 10 disks ...
• So shadowed disks are important.
– Writes go to both copies. Handshake between Writes to
avoid common failure modes (e.g. power failure).
– Service each read from one copy.
• To bring up a new shadow
– Copy tracks from good disk to new disk, one at a time.
– A Write goes to both disks if the track has been copied.
– A read goes to the good disk, until the track is copied.1/18/12
42
RAID
• RAID - redundant array of inexpensive disks
– Use an array of N disks in parallel
– A stripe is an array of the ith block from each disk
– A stripe is partitioned as follows:
... ...
M data blocks N-M error
correction blocks
• Each stripe is one logical block, which can
survive a single-disk failure.
1/18/12
43
Where to Use Disk Redundancy?
• Preferably for both the DB and log.
• But at least for the log
– In an undo algorithm, it’s the only place that
has certain before images.
– In a redo algorithm, it’s the only place that has
certain after images.
• If you don’t shadow the log, it’s a single
point of failure.
1/18/12
44
Archiving
• An archive is a database snapshot used for media recovery.
– Load the archive and redo the log
• To take an archive snapshot
– write a start-archive record to the log
– copy the DB to an archive medium
– write an end-archive record to the log
(or simply mark the archive as complete)
• So, the end-archive record says that all updates before the
start-archive record are in the archive
• Can use the standard LSN-based Restart algorithm to
recover an archive copy relative to the log.
1/18/12
45
Archiving (cont’d)
• To archive the log, use 2 pairs of shadowed disks. Dump
one pair to archive (e.g. tape) while using the other pair for
on-line logging. (I.e. ping-pong to avoid disk contention)
– Optimization - only archive committed pages and
purge undo information from the log before archiving
• To do incremental archive, use an archive bit in each page.
– Each page update sets the bit.
– To archive, copies pages with the bit set, then clear it.
• To reduce media recovery time
– rebuild archive from incremental copies
– partition log to enable fast recovery of a few corrupted
pages1/18/12

More Related Content

What's hot

[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery[Altibase] 13 backup and recovery
[Altibase] 13 backup and recoveryaltistory
 
[Altibase] 12 replication part5 (optimization and monitoring)
[Altibase] 12 replication part5 (optimization and monitoring)[Altibase] 12 replication part5 (optimization and monitoring)
[Altibase] 12 replication part5 (optimization and monitoring)altistory
 
System management with rpm and yadt
System management with rpm and yadtSystem management with rpm and yadt
System management with rpm and yadtIngmar Krusch
 
Database recovery techniques
Database recovery techniquesDatabase recovery techniques
Database recovery techniquespusp220
 
Dbms ii mca-ch11-recovery-2013
Dbms ii mca-ch11-recovery-2013Dbms ii mca-ch11-recovery-2013
Dbms ii mca-ch11-recovery-2013Prosanta Ghosh
 
17. Recovery System in DBMS
17. Recovery System in DBMS17. Recovery System in DBMS
17. Recovery System in DBMSkoolkampus
 
Data base recovery
Data base recoveryData base recovery
Data base recoveryVisakh V
 
10 Problems with your RMAN backup script
10 Problems with your RMAN backup script10 Problems with your RMAN backup script
10 Problems with your RMAN backup scriptYury Velikanov
 
Lesson12 recovery architectures
Lesson12 recovery architecturesLesson12 recovery architectures
Lesson12 recovery architecturesRaval Vijay
 
database recovery techniques
database recovery techniques database recovery techniques
database recovery techniques Kalhan Liyanage
 
Collaborate 2012 - RMAN Eliminate the mystery
Collaborate 2012 - RMAN Eliminate the mysteryCollaborate 2012 - RMAN Eliminate the mystery
Collaborate 2012 - RMAN Eliminate the mysteryNelson Calero
 
Topic 4 database recovery
Topic 4 database recoveryTopic 4 database recovery
Topic 4 database recoveryacap paei
 
Rtos princples adn case study
Rtos princples adn case studyRtos princples adn case study
Rtos princples adn case studyvanamali_vanu
 
Database recovery
Database recoveryDatabase recovery
Database recoveryStudent
 

What's hot (20)

[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery
 
[Altibase] 12 replication part5 (optimization and monitoring)
[Altibase] 12 replication part5 (optimization and monitoring)[Altibase] 12 replication part5 (optimization and monitoring)
[Altibase] 12 replication part5 (optimization and monitoring)
 
System management with rpm and yadt
System management with rpm and yadtSystem management with rpm and yadt
System management with rpm and yadt
 
Ch8
Ch8Ch8
Ch8
 
Database recovery techniques
Database recovery techniquesDatabase recovery techniques
Database recovery techniques
 
Aries
AriesAries
Aries
 
Recovery
RecoveryRecovery
Recovery
 
Unit 07 dbms
Unit 07 dbmsUnit 07 dbms
Unit 07 dbms
 
Dbms ii mca-ch11-recovery-2013
Dbms ii mca-ch11-recovery-2013Dbms ii mca-ch11-recovery-2013
Dbms ii mca-ch11-recovery-2013
 
17. Recovery System in DBMS
17. Recovery System in DBMS17. Recovery System in DBMS
17. Recovery System in DBMS
 
Data base recovery
Data base recoveryData base recovery
Data base recovery
 
10 Problems with your RMAN backup script
10 Problems with your RMAN backup script10 Problems with your RMAN backup script
10 Problems with your RMAN backup script
 
Lesson12 recovery architectures
Lesson12 recovery architecturesLesson12 recovery architectures
Lesson12 recovery architectures
 
database recovery techniques
database recovery techniques database recovery techniques
database recovery techniques
 
Collaborate 2012 - RMAN Eliminate the mystery
Collaborate 2012 - RMAN Eliminate the mysteryCollaborate 2012 - RMAN Eliminate the mystery
Collaborate 2012 - RMAN Eliminate the mystery
 
Topic 4 database recovery
Topic 4 database recoveryTopic 4 database recovery
Topic 4 database recovery
 
Rtos princples adn case study
Rtos princples adn case studyRtos princples adn case study
Rtos princples adn case study
 
Recovery system
Recovery systemRecovery system
Recovery system
 
Database recovery
Database recoveryDatabase recovery
Database recovery
 
ARIES Recovery Algorithms
ARIES Recovery AlgorithmsARIES Recovery Algorithms
ARIES Recovery Algorithms
 

Viewers also liked

19 structured files
19 structured files19 structured files
19 structured filesashish61_scs
 
The buransh mahotsav_report
The buransh mahotsav_report The buransh mahotsav_report
The buransh mahotsav_report Prasanna Kapoor
 
5 application serversforproject
5 application serversforproject5 application serversforproject
5 application serversforprojectashish61_scs
 
14 scaleabilty wics
14 scaleabilty wics14 scaleabilty wics
14 scaleabilty wicsashish61_scs
 
04 transaction models
04 transaction models04 transaction models
04 transaction modelsashish61_scs
 
Digital: An Inflection Point for Mankind
Digital: An Inflection Point for MankindDigital: An Inflection Point for Mankind
Digital: An Inflection Point for MankindBernard Panes
 
10 Ways to be a great Solution Architect
10 Ways to be a great Solution Architect10 Ways to be a great Solution Architect
10 Ways to be a great Solution ArchitectBernard Panes
 
Mostafa Wael Farouk -Techno-Functional
Mostafa Wael Farouk -Techno-FunctionalMostafa Wael Farouk -Techno-Functional
Mostafa Wael Farouk -Techno-FunctionalMostafa Wael
 

Viewers also liked (15)

19 structured files
19 structured files19 structured files
19 structured files
 
Ibrahim Thesis
Ibrahim ThesisIbrahim Thesis
Ibrahim Thesis
 
01 whirlwind tour
01 whirlwind tour01 whirlwind tour
01 whirlwind tour
 
The buransh mahotsav_report
The buransh mahotsav_report The buransh mahotsav_report
The buransh mahotsav_report
 
5 application serversforproject
5 application serversforproject5 application serversforproject
5 application serversforproject
 
Jeopardy (cecilio)
Jeopardy (cecilio)Jeopardy (cecilio)
Jeopardy (cecilio)
 
14 scaleabilty wics
14 scaleabilty wics14 scaleabilty wics
14 scaleabilty wics
 
04 transaction models
04 transaction models04 transaction models
04 transaction models
 
Digital: An Inflection Point for Mankind
Digital: An Inflection Point for MankindDigital: An Inflection Point for Mankind
Digital: An Inflection Point for Mankind
 
Assignment.1
Assignment.1Assignment.1
Assignment.1
 
Solution6.2012
Solution6.2012Solution6.2012
Solution6.2012
 
10 Ways to be a great Solution Architect
10 Ways to be a great Solution Architect10 Ways to be a great Solution Architect
10 Ways to be a great Solution Architect
 
Solution8 v2
Solution8 v2Solution8 v2
Solution8 v2
 
11 bpm
11 bpm11 bpm
11 bpm
 
Mostafa Wael Farouk -Techno-Functional
Mostafa Wael Farouk -Techno-FunctionalMostafa Wael Farouk -Techno-Functional
Mostafa Wael Farouk -Techno-Functional
 

Similar to 4 db recovery

Backing Up and Recovery
Backing Up and RecoveryBacking Up and Recovery
Backing Up and RecoveryMaham Huda
 
ch 5 Daatabase Recovery.ppt
ch 5 Daatabase Recovery.pptch 5 Daatabase Recovery.ppt
ch 5 Daatabase Recovery.pptAdemeCheklie
 
Presentation recovery manager (rman) configuration and performance tuning ...
Presentation    recovery manager (rman) configuration and performance tuning ...Presentation    recovery manager (rman) configuration and performance tuning ...
Presentation recovery manager (rman) configuration and performance tuning ...xKinAnx
 
10 ways to improve your rman script
10 ways to improve your rman script10 ways to improve your rman script
10 ways to improve your rman scriptMaris Elsins
 
Databases: Backup and Recovery
Databases: Backup and RecoveryDatabases: Backup and Recovery
Databases: Backup and RecoveryDamian T. Gordon
 
recovery system
recovery systemrecovery system
recovery systemshreeuva
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99ashish61_scs
 
Apache Apex Fault Tolerance and Processing Semantics
Apache Apex Fault Tolerance and Processing SemanticsApache Apex Fault Tolerance and Processing Semantics
Apache Apex Fault Tolerance and Processing SemanticsApache Apex
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...Dave Stokes
 
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdfLecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdfbadboy624277
 
MySQL Performance - Best practices
MySQL Performance - Best practices MySQL Performance - Best practices
MySQL Performance - Best practices Ted Wennmark
 
Proper Care and Feeding of a MySQL Database for Busy Linux Administrators
Proper Care and Feeding of a MySQL Database for Busy Linux AdministratorsProper Care and Feeding of a MySQL Database for Busy Linux Administrators
Proper Care and Feeding of a MySQL Database for Busy Linux AdministratorsDave Stokes
 
16. PagingImplementIssused.pptx
16. PagingImplementIssused.pptx16. PagingImplementIssused.pptx
16. PagingImplementIssused.pptxMyName1sJeff
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataRyan Bosshart
 

Similar to 4 db recovery (20)

Backing Up and Recovery
Backing Up and RecoveryBacking Up and Recovery
Backing Up and Recovery
 
ch-5 advanced db.pdf
ch-5 advanced db.pdfch-5 advanced db.pdf
ch-5 advanced db.pdf
 
5-Recovery.ppt
5-Recovery.ppt5-Recovery.ppt
5-Recovery.ppt
 
DBMS Vardhaman.pdf
DBMS Vardhaman.pdfDBMS Vardhaman.pdf
DBMS Vardhaman.pdf
 
ch 5 Daatabase Recovery.ppt
ch 5 Daatabase Recovery.pptch 5 Daatabase Recovery.ppt
ch 5 Daatabase Recovery.ppt
 
Presentation recovery manager (rman) configuration and performance tuning ...
Presentation    recovery manager (rman) configuration and performance tuning ...Presentation    recovery manager (rman) configuration and performance tuning ...
Presentation recovery manager (rman) configuration and performance tuning ...
 
10 ways to improve your rman script
10 ways to improve your rman script10 ways to improve your rman script
10 ways to improve your rman script
 
Operating System
Operating SystemOperating System
Operating System
 
Databases: Backup and Recovery
Databases: Backup and RecoveryDatabases: Backup and Recovery
Databases: Backup and Recovery
 
recovery system
recovery systemrecovery system
recovery system
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99
 
Apache Apex Fault Tolerance and Processing Semantics
Apache Apex Fault Tolerance and Processing SemanticsApache Apex Fault Tolerance and Processing Semantics
Apache Apex Fault Tolerance and Processing Semantics
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
 
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdfLecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
Lecture1414_20592_Lecture1419_Transactions.ppt (2).pdf
 
MySQL Performance - Best practices
MySQL Performance - Best practices MySQL Performance - Best practices
MySQL Performance - Best practices
 
Proper Care and Feeding of a MySQL Database for Busy Linux Administrators
Proper Care and Feeding of a MySQL Database for Busy Linux AdministratorsProper Care and Feeding of a MySQL Database for Busy Linux Administrators
Proper Care and Feeding of a MySQL Database for Busy Linux Administrators
 
16. PagingImplementIssused.pptx
16. PagingImplementIssused.pptx16. PagingImplementIssused.pptx
16. PagingImplementIssused.pptx
 
Optimizing Linux Servers
Optimizing Linux ServersOptimizing Linux Servers
Optimizing Linux Servers
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 

More from ashish61_scs (20)

7 concurrency controltwo
7 concurrency controltwo7 concurrency controltwo
7 concurrency controltwo
 
Transactions
TransactionsTransactions
Transactions
 
22 levine
22 levine22 levine
22 levine
 
21 domino mohan-1
21 domino mohan-121 domino mohan-1
21 domino mohan-1
 
20 access paths
20 access paths20 access paths
20 access paths
 
17 wics99 harkey
17 wics99 harkey17 wics99 harkey
17 wics99 harkey
 
16 greg hope_com_wics
16 greg hope_com_wics16 greg hope_com_wics
16 greg hope_com_wics
 
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand records
 
14 turing wics
14 turing wics14 turing wics
14 turing wics
 
13 tm adv
13 tm adv13 tm adv
13 tm adv
 
11 tm
11 tm11 tm
11 tm
 
10b rm
10b rm10b rm
10b rm
 
10a log
10a log10a log
10a log
 
09 workflow
09 workflow09 workflow
09 workflow
 
08 message and_queues_dieter_gawlick
08 message and_queues_dieter_gawlick08 message and_queues_dieter_gawlick
08 message and_queues_dieter_gawlick
 
06 07 lock
06 07 lock06 07 lock
06 07 lock
 
05 tp mon_orbs
05 tp mon_orbs05 tp mon_orbs
05 tp mon_orbs
 
03 fault model
03 fault model03 fault model
03 fault model
 
02 fault tolerance
02 fault tolerance02 fault tolerance
02 fault tolerance
 
Solution5.2012
Solution5.2012Solution5.2012
Solution5.2012
 

Recently uploaded

HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 

Recently uploaded (20)

HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 

4 db recovery

  • 1. 1/18/12 1 4. Database System Recovery CSEP 545 Transaction Processing for E-Commerce Philip A. Bernstein Copyright ©2012 Philip A. Bernstein
  • 2. 1/18/12 2 Outline 1. Introduction 2. Recovery Manager 3. Two Non-Logging Algorithms 4. Log-based Recovery 5. Media Failure
  • 3. 3 1. Introduction • A database may become inconsistent because of – Transaction failure (abort) – Database system failure (possibly caused by OS crash) – Media crash (disk-resident data is corrupted) • The recovery system ensures the database contains exactly those updates produced by committed transactions – I.e. atomicity and durability, despite failures 1/18/12
  • 4. 4 Assumptions • Two-phase locking, holding write locks until after a transaction commits. This implies – Recoverability – No cascading aborts – Strictness (never overwrite uncommitted data) • Page-level everything (for now) – Database is a set of pages – Page-granularity locks – A transaction’s read or write operation operates on an entire page – We’ll look at record granularity later 1/18/12
  • 5. 5 Storage Model • Stable database - survives system failures • Cache (volatile) - contains copies of some pages, which are lost by a system failure Stable Database Log Read, Write Fetch, Flush Pin, Unpin, Deallocate Cache Manager Cache Read, Write 1/18/12
  • 6. 6 Stable Storage • Write(P) overwrites the entire contents of P on the disk • If Write is unsuccessful, the error might be detected on the next read ... – e.g. page checksum error => page is corrupted • … or maybe not – Write correctly wrote to the wrong location • Write is the only operation that’s atomic with respect to failures and whose successful execution can be determined by recovery procedures. 1/18/12
  • 7. 7 The Cache • Cache is divided into page-sized slots. • Dirty bit tells if the page was updated since it was last written to disk. • Pin count tells number of pin ops without unpins Page Dirty Bit Cache Address Pin Count P2 1 91976 1 P47 0 812 2 P21 1 10101 0 • Fetch(P) - read P into a cache slot. Return slot address. • Flush(P) - If P’s slot is dirty and unpinned, then write it to disk (i.e. return after the disk acks). 1/18/12
  • 8. 8 The Cache (cont’d) • Pin(P) - make P’s slot non-flushable & non-replaceable. – Non-flushable because P’s content may be inconsistent. – Non-replaceable because someone has a pointer into P or is accessing P’s content. • Unpin(P) - release it. • Deallocate(P) - allow P’s slot to be reused (even if dirty). 1/18/12
  • 9. 9 Big Picture Database System Query Optimizer Query Executor Access Method (record-oriented files) Page-oriented Files Database Recovery manager Cache manager Page file manager Fetch, Flush Pin, Unpin, Deallocate • Record manager is the main user of the cache manager. • It calls Fetch(P) and Pin(P) to ensure the page is in main memory, non-flushable, and non-replaceable. 1/18/12
  • 10. 10 Latches • A page is a data structure with many fields. • A latch is a short-term lock that gives its owner access to a page in main memory. • A read latch allows the owner to read the content. • A write latch allows the owner to modify the content. • The latch is usually a bit in a control structure, not an entry in the lock manager. It can be set and released much faster than a lock. • There’s no deadlock detection for latches. 1/18/12
  • 11. 11 The Log • A sequential file of records describing updates: – Address of updated page. – Id of transaction that did the update. – Before-image and after-image of the page. • Whenever you update the cache, also update the log. • Log records for Commit(Ti) and Abort(Ti). • Some older systems separated before-images and after-images into separate log files. • If opi conflicts with and executes before opk, then opi’s log record must precede opk’s log record. – Recovery will replay operations in log-record-order.1/18/12
  • 12. 12 The Log (cont’d) • To update records on a page: – Fetch(P) read P into cache – Pin(P) ensure P isn’t flushed – write lock (P) for two-phase locking – write latch (P) get exclusive access to P – update P update P in cache – log the update to P append it to the log – unlatch (P) release exclusive access – Unpin(P) allow P to be flushed 1/18/12
  • 13. 13 2. Recovery Manager • Processes Commit, Abort and Restart • Commit(T) – Write T’s updated pages to stable storage atomically, even if the system crashes • Abort(T) – Undo the effects of T’s writes • Restart = recover from system failure – Abort all transactions that were not committed at the time of the previous failure – Fix stable storage so it includes all committed writes and no uncommitted ones (so it can be read by new txns) 1/18/12
  • 14. 14 Recovery Manager Model Stable Database Log Read, Write Pin, Unpin Fetch Cache Manager Cache Read, Write Recovery Manager Flush Deallocate Transaction 1 Transaction 2 Transaction N Commit, Abort, Restart Read, Write Flush, dealloc for normal operat’n Restart uses Fetch, Pin, Unpin1/18/12
  • 15. 15 Implementing Abort(T) • Suppose T wrote page P. • If P was not transferred to stable storage, then deallocate its cache slot. • If it was transferred, then P’s before-image must be in stable storage (else you couldn’t undo after a system failure). • Undo Rule - Do not flush an uncommitted update of P until P’s before-image is stable. (Ensures undo is possible.) – Write-Ahead Log Protocol - Do not … until P’s before-image is in the log.1/18/12
  • 16. 16 Avoiding Undo • Avoid the problem implied by the Undo Rule by never flushing uncommitted updates. – Avoids stable logging of before-images. – Don’t need to undo updates after a system failure. • A recovery algorithm requires undo if an update of an uncommitted transaction can be flushed. – Usually called a steal algorithm, because it allows a dirty cache page to be “stolen.” 1/18/12
  • 17. 17 Implementing Commit(T) • Commit must be atomic. So it must be implemented by a disk write. • Suppose T wrote P, T committed, and then the system fails. P must be in stable storage. • Redo rule - Don’t commit a transaction until the after-images of all pages it wrote are in stable storage (in the database or log). (Ensures redo is possible.) – Often called the Force-At-Commit rule. 1/18/12
  • 18. 18 Avoiding Redo • To avoid redo, flush all of T’s updates to the stable database before it commits. (They must be in stable storage.) – Usually called a Force algorithm, because updates are forced to disk before commit. – It’s easy, because you don’t need stable bookkeeping of after-images. – But it’s inefficient for hot pages. (Consider TPC-A/B.) • Conversely, a recovery algorithm requires redo if a transaction may commit before all of its updates are in the stable database. 1/18/12
  • 19. 19 Avoiding Undo and Redo? • To avoid both undo and redo – Never flush uncommitted updates (to avoid undo), and – Flush all of T’s updates to the stable database before it commits (to avoid redo). • Thus, it requires installing all of a transaction’s updates into the stable database in one write to disk • It can be done, but it isn’t efficient for short transactions and record-level updates. – Use shadow paging. 1/18/12
  • 20. 20 Implementing Restart • To recover from a system failure – Abort transactions that were active at the failure. – For every committed transaction, redo updates that are in the log but not the stable database. – Resume normal processing of transactions. • Idempotent operation - many executions of the operation have the same effect as one execution. • Restart must be idempotent. If it’s interrupted by a failure, then it re-executes from the beginning. • Restart contributes to unavailability. So make it fast! 1/18/12
  • 21. 21 3. Log-based Recovery • Logging is the most popular mechanism for implementing recovery algorithms. • The recovery manager implements – Commit - by writing a commit record to the log and flushing the log (satisfies the Redo Rule). – Abort - by using the transaction’s log records to restore before-images. – Restart - by scanning the log and undoing and redoing operations as necessary. • The algorithms are fast since they use sequential log I/O in place of random database I/O. They greatly affect TP and Restart performance.1/18/12
  • 22. 22 Implementing Commit • Every commit requires a log flush. • If you can do K log flushes per second, then K is your maximum transaction throughput. • Group Commit Optimization - when processing commit, if the last log page isn’t full, delay the flush to give it time to fill. • If there are multiple data managers on a system, then each data mgr must flush its log to commit. – If each data mgr isn’t using its log’s update bandwidth, then a shared log saves log flushes. – A good idea, but rarely supported commercially.1/18/12
  • 23. 23 Implementing Abort • To implement Abort(T), scan T’s log records and install before images. • To speed up Abort, back-chain each transaction’s update records. Transaction Descriptors Transaction last log record T7 Start of Log End of Log Ti Pk null pointer Ti Pm backpointer Ti’s first log record 1/18/12
  • 24. 24 Satisfying the Undo Rule • To implement the Write-Ahead Log Protocol, tag each cache slot with the log sequence number (LSN) of the last update record to that slot’s page. Page Dirty Cache Pin LSN Bit Address Count P47 1 812 2 P21 1 10101 0 Log Start End On disk Main Memory • Cache manager won’t flush a page P until P’s last updated record, pointed to by LSN, is on disk. • P’s last log record is usually stable before Flush(P), so this rarely costs an extra flush • LSN must be updated while latch is held on P’s slot1/18/12
  • 25. 25 Implementing Restart (rev 1) • Assume undo and redo are required. • Scan the log backwards, starting at the end. – How do you find the end? • Construct a commit list and recovered-page-list during the scan (assuming page level logging). • Commit(T) record => add T to commit list • Update record for P by T – if P is not in the recovered-page-list then • Add P to the recovered-page-list. • If T is in the commit list, then redo the update, else undo the update.
  • 26. 26 Checkpoints • Problem - Prevent Restart from scanning back to the start of the log • A checkpoint is a procedure to limit the amount of work for Restart • Cache-consistent checkpointing – Stop accepting new update, commit, and abort operations – Make list of [active transaction, pointer to last log record] – Flush all dirty pages – Append a checkpoint record to log; include the list – Resume normal processing • Database and log are now mutually consistent 1/18/12
  • 27. 27 Restart Algorithm (rev 2) • No need to redo records before last checkpoint, so – Starting with the last checkpoint, scan forward in the log. – Redo all update records. Process all aborts. Maintain list of active transactions (initialized to content of checkpoint record). – After you’re done scanning, abort all active transactions. • Restart time is proportional to the amount of log after the last checkpoint. • Reduce restart time by checkpointing frequently. • Thus, checkpointing must be cheap. 1/18/12
  • 28. 28 Fuzzy Checkpointing • Make checkpoints cheap by avoiding synchronized flushing of dirty cache at checkpoint time. – Stop accepting new update, commit, and abort operations – Make a list of all dirty pages in cache – Make list of [active transaction, pointer to last log record] – Append a checkpoint record to log; include the list – Resume normal processing – Initiate low priority flush of all dirty pages • Don’t checkpoint again until all of the last checkpoint’s dirty pages are flushed. • Restart begins at second-to-last (penultimate) checkpoint. • Checkpoint frequency depends on disk bandwidth.1/18/12
  • 29. 29 Operation Logging • Record locking requires (at least) record logging. – Suppose records x and y are on page P – w1[x] w2[y] abort1 commit2 (not strict w.r.t. pages) • Record logging requires Restart to read a page before updating it. This reduces log size. • Further reduce log size by logging description of an update, not the entire before/after image of record. – Only log after-image of an insertion – Only log fields being updated • Now Restart can’t blindly redo. – E.g., it must not insert a record twice 1/18/12
  • 30. 30 LSN-based logging • Each database page P’s header has the LSN of the last log record whose operation updated P. • Restart compares log record and page LSN before redoing the log record’s update U. – Redo the update only if LSN(P) < LSN(U) • Undo is a problem. If U’s transaction aborts and you undo U, what LSN to put on the page? – Suppose T1 and T2 update records x and y on P – w1[x] w2[y] c2 a1 (what LSN does a1 put on P?) – not LSN before w1[x] (which says w2[y] didn’t run) – not w2[y] (which says w1[x] wasn’t aborted) 1/18/12
  • 31. 31 LSN-based logging (cont’d) • w1[x] w2[y] c2 a1 (what LSN does a1 put on P?) • Why not use a1’s LSN? – must latch all of T1’s updated pages before logging a1 – else, some w3[z] on P could be logged after a1 but be executed before a1, leaving a1’s LSN on P instead of w3[z]’s. 1/18/12
  • 32. 32 Logging Undo’s • Log the undo(U) operation, and use its LSN on P – CLR = Compensation Log Record = a logged undo – Do this for all undo’s (during normal abort or recovery) • This preserves the invariant that the LSN on each page P exactly describes P’s state relative to the log. – P contains all updates to P up to and including the LSN on P, and no updates with larger LSN. • So every aborted transaction’s log is a palindrome of update records and undo records. • Restart processes Commit and Abort the same way – It redoes the transaction’s log records. – It only aborts active transactions after the forward scan1/18/12
  • 33. 33 Logging Undo’s (cont’d) • Tricky issues – Multi-page updates (it’s best to avoid them) – Restart grows the log by logging undos. Each time it crashes, it has more log to process • Optimization - CLR points to the transaction’s log record preceding the corresponding “do”. – Splices out undone work – Avoids undoing undone work during abort – Avoids growing the log due to aborts during Restart DoA1 DoB1 DoC1 UndoC1 UndoB1... ... ... ... ... 1/18/12
  • 34. 34 Restart Algorithm (rev 3) • Starting with the penultimate checkpoint, scan forward in the log. – Maintain list of active transactions (initialized to content of checkpoint record). – Redo an update record U for page P only if LSN(P) < LSN(U). – After you’re done scanning, abort all active transactions. Log undos while aborting. Log an abort record when you’re done aborting. • This style of record logging, logging undo’s, and replaying history during restart was popularized in the ARIES algorithm by Mohan et al at IBM, published in 1992. 1/18/12
  • 35. 35 Analysis Pass • Log flush record after a flush occurs (to avoid redo) • To improve redo efficiency, pre-analyze the log – Requires accessing only the log, not the database • Build a Dirty Page Table that contains list of dirty pages and, for each page, the oldestLSN that must be redone – Flush(P) says to delete P from Dirty Page Table – Write(P) adds P to Dirty Page Table, if it isn’t there – Include Dirty Page Table in checkpoint records – Start at last checkpt record, scan forward building the table • Also build list of active txns with lastLSN 1/18/12
  • 36. 36 Analysis Pass (cont’d) • Start redo at oldest oldestLSN in Dirty Page Table – Then scan forward in the log, as usual – Only redo records that might need it, that is, those where LSN(redo record)  oldestLSN, hence there’s no later flush record – Also use Dirty Page Table to guide page prefetching • Prefetch pages in oldestLSN order in Dirty Page Table 1/18/12
  • 37. 37 Logging B-Tree Operations • To split a page – log records deleted from the first page (for undo) – log records inserted to the second page (for redo) – they’re the same records, so long them once! • This doubles the amount of log used for inserts – log the inserted data when the record is first inserted – if a page has N records, log N/2 records, every time a page is split, which occurs once for every N/2 insertions 1/18/12
  • 38. 38 User-level Optimizations • If checkpoint frequency is controllable, then run some experiments. • Partition DB across more disks to reduce restart time (if Restart is multithreaded). • Increase resources (e.g. cache) available to restart program. 1/18/12
  • 39. 39 Shared Disk System • Use version number on the page and in the lock Process A P r2 Process B P r7 P • Can cache a page in two processes that write-lock different records • Only one process at a time can have write privilege • Use a global lock manager • When setting a write lock on P, may need to refresh the cached copy from disk (if another process recently updated it) 1/18/12
  • 40. 40 Shared Disk System • When a process sets the lock, it tells the lock manager version number of its cached page. • A process increments the version number the first time it updates a cached page. • When a process is done with an updated page, it flushes the page to disk and then increments version number in the lock. • Need a shared log manager, possibly with local caching in each machine. 1/18/12
  • 41. 41 4. Media Failures • A media failure is the loss of some of stable storage. • Most disks have MTBF over 10 years. • Still, if you have 10 disks ... • So shadowed disks are important. – Writes go to both copies. Handshake between Writes to avoid common failure modes (e.g. power failure). – Service each read from one copy. • To bring up a new shadow – Copy tracks from good disk to new disk, one at a time. – A Write goes to both disks if the track has been copied. – A read goes to the good disk, until the track is copied.1/18/12
  • 42. 42 RAID • RAID - redundant array of inexpensive disks – Use an array of N disks in parallel – A stripe is an array of the ith block from each disk – A stripe is partitioned as follows: ... ... M data blocks N-M error correction blocks • Each stripe is one logical block, which can survive a single-disk failure. 1/18/12
  • 43. 43 Where to Use Disk Redundancy? • Preferably for both the DB and log. • But at least for the log – In an undo algorithm, it’s the only place that has certain before images. – In a redo algorithm, it’s the only place that has certain after images. • If you don’t shadow the log, it’s a single point of failure. 1/18/12
  • 44. 44 Archiving • An archive is a database snapshot used for media recovery. – Load the archive and redo the log • To take an archive snapshot – write a start-archive record to the log – copy the DB to an archive medium – write an end-archive record to the log (or simply mark the archive as complete) • So, the end-archive record says that all updates before the start-archive record are in the archive • Can use the standard LSN-based Restart algorithm to recover an archive copy relative to the log. 1/18/12
  • 45. 45 Archiving (cont’d) • To archive the log, use 2 pairs of shadowed disks. Dump one pair to archive (e.g. tape) while using the other pair for on-line logging. (I.e. ping-pong to avoid disk contention) – Optimization - only archive committed pages and purge undo information from the log before archiving • To do incremental archive, use an archive bit in each page. – Each page update sets the bit. – To archive, copies pages with the bit set, then clear it. • To reduce media recovery time – rebuild archive from incremental copies – partition log to enable fast recovery of a few corrupted pages1/18/12