SlideShare ist ein Scribd-Unternehmen logo
1 von 42
ARIES Recovery Algorithm
ARIES: A Transaction Recovery Method
Supporting Fine Granularity Locking and Partial
Rollback Using Write-Ahead Logging
C. Mohan, D. Haderle, B. Lindsay,
H. Pirahesh, and P. Schwarz
ACM Transactions on Database Systems, 17(1), 1992

Slides prepared by
S. Sudarshan
1

©Silberschatz, Korth and Sudarshan
Recovery Scheme Metrics
Concurrency
Functionality
Complexity
Overheads:
 Space and I/O (Seq and random) during
Normal processing and recovery

Failure Modes:
 transaction/process, system and media/device

2

©Silberschatz, Korth and Sudarshan
Key Features of Aries

Physical Logging, and
Operation logging
 e.g. Add 5 to A,

or

insert K in B-tree B

Page oriented redo
 recovery independence amongst objects

Logical undo (may span multiple pages)
WAL + Inplace Updates

3

©Silberschatz, Korth and Sudarshan
Key Aries Features (contd)
Transaction Rollback
 Total vs partial (up to a savepoint)
 Nested rollback - partial rollback followed by another
(partial/total) rollback

Fine-grain concurrency control
 supports tuple level locks on records, and key value locks on
indices

4

©Silberschatz, Korth and Sudarshan
More Aries Features
Flexible storage management
 Physiological redo logging:
 logical operation within a single page
 no need to log intra-page data movement for compaction
 LSN used to avoid repeated redos (more on LSNs later)

Recovery independence
 can recover some pages separately from others

Fast recovery and parallelism

5

©Silberschatz, Korth and Sudarshan
Latches and Locks
Latches
 used to guarantee physical consistency
 short duration
 no deadlock detection
 direct addressing (unlike hash table for locks)
 often using atomic instructions
 latch acquisition/release is much faster than lock

acquisition/release

Lock requests
 conditional, instant duration, manual duration, commit duration

6

©Silberschatz, Korth and Sudarshan
Buffer Manager
Fix, unfix and fix_new (allocate and fix new pg)
Aries uses steal policy - uncommitted writes may be
output to disk (contrast with no-steal policy)
Aries uses no-force policy (updated pages need not
be forced to disk before commit)
dirty page: buffer version has updated not yet reflected
on disk
 dirty pages written out in a continuous manner to disk

7

©Silberschatz, Korth and Sudarshan
Buffer Manager (Contd)
BCB: buffer control blocks
 stores page ID, dirty status, latch, fix-count

Latching of pages = latch on buffer slot
 limits number of latches required
 but page must be fixed before latching

8

©Silberschatz, Korth and Sudarshan
Some Notation

LSN: Log Sequence Number
 = logical address of record in the log

Page LSN: stored in page
 LSN of most recent update to page

PrevLSN: stored in log record
 identifies previous log record for that transaction

Forward processing (normal operation)
Normal undo

vs. restart undo

9

©Silberschatz, Korth and Sudarshan
Compensation Log Records
CLRs: redo only log records
Used to record actions performed during transaction
rollback
 one CLR for each normal log record which is undone

CLRs have a field UndoNxtLSN indicating which log
record is to be undone next
 avoids repeated undos by bypassing already undo records

– needed in case of restarts during transaction rollback)
 in contrast, IBM IMS may repeat undos, and AS400 may even

undo undos, then redo the undos

10

©Silberschatz, Korth and Sudarshan
Normal Processing
Transactions add log records
Checkpoints are performed periodically
 contains
 Active transaction list,
 LSN of most recent log records of transaction, and
 List of dirty pages in the buffer (and their recLSNs)

– to determine where redo should start

11

©Silberschatz, Korth and Sudarshan
Recovery Phases
Analysis pass
 forward from last checkpoint

Redo pass
 forward from RedoLSN, which is determined in analysis pass

Undo pass
 backwards from end of log, undoing incomplete transactions

12

©Silberschatz, Korth and Sudarshan
Analysis Pass
RedoLSN = min(LSNs of dirty pages recorded
in checkpoint)
 if no dirty pages, RedoLSN = LSN of checkpoint
 pages dirtied later will have higher LSNs)

scan log forwards from last checkpoint
 find transactions to be rolled back (``loser'' transactions)
 find LSN of last record written by each such transaction

13

©Silberschatz, Korth and Sudarshan
Redo Pass

Repeat history, scanning forward from RedoLSN
 for all transactions, even those to be undone
 perform redo only if page_LSN < log records LSN
 no locking done in this pass

14

©Silberschatz, Korth and Sudarshan
Undo Pass
Single scan backwards in log, undoing actions of
``loser'' transactions
 for each transaction, when a log record is found, use prev_LSN
fields to find next record to be undone
 can skip parts of the log with no records from loser
transactions
 don't perform any undo for CLRs (note: UndoNxtLSN for CLR
indicates next record to be undone, can skip intermediate
records of that transactions)

15

©Silberschatz, Korth and Sudarshan
Data Structures Used in Aries

16

©Silberschatz, Korth and Sudarshan
Log Record Structure

Log records contain following fields
 LSN
 Type (CLR, update, special)
 TransID
 PrevLSN (LSN of prev record of this txn)
 PageID (for update/CLRs)
 UndoNxtLSN (for CLRs)
 indicates which log record is being compensated
 on later undos, log records upto UndoNxtLSN can be skipped

 Data (redo/undo data); can be physical or logical

17

©Silberschatz, Korth and Sudarshan
Transaction Table

Stores for each transaction:
 TransID, State
 LastLSN (LSN of last record written by txn)
 UndoNxtLSN (next record to be processed in rollback)

During recovery:
 initialized during analysis pass from most recent checkpoint
 modified during analysis as log records are encountered, and
during undo

18

©Silberschatz, Korth and Sudarshan
Dirty Pages Table
During normal processing:
 When page is fixed with intention to update
 Let L = current end-of-log LSN (the LSN of next log record to be

generated)
 if page is not dirty, store L as RecLSN of the page in dirty pages

table

 When page is flushed to disk, delete from dirty page table
 dirty page table written out during checkpoint
 (Thus RecLSN is LSN of earliest log record whose effect is not
reflected in page on disk)

19

©Silberschatz, Korth and Sudarshan
Dirty Page Table (contd)
During recovery
 load dirty page table from checkpoint
 updated during analysis pass as update log records are
encountered

20

©Silberschatz, Korth and Sudarshan
Normal Processing Details

21

©Silberschatz, Korth and Sudarshan
Updates
Page latch held in X mode until log record is logged
 so updates on same page are logged in correct order
 page latch held in S mode during reads since records may get
moved around by update
 latch required even with page locking if dirty reads are allowed

Log latch acquired when inserting in log

22

©Silberschatz, Korth and Sudarshan
Updates (Contd.)
Protocol to avoid deadlock involving latches
 deadlocks involving latches and locks were a major problem in
System R and SQL/DS
 transaction may hold at most two latches at-a-time
 must never wait for lock while holding latch
 if both are needed (e.g. Record found after latching page):
 release latch before requesting lock and then reacquire latch (and

recheck conditions in case page has changed inbetween).
Optimization: conditional lock request

 page latch released before updating indices
 data update and index update may be out of order

23

©Silberschatz, Korth and Sudarshan
Split Log Records
Can split a log record into undo and redo parts
 undo part must go first
 page_LSN is set to LSN of redo part

24

©Silberschatz, Korth and Sudarshan
Savepoints
Simply notes LSN of last record written by transaction
(up to that point) - denoted by SaveLSN
can have multiple savepoints, and rollback to any of
them
deadlocks can be resolved by rollback to appropriate
savepoint, releasing locks acquired after that savepoint

25

©Silberschatz, Korth and Sudarshan
Rollback
Scan backwards from last log record of txn
 (last log record of txn = transTable[TransID].UndoNxtLSN

 if log record is an update log record
 undo it and add a CLR to the log

 if log record is a CLR
 then UndoNxt = LogRec.UnxoNxtLSN
 else UndoNxt = LogRec.PrevLSN

 next record to process is UndoNxt; stop at SaveLSN or
beginning of transaction as required

26

©Silberschatz, Korth and Sudarshan
More on Rollback
Extra logging during rollback is bounded
 make sure enough log space is available for rollback in case of
system crash, else BIG problem

In case of 2PC, if in-doubt txn needs to be aborted,
rollback record is written to log then rollback is carried
out

27

©Silberschatz, Korth and Sudarshan
Transaction Termination

prepare record is written for 2PC
 locks are noted in prepare record

prepare record also used to handle non-undoable
actions e.g. deleting file
 these pending actions are noted in prepare record and

executed only after actual commit

end record written at commit time
 pending actions are then executed and logged using special
redo-only log records

end record also written after rollback
28

©Silberschatz, Korth and Sudarshan
Checkpoints
begin_chkpt record is written first
transaction table, dirty_pages table and some other file
mgmt information are written out
end_chkpt record is then written out
 for simplicity all above are treated as part of end_chkpt record

LSN of begin_chkpt is then written to master record in
well known place on stable storage
incomplete checkpoint
 if system crash before end_chkpt record is written

29

©Silberschatz, Korth and Sudarshan
Checkpoint (contd)
Pages need not be flushed during checkpoint
 are flushed on a continuous basis

Transactions may write log records during checkpoint
Can copy dirty_page table fuzzily (hold latch, copy
some entries out, release latch, repeat)

30

©Silberschatz, Korth and Sudarshan
Restart Processing
Finds checkpoint begin using master record
Do restart_analysis
Do restart_redo
 ... some details of dirty page table here

Do restart_undo
reacquire locks for prepared transactions
checkpoint

31

©Silberschatz, Korth and Sudarshan
Result of Analysis Pass
Output of analysis
 transaction table
 including UndoNxtLSN for each transaction in table

 dirty page table: pages that were potentially dirty at time of
crash/shutdown
 RedoLSN - where to start redo pass from

Entries added to dirty page table as log records are
encountered in forward scan
 also some special action to deal with OS file deletes

This pass can be combined with redo pass!

32

©Silberschatz, Korth and Sudarshan
Redo Pass
Scan forward from RedoLSN
 If log record is an update log record, AND is in
dirty_page_table AND LogRec.LSN >= RecLSN of the page in
dirty_page_table
 then if pageLSN < LogRec.LSN then perform redo; else just
update RecLSN in dirty_page_table

Repeats history: redo even for loser transactions
(some optimization possible)

33

©Silberschatz, Korth and Sudarshan
More on Redo Pass
Dirty page table details
 dirty page table from end of analysis pass (restart dirty page
table) is used and set in redo pass (and later in undo pass)

Optimizations of redo
 Dirty page table info can be used to pre-read pages during
redo
 Out of order redo is also possible to reduce disk seeks

34

©Silberschatz, Korth and Sudarshan
Undo Pass
Rolls back loser transaction in reverse order in single
scan of log
 stops when all losers have been fully undone
 processing of log records is exactly as in single transaction
rollback

1

2

3

4

4'

3'

5

35

6

6'

5' 2'

1'

©Silberschatz, Korth and Sudarshan
Undo Optimizations

Parallel undo
 each txn undone separately, in parallel with others
 can even generate CLRs and apply them separately , in
parallel for a single transaction

New txns can run even as undo is going on:
 reacquire locks of loser txns before new txns begin
 can release locks as matching actions are undone

36

©Silberschatz, Korth and Sudarshan
Undo Optimization (Contd)
If pages are not available (e.g media failure)
 continue with redo recovery of other pages
 once pages are available again (from archival dump) redos of the

relevant pages must be done first, before any undo

 for physical undos in undo pass
 we can generate CLRs and apply later; new txns can run on other

pages

 for logical undos in undo pass
 postpone undos of loser txns if the undo needs to access these

pages - ``stopped transaction''
 undo of other txns can proceed; new txns can start provided

appropriate locks are first acquired for loser txns

37

©Silberschatz, Korth and Sudarshan
Transaction Recovery
Loser transactions can be restarted in some cases
 e.g. Mini batch transactions which are part of a larger transaction

38

©Silberschatz, Korth and Sudarshan
Checkpoints During Restart
Checkpoint during analysis/redo/undo pass
 reduces work in case of crash/restart during recovery
 (why is Mohan so worried about this!)

 can also flush pages during redo pass
 RecLSN in dirty page table set to current last-processed-record

39

©Silberschatz, Korth and Sudarshan
Media Recovery
For archival dump
 can dump pages directly from disk (bypass buffer, no latching
needed) or via buffer, as desired
 this is a fuzzy dump, not transaction consistent

 begin_chkpt location of most recent checkpoint completed
before archival dump starts is noted
 called image copy checkpoint
 redoLSN computed for this checkpoint and noted as media

recovery redo point

40

©Silberschatz, Korth and Sudarshan
Media Recovery (Contd)
To recover parts of DB from media failure
 failed parts if DB are fetched from archival dump
 only log records for failed part of DB are reapplied in a redo
pass
 inprogress transactions that accessed the failed parts of the
DB are rolled back

Same idea can be used to recover from page
corruption
 e.g. Application program with direct access to buffer crashes
before writing undo log record

41

©Silberschatz, Korth and Sudarshan
Nested Top Actions
Same idea as used in logical undo in our advanced
recovery mechanism
 used also for other operations like creating a file (which can
then be used by other txns, before the creater commits)
 updates of nested top action commit early and should not be
undone

Use dummy CLR to indicate actions should be skipped
during undo

42

©Silberschatz, Korth and Sudarshan

Weitere ähnliche Inhalte

Andere mochten auch (17)

Ch2 another design
Ch2 another designCh2 another design
Ch2 another design
 
Ddl
DdlDdl
Ddl
 
Soal kombinatorik
Soal kombinatorikSoal kombinatorik
Soal kombinatorik
 
Ch3
Ch3Ch3
Ch3
 
E3 chap-12
E3 chap-12E3 chap-12
E3 chap-12
 
Uji hipotesis
Uji hipotesisUji hipotesis
Uji hipotesis
 
E3 chap-14
E3 chap-14E3 chap-14
E3 chap-14
 
E3 chap-09
E3 chap-09E3 chap-09
E3 chap-09
 
App b
App bApp b
App b
 
Ch16
Ch16Ch16
Ch16
 
Peubah acak
Peubah acakPeubah acak
Peubah acak
 
E3 chap-18
E3 chap-18E3 chap-18
E3 chap-18
 
Ch1 introduction
Ch1   introductionCh1   introduction
Ch1 introduction
 
App a
App aApp a
App a
 
Imk pertemuan-4
Imk pertemuan-4Imk pertemuan-4
Imk pertemuan-4
 
Ch9
Ch9Ch9
Ch9
 
E3 chap-04
E3 chap-04E3 chap-04
E3 chap-04
 

Ähnlich wie Aries

Transaction unit 1 topic 4
Transaction unit 1 topic 4Transaction unit 1 topic 4
Transaction unit 1 topic 4
avniS
 
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlCS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency Control
J Singh
 
dokumen.tips_silberschatz-korth-and-sudarshan1-transactions-transaction-conce...
dokumen.tips_silberschatz-korth-and-sudarshan1-transactions-transaction-conce...dokumen.tips_silberschatz-korth-and-sudarshan1-transactions-transaction-conce...
dokumen.tips_silberschatz-korth-and-sudarshan1-transactions-transaction-conce...
DrCViji
 
ch17_Transaction management in Database Management System
ch17_Transaction management in Database Management Systemch17_Transaction management in Database Management System
ch17_Transaction management in Database Management System
coolscools1231
 

Ähnlich wie Aries (20)

Transaction unit 1 topic 4
Transaction unit 1 topic 4Transaction unit 1 topic 4
Transaction unit 1 topic 4
 
Relational Database Management System
Relational Database Management SystemRelational Database Management System
Relational Database Management System
 
UNIT-IV: Transaction Processing Concepts
UNIT-IV: Transaction Processing ConceptsUNIT-IV: Transaction Processing Concepts
UNIT-IV: Transaction Processing Concepts
 
Transaction slide
Transaction slideTransaction slide
Transaction slide
 
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlCS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency Control
 
Transaction
TransactionTransaction
Transaction
 
dokumen.tips_silberschatz-korth-and-sudarshan1-transactions-transaction-conce...
dokumen.tips_silberschatz-korth-and-sudarshan1-transactions-transaction-conce...dokumen.tips_silberschatz-korth-and-sudarshan1-transactions-transaction-conce...
dokumen.tips_silberschatz-korth-and-sudarshan1-transactions-transaction-conce...
 
ch17.pptx
ch17.pptxch17.pptx
ch17.pptx
 
ch17_Transaction management in Database Management System
ch17_Transaction management in Database Management Systemch17_Transaction management in Database Management System
ch17_Transaction management in Database Management System
 
ch14.ppt
ch14.pptch14.ppt
ch14.ppt
 
DBMS Transcations
DBMS TranscationsDBMS Transcations
DBMS Transcations
 
Aries
AriesAries
Aries
 
Top schools in ghaziabad
Top schools in ghaziabadTop schools in ghaziabad
Top schools in ghaziabad
 
Write behind logging
Write behind loggingWrite behind logging
Write behind logging
 
Top 17 Data Recovery System
Top 17 Data Recovery SystemTop 17 Data Recovery System
Top 17 Data Recovery System
 
Ch17
Ch17Ch17
Ch17
 
Transactionsmanagement
TransactionsmanagementTransactionsmanagement
Transactionsmanagement
 
Ch15
Ch15Ch15
Ch15
 
Adbms 34 transaction processing and recovery
Adbms 34 transaction processing and recoveryAdbms 34 transaction processing and recovery
Adbms 34 transaction processing and recovery
 
Database architecture
Database architectureDatabase architecture
Database architecture
 

Kürzlich hochgeladen

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Kürzlich hochgeladen (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 

Aries

  • 1. ARIES Recovery Algorithm ARIES: A Transaction Recovery Method Supporting Fine Granularity Locking and Partial Rollback Using Write-Ahead Logging C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz ACM Transactions on Database Systems, 17(1), 1992 Slides prepared by S. Sudarshan 1 ©Silberschatz, Korth and Sudarshan
  • 2. Recovery Scheme Metrics Concurrency Functionality Complexity Overheads:  Space and I/O (Seq and random) during Normal processing and recovery Failure Modes:  transaction/process, system and media/device 2 ©Silberschatz, Korth and Sudarshan
  • 3. Key Features of Aries Physical Logging, and Operation logging  e.g. Add 5 to A, or insert K in B-tree B Page oriented redo  recovery independence amongst objects Logical undo (may span multiple pages) WAL + Inplace Updates 3 ©Silberschatz, Korth and Sudarshan
  • 4. Key Aries Features (contd) Transaction Rollback  Total vs partial (up to a savepoint)  Nested rollback - partial rollback followed by another (partial/total) rollback Fine-grain concurrency control  supports tuple level locks on records, and key value locks on indices 4 ©Silberschatz, Korth and Sudarshan
  • 5. More Aries Features Flexible storage management  Physiological redo logging:  logical operation within a single page  no need to log intra-page data movement for compaction  LSN used to avoid repeated redos (more on LSNs later) Recovery independence  can recover some pages separately from others Fast recovery and parallelism 5 ©Silberschatz, Korth and Sudarshan
  • 6. Latches and Locks Latches  used to guarantee physical consistency  short duration  no deadlock detection  direct addressing (unlike hash table for locks)  often using atomic instructions  latch acquisition/release is much faster than lock acquisition/release Lock requests  conditional, instant duration, manual duration, commit duration 6 ©Silberschatz, Korth and Sudarshan
  • 7. Buffer Manager Fix, unfix and fix_new (allocate and fix new pg) Aries uses steal policy - uncommitted writes may be output to disk (contrast with no-steal policy) Aries uses no-force policy (updated pages need not be forced to disk before commit) dirty page: buffer version has updated not yet reflected on disk  dirty pages written out in a continuous manner to disk 7 ©Silberschatz, Korth and Sudarshan
  • 8. Buffer Manager (Contd) BCB: buffer control blocks  stores page ID, dirty status, latch, fix-count Latching of pages = latch on buffer slot  limits number of latches required  but page must be fixed before latching 8 ©Silberschatz, Korth and Sudarshan
  • 9. Some Notation LSN: Log Sequence Number  = logical address of record in the log Page LSN: stored in page  LSN of most recent update to page PrevLSN: stored in log record  identifies previous log record for that transaction Forward processing (normal operation) Normal undo vs. restart undo 9 ©Silberschatz, Korth and Sudarshan
  • 10. Compensation Log Records CLRs: redo only log records Used to record actions performed during transaction rollback  one CLR for each normal log record which is undone CLRs have a field UndoNxtLSN indicating which log record is to be undone next  avoids repeated undos by bypassing already undo records – needed in case of restarts during transaction rollback)  in contrast, IBM IMS may repeat undos, and AS400 may even undo undos, then redo the undos 10 ©Silberschatz, Korth and Sudarshan
  • 11. Normal Processing Transactions add log records Checkpoints are performed periodically  contains  Active transaction list,  LSN of most recent log records of transaction, and  List of dirty pages in the buffer (and their recLSNs) – to determine where redo should start 11 ©Silberschatz, Korth and Sudarshan
  • 12. Recovery Phases Analysis pass  forward from last checkpoint Redo pass  forward from RedoLSN, which is determined in analysis pass Undo pass  backwards from end of log, undoing incomplete transactions 12 ©Silberschatz, Korth and Sudarshan
  • 13. Analysis Pass RedoLSN = min(LSNs of dirty pages recorded in checkpoint)  if no dirty pages, RedoLSN = LSN of checkpoint  pages dirtied later will have higher LSNs) scan log forwards from last checkpoint  find transactions to be rolled back (``loser'' transactions)  find LSN of last record written by each such transaction 13 ©Silberschatz, Korth and Sudarshan
  • 14. Redo Pass Repeat history, scanning forward from RedoLSN  for all transactions, even those to be undone  perform redo only if page_LSN < log records LSN  no locking done in this pass 14 ©Silberschatz, Korth and Sudarshan
  • 15. Undo Pass Single scan backwards in log, undoing actions of ``loser'' transactions  for each transaction, when a log record is found, use prev_LSN fields to find next record to be undone  can skip parts of the log with no records from loser transactions  don't perform any undo for CLRs (note: UndoNxtLSN for CLR indicates next record to be undone, can skip intermediate records of that transactions) 15 ©Silberschatz, Korth and Sudarshan
  • 16. Data Structures Used in Aries 16 ©Silberschatz, Korth and Sudarshan
  • 17. Log Record Structure Log records contain following fields  LSN  Type (CLR, update, special)  TransID  PrevLSN (LSN of prev record of this txn)  PageID (for update/CLRs)  UndoNxtLSN (for CLRs)  indicates which log record is being compensated  on later undos, log records upto UndoNxtLSN can be skipped  Data (redo/undo data); can be physical or logical 17 ©Silberschatz, Korth and Sudarshan
  • 18. Transaction Table Stores for each transaction:  TransID, State  LastLSN (LSN of last record written by txn)  UndoNxtLSN (next record to be processed in rollback) During recovery:  initialized during analysis pass from most recent checkpoint  modified during analysis as log records are encountered, and during undo 18 ©Silberschatz, Korth and Sudarshan
  • 19. Dirty Pages Table During normal processing:  When page is fixed with intention to update  Let L = current end-of-log LSN (the LSN of next log record to be generated)  if page is not dirty, store L as RecLSN of the page in dirty pages table  When page is flushed to disk, delete from dirty page table  dirty page table written out during checkpoint  (Thus RecLSN is LSN of earliest log record whose effect is not reflected in page on disk) 19 ©Silberschatz, Korth and Sudarshan
  • 20. Dirty Page Table (contd) During recovery  load dirty page table from checkpoint  updated during analysis pass as update log records are encountered 20 ©Silberschatz, Korth and Sudarshan
  • 22. Updates Page latch held in X mode until log record is logged  so updates on same page are logged in correct order  page latch held in S mode during reads since records may get moved around by update  latch required even with page locking if dirty reads are allowed Log latch acquired when inserting in log 22 ©Silberschatz, Korth and Sudarshan
  • 23. Updates (Contd.) Protocol to avoid deadlock involving latches  deadlocks involving latches and locks were a major problem in System R and SQL/DS  transaction may hold at most two latches at-a-time  must never wait for lock while holding latch  if both are needed (e.g. Record found after latching page):  release latch before requesting lock and then reacquire latch (and recheck conditions in case page has changed inbetween). Optimization: conditional lock request  page latch released before updating indices  data update and index update may be out of order 23 ©Silberschatz, Korth and Sudarshan
  • 24. Split Log Records Can split a log record into undo and redo parts  undo part must go first  page_LSN is set to LSN of redo part 24 ©Silberschatz, Korth and Sudarshan
  • 25. Savepoints Simply notes LSN of last record written by transaction (up to that point) - denoted by SaveLSN can have multiple savepoints, and rollback to any of them deadlocks can be resolved by rollback to appropriate savepoint, releasing locks acquired after that savepoint 25 ©Silberschatz, Korth and Sudarshan
  • 26. Rollback Scan backwards from last log record of txn  (last log record of txn = transTable[TransID].UndoNxtLSN  if log record is an update log record  undo it and add a CLR to the log  if log record is a CLR  then UndoNxt = LogRec.UnxoNxtLSN  else UndoNxt = LogRec.PrevLSN  next record to process is UndoNxt; stop at SaveLSN or beginning of transaction as required 26 ©Silberschatz, Korth and Sudarshan
  • 27. More on Rollback Extra logging during rollback is bounded  make sure enough log space is available for rollback in case of system crash, else BIG problem In case of 2PC, if in-doubt txn needs to be aborted, rollback record is written to log then rollback is carried out 27 ©Silberschatz, Korth and Sudarshan
  • 28. Transaction Termination prepare record is written for 2PC  locks are noted in prepare record prepare record also used to handle non-undoable actions e.g. deleting file  these pending actions are noted in prepare record and executed only after actual commit end record written at commit time  pending actions are then executed and logged using special redo-only log records end record also written after rollback 28 ©Silberschatz, Korth and Sudarshan
  • 29. Checkpoints begin_chkpt record is written first transaction table, dirty_pages table and some other file mgmt information are written out end_chkpt record is then written out  for simplicity all above are treated as part of end_chkpt record LSN of begin_chkpt is then written to master record in well known place on stable storage incomplete checkpoint  if system crash before end_chkpt record is written 29 ©Silberschatz, Korth and Sudarshan
  • 30. Checkpoint (contd) Pages need not be flushed during checkpoint  are flushed on a continuous basis Transactions may write log records during checkpoint Can copy dirty_page table fuzzily (hold latch, copy some entries out, release latch, repeat) 30 ©Silberschatz, Korth and Sudarshan
  • 31. Restart Processing Finds checkpoint begin using master record Do restart_analysis Do restart_redo  ... some details of dirty page table here Do restart_undo reacquire locks for prepared transactions checkpoint 31 ©Silberschatz, Korth and Sudarshan
  • 32. Result of Analysis Pass Output of analysis  transaction table  including UndoNxtLSN for each transaction in table  dirty page table: pages that were potentially dirty at time of crash/shutdown  RedoLSN - where to start redo pass from Entries added to dirty page table as log records are encountered in forward scan  also some special action to deal with OS file deletes This pass can be combined with redo pass! 32 ©Silberschatz, Korth and Sudarshan
  • 33. Redo Pass Scan forward from RedoLSN  If log record is an update log record, AND is in dirty_page_table AND LogRec.LSN >= RecLSN of the page in dirty_page_table  then if pageLSN < LogRec.LSN then perform redo; else just update RecLSN in dirty_page_table Repeats history: redo even for loser transactions (some optimization possible) 33 ©Silberschatz, Korth and Sudarshan
  • 34. More on Redo Pass Dirty page table details  dirty page table from end of analysis pass (restart dirty page table) is used and set in redo pass (and later in undo pass) Optimizations of redo  Dirty page table info can be used to pre-read pages during redo  Out of order redo is also possible to reduce disk seeks 34 ©Silberschatz, Korth and Sudarshan
  • 35. Undo Pass Rolls back loser transaction in reverse order in single scan of log  stops when all losers have been fully undone  processing of log records is exactly as in single transaction rollback 1 2 3 4 4' 3' 5 35 6 6' 5' 2' 1' ©Silberschatz, Korth and Sudarshan
  • 36. Undo Optimizations Parallel undo  each txn undone separately, in parallel with others  can even generate CLRs and apply them separately , in parallel for a single transaction New txns can run even as undo is going on:  reacquire locks of loser txns before new txns begin  can release locks as matching actions are undone 36 ©Silberschatz, Korth and Sudarshan
  • 37. Undo Optimization (Contd) If pages are not available (e.g media failure)  continue with redo recovery of other pages  once pages are available again (from archival dump) redos of the relevant pages must be done first, before any undo  for physical undos in undo pass  we can generate CLRs and apply later; new txns can run on other pages  for logical undos in undo pass  postpone undos of loser txns if the undo needs to access these pages - ``stopped transaction''  undo of other txns can proceed; new txns can start provided appropriate locks are first acquired for loser txns 37 ©Silberschatz, Korth and Sudarshan
  • 38. Transaction Recovery Loser transactions can be restarted in some cases  e.g. Mini batch transactions which are part of a larger transaction 38 ©Silberschatz, Korth and Sudarshan
  • 39. Checkpoints During Restart Checkpoint during analysis/redo/undo pass  reduces work in case of crash/restart during recovery  (why is Mohan so worried about this!)  can also flush pages during redo pass  RecLSN in dirty page table set to current last-processed-record 39 ©Silberschatz, Korth and Sudarshan
  • 40. Media Recovery For archival dump  can dump pages directly from disk (bypass buffer, no latching needed) or via buffer, as desired  this is a fuzzy dump, not transaction consistent  begin_chkpt location of most recent checkpoint completed before archival dump starts is noted  called image copy checkpoint  redoLSN computed for this checkpoint and noted as media recovery redo point 40 ©Silberschatz, Korth and Sudarshan
  • 41. Media Recovery (Contd) To recover parts of DB from media failure  failed parts if DB are fetched from archival dump  only log records for failed part of DB are reapplied in a redo pass  inprogress transactions that accessed the failed parts of the DB are rolled back Same idea can be used to recover from page corruption  e.g. Application program with direct access to buffer crashes before writing undo log record 41 ©Silberschatz, Korth and Sudarshan
  • 42. Nested Top Actions Same idea as used in logical undo in our advanced recovery mechanism  used also for other operations like creating a file (which can then be used by other txns, before the creater commits)  updates of nested top action commit early and should not be undone Use dummy CLR to indicate actions should be skipped during undo 42 ©Silberschatz, Korth and Sudarshan