Weitere ähnliche Inhalte Ähnlich wie In the Beginning, There Was the Log - Webinar with Martin Hubel (20) Mehr von Embarcadero Technologies (20) Kürzlich hochgeladen (20) In the Beginning, There Was the Log - Webinar with Martin Hubel1. In the beginning,
there was the log
Martin Hubel
+ 1 905-764-7498
martin@mhubel.com
1
2. Agenda
1.
The role of the database recovery log
2.
How backups fit in
3.
Recovery scenarios
4.
Planning for recovery
2
3. Disclaimer
•
Every attempt has been made to be accurate with product information. Trademarks are the property of their respective owners.
•
I work mainly on DB2, but have used and supported other products.
•
All DBMS mentioned in today’s webinar are commercial grade, used by hundreds of companies to manage their enterprises. No attempt is made here to compare DBMS products for functionality. Copyright©2014 Martin Hubel Consulting Inc. 3
4. Assumptions
•
This presentation is about OLTP applications
•
The need for backups is known
•
The need for recovery is also understood
•
Databases are enabled for recovery
•
BI / DW have different recovery requirements
•
Data is recovered / rebuilt / reload from another source
Copyright©2014 Martin Hubel Consulting Inc. 4
5. The database recovery log
•
Log contains information necessary to:
•
Redo completed units of work
•
Undo incomplete units of work
•
Rebuild system status
•
Today’s presentation will deal mainly with units of work (transactions) and their effect on the recovery log
Copyright©2014 Martin Hubel Consulting Inc. 5
6. A successful transaction
•
Single transaction can contain many update actions
•
All updates must complete
•
Then a commit makes changes permanent
•
Successful termination is recorded in the recovery log
•
Any failure rolls changes back to beginning
•
The DB is considered consistent before the transaction and after the commit
Copyright©2014 Martin Hubel Consulting Inc. 6
•Begin UR •End UR
•INS A
•UPD B
•UPD C
•INS A
•DEL D
•INS E
•
One Business Transaction 7. Unit of work
•
Updates performed in the subsystem between two points of consistency
•
Point of consistency (commit point) is a point where all recoverable data for an application is consistent with other data
•
Equivalent terms: Synchpoint, Logical Unit of Work or Unit of Recovery (UR)
•
Some environments require a 2 phase commit
Copyright©2014 Martin Hubel Consulting Inc. 7
8. Batch transactions
•
Batch jobs handle transaction can contain many update actions
•
Many business transactions are included in a single unit of work
•
The commit makes changes permanent for all transactions
•
Any failure rolls changes back to beginning
•
Many tens, hundreds or even thousands of transactions are processed in a single UR
Copyright©2014 Martin Hubel Consulting Inc. 8
•Begin UR •End UR
9. Rollback
Copyright©2014 Martin Hubel Consulting Inc. 9
• During rollback, data changes are not removed
•The log is read and the changes reversed
•The data is returned to its last consistent state (last commit)
• Some badly designed applications issue rollback statements following
if validation fails
•Can be a lot of log activity for nothing
•Always complete all validation prior to performing first update
•Begin UR
•
End UR
•INS A
•UPD B
•UPD C
•INS E
•One Business Transaction
•DEL E
•UPD C`
•UPD B`
•DEL A
•
Failure 10. Performance of new application
•
A new financial application had a light transaction processing load
•
Transactions were complex and had a lot of validation and updates to perform
•
Users regularly had to correct input before the transaction was ready for processing
•
Performance was poor. Design showed:
•
After each validation check, the SQL updates to that point were performed
•
If a further check failed, the updates were reversed via rollback
•
This problem was found via a high volume of log activity
•
Much of it for no benefit
Copyright©2014 Martin Hubel Consulting Inc. 10
11. Transaction Recovery/Backout
•
Some DBMS products default to this type of recovery (rollback of incomplete URs only)
•
Logging (log retain) in these products must be turned on for full recovery
Copyright©2014 Martin Hubel Consulting Inc. 11
TIMEOUTWorkRollbackWaitWorkRollbackDEADLOCKWorkWaitContinueabba 12. Log files
•
“In the beginning, there was the log”
•
DBMS writes to the active log file
•
May be dual logs or mirrored by DBMS
May be on RAID devices at hardware level
•
Logs use buffered I/O
For changes to data
•
Logs are written synchronously at commit time
•
In this way, the log is close to being correct if failure occurs
•
Design I/O system to avoid waits for log writes
•
DBMS must be able to write changes to the log
Copyright©2014 Martin Hubel Consulting Inc. 12
13. Archiving logs
•
Current log is called the active log
•
Log files fill up and switch automatically to new files
•
Called archive logs
•
Archive files may remain in same directory for some period
•
May be automatically offloaded to some facility like TSM
•
Problems can occur in the archival process
•
If log file system fills, the DBMS stops!
Active log
Copyright©2014 Martin Hubel Consulting Inc. 13
Other
storage
Archive logs
14. Batch, long running, and large transactions
•
Thousands (or more) updates can be performed, e.g. by:
•
A batch job
•
A single SQL statement
•
Long-running transactions cause problems:
•
Many locks taken during the updates
•
Many log records and log files written before a commit
•
Commit frequency must be managed
•
Rollback is also problematic
•
Requires log to be read, and to write the opposite action to reverse the change
Copyright©2014 Martin Hubel Consulting Inc. 14
15. “The audit table is too big”
•
The new DBA/Linux sysadminnotices the size of the audit table is 30 million rows
•
Issues a DELETE FROM AUDIT_TABLE WHERE DATE < ‘60 days ago’
•
10M rows will be deleted
•
The effect:
•
All users locked out of the application
•
No estimate of completion time
Copyright©2014 Martin Hubel Consulting Inc. 15
16. “The audit table is too big” -2
•
After many complaints and about 2 hours, the DBA cancels the Delete
•
Rollback starts:
•
Rollback has to read the log to finds the updates and write the opposite
•
As such, rollback typically takes twice as long
•
The users continue to be unhappy and are now more vocal
Copyright©2014 Martin Hubel Consulting Inc. 16
17. “The audit table is too big” -3
•
The DBA, using his sysadminkills, starts killing DB2 processes
•
Kill -9 has immediate effects to stop DB2 and the rollback
•
But now, the rollback is incomplete
•
At restart, DB2 must remove the effect of the incomplete rollback and start the full rollback afterwards
•
Estimated full recovery time is 24 hours
•
Some behind-the-scenes involvement reduces the outage to 8 hours on a 24x7 system
•
CIO asks: “What should we do with the DBA?”
Copyright©2014 Martin Hubel Consulting Inc. 17
18. ... And Now
•
A word from Embarcadero...
Copyright©2014 Martin Hubel Consulting Inc. 18
19. Can the log be kept forever?
•
Hypothetically yes, the answer is no
•
Not practical
•
Recovery would take too long for one thing
•
Log retention would also be impractical
•
Backups taken within the DBMS are recorded and provide starting points for recovery
•
Once all databases or tables (tablespaces) have been copied, the oldest logs can be deleted
Copyright©2014 Martin Hubel Consulting Inc. 19
20. Backup copies
•
Backups provide a starting point for the recovery process
•
Need at least one full backup
•
Two generations are safer
•
DBMS options for backups:
•
Database
•
Table or tablespace
•
Incremental or delta
•
Online or offline
•
Once all databases have a valid backup, the older log files can be deleted
•
Set generous retention periods
Copyright©2014 Martin Hubel Consulting Inc. 20
21. Gaps in the log
•
A big recovery consideration is to ensure there are no gaps within the log range
•
A gap indicates that some log records are missing, and DBMS is unable to determine what those records might have contained
•
As such, forward log processing stops
•
It is advantageous to turn logging off for certain situations or operations
•
E.g, large loads
Copyright©2014 Martin Hubel Consulting Inc. 21
22. Implications of turning off logging
•
Logging is not free
•
Significant performance impact for certain operations such as batch updates or load operations
•
Not logged operations affect the ability to forward recover objects using the log
•
DBMS does not know what or how data was changed
•
Therefore, DBMS requires a backup following not logged operations to re-establish a recovery point
Copyright©2014 Martin Hubel Consulting Inc. 22
23. Production mortgage system
•
DBA group was responsible for backups
•
Forgot to take regular backups
•
Last backup was January 26
•
System group managed the logs
•
Set a “reasonable” retention period of 60 days
•
On April 1, they deleted the logs older than Feb 1
•
On April 8, a hardware problem occurred
•
Production was recovered to January 26
Copyright©2014 Martin Hubel Consulting Inc. 23
24. Application recovery scenarios
•
Transaction recovery/backout
•
Application backout
•
Point-in-time recovery
•
Checkpoint restart
•
Data synchronization
•
Data inconsistencies
•
Out of space
•
Media failure
•
Accidentally deleted data
Copyright©2014 Martin Hubel Consulting Inc. 24
25. Recovery guidelines
•
Test your recovery procedures
•
Make a backup copy prior to recovery -things can get worse
•
Make a copy after recovery if you think it might happen again
•
Make two copies using the DBMS backup utility
•
Periodically run integrity checking utilities to ensure recoverability
Copyright©2014 Martin Hubel Consulting Inc. 25
26. Common user errors -1
•
Delete or write over archive log or image copy
•
Without necessary backups, recovery may be impossible
•
Possible storage management issue
•
Misuse of disk level dump/restore
•
Valid for disaster planning, but dump everything –
Logs
DBMS system objects
User data
Copyright©2014 Martin Hubel Consulting Inc. 26
27. Common user errors -2
•
Common Programming Errors
•
Logical unit of work identification
•
Commit/CHKP placed incorrectly
•
Too many commits
•
Coding methods may vary throughout application, e.g.
Use of separate read and write routines
Checking of SQL return code codes
Copyright©2014 Martin Hubel Consulting Inc. 27
28. Application Backout
•
Required when multiple jobs running concurrently against same database
•
Work was committed against database
•
Backoutmay not be possible or practical in all cases
Copyright©2014 Martin Hubel Consulting Inc. 28
Startof Job TimeNormalJob End Job 1Job 2Job 3Job 4Backout, repair, and rerun 29. SQL change in production
•
“Oops, we thought that was the test system”
•
An SQL error changed all addresses to the same value
•
780,000 rows in table
•
“Martin, please change them back”
•
No, you can’t restore from a backup
New customers have signed up
Old customers may have updated their information
Copyright©2014 Martin Hubel Consulting Inc. 29
30. SQL change in production
•
While this was pressing, it was not a “show stopper”
•
Fortunately, I had unloaded this table for some reason
•
This became a source for the correct addresses
•
Alternatives might have been a redirected restore / transportable tablespace
•
We were able to load the backup table
•
Found SQL on internet and adapted it
•
Ultimately updated 362,000 addresses
Copyright©2014 Martin Hubel Consulting Inc. 30
31. Summary
•
All update activity is recorded in the DBMS log
•
Log performance is part of overall DBMS performance
•
Backups and logs are both used for recovery
•
Plan and test recovery scenarios prior to needing them
Copyright©2014 Martin Hubel Consulting Inc. 31
32. Martin Hubel
Martin Hubel Consulting Inc.
martin@mhubel.com
905-764-7498
In the beginning, there was the log
32