1. PostgreSQL Write-Ahead Log
Heikki Linnakangas
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 1
2. What is Write-Ahead Log?
• Also known as the transaction log or redo log
• Practically every database management system
has one
• Also used by journaling file systems, transaction
managers etc.
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 2
3. Write-Ahead Log concepts
• A transaction log is a sequence of log records
• One log record for every change
• Each WAL record is assigned an LSN (Log
Sequence Number)
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 3
4. PostgreSQL Write-Ahead-Log
• Introduced in version 7.1 by Vadim B. Mikheev
– Released in 2001
• REDO log only
– No UNDO log
– Instantaneous rollbacks
– No limit on transaction size
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 4
5. PostgreSQL Example
INSERT INTO tbl (id, data) VALUES (123, 'foo')
@ 0/D023940: xid 734: Heap - insert: rel
1663/12040/16402; tid 0/2
@ 0/D023988: xid 734: Btree - insert: rel
1663/12040/16404; tid 1/2
@ 0/D0239D0: xid 734: Transaction - commit:
2012-03-30 19:08:10.262228+03
LOG: xlog flush request 0/D023A00
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 5
6. PostgreSQL WAL structure
• WAL is divided into WAL segments
– Each segment is a file in pg_xlog directory
$ ls l pgsql/data/pg_xlog/
rw 1 heikki heikki 239 20091103 10:39
000000010000000000000034.00000020.backup
rw 1 heikki heikki 16777216 20091103 15:26 00000001000000000000003A
rw 1 heikki heikki 16777216 20091103 15:26 00000001000000000000003B
rw 1 heikki heikki 16777216 20091103 10:39 00000001000000000000003C
rw 1 heikki heikki 16777216 20091103 13:49 00000001000000000000003D
rw 1 heikki heikki 16777216 20091103 15:14 00000001000000000000003E
drwx 2 heikki heikki 160 20091103 15:26 archive_status
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 6
7. Tools
• Developer tools
– WAL_DEBUG compile option
– Xlogdump http://xlogviewer.projects.postgresql.org/
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 7
8. WAL-first rule
• The log record for an operation is always written
to disk before the affected data pages
– WAL first rule!
• In case of a crash, the WAL is replayed to
reconstruct the unsaved changes
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 8
9. Checkpoints
• WAL grows indefinitely
• Checkpoints allow us to truncate it
1. Flush all data pages to disk
2. Write a checkpoint record
3. Truncate away old WAL
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 9
10. WAL filenames
00000001000000000000003E
Log id Seg no
TLI LSN
• Timeline ID is used to distinguish WAL generated
before and after a PITR recovery
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 10
11. Crash-safety
• Many database actions involve modifying more
than one page.
– Example: Splitting an index page.
• Writing two data pages A and B is not atomic.
• One write must happen before the other.
– We need to give the operating system and disk
controller freedom to reorder the writes; we
don't even know which one will happen first.
• We could crash in between.
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 11
12. WAL Summary
Write-Ahead Logging is critical for:
• Durability
– Once you commit, your data is safe
• Consistency
– No corruption on crash
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 12
13. Advanced features
But there's more!
The Write-Ahead Log also allows:
• Online backup
• Point-in-time Recovery
• Replication
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 13
14. WAL Archiving
• Normally, old WAL is deleted at a checkpoint.
• But it can also be archived
• Archive can be anything
– Directory on another server
– Tape drive
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 14
15. WAL Archiving
In your postgresql.conf file:
archive_mode=on
wal_level=archive
archive_command = 'cp %p /mnt/walarchive/%f'
Restart server, and it will copy all WAL files to
/mnt/walarchive as they're generated
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 15
16. WAL archive
~$ ls -l /mnt/walarchive/
yhteensä 114688
-rw------- 1 heikki heikki 16777216 30.3. 18:11 000000010000000000000001
-rw------- 1 heikki heikki 16777216 30.3. 18:11 000000010000000000000002
-rw------- 1 heikki heikki 16777216 30.3. 18:11 000000010000000000000003
-rw------- 1 heikki heikki 16777216 30.3. 18:11 000000010000000000000004
-rw------- 1 heikki heikki 16777216 30.3. 18:11 000000010000000000000005
-rw------- 1 heikki heikki 16777216 30.3. 18:11 000000010000000000000006
-rw------- 1 heikki heikki 16777216 30.3. 18:11 000000010000000000000007
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 16
17. Online Backup
• Normally, you need to shut down the database
while you take a backup
– or use a filesystem snapshot (e.g ZFS or a SAN)
• WAL archiving makes that unnecessary
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 17
18. Online Backup
1. SELECT pg_start_backup()
2. rsync / tar / whatever
3. SELECT pg_stop_backup();
4. Include all archived WAL files from archive
directory in the backup
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 18
19. Online backup: Step 1
/mnt$ ls l
yhteensä 8
drwxrxrx 2 heikki heikki 4096 30.3. 18:11 archivedir
drwx 14 heikki heikki 4096 30.3. 18:10 data
/mnt$ psql c "SELECT pg_start_backup('my first backup')"
pg_start_backup
0/C000020
(1 row)
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 19
20. Online backup: Step 2
/mnt$ tar czf ~/backup.tar.gz data
/mnt$ ls -l ~/backup.tar.gz
-rw-r--r-- 1 heikki heikki 39670143 30.3. 18:33
/home/heikki/backup.tar.gz
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 20
21. Online backup: Step 3
/mnt$ psql -c "SELECT pg_stop_backup()"
NOTICE: pg_stop_backup complete, all required
WAL segments have been archived
pg_stop_backup
----------------
0/C0000A8
(1 row)
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 21
22. Online backup: Step 4
/mnt$ tar czf ~/backup-xlog.tar.gz archivedir/*
/mnt$ ls -l /home/heikki/backup*.tar.gz
-rw-r--r-- 1 heikki heikki 39670143 30.3. 18:33
/home/heikki/backup.tar.gz
-rw-r--r-- 1 heikki heikki 41937910 30.3. 18:38
/home/heikki/backup-xlog.tar.gz
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 22
23. Point-in-Time Recovery
postgres=# DROP TABLE customers;
DROP TABLE
Oops!
• Once you've enabled WAL archiving and taken a
backup, you can use the archived WAL to restore
to any point in time
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 23
24. Point-in-Time Recovery
Recovery.conf:
# By default, recovery will rollforward to the end of the WAL log.
# If you want to stop rollforward at a specific point, you
# must set a recovery target.
...
#recovery_target_name = '' # e.g. 'daily backup 2011-01-26'
#
#recovery_target_time = '' # e.g. '2004-07-14 22:39:00 EST'
#
#recovery_target_xid = ''
#
#recovery_target_inclusive = true
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 24
25. Replication
• You can transmit WAL as it is generated to a
standby server
– Replication!
• This can be done using the WAL archive, or by
streaming directly from the server over a TCP
connection
• Standby server can be used for read-only queries
(Hot Standby)
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 25
26. Thank you
Write-Ahead Logging makes it possible for a
database to maintain consistency. It also allows
many advanced features: online backup, PITR and
replication
Questions?
pyright 2009 EnterpriseDB Corporation. All rights Reserved.
<presentation title goes here, edit in the slide master > Slide: 26