See the new enhancements in v9.4 which takes away the pain of guessing right wal_keep_segment
See the new time lagging replication capability in v9.4
Short intro to logical replication introduced in v9.4
2. A quick recap!
Earlier we saw:
- What is Streaming Replication
- How to setup of Streaming Replication in v9.3
- How v9.3 enhancements made switchover and
switchback easier
- How v9.3 enhancements ease the setup of
Replication using pg_basebackup
3. What are we going to do today
- See the new enhancements in v9.4 which takes
away the pain of guessing right
wal_keep_segment
- See the new time lagging replication capability
in v9.4
- Short intro to logical replication introduced in
v9.4
4. Parameter Changes in v9.4
- New recovery.conf parameters:
- primary_slot_name
- recovery_min_apply_delay
- New parameter postgresql.conf
- max_replication_slot
- New parameter values for postgresql.conf
- wal_level can have a value logical
6. How DBAs do it today
- Guess a proper value for wal_keep_segment
based on transaction volume
- Keep monitoring the transaction rate
- Increase the wal_keep_segment proactively
7. What if you ‘guessed’ a wrong value
- A smaller value means replication may go out of sync
- Need to rebuild the secondary node from a base backup of
Primary node
- Setup the replication again
- Guess the ‘wal_keep_segment’ value again
- A larger value means you might be wasting storage space
- Rebuild replication if secondary server goes down
- Archived WALs to avoid these issues = more storage
8. How is that going to change
- Create a replication slot on primary server
- Add it in recovery.conf on secondary server
- Primary server will keep WAL files unless the
server using the replication slot has got them
- No guess work!
- If secondary server goes down pending WALs
are still kept with Primary Server
9. Caveats
- If the secondary server goes down for a
long time
- WAL files will continue to accumulate on
primary server
- Replication slot needs to be dropped
manually in such cases
11. Why would you need it?
- Ever tried Point-in-Time recovery?
- Stop the Production Database
- Restore from a backup
- Reapply the transaction log/archive WAL since the backup
- Stop at the time just before the application issue/bug
introduced data inconsistency/corruption
- What if the Backup size is huge?
- What if there are too many archived WAL to be applied
- Higher Recovery Time = Higher Down Time = Loss of
business
12. Setup a time-lagging DR
- Setup a time-lagging DR in PostgreSQL v9.4 with acceptable
amount of time-lag. Let’s say 2 hours
- If there is a need for Point-in-Time recovery
- Stop Primary server
- Apply only pending WALs (not since last backup but only
2hours)
- Stop recovery before the point of corruption
- Promote secondary server to be primary
- Change connection conf
- Less time taken to bring up the Server = Reduced the Loss of
Business
13. Backdated Reporting and Time-travel Queries
- Do correlation/comparative queries to check
profit margin as compared to yesterday
- Pull data from Primary Server and Secondary
Server lagging by a day
- Pull reports from yesterday’s database
- Pause recovery on secondary and pull reports
- Reduces the downtime needed for Primary DB
for end of day reporting
14. Demo
- Primary Server running on port 5532 on localhost
- Secondary Server running on port 6532 on localhost
- postgresql.conf On primary
- max_replication_slot = 2
- max_wal_senders = 2
- wal_level = hot_standby
- archive_mode=off #no archive setup
- Create a replication slot on primary
- select * from create_physical_replication_slot('testingv94');
15. Demo
- recovery.conf on Secondary Server
- standby_mode = on
- primary_conninfo = 'host=127.0.0.1 port=5532
user=postgres'
- primary_slot_name = 'testingv94'
- recovery_min_apply_delay = 1min
- postgresql.conf on secondary
- hot_standby_mode = on