Swapping Pacemaker Corosync with repmgr

Swapping
Pacemaker/Corosync for
repmgr
pgDay Asia 2016
Ang Wei Shan
17th March 2016 Disclaimer: I don’t work for 2ndQuadrant

Agenda
● Introduction
● Challenges with Pacemaker/Corosync
● Pg_bouncer
● Linux’s UCARP
● 2ndQuadrant’s repmgr
● Demo

● Database Administrator
● > 4 years of experience in databases
● Worked with majority of the RDBMS
● ≈ 350 days with PostgreSQL

● Open-source alternative to Red Hat Cluster Suite
● Extremely popular choice in the open-source world
● Made up of 2 different stack of software
○ Pacemaker
○ Corosync/Heartbeat
● Complicated to get the configuration correct

Online: [ node1 node2 ]
Full list of resources:
stonith_node1 (stonith:fence_ipmilan): Stopped
stonith_node2 (stonith:fence_ipmilan): Stopped
vip-slave (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Stopped: [ pgsql:1 ]
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node1
vip-rep (ocf::heartbeat:IPaddr2): Started node1
Node Attributes:
* Node node1:
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 00000070C6FCF9F0
+ pgsql-status : PRI
* Node node2:
+ master-pgsql : -INFINITY
+ pgsql-data-status : DISCONNECT
+ pgsql-status : STOP
Migration summary:
* Node node2:
stonith_node1: migration-threshold=1000000 fail-count=1000000 last-failure='Thu Feb 4 13:43:49 2016'
pgsql:0: migration-threshold=1 last-failure='Thu Feb 4 13:46:14 2016'
* Node node1:
stonith_node2: migration-threshold=1000000 fail-count=1000000 last-failure='Thu Feb 4 13:38:45 2016'
pgsql_start_0 (node=node2, call=84, rc=1, status=complete): unknown error

Feb 4 17:08:12 node1 attrd[3149]: notice: attrd_perform_update: Sent delete 46: node=node1, attr=last-f
ailure-stonith_node2, id=<n/a>, set=(null), section=status
Feb 4 17:08:12 node1 stonith-ng[3147]: notice: stonith_device_register: Device 'stonith_node2' already existed in device
list (2 active devices)
Feb 4 17:08:14 node1 stonith-ng[3147]: notice: log_operation: Operation 'monitor' [8201] for device 'stonith_node2' retur
ned: -1001 (Generic Pacemaker error)
Feb 4 17:08:14 node1 stonith-ng[3147]: warning: log_operation: stonith_node2:8201 [ ERROR: Failed to authenticate to https
://cathy.rocketwork.com.sg:4000 as node1 with key /etc/chef/client.pem ]
Feb 4 17:08:14 node1 stonith-ng[3147]: warning: log_operation: stonith_node2:8201 [ Getting status of IPMI:10.51.113.22...
Spawning: '/usr/bin/ipmitool -I lanplus -H '10.51.113.22' -U 'pacemaker' -L 'OPERATOR' -P '' -v chassis power status'... ]
Feb 4 17:08:14 node1 stonith-ng[3147]: warning: log_operation: stonith_node2:8201 [ Failed ]
Feb 4 17:08:15 node1 crmd[3151]: error: process_lrm_event: LRM operation stonith_node2_start_0 (call=52, status=4, cib-u
pdate=48, confirmed=true) Error
Feb 4 17:08:15 node1 crmd[3151]: warning: status_from_rc: Action 5 (stonith_node2_start_0) on node1 fai
led (target: 0 vs. rc: 1): Error
Feb 4 17:08:15 node1 crmd[3151]: warning: update_failcount: Updating failcount for stonith_node2 on node1
after failed start: rc=1 (update=INFINITY, time=1454576895)
Feb 4 17:08:15 node1 attrd[3149]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-stonith_dain
a2 (INFINITY)
Feb 4 17:08:15 node1 crmd[3151]: warning: update_failcount: Updating failcount for stonith_node2 on node1
after failed start: rc=1 (update=INFINITY, time=1454576895)
Feb 4 17:08:15 node1 crmd[3151]: notice: run_graph: Transition 12 (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=0
, Source=unknown): Stopped
Feb 4 17:08:15 node1 attrd[3149]: notice: attrd_perform_update: Sent update 51: fail-count-stonith_node2=INFINITY
Feb 4 17:08:15 node1 attrd[3149]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-stonith_da

● Lightweight connection pooler for PostgreSQL
● Open-source
● Acts as the single point of entry to the database
● Useful for managing huge number of incoming
connections to the database
● Latest version - v1.7.2

● Common Address Redundancy Protocol (CARP)
● Linux’s implementation of CARP from FreeBSD
● Allows multiple hosts to share a single IP address
● Management of Virtual IP for failover purpose
● For client connectivity to Pg_bouncer
● Latest version => v1.5.2

● Developed by 2ndQuadrant
● Open-source
● Manages replication and failover for your
PostgreSQL HA cluster
● Latest version - v3.1.1

● Linux or Unix only
● repmgr 2.0 is for PostgreSQL 9.0 to 9.4
● repmgr 3.0 is for PostgreSQL 9.3 or higher
● Does not take care of client failover!!

● Automatic failover capabilities
● Provisioning of standby servers
● 2 main tools
○ repmgr => Perform administrative tasks
○ repmgrd => Perform monitoring, automatic failover and
notification events

● Requires a database to store cluster metadata
● Runs as postgres user
● Password-less SSH connectivity between all
hosts
● Recommended to run at with an odd number
cluster

The decision whether a server can be promoted depends whether the majority of servers are
"visible". If you have three servers - primary and standby in one location, and a second
standby in another location - and the network to the second standby goes down, the second
standby will see it's in the minority (its location represents 1/3 of the servers) and won't
promote itself.
If you have two servers in each location, you'd need an additional witness server so one
location still has a "majority" - otherwise in the event of a network disconnection you might
end up with one standby in each location promoting itself.

Thank you for your attention!
weishan.ang@gmail.com
newbiedba.wordpress.com
sg.linkedin.com/in/weishan

Swapping Pacemaker Corosync with repmgr

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Swapping Pacemaker Corosync with repmgr

Ähnlich wie Swapping Pacemaker Corosync with repmgr (20)

Mehr von PGConf APAC

Mehr von PGConf APAC (16)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Swapping Pacemaker Corosync with repmgr