[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)

오픈소스 Pacemaker 홗용한
Zabbix 이중화 방안
Dong hyun Kim
Opensource Business Team
Enterprise Linux Senior Enginner
kimdonghyun0916@gmail.com
Korea Community

# Whoami
 Systems and Infrastructure Geek
 Enterprise Linux Infrastructure Engineer (Red Hat)
 Work
• Technology Research : New Technology Research (Container, Openstack, Etc)
• Technical Support : Troubleshooting, Debugging, Performace Tuning
• Consulting : IT Downsizing, Infra Optimization
 I love linux ♥
• Linux Blog:
http://rhlinux.tistory.com/
• Red Hat Linux Engineer Group:
http://cafe.naver.com/iamstrong
• ClusterLabs Korea(Pacemaker):
https://www.facebook.com/groups/clusterlabskorea
1

2
Pacemaker‟s Story - The Open Source, High Availability Cluster
Overview of HA architectural components
USE CASE EXAMPLE
General futures
☞
☞
☞
☞
# In this Session

3
The Open Source, High Availability Cluster

4
HA for OpenSource Technology

 Pacemaker is :
• LINUX PLATFORM을 위한 High-Availability와 Load-Balancing Stack 제공
• Python-based Unified, scriptable, cluster shell
 클러스터 리소스 정책을 사용자가 직접 결정 :
• Resource Agents 설정을 만들고 지우고 변경하는 것에 대한 자유로움
• 다양한 산업(공공, 증권/금융 등)군에서 사용 어플리케이션에서 요구하는 HA조건을 대체로 만족
• 리소스형태 fence agents 설정관리 용이 STONITH(Shoot The Other Node In The Head)
 Monitor and Control Resource :
• SystemD / LSB / OCF Services
• Cloned Services : N+1, N+M, N nodes
• Multi-state (Master/Slave, Primary/Secondary)
5
What is Pacemaker?

High-Availability in the Open Source Ecosystem
6
• 2003, SUSE's Lars Marowsky-Brée conceived of a
new project called the "crm"
• 2009s, “Corosync” 새로운 Project 발표
• 2010s, Pacemaker version 1.1(Red Hat)
• 2010s, Pacemaker added support for cman
• 2010s. Heartbeat project reached version 3
• 2018s. Pacemaker v2.0 Release(ClusterLabs)
• 2019s, Pacemaker version 2.0(Red Hat)
• 2008, Pacemaker version 0.6.0 was release
- support for OpenAIS
• 2007, Pacemaker (Heartbeat v 2.1.3)
- Heartbeat package called "Pacemaker“
• 2002s, REDHAT "Red Hat Cluster Manager" Version 1
• 1990s, 오픈소스 고가용성 플랫폼을 만들고자 완전히
독립된 두 회사의 시도는 1990년대 후반부터 시작
- SUSE's "Linux HA" project
- Red Hat's “Cluster Services"
• 1998s, "Heartbeat“ 불리우는 새로운 프로토콜
'Linux-HA'프로젝트,이후 heartbeat v1.0 발표
• Global Vendors 갂 기술 협약을 통해 적용범위 확대
• 오늘날, Clusterlabs는 Heartbeat Project 에서생성된
Component들과 다른 솔루션형태로 빠르게 통합 및 변화
• 2004s, Cluster Summit에 Novell와 Red Hat developers 함께 참석
• 2005s, "Heartbeat version 2“released(Linux-HA)

7
Linux-HA / ClusterLabsSUSE Enterprise Linux Red Hat Enterprise Linux
Pacemaker-mgmt
Hwak (GUI)
booth
crmsh (CLI)
PacemakerPacemaker
resource-agents
Heartbeatcorosync
cluster-glue
Community
Developer
Novell
Developer
Red Hat
Developer
OpenSource Project Progress
fence-agents
PCSD (GUI)
Pacemaker
corosync
Resources
Layer
ResourceAllocation
Layer
Messaging/
InfrastructureLayer
PCS (CLI)
Upstream Release
UpstreamRelease
booth

8
“Mission Critical Linux”
8

9
Resource Agents
- Agent Scripts
- Open Cluster Framework
Resource Agents
Pacemaker
- Resource Management
LRMd
Stonith CRMd CIB
PEngine
Corosync
- Membership
- Messaging
- Quorum
Cluster Abstraction Layer
Corosync
Pacemaker - Architecture Component

10
Pacemaker - High level architecture
Messaging / Infrastructure Layer
Resource Allocation Layer
Resources Layer
XML XML
Pacemaker Node #1
Corosync
Cluster
Resource
Manager
CRM
Corosync
Services
Local
Resource
Manager LRM
Policy
Engine
Cluster
Information
Base
CIB (복제)
Resource
Agents RAs
Pacemaker Node #2

11
Quick Overview of Components - CRMd
 CRMd(Cluster Resource Management daemon)
• main controlling process 역할 담당
• 모든 리소스 작업을 라우팅해주는 데몬
• Resource Allocation Layer내에서 수행되는 모든 동작 처리
• Cluster Information Base (CIB) 관리
• CRMd에 의해 관리된 리소스는 필요에 따라 클라이언트
시스템에 전달, 쿼리되거나 인스턴스화하여 변경
RA
Resource Layer
Messaging/Infrastructure Layer
Corosync
RA RA
CIB
(XML)
CRM
PELRM
STONITH

Quick Overview of Components - CIB
 CIB (Cluster Information Base)
• 설정 정보 관리 데몬. XML파일로 설정 (In-memory data)
• DC(Designated Co-ordinator)에 의해 제공되는 각 노드별
설정내용 및 상태 정보를 동기화
• CIB 은 cibadmin 명령어를 사용하여 변경할수 있고, crm
shell 또는 pcs utility 사용
RA
Resource Layer
Corosync
RA RA
CIB
(XML)
CRM
PELRM
12
STONITH

Quick Overview of Components - PEngine
RA
Resource Layer
Corosync
RA RA
CIB
(XML)
CRM
PELRM
 PEngine (PE or Policy Engine)
• 현재 클러스터 상태 및 구성을 기반으로 다음 상태를 결정
• PE프로세스는 각 노드에서 실행되지만, DC[1]에서만 홗성화
• 여러 서비스홖경에 따라 Clone 및 domain 등 사용자 요구에
따라 정책 부여
• 다른 클러스터 노드로 리소스 전홖시 의졲성 확인
13
STONITH
[1] DC(Deginated Controller): 클러스터 메시징 인프라를 통해 다른 노드의 로컬 리소스 관리 데몬(LRMd) 또는 CRMd peer로 전달하여 필요한 순서로 PE의 instructions 수행

Quick Overview of Components - LRMd
RA
Resource Layer
Corosync
RA RA
CIB
(XML)
CRM
PELRM
 LRMd (Local Resource Management Daemon)
• CRMd와 각 리소스 사이에 인터페이스 역할을 수행하며,
CRMd의 명령을 agent에 전달
• CRM을 대싞하여 자기 자싞의 RAs(Resource Agents) 호출
• CRM수행되어 보고된 결과에 따라 start / stop / monitor를
동작
14
STONITH

Quick Overview of Components - RAs
RA
Resource Layer
Corosync
RA RA
CIB
(XML)
CRM
PELRM
 RAs (Resource Agents)
• 클러스터리소스를 위해 정의된 규격화된 인터페이스
• local resource의 start / stops / monitors 스크립트 제공
• RAs(Resource Agents)는 LRM에 의해 호출
• 수많은 Contributer들이 여러 Application홖경에 적용될수
있도록 github 통해 배포
 Pacemaker제공 RA 지원 타입 3가지:
• LSB : Linux Standard Base “init scripts”
• OCF : Open Cluster Framework
- /usr/lib/ocf/resource.d/heartbeat
- /usr/lib/ocf/resource.d/pacemaker
• Stonith Resource Agents
http://linux-ha.org/wiki/OCF_Resource_Agent
http://linux-ha.org/wiki/LSB_Resource_Agents
https://github.com/ClusterLabs/resource-agents
15
STONITH

Quick Overview of Components - STONITHD
RA
Resource Layer
Corosync
RA RA
CIB
(XML)
CRM
PELRM
 STONITHD “Shoot The Other Node In The Head Daemon”
• fence node에서 사용되는 서비스 데몬
• Application-level fencing 설정 가능
 실무에서 가장 많이 사용되는 fence device:
• Power fencing: HP iLO, Dell DRAC, IBM IMM, IPMI
Appliance 등
• I/O fence agents: Fibre Channel Switch fencing, 소프트웨어
기반의 SBD (SUSE진영 가장 많이 사용)
• Listing Fence Device : # ccs -h <host> --lsfenceopts
 Data integrity (데이터 무결성)을 위해 반드시 필요
• 클러스터내 다른 노드로 리소스를 전홖하기 위한 가장
최상의 방법
• “Enterprise”을 지향하는 Linux HA Cluster에서는 선택이
아닌 필수
16
STONITH

17
What is fencing?
„Planned or Unplanned‟ 시스템 다운타임으로 부터 데이타보호하고 예방하기 위한 장치
Kernel panic
System freeze
Live hang / recovery

Quick Overview of Components - Corosync
RA
Resource Layer
Corosync
RA RA
CIB
(XML)
STONITH CRM
PELRM
 Corosync
• Pacemaker 작동에 필요한 기본 클러스터 인프라
• 일반적인 클러스터, 클라우드컴퓨팅 그리고 고가용성
홖경에서 사용되는 오픈소스 그룹 메시징시스템.
 Communication Layer : messaging and membership
• Totem single-ring ordering and membership protocol
• 기본적인 제약 조건 : 브로드캐스트를 통한 멀티캐스트 통싞
방식을 선호
• UDP/IP and InfiniBand 기반의 networks 통싞
• UDPU (CentOS 6.2+ 이상부터 지원)
 클러스터 파일시스템 지원 (GFS2, OCFS2, cLVM2 등)
18

19
Quick Overview of Components - User Interface
 고가용성 관리
• Pacemaker 설정 시스템은 통일된 클러스터 설정
및 관리 도구로 제공
• crm shell : Cluster Resource Manager Shell (SLES)
• pcs : Pacemaker Cluster System (Red Hat)
 이점
• 클러스터를 쉽게 부트스트랩할 수 있어 첫 번째
클러스터를 시작 및 실행 가능
• 리소스 및 상호 관렦성을 추가, 삭제 및 변경
• 클러스터 세부 옵션을 온라인으로 설정 및 확인
pcsd web UI (Red Hat)
Hawk web UI (SLES)

20
USE CASE EXAMPLE
ZABBIX 고가용성을 위한 이중화 솔루션 도입 요구사항
인프라 표준 TA 영역으로 안정성이 검증된 System SW
중·장기적 도입 비용 효율성 고려
자체 기술력이 확보된 오픈소스 기반 이중화 솔루션
다양한 운영 홖경에 적용 가능 여부

21
Architecture Degine Model
Shared Volumes
Database WEB
Active Standby
Active
Virtual IP
Virtual IP
Network
FrontEnd
Database
CLIENTS
INFRA
• Virtual Machine
• Cloud(Instance, Container)
• Dedicated(Lagecy)
• CentOS 7 Update 4
• Pacemaker 1.1
• PostgreSQL 10
Standby
• CentOS 7 Update 4
• Pacemaker 1.1
• Apache HTTP Server 2.4
• Zabbix 4
• Windows, Linux
• Unix, Appliance, Etc…
• Network(Router)
Appliance
DMZPRIVATE

22
Detail : Zabbix Server HA
CentOS 7.4
Pacemaker 1.1
VIP
Apache 2.4
Zabbix Server 4
CentOS 7.4
Pacemaker 1.1
VIP
Apache 2.4
Zabbix Server 4
• Hostname : zabbix-svr02
• Clustername : cluster-node2
• fence device : fence_sbd
192.168.0.51 192.168.0.52
• Hostname : zabbix-svr01
Virtual IP : 192.168.0.50
iSCSI 공유 스토리지
SBD 용도의 공유 디바이스 SCSI장치(1G 이상)
Pacemaker
22
SBDSBD
Active Standby

23
On All nodes: Install HA Component
23
[root@zabbix-svr01 ~]# yum -y install pcs pacemaker fence-agents-all sbd watchdog
~~~~ 생략 ~~~
---> Package fence-agents-vmware-soap.x86_64 0:4.2.1-11.el7 will be installed
--> Processing Dependency: python-suds for package: fence-agents-vmware-soap-4.2.1-11.el7.x86_64
---> Package fence-agents-wti.x86_64 0:4.2.1-11.el7 will be installed
---> Package fence-virt.x86_64 0:0.3.2-13.el7 will be installed
--> Processing Dependency: firewalld-filesystem for package: fence-virt-0.3.2-13.el7.x86_64
---> Package gnutls.x86_64 0:3.3.29-8.el7 will be installed
--> Processing Dependency: trousers >= 0.3.11.2 for package: gnutls-3.3.29-8.el7.x86_64
--> Processing Dependency: libnettle.so.4()(64bit) for package: gnutls-3.3.29-8.el7.x86_64
~~~~ 생략 ~~~~
[root@zabbix-svr01 ~]# rpm -qa | egrep -i '^pcs|^pacemaker|^fence-agents-all|^corosync|^sbd|^watchdog„
pcs-0.9.162-5.el7.x86_64
pacemaker-1.1.18-11.el7.x86_64
corosync-2.4.3-2.el7.x86_64
sbd-1.3.1-7.el7.x86_64
Watchdog-5.13-11.el7.x86_64
[root@zabbix-svr01 ~]# echo <Cluster Password> | passwd --stin hacluster
3. Create user for Cluster
2. Check Pacemaker Package
1. Installation Pacemaker Package

24
On node1: Cluster Setup
24
[root@zabbix-svr01 ~]# pcs cluster auth cluster-node1 cluster-node2
hacluster : hacluster
Password : <Cluster_Password>
cluster-node1 : Authorized
cluster-node2 : Authorized
5. Authentificate cluster nodes:
[root@zabbix-svr01 ~]# pcs cluster setup --name zabbix-cluster cluster-node1 cluster-node2
6. Create Zabbix-Cluster
[root@zabbix-svr01 ~]# systemctl start pcsd; systemctl enable pcsd
Created symlink from /etc/system/system/multi-user.target.wants/pcsd.service to /usr/lib/system/pcsd.service
4. Start PCS Deamon(Pacemaker Cluster Service)
[root@zabbix-svr01 ~]# pcs cluster start --all
cluster-node1: Starting Cluster…
cluster-node2: Starting Cluster…
7. Start Cluster all nodes

25
On node1: Cluster Setup
25
[root@zabbix-svr01 ~]# pcs cluster enable --all
8. Enable Cluster Service on all nodes
[root@zabbix-svr01 ~]# pcs status
Cluster name: zabbix-cluster
Stack: corosync
Current DC: cluster-node2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Nov 28 13:38:00 2019
Last change: Mon Nov 25 19:00:27 2019 by hacluster via crmd on cluster-node1
2 nodes configured
Online: [ cluster-node1 cluster-node2 ]
Full list of resources:
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
9. Check Cluster Status

26
On node1: Fencing Device
26
[root@zabbix-svr01 ~]# lsscsi
[0:0:0:0] cd/dvd QEMU QEMU DVD-ROM 1.5. /dev/sr0
[5:0:0:0] disk LIO-ORG lun15 4.0 /dev/sda
9. Check Disk Block for fencing device(SBD, Storage-Based Death)
[root@zabbix-cluster1 by-id]# sbd -d /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780 list
0 cluster-node1 clear
1 cluster-node2 clear
[root@zabbix-svr01 ~]# sbd -d /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780 create
[root@zabbix-svr01 ~]# sbd -d /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780 dump
==Dumping header on disk /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780
~~~ 생략 ~~~
Timeout (watchdog) : 5
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 10
==Header on disk /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780 is dumped
10. Create SBD Device
[root@zabbix-svr01 ~]# cd /dev/disk/by-id/
[root@zabbix-svr01 by-id]# ll
total 0
~~~ 생략 ~~~
lrwxrwxrwx 1 root root 9 Nov 28 13:44 scsi-3600140595dea00f1f1d492499f682780 -> ../../sda
lrwxrwxrwx 1 root root 9 Nov 28 13:44 wwn-0x600140595dea00f1f1d492499f682780 -> ../../sda

27
On node1: Fencing Device
27
[root@zabbix-svr01 ~]# cat /etc/modules-load.d/softdog.conf
[root@zabbix-svr01 ~]# modprobe -v softdog
[root@zabbix-svr01 ~]# lsmod | grep softdog
[root@zabbix-svr01 ~]# ls -al /dev/ | grep -i watchdog0
crw-------- 1 root root 10, 130 Dec 2 10:06 watchdog
crw-------- 1 root root 252, 0 Dec 2 10:06 watchdog0
11. Insmod Watchdog
[root@zabbix-svr01 ~]# pcs stonith sbd device setup --device=/dev/disk/by-id/scsi-
3600140595dea00f1f1d492499f682780
[root@zabbix-svr01 ~]# pcs stonith sbd enable --device=/dev/disk/by-id/scsi-
3600140595dea00f1f1d492499f682780 --watchdog=/dev/watchdog0@cluster-node1 --
watchdog=/dev/watchdog0@cluster-node2 SBD_WATCHDOG_TIMEOUT=10
[root@zabbix-svr01 ~]# pcs property set stonith-watchdog-timeout=10
12. Configure fence sbd

28
On node1: Pacemaker Resource Setup
28
[root@zabbix-svr01 ~]# pcs stonith create sbd fence_sbd devices=/dev/disk/by-id/scsi-
3600140595dea00f1f1d492499f682780
11. Create fence_sbd
[root@zabbix-svr01 ~]# pcs resource create httpd systemd:httpd op monitor inerval=10s --group zabbix-svc
13. Add Apache HTTP server on group zabbix-svc
[root@zabbix-svr01 ~]# pcs resource create zabbix-server systemd:zabbix-server op monitor inerval=10s --
group zabbix-svc
14. Control zabbix-server daemon
[root@zabbix-svr01 ~]# pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.0.50 op monitor inerval=5s
--group zabbix-svc
15. VIP for zabbix Server APP
[root@zabbix-svr01 ~]# pcs resource defaults resource-stickiness=100
12. Prevent Resource form Moving after Recovery

29
On node1: Pacemaker Status
29
[root@zabbix-svr01 ~]# pcs status
Cluster name: zabbix-cluster
Stack: corosync
Last change: Mon Nov 25 19:00:27 2019 by hacluster via crmd on cluster-node1
2 nodes configured
4 resources configured
sbd (stonith:fence_sbd): Started cluster-node1
vip (ocf::heartbeat:IPaddr2): Started cluster-node1
Resource Group: zabbix-svc
httpd (systemd:httpd): Started cluster-node1
zabbix-server (systemd:zabbix-server): Started cluster-node1
Daemon Status:
sbd: active/enabled
16. Pacemaker Status

30
On node1: Pacemaker Configure
30
[root@zabbix-svr01 ~]# pcs config show --all
Cluster Name: zabbix-cluster
Corosync Nodes:
cluster-node1 cluster-node2
Pacemaker Nodes:
cluster-node1 cluster-node2
Resources:
Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=192.168.0.50
Operations: monitor interval=5s timeout=20s (vip-monitor-interval-10s)
start interval=0s timeout=20s (vip-start-interval-0s)
stop interval=0s timeout=20s (vip-stop-interval-0s)
Group: zabbix-svc
Resource: httpd (class=systemd type=httpd)
Operations: monitor interval=10 timeout=100 (web-monitor-interval-60)
start interval=0s timeout=100 (web-start-interval-0s)
stop interval=0s timeout=100 (web-stop-interval-0s)
Resource: zabbix-server (class=systemd type=zabbix-server)
Operations: monitor interval=10 timeout=100 (zabbix-server-monitor-interval-60)
start interval=0s timeout=100 (zabbix-server-start-interval-0s)
stop interval=0s timeout=100 (zabbix-server-stop-interval-0s)
Stonith Devices:
Resource: sbd (class=stonith type=fence_sbd)
Attributes: devices=/dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780
Operations: monitor interval=60s (sbd-monitor-interval-60s)
Fencing Levels:
~~~ 생략 ~~~~
17. Pacemaker Config Show

31
On Active node: Service Check
18. Check Virtual IP Address
[root@postgres-cluster2 ~]# ip addr show | grep secondary
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:28:33:5a brd ff:ff:ff:ff:ff:ff
inet 192.168.0.51/24 brd 192.168.0.255 scope global eth0
valid_lft forever preferred_lft forever
inet 192.168.0.50/24 brd 192.168.0.255 scope global secondary eth0
inet6 fe80::5054:ff:fe28:335a/64 scope link
19. Check HTTP Server Service
[root@zabbix-cluster2 ~]# ps auxw | grep apache
apache 1388 0.0 0.1 379952 7348 ? S 13:37 0:00 /usr/sbin/httpd -DFOREGROUND
root 18556 0.0 0.0 112648 920 pts/0 R+ 16:36 0:00 grep --color=auto apache
20. Check Zabbix AP Service
[root@zabbix-cluster2 ~]# ps auxw | grep zabbix
zabbix 1410 0.0 0.0 178004 3520 ? S 13:37 0:00 /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf
zabbix 1411 0.0 0.0 178052 2928 ? S 13:37 0:00 /usr/sbin/zabbix_server: configuration syncer [synced
configuration in 0.024625 sec, idle 60 sec]
zabbix 1412 0.0 0.0 178004 2264 ? S 13:37 0:00 /usr/sbin/zabbix_server: alerter #1 started
~~~ 생략 ~~~
31

32
Detail : Database Server HA
CentOS 7.4
Pacemaker 1.1
VIP
PostgreSQL 10
XFS Filesystem
CentOS 7.4
Pacemaker 1.1
• Hostname : postgres-svr01
192.168.0.61 192.168.0.62
• Hostname : postgres-svr01
Virtual IP : 192.168.0.60
SBD 용도의 공유 디바이스 SCSI장치(1G 이상)
SBD
PostgreSQL 영역 (Archive / Log / Data)
LVM
VIP
PostgreSQL 10
XFS Filesystem
LVM
32
Pacemaker
SBD
Active Standby

On node1: PostgreSQL install and Setup(19~23Page 동일)
33
[root@postgres-svr1 ~]# cat /data/zabbix/postgresql.conf
listen_addresses = '*'
log_destination = 'stderr'
logging_collector = on
wal_level = logical
archive_mode = on
#archive_command = 'dd conv=fdatasync bs=256k if=%p of=/archive/temp/%f && mv -vf /archive/temp/%f /archive/zabbix'
archive_command = 'true'
log_min_duration_statement = 2000
log_line_prefix = '%t %u@%r/%d (%p) '
log_statement = 'ddl'
shared_preload_libraries = '$libdir/pg_stat_statements,$libdir/auto_explain,$libdir/passwordcheck'
track_functions = all
track_activity_query_size = 65536
pg_stat_statements.max = 10000
pg_stat_statements.track = all
auto_explain.log_min_duration = '5min'
shared_buffers = 1011406kB
autovacuum_max_workers = 5
max_replication_slots = 3
hot_standby = on
max_wal_senders = 2
max_wal_size = 2GB
min_wal_size = 2GB
log_temp_files = 1024kB
max_connections = 200
maintenance_work_mem = 512MB
checkpoint_completion_target = 0.9
temp_file_limit = 100GB
autovacuum_work_mem = 287MB
1. ZABBIX Specific setting and tuning

33
On node1: Pacemaker Resource Setup
[root@postgres-svr1 ~]# pcs resource create lvm LVM volgrpname=VG01 exclusive=true --group postgres-
svc
[root@postgres-svr1 ~]# pcs resource create archive Filesystem device=“/dev/VG01/archive”
directory=“/archive” fstype=“xfs” options=“noatime,nodiratime,nobarrier” op monitor inerval=20s --group
postgres-svc
[root@postgres-svr1 ~]# pcs resource create pg_wal Filesystem device=“/dev/VG01/pg_wal”
directory=“/pg_wal” fstype=“xfs” options=“noatime,nodiratime,nobarrier” op monitor inerval=20s --group
postgres-svc
[root@postgres-svr1 ~]# pcs resource create data Filesystem device=“/dev/VG01/data” directory=“/data”
fstype=“xfs” options=“noatime,nodiratime,nobarrier” op monitor inerval=20s --group postgres-svc
4. Add LVM/XFS Filesystem on group postgres-svc
[root@postgres-svr1 ~]# pcs resource create pgsql ocf:heartbeat:pgsql bindir=/postgres/10/bin
pgdata=/data/zabbix/postgresql.conf op monitor interval=10s --group postgres-svc
5. Control PostgreSQL Service
[root@postgres-svr1 ~]# pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.0.60 op monitor
inerval=5s --group postgres-svc
6. VIP for PostgreSQL Database
[root@postgres-svr1 ~]# pcs stonith create sbd fence_sbd devices=/dev/disk/by-id/scsi-
3600140595dea00f1f1d492499f682780
2. Create fence_sbd
[root@postgres-svr1 ~]# pcs resource defaults resource-stickiness=100
3. Prevent Resource form Moving after Recovery

3534
On node1: Pacemaker Status
[root@postgres-svr1 ~]# pcs status
Cluster name: postgres-cluster
Stack: corosync
Last change: Wed Nov 20 14:40:03 2019 by hacluster via cibadmin on cluster-node1
2 nodes configured
7 resources configured
sbd (stonith:fence_sbd): Started cluster-node1
Resource Group: postgres-svc
vip (ocf::heartbeat:IPaddr2): Started cluster-node2
lvm (ocf::heartbeat:LVM): Started cluster-node2
pg_wal (ocf::heartbeat:Filesystem): Started cluster-node2
archive (ocf::heartbeat:Filesystem): Started cluster-node2
data (ocf::heartbeat:Filesystem): Started cluster-node2
pgsql (ocf::heartbeat:pgsql): Started cluster-node2
Daemon Status:
sbd: active/enabled
7. Pacemaker Status

3635
On node1: Pacemaker Configure
[root@postgres-svr1 ~]# pcs config show
~~~ 생략 ~~~
Resources:
Group: postgres-svc
Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=192.168.0.60
Operations: monitor interval=5s timeout=20s (vip-monitor-interval-10s)
start interval=0s timeout=20s (vip-start-interval-0s)
stop interval=0s timeout=20s (vip-stop-interval-0s)
Resource: pg_wal (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/VG02/pg_wal directory=/pg_wal fstype=xfs
Operations: monitor interval=20 timeout=40 (pg_wal-monitor-interval-20)
notify interval=0s timeout=60 (pg_wal-notify-interval-0s)
start interval=0s timeout=60 (pg_wal-start-interval-0s)
stop interval=0s timeout=60 (pg_wal-stop-interval-0s)
Resource: archive (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/VG01/archive directory=/archive fstype=xfs
Operations: monitor interval=20 timeout=40 (archive-monitor-interval-20)
notify interval=0s timeout=60 (archive-notify-interval-0s)
start interval=0s timeout=60 (archive-start-interval-0s)
stop interval=0s timeout=60 (archive-stop-interval-0s)
Resource: data (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/VG03/data directory=/data fstype=xfs
Operations: monitor interval=20 timeout=40 (data-monitor-interval-20)
notify interval=0s timeout=60 (data-notify-interval-0s)
start interval=0s timeout=60 (data-start-interval-0s)
stop interval=0s timeout=60 (data-stop-interval-0s)
Resource: pgsql (class=ocf provider=heartbeat type=pgsql)
Attributes: config=/data/zabbix/postgresql.conf pgctl=/postgres/10/bin/pg_ctl pgdata=/data/zabbix psql=/postgres/10/bin/psql restart_on_promote=ture
Meta Attrs: migration-threshold=3
Operations: demote interval=0s timeout=120 (pgsql-demote-interval-0s)
methods interval=0s timeout=5 (pgsql-methods-interval-0s)
monitor interval=30 timeout=30 (pgsql-monitor-interval-30)
monitor interval=29 role=Master timeout=30 (pgsql-monitor-interval-29)
notify interval=0s timeout=90 (pgsql-notify-interval-0s)
promote interval=0s timeout=120 (pgsql-promote-interval-0s)
start interval=0s timeout=120 (pgsql-start-interval-0s)
stop interval=0s timeout=120 (pgsql-stop-interval-0s)
~~~ 생략 ~~~~
8. Pacemaker Config Show

37
On Active node: Service Check
9. Check Virtual IP Address
[root@postgres-svr1 ~]# ip addr show | grep secondary
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:28:33:5a brd ff:ff:ff:ff:ff:ff
inet 192.168.0.61/24 brd 192.168.0.255 scope global eth0
inet 192.168.0.60/24 brd 192.168.0.255 scope global secondary eth0
inet6 fe80::5054:ff:fe28:335a/64 scope link
10. Check PostgreSQL Service
11. Check Filesystem mount
[root@postgres-svr1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
~~~ 생략 ~~~
/dev/sdb 100G 65M 100G 1% /postgres
/dev/mapper/VG02-pg_wal 50G 2.1G 48G 5% /pg_wal
/dev/mapper/VG01-archive 50G 33M 50G 1% /archive
/dev/mapper/VG03-data 50G 1.6G 49G 4% /data
[root@postgres-svr1 ~]# ps auxw | grep postgres
postgres 2603 0.0 0.1 1180040 4980 ? S 09:45 0:01 /postgres/10/bin/postgres -D /data/zabbix -c
config_file=/data/zabbix/postgresql.conf
postgres 2604 0.0 0.0 96560 1648 ? Ss 09:45 0:00 postgres: logger process
postgres 2606 0.0 0.0 1180188 2312 ? Ss 09:45 0:00 postgres: checkpointer process
postgres 2607 0.0 0.0 1180180 1820 ? Ss 09:45 0:00 postgres: writer process
postgres 2608 0.0 0.0 1180040 1820 ? Ss 09:45 0:01 postgres: wal writer process
~~~ 생략 ~~~~
36

37
37
Failover Scenario - Primary Node Down
Pacemaker
Resource Group
Clients
Shared Volumes
CentOS CentOS
CentOS
Virtual IP
Apache HTTP Server
Zabbix Server
HW / SW
장애발생
HW / SW
장애발생
Active Standby
Active Standby
CentOS
Pacemaker
Resource Group
Virtual IP
PostgreSQL
Filesystem
FrontEndDatabase

39
General Future - Pacemaker 2.0
38
Previous name Current name 목적
attrd pacemaker-attrd 노드 속성 관리
cib pacemaker-cib 클러스터 정보 관리
crmd pacemaker-controld 클러스터 조정
lrmd pacemaker-execd 자싞 리소스 에이전트 수행
stonithd pacemaker-fenced 노드 펜스 수행
pacemaker_remoted pacemaker-remoted 원격지 리소스 에이전트 수행
pengine pacemaker-schedulerd 스케줄러 수행
Pacemaker
corosync
cpg Quorum
Resource1 Resource2
Pacemaker-execd Pacemaker-fenced
Pacemaker-schedulerd Pacemaker-attrd
Pacemaker-based
Pacemaker-controld
Pacemakerd
NetworkNode 1 Node 2
Pacemaker
corosync
cpg Quorum
Resource1 Resource2
Pacemaker-execd Pacemaker-fenced
Pacemaker-schedulerd Pacemaker-attrd
Pacemaker-based
Pacemaker-controld
Pacemakerd
cpg(Cluster Process Group)

40
General Future - Kronosnet(KNET)
39
 What is Kronosnet?
• https://www.kronosnet.org
• 네트워크 전송 가용성 확보를 위한 라이브러리
• 중복성, 보안, 내결함성 및 빠른 장애 조치가 핵심 요구
사항인 고가용성 사용을 위해 설계된 네트워크 추상화
계층 구조
 Project Features
• 호스트당 최대 8개의 개별 네트워크 연결을 허용하여
클러스터 노드갂 네트워크 통싞에 높은 가용성을 제공
• 통싞이 복구된 후 자동으로 오류 발생 링크를 복구하여
재해 복구 속도를 높이고 다운타임을 단축
• 다중 네트워크 프로토콜 지원(UDP/SCTP)
• 방화벽을 통과하여 Subnet 전반에서 동작
Pacemaker
Corosync
totempg
totemsrp
totemnet
totemknet
libknet
NIC 1 NIC 2

At the end of….
여러분이 생각하는 고가용성이란 무엇인가요?
또한, 어떻게 생각하고 계셨나요?
어떠한 솔루션도 99.9% 고가용성을 보장해줄수는 없습니다.
또한, Best Practice 란 없습니다.
오로지 Test! Test! TEST!!!
40

REFERENCE
All *open-source* in the whole stack. Go, googling, …
Configuring the Red Hat High Availability Add-On with Pacemaker
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html-single/Configuring_the_Red_Hat_High_Availability_Add-
On_with_Pacemaker/index.html
SAP on Red Hat Technical Documents
http://www.redhat.com/f/pdf/ha-sap-v1-6-4.pdf
Red Hat Reference Architecture Series
http://www.sistina.com/rhel/resource_center/reference_architecture.html
Clusterlabs
http://clusterlabs.org/doc/
http://blog.clusterlabs.org/
OpenStack HA
http://www.slideshare.net/kenhui65/openstack-ha
High Availability on Linux - the SUSE way
https://tuna.moe/assets/slides/SFD2015/SUSE_HA_arch_overview.pdf
Github ClusterLabs (Booth)
https://github.com/ClusterLabs/booth
41

43
Dong hyun Kim | Manager
Opensource Buisness Team, kt ds
kim.donghyun@kt.com

[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie [발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)

Ähnlich wie [발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community) (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)