SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
오픈소스 Pacemaker 홗용한
Zabbix 이중화 방안
Dong hyun Kim
Opensource Business Team
Enterprise Linux Senior Enginner
kimdonghyun0916@gmail.com
Korea Community
# Whoami
 Systems and Infrastructure Geek
 Enterprise Linux Infrastructure Engineer (Red Hat)
 Work
• Technology Research : New Technology Research (Container, Openstack, Etc)
• Technical Support : Troubleshooting, Debugging, Performace Tuning
• Consulting : IT Downsizing, Infra Optimization
 I love linux ♥
• Linux Blog:
http://rhlinux.tistory.com/
• Red Hat Linux Engineer Group:
http://cafe.naver.com/iamstrong
• ClusterLabs Korea(Pacemaker):
https://www.facebook.com/groups/clusterlabskorea
1
2
Pacemaker‟s Story - The Open Source, High Availability Cluster
Overview of HA architectural components
USE CASE EXAMPLE
General futures
☞
☞
☞
☞
# In this Session
3
The Open Source, High Availability Cluster
4
HA for OpenSource Technology
 Pacemaker is :
• LINUX PLATFORM을 위한 High-Availability와 Load-Balancing Stack 제공
• Python-based Unified, scriptable, cluster shell
 클러스터 리소스 정책을 사용자가 직접 결정 :
• Resource Agents 설정을 만들고 지우고 변경하는 것에 대한 자유로움
• 다양한 산업(공공, 증권/금융 등)군에서 사용 어플리케이션에서 요구하는 HA조건을 대체로 만족
• 리소스형태 fence agents 설정관리 용이 STONITH(Shoot The Other Node In The Head)
 Monitor and Control Resource :
• SystemD / LSB / OCF Services
• Cloned Services : N+1, N+M, N nodes
• Multi-state (Master/Slave, Primary/Secondary)
5
What is Pacemaker?
High-Availability in the Open Source Ecosystem
6
• 2003, SUSE's Lars Marowsky-Brée conceived of a
new project called the "crm"
• 2009s, “Corosync” 새로운 Project 발표
• 2010s, Pacemaker version 1.1(Red Hat)
• 2010s, Pacemaker added support for cman
• 2010s. Heartbeat project reached version 3
• 2018s. Pacemaker v2.0 Release(ClusterLabs)
• 2019s, Pacemaker version 2.0(Red Hat)
• 2008, Pacemaker version 0.6.0 was release
- support for OpenAIS
• 2007, Pacemaker (Heartbeat v 2.1.3)
- Heartbeat package called "Pacemaker“
• 2002s, REDHAT "Red Hat Cluster Manager" Version 1
• 1990s, 오픈소스 고가용성 플랫폼을 만들고자 완전히
독립된 두 회사의 시도는 1990년대 후반부터 시작
- SUSE's "Linux HA" project
- Red Hat's “Cluster Services"
• 1998s, "Heartbeat“ 불리우는 새로운 프로토콜
'Linux-HA'프로젝트,이후 heartbeat v1.0 발표
• Global Vendors 갂 기술 협약을 통해 적용범위 확대
• 오늘날, Clusterlabs는 Heartbeat Project 에서생성된
Component들과 다른 솔루션형태로 빠르게 통합 및 변화
• 2004s, Cluster Summit에 Novell와 Red Hat developers 함께 참석
• 2005s, "Heartbeat version 2“released(Linux-HA)
7
Linux-HA / ClusterLabsSUSE Enterprise Linux Red Hat Enterprise Linux
Pacemaker-mgmt
Hwak (GUI)
booth
crmsh (CLI)
PacemakerPacemaker
resource-agents
Heartbeatcorosync
cluster-glue
Community
Developer
Novell
Developer
Red Hat
Developer
OpenSource Project Progress
fence-agents
PCSD (GUI)
Pacemaker
corosync
Resources
Layer
ResourceAllocation
Layer
Messaging/
InfrastructureLayer
PCS (CLI)
Upstream Release
UpstreamRelease
booth
8
“Mission Critical Linux”
8
9
Resource Agents
- Agent Scripts
- Open Cluster Framework
Resource Agents
Pacemaker
- Resource Management
LRMd
Stonith CRMd CIB
PEngine
Corosync
- Membership
- Messaging
- Quorum
Cluster Abstraction Layer
Corosync
Pacemaker - Architecture Component
10
Pacemaker - High level architecture
Messaging / Infrastructure Layer
Resource Allocation Layer
Resources Layer
XML XML
Pacemaker Node #1
Corosync
Cluster
Resource
Manager
CRM
Corosync
Services
Local
Resource
Manager LRM
Policy
Engine
Cluster
Information
Base
CIB (복제)
Resource
Agents RAs
Pacemaker Node #2
11
Quick Overview of Components - CRMd
 CRMd(Cluster Resource Management daemon)
• main controlling process 역할 담당
• 모든 리소스 작업을 라우팅해주는 데몬
• Resource Allocation Layer내에서 수행되는 모든 동작 처리
• Cluster Information Base (CIB) 관리
• CRMd에 의해 관리된 리소스는 필요에 따라 클라이언트
시스템에 전달, 쿼리되거나 인스턴스화하여 변경
RA
Resource Layer
Messaging/Infrastructure Layer
Corosync
RA RA
CIB
(XML)
CRM
Resource Allocation Layer
PELRM
STONITH
Quick Overview of Components - CIB
 CIB (Cluster Information Base)
• 설정 정보 관리 데몬. XML파일로 설정 (In-memory data)
• DC(Designated Co-ordinator)에 의해 제공되는 각 노드별
설정내용 및 상태 정보를 동기화
• CIB 은 cibadmin 명령어를 사용하여 변경할수 있고, crm
shell 또는 pcs utility 사용
RA
Resource Layer
Messaging/Infrastructure Layer
Corosync
RA RA
CIB
(XML)
CRM
Resource Allocation Layer
PELRM
12
STONITH
Quick Overview of Components - PEngine
RA
Resource Layer
Messaging/Infrastructure Layer
Corosync
RA RA
CIB
(XML)
CRM
Resource Allocation Layer
PELRM
 PEngine (PE or Policy Engine)
• 현재 클러스터 상태 및 구성을 기반으로 다음 상태를 결정
• PE프로세스는 각 노드에서 실행되지만, DC[1]에서만 홗성화
• 여러 서비스홖경에 따라 Clone 및 domain 등 사용자 요구에
따라 정책 부여
• 다른 클러스터 노드로 리소스 전홖시 의졲성 확인
13
STONITH
[1] DC(Deginated Controller): 클러스터 메시징 인프라를 통해 다른 노드의 로컬 리소스 관리 데몬(LRMd) 또는 CRMd peer로 전달하여 필요한 순서로 PE의 instructions 수행
Quick Overview of Components - LRMd
RA
Resource Layer
Messaging/Infrastructure Layer
Corosync
RA RA
CIB
(XML)
CRM
Resource Allocation Layer
PELRM
 LRMd (Local Resource Management Daemon)
• CRMd와 각 리소스 사이에 인터페이스 역할을 수행하며,
CRMd의 명령을 agent에 전달
• CRM을 대싞하여 자기 자싞의 RAs(Resource Agents) 호출
• CRM수행되어 보고된 결과에 따라 start / stop / monitor를
동작
14
STONITH
Quick Overview of Components - RAs
RA
Resource Layer
Messaging/Infrastructure Layer
Corosync
RA RA
CIB
(XML)
CRM
Resource Allocation Layer
PELRM
 RAs (Resource Agents)
• 클러스터리소스를 위해 정의된 규격화된 인터페이스
• local resource의 start / stops / monitors 스크립트 제공
• RAs(Resource Agents)는 LRM에 의해 호출
• 수많은 Contributer들이 여러 Application홖경에 적용될수
있도록 github 통해 배포
 Pacemaker제공 RA 지원 타입 3가지:
• LSB : Linux Standard Base “init scripts”
• OCF : Open Cluster Framework
- /usr/lib/ocf/resource.d/heartbeat
- /usr/lib/ocf/resource.d/pacemaker
• Stonith Resource Agents
http://linux-ha.org/wiki/OCF_Resource_Agent
http://linux-ha.org/wiki/LSB_Resource_Agents
https://github.com/ClusterLabs/resource-agents
15
STONITH
Quick Overview of Components - STONITHD
RA
Resource Layer
Messaging/Infrastructure Layer
Corosync
RA RA
CIB
(XML)
CRM
Resource Allocation Layer
PELRM
 STONITHD “Shoot The Other Node In The Head Daemon”
• fence node에서 사용되는 서비스 데몬
• Application-level fencing 설정 가능
 실무에서 가장 많이 사용되는 fence device:
• Power fencing: HP iLO, Dell DRAC, IBM IMM, IPMI
Appliance 등
• I/O fence agents: Fibre Channel Switch fencing, 소프트웨어
기반의 SBD (SUSE진영 가장 많이 사용)
• Listing Fence Device : # ccs -h <host> --lsfenceopts
 Data integrity (데이터 무결성)을 위해 반드시 필요
• 클러스터내 다른 노드로 리소스를 전홖하기 위한 가장
최상의 방법
• “Enterprise”을 지향하는 Linux HA Cluster에서는 선택이
아닌 필수
16
STONITH
17
What is fencing?
„Planned or Unplanned‟ 시스템 다운타임으로 부터 데이타보호하고 예방하기 위한 장치
Kernel panic
System freeze
Live hang / recovery
Quick Overview of Components - Corosync
RA
Resource Layer
Messaging/Infrastructure Layer
Corosync
RA RA
CIB
(XML)
STONITH CRM
Resource Allocation Layer
PELRM
 Corosync
• Pacemaker 작동에 필요한 기본 클러스터 인프라
• 일반적인 클러스터, 클라우드컴퓨팅 그리고 고가용성
홖경에서 사용되는 오픈소스 그룹 메시징시스템.
 Communication Layer : messaging and membership
• Totem single-ring ordering and membership protocol
• 기본적인 제약 조건 : 브로드캐스트를 통한 멀티캐스트 통싞
방식을 선호
• UDP/IP and InfiniBand 기반의 networks 통싞
• UDPU (CentOS 6.2+ 이상부터 지원)
 클러스터 파일시스템 지원 (GFS2, OCFS2, cLVM2 등)
18
19
Quick Overview of Components - User Interface
 고가용성 관리
• Pacemaker 설정 시스템은 통일된 클러스터 설정
및 관리 도구로 제공
• crm shell : Cluster Resource Manager Shell (SLES)
• pcs : Pacemaker Cluster System (Red Hat)
 이점
• 클러스터를 쉽게 부트스트랩할 수 있어 첫 번째
클러스터를 시작 및 실행 가능
• 리소스 및 상호 관렦성을 추가, 삭제 및 변경
• 클러스터 세부 옵션을 온라인으로 설정 및 확인
pcsd web UI (Red Hat)
Hawk web UI (SLES)
20
USE CASE EXAMPLE
ZABBIX 고가용성을 위한 이중화 솔루션 도입 요구사항
인프라 표준 TA 영역으로 안정성이 검증된 System SW
중·장기적 도입 비용 효율성 고려
자체 기술력이 확보된 오픈소스 기반 이중화 솔루션
다양한 운영 홖경에 적용 가능 여부
21
Architecture Degine Model
Shared Volumes
Database WEB
Active Standby
Active
Virtual IP
Virtual IP
Network
FrontEnd
Database
CLIENTS
INFRA
• Virtual Machine
• Cloud(Instance, Container)
• Dedicated(Lagecy)
• CentOS 7 Update 4
• Pacemaker 1.1
• PostgreSQL 10
Standby
• CentOS 7 Update 4
• Pacemaker 1.1
• Apache HTTP Server 2.4
• Zabbix 4
• Windows, Linux
• Unix, Appliance, Etc…
• Network(Router)
Appliance
DMZPRIVATE
22
Detail : Zabbix Server HA
CentOS 7.4
Pacemaker 1.1
VIP
Apache 2.4
Zabbix Server 4
CentOS 7.4
Pacemaker 1.1
VIP
Apache 2.4
Zabbix Server 4
• Hostname : zabbix-svr02
• Clustername : cluster-node2
• fence device : fence_sbd
192.168.0.51 192.168.0.52
• Hostname : zabbix-svr01
• Clustername : cluster-node1
• fence device : fence_sbd
Virtual IP : 192.168.0.50
iSCSI 공유 스토리지
SBD 용도의 공유 디바이스 SCSI장치(1G 이상)
Pacemaker
22
SBDSBD
Active Standby
23
On All nodes: Install HA Component
23
[root@zabbix-svr01 ~]# yum -y install pcs pacemaker fence-agents-all sbd watchdog
~~~~ 생략 ~~~
---> Package fence-agents-vmware-soap.x86_64 0:4.2.1-11.el7 will be installed
--> Processing Dependency: python-suds for package: fence-agents-vmware-soap-4.2.1-11.el7.x86_64
---> Package fence-agents-wti.x86_64 0:4.2.1-11.el7 will be installed
---> Package fence-virt.x86_64 0:0.3.2-13.el7 will be installed
--> Processing Dependency: firewalld-filesystem for package: fence-virt-0.3.2-13.el7.x86_64
---> Package gnutls.x86_64 0:3.3.29-8.el7 will be installed
--> Processing Dependency: trousers >= 0.3.11.2 for package: gnutls-3.3.29-8.el7.x86_64
--> Processing Dependency: libnettle.so.4()(64bit) for package: gnutls-3.3.29-8.el7.x86_64
~~~~ 생략 ~~~~
[root@zabbix-svr01 ~]# rpm -qa | egrep -i '^pcs|^pacemaker|^fence-agents-all|^corosync|^sbd|^watchdog„
pcs-0.9.162-5.el7.x86_64
pacemaker-1.1.18-11.el7.x86_64
corosync-2.4.3-2.el7.x86_64
sbd-1.3.1-7.el7.x86_64
Watchdog-5.13-11.el7.x86_64
[root@zabbix-svr01 ~]# echo <Cluster Password> | passwd --stin hacluster
3. Create user for Cluster
2. Check Pacemaker Package
1. Installation Pacemaker Package
24
On node1: Cluster Setup
24
[root@zabbix-svr01 ~]# pcs cluster auth cluster-node1 cluster-node2
hacluster : hacluster
Password : <Cluster_Password>
cluster-node1 : Authorized
cluster-node2 : Authorized
5. Authentificate cluster nodes:
[root@zabbix-svr01 ~]# pcs cluster setup --name zabbix-cluster cluster-node1 cluster-node2
6. Create Zabbix-Cluster
[root@zabbix-svr01 ~]# systemctl start pcsd; systemctl enable pcsd
Created symlink from /etc/system/system/multi-user.target.wants/pcsd.service to /usr/lib/system/pcsd.service
4. Start PCS Deamon(Pacemaker Cluster Service)
[root@zabbix-svr01 ~]# pcs cluster start --all
cluster-node1: Starting Cluster…
cluster-node2: Starting Cluster…
7. Start Cluster all nodes
25
On node1: Cluster Setup
25
[root@zabbix-svr01 ~]# pcs cluster enable --all
8. Enable Cluster Service on all nodes
[root@zabbix-svr01 ~]# pcs status
Cluster name: zabbix-cluster
Stack: corosync
Current DC: cluster-node2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Nov 28 13:38:00 2019
Last change: Mon Nov 25 19:00:27 2019 by hacluster via crmd on cluster-node1
2 nodes configured
Online: [ cluster-node1 cluster-node2 ]
Full list of resources:
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
9. Check Cluster Status
26
On node1: Fencing Device
26
[root@zabbix-svr01 ~]# lsscsi
[0:0:0:0] cd/dvd QEMU QEMU DVD-ROM 1.5. /dev/sr0
[5:0:0:0] disk LIO-ORG lun15 4.0 /dev/sda
9. Check Disk Block for fencing device(SBD, Storage-Based Death)
[root@zabbix-cluster1 by-id]# sbd -d /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780 list
0 cluster-node1 clear
1 cluster-node2 clear
[root@zabbix-svr01 ~]# sbd -d /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780 create
[root@zabbix-svr01 ~]# sbd -d /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780 dump
==Dumping header on disk /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780
~~~ 생략 ~~~
Timeout (watchdog) : 5
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 10
==Header on disk /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780 is dumped
10. Create SBD Device
[root@zabbix-svr01 ~]# cd /dev/disk/by-id/
[root@zabbix-svr01 by-id]# ll
total 0
~~~ 생략 ~~~
lrwxrwxrwx 1 root root 9 Nov 28 13:44 scsi-3600140595dea00f1f1d492499f682780 -> ../../sda
lrwxrwxrwx 1 root root 9 Nov 28 13:44 wwn-0x600140595dea00f1f1d492499f682780 -> ../../sda
27
On node1: Fencing Device
27
[root@zabbix-svr01 ~]# cat /etc/modules-load.d/softdog.conf
[root@zabbix-svr01 ~]# modprobe -v softdog
[root@zabbix-svr01 ~]# lsmod | grep softdog
[root@zabbix-svr01 ~]# ls -al /dev/ | grep -i watchdog0
crw-------- 1 root root 10, 130 Dec 2 10:06 watchdog
crw-------- 1 root root 252, 0 Dec 2 10:06 watchdog0
11. Insmod Watchdog
[root@zabbix-svr01 ~]# pcs stonith sbd device setup --device=/dev/disk/by-id/scsi-
3600140595dea00f1f1d492499f682780
[root@zabbix-svr01 ~]# pcs stonith sbd enable --device=/dev/disk/by-id/scsi-
3600140595dea00f1f1d492499f682780 --watchdog=/dev/watchdog0@cluster-node1 --
watchdog=/dev/watchdog0@cluster-node2 SBD_WATCHDOG_TIMEOUT=10
[root@zabbix-svr01 ~]# pcs property set stonith-watchdog-timeout=10
12. Configure fence sbd
28
On node1: Pacemaker Resource Setup
28
[root@zabbix-svr01 ~]# pcs stonith create sbd fence_sbd devices=/dev/disk/by-id/scsi-
3600140595dea00f1f1d492499f682780
11. Create fence_sbd
[root@zabbix-svr01 ~]# pcs resource create httpd systemd:httpd op monitor inerval=10s --group zabbix-svc
13. Add Apache HTTP server on group zabbix-svc
[root@zabbix-svr01 ~]# pcs resource create zabbix-server systemd:zabbix-server op monitor inerval=10s --
group zabbix-svc
14. Control zabbix-server daemon
[root@zabbix-svr01 ~]# pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.0.50 op monitor inerval=5s
--group zabbix-svc
15. VIP for zabbix Server APP
[root@zabbix-svr01 ~]# pcs resource defaults resource-stickiness=100
12. Prevent Resource form Moving after Recovery
29
On node1: Pacemaker Status
29
[root@zabbix-svr01 ~]# pcs status
Cluster name: zabbix-cluster
Stack: corosync
Current DC: cluster-node1 (version 1.1.18-11.el7_4.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Nov 28 13:38:00 2019
Last change: Mon Nov 25 19:00:27 2019 by hacluster via crmd on cluster-node1
2 nodes configured
4 resources configured
Online: [ cluster-node1 cluster-node2 ]
Full list of resources:
sbd (stonith:fence_sbd): Started cluster-node1
vip (ocf::heartbeat:IPaddr2): Started cluster-node1
Resource Group: zabbix-svc
httpd (systemd:httpd): Started cluster-node1
zabbix-server (systemd:zabbix-server): Started cluster-node1
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
sbd: active/enabled
16. Pacemaker Status
30
On node1: Pacemaker Configure
30
[root@zabbix-svr01 ~]# pcs config show --all
Cluster Name: zabbix-cluster
Corosync Nodes:
cluster-node1 cluster-node2
Pacemaker Nodes:
cluster-node1 cluster-node2
Resources:
Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=192.168.0.50
Operations: monitor interval=5s timeout=20s (vip-monitor-interval-10s)
start interval=0s timeout=20s (vip-start-interval-0s)
stop interval=0s timeout=20s (vip-stop-interval-0s)
Group: zabbix-svc
Resource: httpd (class=systemd type=httpd)
Operations: monitor interval=10 timeout=100 (web-monitor-interval-60)
start interval=0s timeout=100 (web-start-interval-0s)
stop interval=0s timeout=100 (web-stop-interval-0s)
Resource: zabbix-server (class=systemd type=zabbix-server)
Operations: monitor interval=10 timeout=100 (zabbix-server-monitor-interval-60)
start interval=0s timeout=100 (zabbix-server-start-interval-0s)
stop interval=0s timeout=100 (zabbix-server-stop-interval-0s)
Stonith Devices:
Resource: sbd (class=stonith type=fence_sbd)
Attributes: devices=/dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780
Operations: monitor interval=60s (sbd-monitor-interval-60s)
Fencing Levels:
~~~ 생략 ~~~~
17. Pacemaker Config Show
31
On Active node: Service Check
18. Check Virtual IP Address
[root@postgres-cluster2 ~]# ip addr show | grep secondary
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:28:33:5a brd ff:ff:ff:ff:ff:ff
inet 192.168.0.51/24 brd 192.168.0.255 scope global eth0
valid_lft forever preferred_lft forever
inet 192.168.0.50/24 brd 192.168.0.255 scope global secondary eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe28:335a/64 scope link
valid_lft forever preferred_lft forever
19. Check HTTP Server Service
[root@zabbix-cluster2 ~]# ps auxw | grep apache
apache 1388 0.0 0.1 379952 7348 ? S 13:37 0:00 /usr/sbin/httpd -DFOREGROUND
apache 1389 0.0 0.1 379952 7348 ? S 13:37 0:00 /usr/sbin/httpd -DFOREGROUND
apache 1390 0.0 0.1 379952 7348 ? S 13:37 0:00 /usr/sbin/httpd -DFOREGROUND
apache 1391 0.0 0.1 379952 7348 ? S 13:37 0:00 /usr/sbin/httpd -DFOREGROUND
apache 1392 0.0 0.1 379952 7348 ? S 13:37 0:00 /usr/sbin/httpd -DFOREGROUND
root 18556 0.0 0.0 112648 920 pts/0 R+ 16:36 0:00 grep --color=auto apache
20. Check Zabbix AP Service
[root@zabbix-cluster2 ~]# ps auxw | grep zabbix
zabbix 1410 0.0 0.0 178004 3520 ? S 13:37 0:00 /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf
zabbix 1411 0.0 0.0 178052 2928 ? S 13:37 0:00 /usr/sbin/zabbix_server: configuration syncer [synced
configuration in 0.024625 sec, idle 60 sec]
zabbix 1412 0.0 0.0 178004 2264 ? S 13:37 0:00 /usr/sbin/zabbix_server: alerter #1 started
zabbix 1413 0.0 0.0 178004 2264 ? S 13:37 0:00 /usr/sbin/zabbix_server: alerter #2 started
zabbix 1414 0.0 0.0 178004 2264 ? S 13:37 0:00 /usr/sbin/zabbix_server: alerter #3 started
~~~ 생략 ~~~
31
32
Detail : Database Server HA
CentOS 7.4
Pacemaker 1.1
VIP
PostgreSQL 10
XFS Filesystem
CentOS 7.4
Pacemaker 1.1
• Hostname : postgres-svr01
• Clustername : cluster-node2
• fence device : fence_sbd
192.168.0.61 192.168.0.62
• Hostname : postgres-svr01
• Clustername : cluster-node1
• fence device : fence_sbd
Virtual IP : 192.168.0.60
SBD 용도의 공유 디바이스 SCSI장치(1G 이상)
SBD
iSCSI 공유 스토리지
iSCSI 공유 스토리지
PostgreSQL 영역 (Archive / Log / Data)
LVM
VIP
PostgreSQL 10
XFS Filesystem
LVM
32
Pacemaker
SBD
Active Standby
On node1: PostgreSQL install and Setup(19~23Page 동일)
33
[root@postgres-svr1 ~]# cat /data/zabbix/postgresql.conf
listen_addresses = '*'
log_destination = 'stderr'
logging_collector = on
wal_level = logical
archive_mode = on
#archive_command = 'dd conv=fdatasync bs=256k if=%p of=/archive/temp/%f && mv -vf /archive/temp/%f /archive/zabbix'
archive_command = 'true'
log_min_duration_statement = 2000
log_line_prefix = '%t %u@%r/%d (%p) '
log_statement = 'ddl'
shared_preload_libraries = '$libdir/pg_stat_statements,$libdir/auto_explain,$libdir/passwordcheck'
track_functions = all
track_activity_query_size = 65536
pg_stat_statements.max = 10000
pg_stat_statements.track = all
auto_explain.log_min_duration = '5min'
shared_buffers = 1011406kB
autovacuum_max_workers = 5
max_replication_slots = 3
hot_standby = on
max_wal_senders = 2
max_wal_size = 2GB
min_wal_size = 2GB
log_temp_files = 1024kB
max_connections = 200
maintenance_work_mem = 512MB
checkpoint_completion_target = 0.9
temp_file_limit = 100GB
autovacuum_work_mem = 287MB
1. ZABBIX Specific setting and tuning
33
On node1: Pacemaker Resource Setup
[root@postgres-svr1 ~]# pcs resource create lvm LVM volgrpname=VG01 exclusive=true --group postgres-
svc
[root@postgres-svr1 ~]# pcs resource create archive Filesystem device=“/dev/VG01/archive”
directory=“/archive” fstype=“xfs” options=“noatime,nodiratime,nobarrier” op monitor inerval=20s --group
postgres-svc
[root@postgres-svr1 ~]# pcs resource create pg_wal Filesystem device=“/dev/VG01/pg_wal”
directory=“/pg_wal” fstype=“xfs” options=“noatime,nodiratime,nobarrier” op monitor inerval=20s --group
postgres-svc
[root@postgres-svr1 ~]# pcs resource create data Filesystem device=“/dev/VG01/data” directory=“/data”
fstype=“xfs” options=“noatime,nodiratime,nobarrier” op monitor inerval=20s --group postgres-svc
4. Add LVM/XFS Filesystem on group postgres-svc
[root@postgres-svr1 ~]# pcs resource create pgsql ocf:heartbeat:pgsql bindir=/postgres/10/bin
pgdata=/data/zabbix/postgresql.conf op monitor interval=10s --group postgres-svc
5. Control PostgreSQL Service
[root@postgres-svr1 ~]# pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.0.60 op monitor
inerval=5s --group postgres-svc
6. VIP for PostgreSQL Database
[root@postgres-svr1 ~]# pcs stonith create sbd fence_sbd devices=/dev/disk/by-id/scsi-
3600140595dea00f1f1d492499f682780
2. Create fence_sbd
[root@postgres-svr1 ~]# pcs resource defaults resource-stickiness=100
3. Prevent Resource form Moving after Recovery
3534
On node1: Pacemaker Status
[root@postgres-svr1 ~]# pcs status
Cluster name: postgres-cluster
Stack: corosync
Current DC: cluster-node2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Nov 28 16:36:25 2019
Last change: Wed Nov 20 14:40:03 2019 by hacluster via cibadmin on cluster-node1
2 nodes configured
7 resources configured
Online: [ cluster-node1 cluster-node2 ]
Full list of resources:
sbd (stonith:fence_sbd): Started cluster-node1
Resource Group: postgres-svc
vip (ocf::heartbeat:IPaddr2): Started cluster-node2
lvm (ocf::heartbeat:LVM): Started cluster-node2
pg_wal (ocf::heartbeat:Filesystem): Started cluster-node2
archive (ocf::heartbeat:Filesystem): Started cluster-node2
data (ocf::heartbeat:Filesystem): Started cluster-node2
pgsql (ocf::heartbeat:pgsql): Started cluster-node2
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
sbd: active/enabled
7. Pacemaker Status
3635
On node1: Pacemaker Configure
[root@postgres-svr1 ~]# pcs config show
~~~ 생략 ~~~
Resources:
Group: postgres-svc
Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=192.168.0.60
Operations: monitor interval=5s timeout=20s (vip-monitor-interval-10s)
start interval=0s timeout=20s (vip-start-interval-0s)
stop interval=0s timeout=20s (vip-stop-interval-0s)
Resource: pg_wal (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/VG02/pg_wal directory=/pg_wal fstype=xfs
Operations: monitor interval=20 timeout=40 (pg_wal-monitor-interval-20)
notify interval=0s timeout=60 (pg_wal-notify-interval-0s)
start interval=0s timeout=60 (pg_wal-start-interval-0s)
stop interval=0s timeout=60 (pg_wal-stop-interval-0s)
Resource: archive (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/VG01/archive directory=/archive fstype=xfs
Operations: monitor interval=20 timeout=40 (archive-monitor-interval-20)
notify interval=0s timeout=60 (archive-notify-interval-0s)
start interval=0s timeout=60 (archive-start-interval-0s)
stop interval=0s timeout=60 (archive-stop-interval-0s)
Resource: data (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/VG03/data directory=/data fstype=xfs
Operations: monitor interval=20 timeout=40 (data-monitor-interval-20)
notify interval=0s timeout=60 (data-notify-interval-0s)
start interval=0s timeout=60 (data-start-interval-0s)
stop interval=0s timeout=60 (data-stop-interval-0s)
Resource: pgsql (class=ocf provider=heartbeat type=pgsql)
Attributes: config=/data/zabbix/postgresql.conf pgctl=/postgres/10/bin/pg_ctl pgdata=/data/zabbix psql=/postgres/10/bin/psql restart_on_promote=ture
Meta Attrs: migration-threshold=3
Operations: demote interval=0s timeout=120 (pgsql-demote-interval-0s)
methods interval=0s timeout=5 (pgsql-methods-interval-0s)
monitor interval=30 timeout=30 (pgsql-monitor-interval-30)
monitor interval=29 role=Master timeout=30 (pgsql-monitor-interval-29)
notify interval=0s timeout=90 (pgsql-notify-interval-0s)
promote interval=0s timeout=120 (pgsql-promote-interval-0s)
start interval=0s timeout=120 (pgsql-start-interval-0s)
stop interval=0s timeout=120 (pgsql-stop-interval-0s)
~~~ 생략 ~~~~
8. Pacemaker Config Show
37
On Active node: Service Check
9. Check Virtual IP Address
[root@postgres-svr1 ~]# ip addr show | grep secondary
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:28:33:5a brd ff:ff:ff:ff:ff:ff
inet 192.168.0.61/24 brd 192.168.0.255 scope global eth0
valid_lft forever preferred_lft forever
inet 192.168.0.60/24 brd 192.168.0.255 scope global secondary eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe28:335a/64 scope link
valid_lft forever preferred_lft forever
10. Check PostgreSQL Service
11. Check Filesystem mount
[root@postgres-svr1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
~~~ 생략 ~~~
/dev/sdb 100G 65M 100G 1% /postgres
/dev/mapper/VG02-pg_wal 50G 2.1G 48G 5% /pg_wal
/dev/mapper/VG01-archive 50G 33M 50G 1% /archive
/dev/mapper/VG03-data 50G 1.6G 49G 4% /data
[root@postgres-svr1 ~]# ps auxw | grep postgres
postgres 2603 0.0 0.1 1180040 4980 ? S 09:45 0:01 /postgres/10/bin/postgres -D /data/zabbix -c
config_file=/data/zabbix/postgresql.conf
postgres 2604 0.0 0.0 96560 1648 ? Ss 09:45 0:00 postgres: logger process
postgres 2606 0.0 0.0 1180188 2312 ? Ss 09:45 0:00 postgres: checkpointer process
postgres 2607 0.0 0.0 1180180 1820 ? Ss 09:45 0:00 postgres: writer process
postgres 2608 0.0 0.0 1180040 1820 ? Ss 09:45 0:01 postgres: wal writer process
~~~ 생략 ~~~~
36
37
37
Failover Scenario - Primary Node Down
Pacemaker
Resource Group
Clients
Shared Volumes
CentOS CentOS
CentOS
Virtual IP
Apache HTTP Server
Zabbix Server
HW / SW
장애발생
HW / SW
장애발생
Active Standby
Active Standby
CentOS
Pacemaker
Resource Group
Virtual IP
PostgreSQL
Filesystem
FrontEndDatabase
39
General Future - Pacemaker 2.0
38
Previous name Current name 목적
attrd pacemaker-attrd 노드 속성 관리
cib pacemaker-cib 클러스터 정보 관리
crmd pacemaker-controld 클러스터 조정
lrmd pacemaker-execd 자싞 리소스 에이전트 수행
stonithd pacemaker-fenced 노드 펜스 수행
pacemaker_remoted pacemaker-remoted 원격지 리소스 에이전트 수행
pengine pacemaker-schedulerd 스케줄러 수행
Pacemaker
corosync
cpg Quorum
Resource1 Resource2
Pacemaker-execd Pacemaker-fenced
Pacemaker-schedulerd Pacemaker-attrd
Pacemaker-based
Pacemaker-controld
Pacemakerd
NetworkNode 1 Node 2
Pacemaker
corosync
cpg Quorum
Resource1 Resource2
Pacemaker-execd Pacemaker-fenced
Pacemaker-schedulerd Pacemaker-attrd
Pacemaker-based
Pacemaker-controld
Pacemakerd
cpg(Cluster Process Group)
40
General Future - Kronosnet(KNET)
39
 What is Kronosnet?
• https://www.kronosnet.org
• 네트워크 전송 가용성 확보를 위한 라이브러리
• 중복성, 보안, 내결함성 및 빠른 장애 조치가 핵심 요구
사항인 고가용성 사용을 위해 설계된 네트워크 추상화
계층 구조
 Project Features
• 호스트당 최대 8개의 개별 네트워크 연결을 허용하여
클러스터 노드갂 네트워크 통싞에 높은 가용성을 제공
• 통싞이 복구된 후 자동으로 오류 발생 링크를 복구하여
재해 복구 속도를 높이고 다운타임을 단축
• 다중 네트워크 프로토콜 지원(UDP/SCTP)
• 방화벽을 통과하여 Subnet 전반에서 동작
Pacemaker
Corosync
totempg
totemsrp
totemnet
totemknet
libknet
NIC 1 NIC 2
At the end of….
여러분이 생각하는 고가용성이란 무엇인가요?
또한, 어떻게 생각하고 계셨나요?
어떠한 솔루션도 99.9% 고가용성을 보장해줄수는 없습니다.
또한, Best Practice 란 없습니다.
오로지 Test! Test! TEST!!!
40
REFERENCE
All *open-source* in the whole stack. Go, googling, …
Configuring the Red Hat High Availability Add-On with Pacemaker
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html-single/Configuring_the_Red_Hat_High_Availability_Add-
On_with_Pacemaker/index.html
SAP on Red Hat Technical Documents
http://www.redhat.com/f/pdf/ha-sap-v1-6-4.pdf
Red Hat Reference Architecture Series
http://www.sistina.com/rhel/resource_center/reference_architecture.html
Clusterlabs
http://clusterlabs.org/doc/
http://blog.clusterlabs.org/
OpenStack HA
http://www.slideshare.net/kenhui65/openstack-ha
High Availability on Linux - the SUSE way
https://tuna.moe/assets/slides/SFD2015/SUSE_HA_arch_overview.pdf
Github ClusterLabs (Booth)
https://github.com/ClusterLabs/booth
41
43
Dong hyun Kim | Manager
Opensource Buisness Team, kt ds
kim.donghyun@kt.com

Weitere ähnliche Inhalte

Was ist angesagt?

가상화 기술과 컨테이너 기술의 차이점과 기대 효과
가상화 기술과 컨테이너 기술의 차이점과 기대 효과가상화 기술과 컨테이너 기술의 차이점과 기대 효과
가상화 기술과 컨테이너 기술의 차이점과 기대 효과Opennaru, inc.
 
CI / CD ( 지속적인 통합 / 지속적인 전달 ) 발표 자료 다운로드
CI / CD ( 지속적인 통합 / 지속적인 전달 ) 발표 자료 다운로드CI / CD ( 지속적인 통합 / 지속적인 전달 ) 발표 자료 다운로드
CI / CD ( 지속적인 통합 / 지속적인 전달 ) 발표 자료 다운로드Opennaru, inc.
 
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)충섭 김
 
왜 컨테이너인가? - OpenShift 구축 사례와 컨테이너로 환경 전환 시 고려사항
왜 컨테이너인가? - OpenShift 구축 사례와 컨테이너로 환경 전환 시 고려사항왜 컨테이너인가? - OpenShift 구축 사례와 컨테이너로 환경 전환 시 고려사항
왜 컨테이너인가? - OpenShift 구축 사례와 컨테이너로 환경 전환 시 고려사항rockplace
 
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요Jo Hoon
 
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항Ji-Woong Choi
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법Open Source Consulting
 
Cloud, sdn and nfv 기술동향 atto-research-박문기-20171016
Cloud, sdn and nfv 기술동향 atto-research-박문기-20171016Cloud, sdn and nfv 기술동향 atto-research-박문기-20171016
Cloud, sdn and nfv 기술동향 atto-research-박문기-20171016문기 박
 
[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험NHN FORWARD
 
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/NeutronOverview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/Neutronvivekkonnect
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In DeepMydbops
 
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트) 마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트) Amazon Web Services Korea
 
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법Open Source Consulting
 
잘 키운 모노리스 하나 열 마이크로서비스 안 부럽다
잘 키운 모노리스 하나 열 마이크로서비스 안 부럽다잘 키운 모노리스 하나 열 마이크로서비스 안 부럽다
잘 키운 모노리스 하나 열 마이크로서비스 안 부럽다Arawn Park
 
Introduction to Nexus Repository Manager.pdf
Introduction to Nexus Repository Manager.pdfIntroduction to Nexus Repository Manager.pdf
Introduction to Nexus Repository Manager.pdfKnoldus Inc.
 
우아한 모노리스
우아한 모노리스우아한 모노리스
우아한 모노리스Arawn Park
 
SDN입문 (Overlay and Underlay)
SDN입문 (Overlay and Underlay)SDN입문 (Overlay and Underlay)
SDN입문 (Overlay and Underlay)NAIM Networks, Inc.
 
컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항
컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항
컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항Opennaru, inc.
 
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개OpenStack Korea Community
 

Was ist angesagt? (20)

Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
가상화 기술과 컨테이너 기술의 차이점과 기대 효과
가상화 기술과 컨테이너 기술의 차이점과 기대 효과가상화 기술과 컨테이너 기술의 차이점과 기대 효과
가상화 기술과 컨테이너 기술의 차이점과 기대 효과
 
CI / CD ( 지속적인 통합 / 지속적인 전달 ) 발표 자료 다운로드
CI / CD ( 지속적인 통합 / 지속적인 전달 ) 발표 자료 다운로드CI / CD ( 지속적인 통합 / 지속적인 전달 ) 발표 자료 다운로드
CI / CD ( 지속적인 통합 / 지속적인 전달 ) 발표 자료 다운로드
 
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
쿠버네티스를 이용한 기능 브랜치별 테스트 서버 만들기 (GitOps CI/CD)
 
왜 컨테이너인가? - OpenShift 구축 사례와 컨테이너로 환경 전환 시 고려사항
왜 컨테이너인가? - OpenShift 구축 사례와 컨테이너로 환경 전환 시 고려사항왜 컨테이너인가? - OpenShift 구축 사례와 컨테이너로 환경 전환 시 고려사항
왜 컨테이너인가? - OpenShift 구축 사례와 컨테이너로 환경 전환 시 고려사항
 
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
 
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
 
Cloud, sdn and nfv 기술동향 atto-research-박문기-20171016
Cloud, sdn and nfv 기술동향 atto-research-박문기-20171016Cloud, sdn and nfv 기술동향 atto-research-박문기-20171016
Cloud, sdn and nfv 기술동향 atto-research-박문기-20171016
 
[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험
 
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/NeutronOverview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
 
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트) 마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
 
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
 
잘 키운 모노리스 하나 열 마이크로서비스 안 부럽다
잘 키운 모노리스 하나 열 마이크로서비스 안 부럽다잘 키운 모노리스 하나 열 마이크로서비스 안 부럽다
잘 키운 모노리스 하나 열 마이크로서비스 안 부럽다
 
Introduction to Nexus Repository Manager.pdf
Introduction to Nexus Repository Manager.pdfIntroduction to Nexus Repository Manager.pdf
Introduction to Nexus Repository Manager.pdf
 
우아한 모노리스
우아한 모노리스우아한 모노리스
우아한 모노리스
 
SDN입문 (Overlay and Underlay)
SDN입문 (Overlay and Underlay)SDN입문 (Overlay and Underlay)
SDN입문 (Overlay and Underlay)
 
컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항
컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항
컨테이너 (PaaS) 환경으로의 애플리케이션 전환 방법과 고려사항
 
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
 

Ähnlich wie [발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)

Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Roger Zhou 周志强
 
2.1 Red_Hat_Cluster1.ppt
2.1 Red_Hat_Cluster1.ppt2.1 Red_Hat_Cluster1.ppt
2.1 Red_Hat_Cluster1.pptManoj603126
 
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System zShawn Wells
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfhik_lhz
 
Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket LinxiaofengMichael Zhang
 
RHCE (RED HAT CERTIFIED ENGINEERING)
RHCE (RED HAT CERTIFIED ENGINEERING)RHCE (RED HAT CERTIFIED ENGINEERING)
RHCE (RED HAT CERTIFIED ENGINEERING)Sumant Garg
 
Using SoC Vendor HALs in the Zephyr Project - SFO17-112
Using SoC Vendor HALs in the Zephyr Project - SFO17-112Using SoC Vendor HALs in the Zephyr Project - SFO17-112
Using SoC Vendor HALs in the Zephyr Project - SFO17-112Linaro
 
Open stack ha design & deployment kilo
Open stack ha design & deployment   kiloOpen stack ha design & deployment   kilo
Open stack ha design & deployment kiloSteven Li
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HAtcp cloud
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High AvailabilityJakub Pavlik
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetesTed Jung
 
2010-01-28 NSA Open Source User Group Meeting, Current & Future Linux on Syst...
2010-01-28 NSA Open Source User Group Meeting, Current & Future Linux on Syst...2010-01-28 NSA Open Source User Group Meeting, Current & Future Linux on Syst...
2010-01-28 NSA Open Source User Group Meeting, Current & Future Linux on Syst...Shawn Wells
 
Cloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarCloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarArun Kumar
 
Cloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarCloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarArun Kumar
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBDDan Frincu
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container EcosystemVinay Rao
 
Linux sever building
Linux sever buildingLinux sever building
Linux sever buildingEdmond Yu
 

Ähnlich wie [발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community) (20)

Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015
 
2.1 Red_Hat_Cluster1.ppt
2.1 Red_Hat_Cluster1.ppt2.1 Red_Hat_Cluster1.ppt
2.1 Red_Hat_Cluster1.ppt
 
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmf
 
Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket Linxiaofeng
 
Building a Router
Building a RouterBuilding a Router
Building a Router
 
RHCE (RED HAT CERTIFIED ENGINEERING)
RHCE (RED HAT CERTIFIED ENGINEERING)RHCE (RED HAT CERTIFIED ENGINEERING)
RHCE (RED HAT CERTIFIED ENGINEERING)
 
Rac on NFS
Rac on NFSRac on NFS
Rac on NFS
 
Using SoC Vendor HALs in the Zephyr Project - SFO17-112
Using SoC Vendor HALs in the Zephyr Project - SFO17-112Using SoC Vendor HALs in the Zephyr Project - SFO17-112
Using SoC Vendor HALs in the Zephyr Project - SFO17-112
 
Rhel7 vs rhel6
Rhel7 vs rhel6Rhel7 vs rhel6
Rhel7 vs rhel6
 
Open stack ha design & deployment kilo
Open stack ha design & deployment   kiloOpen stack ha design & deployment   kilo
Open stack ha design & deployment kilo
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High Availability
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
 
2010-01-28 NSA Open Source User Group Meeting, Current & Future Linux on Syst...
2010-01-28 NSA Open Source User Group Meeting, Current & Future Linux on Syst...2010-01-28 NSA Open Source User Group Meeting, Current & Future Linux on Syst...
2010-01-28 NSA Open Source User Group Meeting, Current & Future Linux on Syst...
 
Cloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarCloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumar
 
Cloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarCloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumar
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 
Linux sever building
Linux sever buildingLinux sever building
Linux sever building
 

Kürzlich hochgeladen

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Kürzlich hochgeladen (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)

  • 1. 오픈소스 Pacemaker 홗용한 Zabbix 이중화 방안 Dong hyun Kim Opensource Business Team Enterprise Linux Senior Enginner kimdonghyun0916@gmail.com Korea Community
  • 2. # Whoami  Systems and Infrastructure Geek  Enterprise Linux Infrastructure Engineer (Red Hat)  Work • Technology Research : New Technology Research (Container, Openstack, Etc) • Technical Support : Troubleshooting, Debugging, Performace Tuning • Consulting : IT Downsizing, Infra Optimization  I love linux ♥ • Linux Blog: http://rhlinux.tistory.com/ • Red Hat Linux Engineer Group: http://cafe.naver.com/iamstrong • ClusterLabs Korea(Pacemaker): https://www.facebook.com/groups/clusterlabskorea 1
  • 3. 2 Pacemaker‟s Story - The Open Source, High Availability Cluster Overview of HA architectural components USE CASE EXAMPLE General futures ☞ ☞ ☞ ☞ # In this Session
  • 4. 3 The Open Source, High Availability Cluster
  • 5. 4 HA for OpenSource Technology
  • 6.  Pacemaker is : • LINUX PLATFORM을 위한 High-Availability와 Load-Balancing Stack 제공 • Python-based Unified, scriptable, cluster shell  클러스터 리소스 정책을 사용자가 직접 결정 : • Resource Agents 설정을 만들고 지우고 변경하는 것에 대한 자유로움 • 다양한 산업(공공, 증권/금융 등)군에서 사용 어플리케이션에서 요구하는 HA조건을 대체로 만족 • 리소스형태 fence agents 설정관리 용이 STONITH(Shoot The Other Node In The Head)  Monitor and Control Resource : • SystemD / LSB / OCF Services • Cloned Services : N+1, N+M, N nodes • Multi-state (Master/Slave, Primary/Secondary) 5 What is Pacemaker?
  • 7. High-Availability in the Open Source Ecosystem 6 • 2003, SUSE's Lars Marowsky-Brée conceived of a new project called the "crm" • 2009s, “Corosync” 새로운 Project 발표 • 2010s, Pacemaker version 1.1(Red Hat) • 2010s, Pacemaker added support for cman • 2010s. Heartbeat project reached version 3 • 2018s. Pacemaker v2.0 Release(ClusterLabs) • 2019s, Pacemaker version 2.0(Red Hat) • 2008, Pacemaker version 0.6.0 was release - support for OpenAIS • 2007, Pacemaker (Heartbeat v 2.1.3) - Heartbeat package called "Pacemaker“ • 2002s, REDHAT "Red Hat Cluster Manager" Version 1 • 1990s, 오픈소스 고가용성 플랫폼을 만들고자 완전히 독립된 두 회사의 시도는 1990년대 후반부터 시작 - SUSE's "Linux HA" project - Red Hat's “Cluster Services" • 1998s, "Heartbeat“ 불리우는 새로운 프로토콜 'Linux-HA'프로젝트,이후 heartbeat v1.0 발표 • Global Vendors 갂 기술 협약을 통해 적용범위 확대 • 오늘날, Clusterlabs는 Heartbeat Project 에서생성된 Component들과 다른 솔루션형태로 빠르게 통합 및 변화 • 2004s, Cluster Summit에 Novell와 Red Hat developers 함께 참석 • 2005s, "Heartbeat version 2“released(Linux-HA)
  • 8. 7 Linux-HA / ClusterLabsSUSE Enterprise Linux Red Hat Enterprise Linux Pacemaker-mgmt Hwak (GUI) booth crmsh (CLI) PacemakerPacemaker resource-agents Heartbeatcorosync cluster-glue Community Developer Novell Developer Red Hat Developer OpenSource Project Progress fence-agents PCSD (GUI) Pacemaker corosync Resources Layer ResourceAllocation Layer Messaging/ InfrastructureLayer PCS (CLI) Upstream Release UpstreamRelease booth
  • 10. 9 Resource Agents - Agent Scripts - Open Cluster Framework Resource Agents Pacemaker - Resource Management LRMd Stonith CRMd CIB PEngine Corosync - Membership - Messaging - Quorum Cluster Abstraction Layer Corosync Pacemaker - Architecture Component
  • 11. 10 Pacemaker - High level architecture Messaging / Infrastructure Layer Resource Allocation Layer Resources Layer XML XML Pacemaker Node #1 Corosync Cluster Resource Manager CRM Corosync Services Local Resource Manager LRM Policy Engine Cluster Information Base CIB (복제) Resource Agents RAs Pacemaker Node #2
  • 12. 11 Quick Overview of Components - CRMd  CRMd(Cluster Resource Management daemon) • main controlling process 역할 담당 • 모든 리소스 작업을 라우팅해주는 데몬 • Resource Allocation Layer내에서 수행되는 모든 동작 처리 • Cluster Information Base (CIB) 관리 • CRMd에 의해 관리된 리소스는 필요에 따라 클라이언트 시스템에 전달, 쿼리되거나 인스턴스화하여 변경 RA Resource Layer Messaging/Infrastructure Layer Corosync RA RA CIB (XML) CRM Resource Allocation Layer PELRM STONITH
  • 13. Quick Overview of Components - CIB  CIB (Cluster Information Base) • 설정 정보 관리 데몬. XML파일로 설정 (In-memory data) • DC(Designated Co-ordinator)에 의해 제공되는 각 노드별 설정내용 및 상태 정보를 동기화 • CIB 은 cibadmin 명령어를 사용하여 변경할수 있고, crm shell 또는 pcs utility 사용 RA Resource Layer Messaging/Infrastructure Layer Corosync RA RA CIB (XML) CRM Resource Allocation Layer PELRM 12 STONITH
  • 14. Quick Overview of Components - PEngine RA Resource Layer Messaging/Infrastructure Layer Corosync RA RA CIB (XML) CRM Resource Allocation Layer PELRM  PEngine (PE or Policy Engine) • 현재 클러스터 상태 및 구성을 기반으로 다음 상태를 결정 • PE프로세스는 각 노드에서 실행되지만, DC[1]에서만 홗성화 • 여러 서비스홖경에 따라 Clone 및 domain 등 사용자 요구에 따라 정책 부여 • 다른 클러스터 노드로 리소스 전홖시 의졲성 확인 13 STONITH [1] DC(Deginated Controller): 클러스터 메시징 인프라를 통해 다른 노드의 로컬 리소스 관리 데몬(LRMd) 또는 CRMd peer로 전달하여 필요한 순서로 PE의 instructions 수행
  • 15. Quick Overview of Components - LRMd RA Resource Layer Messaging/Infrastructure Layer Corosync RA RA CIB (XML) CRM Resource Allocation Layer PELRM  LRMd (Local Resource Management Daemon) • CRMd와 각 리소스 사이에 인터페이스 역할을 수행하며, CRMd의 명령을 agent에 전달 • CRM을 대싞하여 자기 자싞의 RAs(Resource Agents) 호출 • CRM수행되어 보고된 결과에 따라 start / stop / monitor를 동작 14 STONITH
  • 16. Quick Overview of Components - RAs RA Resource Layer Messaging/Infrastructure Layer Corosync RA RA CIB (XML) CRM Resource Allocation Layer PELRM  RAs (Resource Agents) • 클러스터리소스를 위해 정의된 규격화된 인터페이스 • local resource의 start / stops / monitors 스크립트 제공 • RAs(Resource Agents)는 LRM에 의해 호출 • 수많은 Contributer들이 여러 Application홖경에 적용될수 있도록 github 통해 배포  Pacemaker제공 RA 지원 타입 3가지: • LSB : Linux Standard Base “init scripts” • OCF : Open Cluster Framework - /usr/lib/ocf/resource.d/heartbeat - /usr/lib/ocf/resource.d/pacemaker • Stonith Resource Agents http://linux-ha.org/wiki/OCF_Resource_Agent http://linux-ha.org/wiki/LSB_Resource_Agents https://github.com/ClusterLabs/resource-agents 15 STONITH
  • 17. Quick Overview of Components - STONITHD RA Resource Layer Messaging/Infrastructure Layer Corosync RA RA CIB (XML) CRM Resource Allocation Layer PELRM  STONITHD “Shoot The Other Node In The Head Daemon” • fence node에서 사용되는 서비스 데몬 • Application-level fencing 설정 가능  실무에서 가장 많이 사용되는 fence device: • Power fencing: HP iLO, Dell DRAC, IBM IMM, IPMI Appliance 등 • I/O fence agents: Fibre Channel Switch fencing, 소프트웨어 기반의 SBD (SUSE진영 가장 많이 사용) • Listing Fence Device : # ccs -h <host> --lsfenceopts  Data integrity (데이터 무결성)을 위해 반드시 필요 • 클러스터내 다른 노드로 리소스를 전홖하기 위한 가장 최상의 방법 • “Enterprise”을 지향하는 Linux HA Cluster에서는 선택이 아닌 필수 16 STONITH
  • 18. 17 What is fencing? „Planned or Unplanned‟ 시스템 다운타임으로 부터 데이타보호하고 예방하기 위한 장치 Kernel panic System freeze Live hang / recovery
  • 19. Quick Overview of Components - Corosync RA Resource Layer Messaging/Infrastructure Layer Corosync RA RA CIB (XML) STONITH CRM Resource Allocation Layer PELRM  Corosync • Pacemaker 작동에 필요한 기본 클러스터 인프라 • 일반적인 클러스터, 클라우드컴퓨팅 그리고 고가용성 홖경에서 사용되는 오픈소스 그룹 메시징시스템.  Communication Layer : messaging and membership • Totem single-ring ordering and membership protocol • 기본적인 제약 조건 : 브로드캐스트를 통한 멀티캐스트 통싞 방식을 선호 • UDP/IP and InfiniBand 기반의 networks 통싞 • UDPU (CentOS 6.2+ 이상부터 지원)  클러스터 파일시스템 지원 (GFS2, OCFS2, cLVM2 등) 18
  • 20. 19 Quick Overview of Components - User Interface  고가용성 관리 • Pacemaker 설정 시스템은 통일된 클러스터 설정 및 관리 도구로 제공 • crm shell : Cluster Resource Manager Shell (SLES) • pcs : Pacemaker Cluster System (Red Hat)  이점 • 클러스터를 쉽게 부트스트랩할 수 있어 첫 번째 클러스터를 시작 및 실행 가능 • 리소스 및 상호 관렦성을 추가, 삭제 및 변경 • 클러스터 세부 옵션을 온라인으로 설정 및 확인 pcsd web UI (Red Hat) Hawk web UI (SLES)
  • 21. 20 USE CASE EXAMPLE ZABBIX 고가용성을 위한 이중화 솔루션 도입 요구사항 인프라 표준 TA 영역으로 안정성이 검증된 System SW 중·장기적 도입 비용 효율성 고려 자체 기술력이 확보된 오픈소스 기반 이중화 솔루션 다양한 운영 홖경에 적용 가능 여부
  • 22. 21 Architecture Degine Model Shared Volumes Database WEB Active Standby Active Virtual IP Virtual IP Network FrontEnd Database CLIENTS INFRA • Virtual Machine • Cloud(Instance, Container) • Dedicated(Lagecy) • CentOS 7 Update 4 • Pacemaker 1.1 • PostgreSQL 10 Standby • CentOS 7 Update 4 • Pacemaker 1.1 • Apache HTTP Server 2.4 • Zabbix 4 • Windows, Linux • Unix, Appliance, Etc… • Network(Router) Appliance DMZPRIVATE
  • 23. 22 Detail : Zabbix Server HA CentOS 7.4 Pacemaker 1.1 VIP Apache 2.4 Zabbix Server 4 CentOS 7.4 Pacemaker 1.1 VIP Apache 2.4 Zabbix Server 4 • Hostname : zabbix-svr02 • Clustername : cluster-node2 • fence device : fence_sbd 192.168.0.51 192.168.0.52 • Hostname : zabbix-svr01 • Clustername : cluster-node1 • fence device : fence_sbd Virtual IP : 192.168.0.50 iSCSI 공유 스토리지 SBD 용도의 공유 디바이스 SCSI장치(1G 이상) Pacemaker 22 SBDSBD Active Standby
  • 24. 23 On All nodes: Install HA Component 23 [root@zabbix-svr01 ~]# yum -y install pcs pacemaker fence-agents-all sbd watchdog ~~~~ 생략 ~~~ ---> Package fence-agents-vmware-soap.x86_64 0:4.2.1-11.el7 will be installed --> Processing Dependency: python-suds for package: fence-agents-vmware-soap-4.2.1-11.el7.x86_64 ---> Package fence-agents-wti.x86_64 0:4.2.1-11.el7 will be installed ---> Package fence-virt.x86_64 0:0.3.2-13.el7 will be installed --> Processing Dependency: firewalld-filesystem for package: fence-virt-0.3.2-13.el7.x86_64 ---> Package gnutls.x86_64 0:3.3.29-8.el7 will be installed --> Processing Dependency: trousers >= 0.3.11.2 for package: gnutls-3.3.29-8.el7.x86_64 --> Processing Dependency: libnettle.so.4()(64bit) for package: gnutls-3.3.29-8.el7.x86_64 ~~~~ 생략 ~~~~ [root@zabbix-svr01 ~]# rpm -qa | egrep -i '^pcs|^pacemaker|^fence-agents-all|^corosync|^sbd|^watchdog„ pcs-0.9.162-5.el7.x86_64 pacemaker-1.1.18-11.el7.x86_64 corosync-2.4.3-2.el7.x86_64 sbd-1.3.1-7.el7.x86_64 Watchdog-5.13-11.el7.x86_64 [root@zabbix-svr01 ~]# echo <Cluster Password> | passwd --stin hacluster 3. Create user for Cluster 2. Check Pacemaker Package 1. Installation Pacemaker Package
  • 25. 24 On node1: Cluster Setup 24 [root@zabbix-svr01 ~]# pcs cluster auth cluster-node1 cluster-node2 hacluster : hacluster Password : <Cluster_Password> cluster-node1 : Authorized cluster-node2 : Authorized 5. Authentificate cluster nodes: [root@zabbix-svr01 ~]# pcs cluster setup --name zabbix-cluster cluster-node1 cluster-node2 6. Create Zabbix-Cluster [root@zabbix-svr01 ~]# systemctl start pcsd; systemctl enable pcsd Created symlink from /etc/system/system/multi-user.target.wants/pcsd.service to /usr/lib/system/pcsd.service 4. Start PCS Deamon(Pacemaker Cluster Service) [root@zabbix-svr01 ~]# pcs cluster start --all cluster-node1: Starting Cluster… cluster-node2: Starting Cluster… 7. Start Cluster all nodes
  • 26. 25 On node1: Cluster Setup 25 [root@zabbix-svr01 ~]# pcs cluster enable --all 8. Enable Cluster Service on all nodes [root@zabbix-svr01 ~]# pcs status Cluster name: zabbix-cluster Stack: corosync Current DC: cluster-node2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Thu Nov 28 13:38:00 2019 Last change: Mon Nov 25 19:00:27 2019 by hacluster via crmd on cluster-node1 2 nodes configured Online: [ cluster-node1 cluster-node2 ] Full list of resources: Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled 9. Check Cluster Status
  • 27. 26 On node1: Fencing Device 26 [root@zabbix-svr01 ~]# lsscsi [0:0:0:0] cd/dvd QEMU QEMU DVD-ROM 1.5. /dev/sr0 [5:0:0:0] disk LIO-ORG lun15 4.0 /dev/sda 9. Check Disk Block for fencing device(SBD, Storage-Based Death) [root@zabbix-cluster1 by-id]# sbd -d /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780 list 0 cluster-node1 clear 1 cluster-node2 clear [root@zabbix-svr01 ~]# sbd -d /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780 create [root@zabbix-svr01 ~]# sbd -d /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780 dump ==Dumping header on disk /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780 ~~~ 생략 ~~~ Timeout (watchdog) : 5 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 10 ==Header on disk /dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780 is dumped 10. Create SBD Device [root@zabbix-svr01 ~]# cd /dev/disk/by-id/ [root@zabbix-svr01 by-id]# ll total 0 ~~~ 생략 ~~~ lrwxrwxrwx 1 root root 9 Nov 28 13:44 scsi-3600140595dea00f1f1d492499f682780 -> ../../sda lrwxrwxrwx 1 root root 9 Nov 28 13:44 wwn-0x600140595dea00f1f1d492499f682780 -> ../../sda
  • 28. 27 On node1: Fencing Device 27 [root@zabbix-svr01 ~]# cat /etc/modules-load.d/softdog.conf [root@zabbix-svr01 ~]# modprobe -v softdog [root@zabbix-svr01 ~]# lsmod | grep softdog [root@zabbix-svr01 ~]# ls -al /dev/ | grep -i watchdog0 crw-------- 1 root root 10, 130 Dec 2 10:06 watchdog crw-------- 1 root root 252, 0 Dec 2 10:06 watchdog0 11. Insmod Watchdog [root@zabbix-svr01 ~]# pcs stonith sbd device setup --device=/dev/disk/by-id/scsi- 3600140595dea00f1f1d492499f682780 [root@zabbix-svr01 ~]# pcs stonith sbd enable --device=/dev/disk/by-id/scsi- 3600140595dea00f1f1d492499f682780 --watchdog=/dev/watchdog0@cluster-node1 -- watchdog=/dev/watchdog0@cluster-node2 SBD_WATCHDOG_TIMEOUT=10 [root@zabbix-svr01 ~]# pcs property set stonith-watchdog-timeout=10 12. Configure fence sbd
  • 29. 28 On node1: Pacemaker Resource Setup 28 [root@zabbix-svr01 ~]# pcs stonith create sbd fence_sbd devices=/dev/disk/by-id/scsi- 3600140595dea00f1f1d492499f682780 11. Create fence_sbd [root@zabbix-svr01 ~]# pcs resource create httpd systemd:httpd op monitor inerval=10s --group zabbix-svc 13. Add Apache HTTP server on group zabbix-svc [root@zabbix-svr01 ~]# pcs resource create zabbix-server systemd:zabbix-server op monitor inerval=10s -- group zabbix-svc 14. Control zabbix-server daemon [root@zabbix-svr01 ~]# pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.0.50 op monitor inerval=5s --group zabbix-svc 15. VIP for zabbix Server APP [root@zabbix-svr01 ~]# pcs resource defaults resource-stickiness=100 12. Prevent Resource form Moving after Recovery
  • 30. 29 On node1: Pacemaker Status 29 [root@zabbix-svr01 ~]# pcs status Cluster name: zabbix-cluster Stack: corosync Current DC: cluster-node1 (version 1.1.18-11.el7_4.3-2b07d5c5a9) - partition with quorum Last updated: Thu Nov 28 13:38:00 2019 Last change: Mon Nov 25 19:00:27 2019 by hacluster via crmd on cluster-node1 2 nodes configured 4 resources configured Online: [ cluster-node1 cluster-node2 ] Full list of resources: sbd (stonith:fence_sbd): Started cluster-node1 vip (ocf::heartbeat:IPaddr2): Started cluster-node1 Resource Group: zabbix-svc httpd (systemd:httpd): Started cluster-node1 zabbix-server (systemd:zabbix-server): Started cluster-node1 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled sbd: active/enabled 16. Pacemaker Status
  • 31. 30 On node1: Pacemaker Configure 30 [root@zabbix-svr01 ~]# pcs config show --all Cluster Name: zabbix-cluster Corosync Nodes: cluster-node1 cluster-node2 Pacemaker Nodes: cluster-node1 cluster-node2 Resources: Resource: vip (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.0.50 Operations: monitor interval=5s timeout=20s (vip-monitor-interval-10s) start interval=0s timeout=20s (vip-start-interval-0s) stop interval=0s timeout=20s (vip-stop-interval-0s) Group: zabbix-svc Resource: httpd (class=systemd type=httpd) Operations: monitor interval=10 timeout=100 (web-monitor-interval-60) start interval=0s timeout=100 (web-start-interval-0s) stop interval=0s timeout=100 (web-stop-interval-0s) Resource: zabbix-server (class=systemd type=zabbix-server) Operations: monitor interval=10 timeout=100 (zabbix-server-monitor-interval-60) start interval=0s timeout=100 (zabbix-server-start-interval-0s) stop interval=0s timeout=100 (zabbix-server-stop-interval-0s) Stonith Devices: Resource: sbd (class=stonith type=fence_sbd) Attributes: devices=/dev/disk/by-id/scsi-3600140595dea00f1f1d492499f682780 Operations: monitor interval=60s (sbd-monitor-interval-60s) Fencing Levels: ~~~ 생략 ~~~~ 17. Pacemaker Config Show
  • 32. 31 On Active node: Service Check 18. Check Virtual IP Address [root@postgres-cluster2 ~]# ip addr show | grep secondary 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:28:33:5a brd ff:ff:ff:ff:ff:ff inet 192.168.0.51/24 brd 192.168.0.255 scope global eth0 valid_lft forever preferred_lft forever inet 192.168.0.50/24 brd 192.168.0.255 scope global secondary eth0 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe28:335a/64 scope link valid_lft forever preferred_lft forever 19. Check HTTP Server Service [root@zabbix-cluster2 ~]# ps auxw | grep apache apache 1388 0.0 0.1 379952 7348 ? S 13:37 0:00 /usr/sbin/httpd -DFOREGROUND apache 1389 0.0 0.1 379952 7348 ? S 13:37 0:00 /usr/sbin/httpd -DFOREGROUND apache 1390 0.0 0.1 379952 7348 ? S 13:37 0:00 /usr/sbin/httpd -DFOREGROUND apache 1391 0.0 0.1 379952 7348 ? S 13:37 0:00 /usr/sbin/httpd -DFOREGROUND apache 1392 0.0 0.1 379952 7348 ? S 13:37 0:00 /usr/sbin/httpd -DFOREGROUND root 18556 0.0 0.0 112648 920 pts/0 R+ 16:36 0:00 grep --color=auto apache 20. Check Zabbix AP Service [root@zabbix-cluster2 ~]# ps auxw | grep zabbix zabbix 1410 0.0 0.0 178004 3520 ? S 13:37 0:00 /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf zabbix 1411 0.0 0.0 178052 2928 ? S 13:37 0:00 /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.024625 sec, idle 60 sec] zabbix 1412 0.0 0.0 178004 2264 ? S 13:37 0:00 /usr/sbin/zabbix_server: alerter #1 started zabbix 1413 0.0 0.0 178004 2264 ? S 13:37 0:00 /usr/sbin/zabbix_server: alerter #2 started zabbix 1414 0.0 0.0 178004 2264 ? S 13:37 0:00 /usr/sbin/zabbix_server: alerter #3 started ~~~ 생략 ~~~ 31
  • 33. 32 Detail : Database Server HA CentOS 7.4 Pacemaker 1.1 VIP PostgreSQL 10 XFS Filesystem CentOS 7.4 Pacemaker 1.1 • Hostname : postgres-svr01 • Clustername : cluster-node2 • fence device : fence_sbd 192.168.0.61 192.168.0.62 • Hostname : postgres-svr01 • Clustername : cluster-node1 • fence device : fence_sbd Virtual IP : 192.168.0.60 SBD 용도의 공유 디바이스 SCSI장치(1G 이상) SBD iSCSI 공유 스토리지 iSCSI 공유 스토리지 PostgreSQL 영역 (Archive / Log / Data) LVM VIP PostgreSQL 10 XFS Filesystem LVM 32 Pacemaker SBD Active Standby
  • 34. On node1: PostgreSQL install and Setup(19~23Page 동일) 33 [root@postgres-svr1 ~]# cat /data/zabbix/postgresql.conf listen_addresses = '*' log_destination = 'stderr' logging_collector = on wal_level = logical archive_mode = on #archive_command = 'dd conv=fdatasync bs=256k if=%p of=/archive/temp/%f && mv -vf /archive/temp/%f /archive/zabbix' archive_command = 'true' log_min_duration_statement = 2000 log_line_prefix = '%t %u@%r/%d (%p) ' log_statement = 'ddl' shared_preload_libraries = '$libdir/pg_stat_statements,$libdir/auto_explain,$libdir/passwordcheck' track_functions = all track_activity_query_size = 65536 pg_stat_statements.max = 10000 pg_stat_statements.track = all auto_explain.log_min_duration = '5min' shared_buffers = 1011406kB autovacuum_max_workers = 5 max_replication_slots = 3 hot_standby = on max_wal_senders = 2 max_wal_size = 2GB min_wal_size = 2GB log_temp_files = 1024kB max_connections = 200 maintenance_work_mem = 512MB checkpoint_completion_target = 0.9 temp_file_limit = 100GB autovacuum_work_mem = 287MB 1. ZABBIX Specific setting and tuning
  • 35. 33 On node1: Pacemaker Resource Setup [root@postgres-svr1 ~]# pcs resource create lvm LVM volgrpname=VG01 exclusive=true --group postgres- svc [root@postgres-svr1 ~]# pcs resource create archive Filesystem device=“/dev/VG01/archive” directory=“/archive” fstype=“xfs” options=“noatime,nodiratime,nobarrier” op monitor inerval=20s --group postgres-svc [root@postgres-svr1 ~]# pcs resource create pg_wal Filesystem device=“/dev/VG01/pg_wal” directory=“/pg_wal” fstype=“xfs” options=“noatime,nodiratime,nobarrier” op monitor inerval=20s --group postgres-svc [root@postgres-svr1 ~]# pcs resource create data Filesystem device=“/dev/VG01/data” directory=“/data” fstype=“xfs” options=“noatime,nodiratime,nobarrier” op monitor inerval=20s --group postgres-svc 4. Add LVM/XFS Filesystem on group postgres-svc [root@postgres-svr1 ~]# pcs resource create pgsql ocf:heartbeat:pgsql bindir=/postgres/10/bin pgdata=/data/zabbix/postgresql.conf op monitor interval=10s --group postgres-svc 5. Control PostgreSQL Service [root@postgres-svr1 ~]# pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.0.60 op monitor inerval=5s --group postgres-svc 6. VIP for PostgreSQL Database [root@postgres-svr1 ~]# pcs stonith create sbd fence_sbd devices=/dev/disk/by-id/scsi- 3600140595dea00f1f1d492499f682780 2. Create fence_sbd [root@postgres-svr1 ~]# pcs resource defaults resource-stickiness=100 3. Prevent Resource form Moving after Recovery
  • 36. 3534 On node1: Pacemaker Status [root@postgres-svr1 ~]# pcs status Cluster name: postgres-cluster Stack: corosync Current DC: cluster-node2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Thu Nov 28 16:36:25 2019 Last change: Wed Nov 20 14:40:03 2019 by hacluster via cibadmin on cluster-node1 2 nodes configured 7 resources configured Online: [ cluster-node1 cluster-node2 ] Full list of resources: sbd (stonith:fence_sbd): Started cluster-node1 Resource Group: postgres-svc vip (ocf::heartbeat:IPaddr2): Started cluster-node2 lvm (ocf::heartbeat:LVM): Started cluster-node2 pg_wal (ocf::heartbeat:Filesystem): Started cluster-node2 archive (ocf::heartbeat:Filesystem): Started cluster-node2 data (ocf::heartbeat:Filesystem): Started cluster-node2 pgsql (ocf::heartbeat:pgsql): Started cluster-node2 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled sbd: active/enabled 7. Pacemaker Status
  • 37. 3635 On node1: Pacemaker Configure [root@postgres-svr1 ~]# pcs config show ~~~ 생략 ~~~ Resources: Group: postgres-svc Resource: vip (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.0.60 Operations: monitor interval=5s timeout=20s (vip-monitor-interval-10s) start interval=0s timeout=20s (vip-start-interval-0s) stop interval=0s timeout=20s (vip-stop-interval-0s) Resource: pg_wal (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/VG02/pg_wal directory=/pg_wal fstype=xfs Operations: monitor interval=20 timeout=40 (pg_wal-monitor-interval-20) notify interval=0s timeout=60 (pg_wal-notify-interval-0s) start interval=0s timeout=60 (pg_wal-start-interval-0s) stop interval=0s timeout=60 (pg_wal-stop-interval-0s) Resource: archive (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/VG01/archive directory=/archive fstype=xfs Operations: monitor interval=20 timeout=40 (archive-monitor-interval-20) notify interval=0s timeout=60 (archive-notify-interval-0s) start interval=0s timeout=60 (archive-start-interval-0s) stop interval=0s timeout=60 (archive-stop-interval-0s) Resource: data (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/VG03/data directory=/data fstype=xfs Operations: monitor interval=20 timeout=40 (data-monitor-interval-20) notify interval=0s timeout=60 (data-notify-interval-0s) start interval=0s timeout=60 (data-start-interval-0s) stop interval=0s timeout=60 (data-stop-interval-0s) Resource: pgsql (class=ocf provider=heartbeat type=pgsql) Attributes: config=/data/zabbix/postgresql.conf pgctl=/postgres/10/bin/pg_ctl pgdata=/data/zabbix psql=/postgres/10/bin/psql restart_on_promote=ture Meta Attrs: migration-threshold=3 Operations: demote interval=0s timeout=120 (pgsql-demote-interval-0s) methods interval=0s timeout=5 (pgsql-methods-interval-0s) monitor interval=30 timeout=30 (pgsql-monitor-interval-30) monitor interval=29 role=Master timeout=30 (pgsql-monitor-interval-29) notify interval=0s timeout=90 (pgsql-notify-interval-0s) promote interval=0s timeout=120 (pgsql-promote-interval-0s) start interval=0s timeout=120 (pgsql-start-interval-0s) stop interval=0s timeout=120 (pgsql-stop-interval-0s) ~~~ 생략 ~~~~ 8. Pacemaker Config Show
  • 38. 37 On Active node: Service Check 9. Check Virtual IP Address [root@postgres-svr1 ~]# ip addr show | grep secondary 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:28:33:5a brd ff:ff:ff:ff:ff:ff inet 192.168.0.61/24 brd 192.168.0.255 scope global eth0 valid_lft forever preferred_lft forever inet 192.168.0.60/24 brd 192.168.0.255 scope global secondary eth0 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe28:335a/64 scope link valid_lft forever preferred_lft forever 10. Check PostgreSQL Service 11. Check Filesystem mount [root@postgres-svr1 ~]# df -h Filesystem Size Used Avail Use% Mounted on ~~~ 생략 ~~~ /dev/sdb 100G 65M 100G 1% /postgres /dev/mapper/VG02-pg_wal 50G 2.1G 48G 5% /pg_wal /dev/mapper/VG01-archive 50G 33M 50G 1% /archive /dev/mapper/VG03-data 50G 1.6G 49G 4% /data [root@postgres-svr1 ~]# ps auxw | grep postgres postgres 2603 0.0 0.1 1180040 4980 ? S 09:45 0:01 /postgres/10/bin/postgres -D /data/zabbix -c config_file=/data/zabbix/postgresql.conf postgres 2604 0.0 0.0 96560 1648 ? Ss 09:45 0:00 postgres: logger process postgres 2606 0.0 0.0 1180188 2312 ? Ss 09:45 0:00 postgres: checkpointer process postgres 2607 0.0 0.0 1180180 1820 ? Ss 09:45 0:00 postgres: writer process postgres 2608 0.0 0.0 1180040 1820 ? Ss 09:45 0:01 postgres: wal writer process ~~~ 생략 ~~~~ 36
  • 39. 37 37 Failover Scenario - Primary Node Down Pacemaker Resource Group Clients Shared Volumes CentOS CentOS CentOS Virtual IP Apache HTTP Server Zabbix Server HW / SW 장애발생 HW / SW 장애발생 Active Standby Active Standby CentOS Pacemaker Resource Group Virtual IP PostgreSQL Filesystem FrontEndDatabase
  • 40. 39 General Future - Pacemaker 2.0 38 Previous name Current name 목적 attrd pacemaker-attrd 노드 속성 관리 cib pacemaker-cib 클러스터 정보 관리 crmd pacemaker-controld 클러스터 조정 lrmd pacemaker-execd 자싞 리소스 에이전트 수행 stonithd pacemaker-fenced 노드 펜스 수행 pacemaker_remoted pacemaker-remoted 원격지 리소스 에이전트 수행 pengine pacemaker-schedulerd 스케줄러 수행 Pacemaker corosync cpg Quorum Resource1 Resource2 Pacemaker-execd Pacemaker-fenced Pacemaker-schedulerd Pacemaker-attrd Pacemaker-based Pacemaker-controld Pacemakerd NetworkNode 1 Node 2 Pacemaker corosync cpg Quorum Resource1 Resource2 Pacemaker-execd Pacemaker-fenced Pacemaker-schedulerd Pacemaker-attrd Pacemaker-based Pacemaker-controld Pacemakerd cpg(Cluster Process Group)
  • 41. 40 General Future - Kronosnet(KNET) 39  What is Kronosnet? • https://www.kronosnet.org • 네트워크 전송 가용성 확보를 위한 라이브러리 • 중복성, 보안, 내결함성 및 빠른 장애 조치가 핵심 요구 사항인 고가용성 사용을 위해 설계된 네트워크 추상화 계층 구조  Project Features • 호스트당 최대 8개의 개별 네트워크 연결을 허용하여 클러스터 노드갂 네트워크 통싞에 높은 가용성을 제공 • 통싞이 복구된 후 자동으로 오류 발생 링크를 복구하여 재해 복구 속도를 높이고 다운타임을 단축 • 다중 네트워크 프로토콜 지원(UDP/SCTP) • 방화벽을 통과하여 Subnet 전반에서 동작 Pacemaker Corosync totempg totemsrp totemnet totemknet libknet NIC 1 NIC 2
  • 42. At the end of…. 여러분이 생각하는 고가용성이란 무엇인가요? 또한, 어떻게 생각하고 계셨나요? 어떠한 솔루션도 99.9% 고가용성을 보장해줄수는 없습니다. 또한, Best Practice 란 없습니다. 오로지 Test! Test! TEST!!! 40
  • 43. REFERENCE All *open-source* in the whole stack. Go, googling, … Configuring the Red Hat High Availability Add-On with Pacemaker https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html-single/Configuring_the_Red_Hat_High_Availability_Add- On_with_Pacemaker/index.html SAP on Red Hat Technical Documents http://www.redhat.com/f/pdf/ha-sap-v1-6-4.pdf Red Hat Reference Architecture Series http://www.sistina.com/rhel/resource_center/reference_architecture.html Clusterlabs http://clusterlabs.org/doc/ http://blog.clusterlabs.org/ OpenStack HA http://www.slideshare.net/kenhui65/openstack-ha High Availability on Linux - the SUSE way https://tuna.moe/assets/slides/SFD2015/SUSE_HA_arch_overview.pdf Github ClusterLabs (Booth) https://github.com/ClusterLabs/booth 41
  • 44. 43 Dong hyun Kim | Manager Opensource Buisness Team, kt ds kim.donghyun@kt.com