SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Galera

Cluster

Node Recovery
Seppo Jaakola
Codership
Agenda

●

Node Recovery Scenarios

●

Incremental State Transfer

●

State Snapshot Transfer

●

Full Cluster recovery

www.codership.com
2
Node Recovery Scenarios
Node drops from cluster gracefully and joins back
Replication state is stored in grastate.dat le
Joining happens by Incremental State Transfer (IST)

●
●

Joining after node crash
●
●

Node has either known or unknown state
Joining can happen by IST or full State Snapshot Transfer (SST)
is required

Full Cluster recovery
●
●
●
●

e.g. data center power down
All nodes with known or unknown states
The node with latest known state must be identi ed
New cluster needs to be bootstrapped

www.codership.com
3
Joining One Node to Cluster
Automatic Node Joining

Cluster handshake

MySQL

joiner

MySQL

Galera Replication

www.codership.com
5
Automatic Node Joining

Cluster selects donor to help
the joiner to join
Send state

MySQL

joiner

Donor
IST or SST

Galera Replication

www.codership.com
6
Automatic Node Joining

Catch up

MySQL

MySQL

Galera Replication

joiner

Slave queue

www.codership.com
7
Automatic Node Joining

MySQL

MySQL

MySQL

Galera Replication

www.codership.com
8
Incremental State Transfer
Incremental State Transfer
●

●

●

Every node in Galera Cluster has a log of replicated write
sets: gcache
Gcache is mmap le, available disk space is upper limit
for size allocation
If joining node has past history in the cluster and donor
has long enough gcache containing joiner's seqno
position => then IST can be used for synchronization

www.codership.com
10
Incremental State Transfer

Request to join

Node-1

GTID: seqno-n

Node-n

Joiner

Donor
seqno-n+m

grastate.dat

seqno-n
gcache

Group ID:seqno

gcache

www.codership.com
11
Incremental State Transfer

Node-1

Node-n

Joiner
Donor
apply

seqno-n+m

Send IST events
grastate.dat

gcache

seqno-n

Group ID:seqno

gcache

www.codership.com
12
Incremental State Transfer
●

●

●

Node synchronization by IST is very e/ective and least
intrusive method for the donor
gcache.size parameter de nes how big cache will be
maintained
Use database size and write rate to optimize gcache:
➢
➢

●

●

gcache < database size
Write rate de nes how long tail is available in cache

If joiner node had crashed and IST was used to
synchronize it back, then it is essential that InnoDB
recovery works (innodb_doublewrite)
If IST is not possible, donor will switch automatically to
SST method
www.codership.com
13
State Snapshot Transfer
SST Request

MySQL

joiner

MySQL
SST Request

Galera Replication

●

wsrep_sst_method

www.codership.com
15
SST Method

wsrep_sst_mysqldump

MySQL

donor

wsrep_sst_rsync

joiner

Galera Replication
wsrep_sst_xtrabackup

www.codership.com
16
State Snapshot Transfer
●
●

●
●

●

To send full database state
wsrep_sst_method to choose the method:
➢ mysqldump
➢ rsync
➢ Xtrabackup
Open API for creating new SST methods
All SST methods cause at least some service break in
donor node
If node has crashed, InnoDB recovery will happen
during startup. But with SST, this InnoDB recovery is
more or less useless

www.codership.com
17
Full Cluster Recovery
Full Cluster Recovery
All nodes dropped from cluster:
1. Find the node which has latest changes
2. Bootstrap new cluster from the latest node

www.codership.com
19
Node With Latest Changes
Check grastate.dat les:
1. File has valid seqno

# GALERA saved state
version: 2.1
uuid:
5ee99582-bb8d-11e2-b8e3-23de375c1d30
seqno:
8204503945773

●

Graceful shutdown

●

Find node which has biggest seqno

2. No seqno, but group ID is there

# GALERA saved state
version: 2.1
uuid:
5ee99582-bb8d-11e2-b8e3-23de375c1d30
seqno:
-1

●

Crash during transaction processing

●

Use –wsrep-recover to dig out the last seqno

3. No seqno, no group ID
●

# GALERA saved state
version: 2.1
uuid:
00000000-0000-0000-0000-000000000000
seqno:
-1

Crash during DDL

http://www.codership.com/wiki/doku.php?id=mysql_galera_restart
www.codership.com
20
--wsrep-recover
MySQL stores last committed GTID in InnoDB data
header, transactionally

●

This GTID can be read by starting mysqld with
–wsrep-recover option

●

<path to bin>/mysqld

–wsrep-recover –defaults- le=<path to my.cnf>

●

Mysqld will read InnoDB header les and shutdown immediately

●

Last wsrep position is printed in mysql error le
130514 18:39:13 [Note] WSREP: Recovered position: 5ee99582-bb8d-11e2-b8e3-23de375c1d30:8204503945771

www.codership.com
21
Bootstrapping New Cluster
When the latest node has been identi ed, start this
node as rst node in cluster

●

●
●

service mysql start –wsrep_new_cluster
service mysql start –wsrep_cluster_address=gcomm://

Start all other nodes. my.cnf should have
wsrep_cluster_address pointing to all other nodes

●

●
●

service mysql start
Don't re all nodes at once, rather start them one by one

www.codership.com
22
Questions?

Thank you for listening!
Happy Clustering :-)

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
 
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorAlmost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
 
Maxscale switchover, failover, and auto rejoin
Maxscale switchover, failover, and auto rejoinMaxscale switchover, failover, and auto rejoin
Maxscale switchover, failover, and auto rejoin
 
Cobbler - Fast and reliable multi-OS provisioning
Cobbler - Fast and reliable multi-OS provisioningCobbler - Fast and reliable multi-OS provisioning
Cobbler - Fast and reliable multi-OS provisioning
 
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
MySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialMySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn Tutorial
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestrator
 
MariaDB Galera Cluster presentation
MariaDB Galera Cluster presentationMariaDB Galera Cluster presentation
MariaDB Galera Cluster presentation
 
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group ReplicationPercona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
 
Galera Replication Demystified: How Does It Work?
Galera Replication Demystified: How Does It Work?Galera Replication Demystified: How Does It Work?
Galera Replication Demystified: How Does It Work?
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Errant GTIDs breaking replication @ Percona Live 2019
Errant GTIDs breaking replication @ Percona Live 2019Errant GTIDs breaking replication @ Percona Live 2019
Errant GTIDs breaking replication @ Percona Live 2019
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issues
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docxKeepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
 
Case Study: MySQL migration from latin1 to UTF-8
Case Study: MySQL migration from latin1 to UTF-8Case Study: MySQL migration from latin1 to UTF-8
Case Study: MySQL migration from latin1 to UTF-8
 

Ähnlich wie Galera Cluster - Node Recovery - Webinar slides

Plny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practicesPlny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practices
Dimas Prasetyo
 
FOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with GaleraFOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with Galera
FromDual GmbH
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 

Ähnlich wie Galera Cluster - Node Recovery - Webinar slides (20)

Plny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practicesPlny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practices
 
MySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMySQL Live Migration - Common Scenarios
MySQL Live Migration - Common Scenarios
 
MySQL Galera 集群
MySQL Galera 集群MySQL Galera 集群
MySQL Galera 集群
 
Advanced percona xtra db cluster in a nutshell... la suite plsc2016
Advanced percona xtra db cluster in a nutshell... la suite plsc2016Advanced percona xtra db cluster in a nutshell... la suite plsc2016
Advanced percona xtra db cluster in a nutshell... la suite plsc2016
 
Percona XtraDB 集群内部
Percona XtraDB 集群内部Percona XtraDB 集群内部
Percona XtraDB 集群内部
 
Highly Available Load Balanced Galera MySql Cluster
Highly Available Load Balanced  Galera MySql ClusterHighly Available Load Balanced  Galera MySql Cluster
Highly Available Load Balanced Galera MySql Cluster
 
MariaDB Galera Cluster - Simple, Transparent, Highly Available
MariaDB Galera Cluster - Simple, Transparent, Highly AvailableMariaDB Galera Cluster - Simple, Transparent, Highly Available
MariaDB Galera Cluster - Simple, Transparent, Highly Available
 
Advanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suiteAdvanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suite
 
How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013
 
Structured streaming in Spark
Structured streaming in SparkStructured streaming in Spark
Structured streaming in Spark
 
PXC (Xtradb) Failure and Recovery
PXC (Xtradb) Failure and RecoveryPXC (Xtradb) Failure and Recovery
PXC (Xtradb) Failure and Recovery
 
FOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with GaleraFOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with Galera
 
Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6
Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6
Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
 
M|18 Creating a Reference Architecture for High Availability at Nokia
M|18 Creating a Reference Architecture for High Availability at NokiaM|18 Creating a Reference Architecture for High Availability at Nokia
M|18 Creating a Reference Architecture for High Availability at Nokia
 
M|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera ClusterM|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera Cluster
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Galera Cluster 4 presentation at Percona Live Austin 2019
Galera Cluster 4 presentation at Percona Live Austin 2019 Galera Cluster 4 presentation at Percona Live Austin 2019
Galera Cluster 4 presentation at Percona Live Austin 2019
 
Percona XtraDB 集群安装与配置
Percona XtraDB 集群安装与配置Percona XtraDB 集群安装与配置
Percona XtraDB 集群安装与配置
 
MariaDB Auto-Clustering, Vertical and Horizontal Scaling within Jelastic PaaS
MariaDB Auto-Clustering, Vertical and Horizontal Scaling within Jelastic PaaSMariaDB Auto-Clustering, Vertical and Horizontal Scaling within Jelastic PaaS
MariaDB Auto-Clustering, Vertical and Horizontal Scaling within Jelastic PaaS
 

Mehr von Severalnines

Webinar slides: How to Migrate from Oracle DB to MariaDB
Webinar slides: How to Migrate from Oracle DB to MariaDBWebinar slides: How to Migrate from Oracle DB to MariaDB
Webinar slides: How to Migrate from Oracle DB to MariaDB
Severalnines
 
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControlWebinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Severalnines
 
Webinar slides: How to Manage Replication Failover Processes for MySQL, Maria...
Webinar slides: How to Manage Replication Failover Processes for MySQL, Maria...Webinar slides: How to Manage Replication Failover Processes for MySQL, Maria...
Webinar slides: How to Manage Replication Failover Processes for MySQL, Maria...
Severalnines
 
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Severalnines
 
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Severalnines
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Severalnines
 
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDBWebinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Severalnines
 
Webinar slides: How to Measure Database Availability?
Webinar slides: How to Measure Database Availability?Webinar slides: How to Measure Database Availability?
Webinar slides: How to Measure Database Availability?
Severalnines
 
Webinar slides: Designing Open Source Databases for High Availability
Webinar slides: Designing Open Source Databases for High AvailabilityWebinar slides: Designing Open Source Databases for High Availability
Webinar slides: Designing Open Source Databases for High Availability
Severalnines
 

Mehr von Severalnines (20)

Cloud's future runs through Sovereign DBaaS
Cloud's future runs through Sovereign DBaaSCloud's future runs through Sovereign DBaaS
Cloud's future runs through Sovereign DBaaS
 
Tips to drive maria db cluster performance for nextcloud
Tips to drive maria db cluster performance for nextcloudTips to drive maria db cluster performance for nextcloud
Tips to drive maria db cluster performance for nextcloud
 
Working with the Moodle Database: The Basics
Working with the Moodle Database: The BasicsWorking with the Moodle Database: The Basics
Working with the Moodle Database: The Basics
 
SysAdmin Working from Home? Tips to Automate MySQL, MariaDB, Postgres & MongoDB
SysAdmin Working from Home? Tips to Automate MySQL, MariaDB, Postgres & MongoDBSysAdmin Working from Home? Tips to Automate MySQL, MariaDB, Postgres & MongoDB
SysAdmin Working from Home? Tips to Automate MySQL, MariaDB, Postgres & MongoDB
 
(slides) Polyglot persistence: utilizing open source databases as a Swiss poc...
(slides) Polyglot persistence: utilizing open source databases as a Swiss poc...(slides) Polyglot persistence: utilizing open source databases as a Swiss poc...
(slides) Polyglot persistence: utilizing open source databases as a Swiss poc...
 
Webinar slides: How to Migrate from Oracle DB to MariaDB
Webinar slides: How to Migrate from Oracle DB to MariaDBWebinar slides: How to Migrate from Oracle DB to MariaDB
Webinar slides: How to Migrate from Oracle DB to MariaDB
 
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControlWebinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControl
 
Webinar slides: How to Manage Replication Failover Processes for MySQL, Maria...
Webinar slides: How to Manage Replication Failover Processes for MySQL, Maria...Webinar slides: How to Manage Replication Failover Processes for MySQL, Maria...
Webinar slides: How to Manage Replication Failover Processes for MySQL, Maria...
 
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
 
Disaster Recovery Planning for MySQL & MariaDB
Disaster Recovery Planning for MySQL & MariaDBDisaster Recovery Planning for MySQL & MariaDB
Disaster Recovery Planning for MySQL & MariaDB
 
MariaDB Performance Tuning Crash Course
MariaDB Performance Tuning Crash CourseMariaDB Performance Tuning Crash Course
MariaDB Performance Tuning Crash Course
 
Performance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDBPerformance Tuning Cheat Sheet for MongoDB
Performance Tuning Cheat Sheet for MongoDB
 
Advanced MySql Data-at-Rest Encryption in Percona Server
Advanced MySql Data-at-Rest Encryption in Percona ServerAdvanced MySql Data-at-Rest Encryption in Percona Server
Advanced MySql Data-at-Rest Encryption in Percona Server
 
Polyglot Persistence Utilizing Open Source Databases as a Swiss Pocket Knife
Polyglot Persistence Utilizing Open Source Databases as a Swiss Pocket KnifePolyglot Persistence Utilizing Open Source Databases as a Swiss Pocket Knife
Polyglot Persistence Utilizing Open Source Databases as a Swiss Pocket Knife
 
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
 
Webinar slides: Our Guide to MySQL & MariaDB Performance Tuning
Webinar slides: Our Guide to MySQL & MariaDB Performance TuningWebinar slides: Our Guide to MySQL & MariaDB Performance Tuning
Webinar slides: Our Guide to MySQL & MariaDB Performance Tuning
 
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDBWebinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
 
Webinar slides: How to Measure Database Availability?
Webinar slides: How to Measure Database Availability?Webinar slides: How to Measure Database Availability?
Webinar slides: How to Measure Database Availability?
 
Webinar slides: Designing Open Source Databases for High Availability
Webinar slides: Designing Open Source Databases for High AvailabilityWebinar slides: Designing Open Source Databases for High Availability
Webinar slides: Designing Open Source Databases for High Availability
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Galera Cluster - Node Recovery - Webinar slides

  • 2. Agenda ● Node Recovery Scenarios ● Incremental State Transfer ● State Snapshot Transfer ● Full Cluster recovery www.codership.com 2
  • 3. Node Recovery Scenarios Node drops from cluster gracefully and joins back Replication state is stored in grastate.dat le Joining happens by Incremental State Transfer (IST) ● ● Joining after node crash ● ● Node has either known or unknown state Joining can happen by IST or full State Snapshot Transfer (SST) is required Full Cluster recovery ● ● ● ● e.g. data center power down All nodes with known or unknown states The node with latest known state must be identi ed New cluster needs to be bootstrapped www.codership.com 3
  • 4. Joining One Node to Cluster
  • 5. Automatic Node Joining Cluster handshake MySQL joiner MySQL Galera Replication www.codership.com 5
  • 6. Automatic Node Joining Cluster selects donor to help the joiner to join Send state MySQL joiner Donor IST or SST Galera Replication www.codership.com 6
  • 7. Automatic Node Joining Catch up MySQL MySQL Galera Replication joiner Slave queue www.codership.com 7
  • 8. Automatic Node Joining MySQL MySQL MySQL Galera Replication www.codership.com 8
  • 10. Incremental State Transfer ● ● ● Every node in Galera Cluster has a log of replicated write sets: gcache Gcache is mmap le, available disk space is upper limit for size allocation If joining node has past history in the cluster and donor has long enough gcache containing joiner's seqno position => then IST can be used for synchronization www.codership.com 10
  • 11. Incremental State Transfer Request to join Node-1 GTID: seqno-n Node-n Joiner Donor seqno-n+m grastate.dat seqno-n gcache Group ID:seqno gcache www.codership.com 11
  • 12. Incremental State Transfer Node-1 Node-n Joiner Donor apply seqno-n+m Send IST events grastate.dat gcache seqno-n Group ID:seqno gcache www.codership.com 12
  • 13. Incremental State Transfer ● ● ● Node synchronization by IST is very e/ective and least intrusive method for the donor gcache.size parameter de nes how big cache will be maintained Use database size and write rate to optimize gcache: ➢ ➢ ● ● gcache < database size Write rate de nes how long tail is available in cache If joiner node had crashed and IST was used to synchronize it back, then it is essential that InnoDB recovery works (innodb_doublewrite) If IST is not possible, donor will switch automatically to SST method www.codership.com 13
  • 15. SST Request MySQL joiner MySQL SST Request Galera Replication ● wsrep_sst_method www.codership.com 15
  • 17. State Snapshot Transfer ● ● ● ● ● To send full database state wsrep_sst_method to choose the method: ➢ mysqldump ➢ rsync ➢ Xtrabackup Open API for creating new SST methods All SST methods cause at least some service break in donor node If node has crashed, InnoDB recovery will happen during startup. But with SST, this InnoDB recovery is more or less useless www.codership.com 17
  • 19. Full Cluster Recovery All nodes dropped from cluster: 1. Find the node which has latest changes 2. Bootstrap new cluster from the latest node www.codership.com 19
  • 20. Node With Latest Changes Check grastate.dat les: 1. File has valid seqno # GALERA saved state version: 2.1 uuid: 5ee99582-bb8d-11e2-b8e3-23de375c1d30 seqno: 8204503945773 ● Graceful shutdown ● Find node which has biggest seqno 2. No seqno, but group ID is there # GALERA saved state version: 2.1 uuid: 5ee99582-bb8d-11e2-b8e3-23de375c1d30 seqno: -1 ● Crash during transaction processing ● Use –wsrep-recover to dig out the last seqno 3. No seqno, no group ID ● # GALERA saved state version: 2.1 uuid: 00000000-0000-0000-0000-000000000000 seqno: -1 Crash during DDL http://www.codership.com/wiki/doku.php?id=mysql_galera_restart www.codership.com 20
  • 21. --wsrep-recover MySQL stores last committed GTID in InnoDB data header, transactionally ● This GTID can be read by starting mysqld with –wsrep-recover option ● <path to bin>/mysqld –wsrep-recover –defaults- le=<path to my.cnf> ● Mysqld will read InnoDB header les and shutdown immediately ● Last wsrep position is printed in mysql error le 130514 18:39:13 [Note] WSREP: Recovered position: 5ee99582-bb8d-11e2-b8e3-23de375c1d30:8204503945771 www.codership.com 21
  • 22. Bootstrapping New Cluster When the latest node has been identi ed, start this node as rst node in cluster ● ● ● service mysql start –wsrep_new_cluster service mysql start –wsrep_cluster_address=gcomm:// Start all other nodes. my.cnf should have wsrep_cluster_address pointing to all other nodes ● ● ● service mysql start Don't re all nodes at once, rather start them one by one www.codership.com 22
  • 23. Questions? Thank you for listening! Happy Clustering :-)