SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Failover, or not Failover,
that is the question
Percona Live MySQL Conference and Expo 2013
Massimo Brignoli, SkySQL
Henrik Ingo, Nokia
Please share and reuse this presentation licensed under the Creative Commonse Attribution License
Agenda
● Why HA is more difficult for databases
● Steps to failover
● Monitoring
● Automating failover
● Sounds great!
What could possibly go wrong?
● Amazon Dynamo
● Galera and NDB
Fault tolerance = redundancy
● RAID
● 2 power units per server
● Cluster of servers
● 2 kidneys per person
● Redudancy at all levels:
Software, Hardware, Network, Electricity...
A chain is as strong as the weakest link.
Durability
"Durability is an interesting concept.
If I flush a transaction to disk,
it is said to be durable.
But if I then take a backup,
it is even more durable."
Heikki Tuuri
Why High Availability is
More Difficult for Databases
Redundancy of server
AND
Redundancy of data
WHILE
Performing thousands of write operations
per second onto the dataset
What failover?
1. Primary server
2. Secondary / Standby server
for redundancy
3. In case Primary fails,
Secondary server must
become the new Primary
Steps to failover (theory)
1. Notice failure
2. Move VIP
3. Continue
Automating failover
Generic Clustering Solutions
● Pacemaker/Corosync
● Linux Heartbeat
● Red Hat Cluster Suite
● Solaris Cluster
● Windows Server Failover
Clustering
● etc...
MySQL Specific Solutions
● MMM
● PRM
● MHA
● JDBC connector
Steps to failover (DRBD)
VIP
VIP
1. Have DRBD
2. Notice failure
3. Shutdown MySQL on primary
4. Unmount disk on primary
5. Mount disk on secondary
6. Start MySQL on secondary
7. Wait for InnoDB recovery
8. Wait for InnoDB recovery
9. Wait for InnoDB recovery
10. Unset VIP on primary
11. Set VIP on secondary
12. Continue
13. Should you add a new secondary?
Steps to failover (MySQL replication)
VIP
VIP
1. Have replication
2. Notice failure
3. Make slave writable
4. Make master read-only
5. Unset VIP on master
6. Set VIP on slave
7. Continue
8. Should you add a new slave?
What if you have more than 2 servers?
(MySQL replication)
VIP
VIP
?
● MySQL replication failover with more than 2
servers can be a hassle.
● Which slave should become the new master?
● All slaves must be pointed to the new master.
● They must figure out where to continue
replication (binlog position)
● MySQL 5.6 GTID helps.
MHA and SkySQL...
● Combination of
resource manager
+ scripts
● Automating failover
process:
○ New Master
selection
○ Slaves
reconfiguration
○ VIP management
○ Missing binlogs
retrieval
Sounds great,
what could possibly go wrong?
Sounds great,
what could possibly go wrong?
VIP
1. Have replication
○ Ok, is it working? What if it's not working?
○ Is it replicating in the right direction?
○ Does your bash script handle binlog positions correctly?
○ Asynchronous?
2. Notice failure
○ Polling interval
○ Who is polling?
○ ...and from where?
○ How is he handling failure himself?
○ False positives
○ Is failover the right response to every failure?
3. STONITH
○ Shutdown MySQL on Primary? How? It's not responding...
○ Unmount disk on Primary? How? It's not responding...
○ "You need a STONITH device"! Hehe, nice try...
4. Move VIP
○ Unset VIP on Master/Primary? How? It's not responding...
○ Set VIP on Secondary/Slave. This will work fine. Unfortunately.
5. Continue
6. Add back new/same Secondary
○ Automatically of course. Even if it just failed 15 seconds ago.
VIP
Case Github
https://github.com/blog/1261-github-availability-this-week
● MySQL replication, Pacemaker, Corosync, Percona Replication Manager
● PRM health check fails due to high load during schema migration.
● Failover!
● New node has cold caches, so even worse performance.
● Failover! (back)
● Disable PRM
● A slave is found outdated as replication is not happening
● Enable PRM and hope it will fix it
● Pacemaker segfaults, causing cluster partition
● PRM selects the outdated node as master, shuts down others
● All kinds of data inconsistencies
● Restart PRM on all nodes
● ...
Case Github
Lesson learned:
Automated failover is dangerous
Cold cache is dangerous
But...
Not automating is also dangerous
Baron Schwartz:
75% of replication failures are human errors
http://www.percona.com/about-us/mysql-white-paper/causes-of-downtime-in-production-mysql-servers
80% of Aviation accidents
are caused by human errors
http://asasi.org/papers/2004/Shappell%20et%20al_HFACS_ISASI04.pdf
80% Events caused by human errors, 70% of
them due to organization weaknesses
http://www.hss.doe.gov/sesa/corporatesafety/hpc/fundamentals.html
Are we solving the right problem?
Instead of automating the problem...
Eliminate the problem!
Amazon Dynamo
R + W > N
Voldemort, Cassandra, RIAK, DynamoDB, S3
http://openlife.cc/blogs/2012/september/failover-evil
N=3, R=W=2
R + W > N
Eventual consistency is internal only
R + W > N
Failover?
Single node failure is a non-event!
For relational databases?
Synchronous replication is
special case of Dynamo:
W=N & R=1
Or is there a failover after all?
Due to W=N, writers actually notice node
failures! Cluster reconfiguration needed.
(Readers are ok.)
?
Example: Galera
OKTimeout
Example: MySQL NDB Cluster
What have we learned?
● Failover with DRBD is painful because it is
slow.
● Failover with MySQL replication is painful
because it's a mess.
● Amazon Dynamo has no failover
● Galera Cluster has no failover but needs
cluster reconfiguration. Same thing...
● MySQL NDB Cluster has failover but you
can't see it.

Weitere ähnliche Inhalte

Was ist angesagt?

Highly Available MySQL/PHP Applications with mysqlnd
Highly Available MySQL/PHP Applications with mysqlndHighly Available MySQL/PHP Applications with mysqlnd
Highly Available MySQL/PHP Applications with mysqlndJervin Real
 
High Availability with MariaDB Enterprise
High Availability with MariaDB EnterpriseHigh Availability with MariaDB Enterprise
High Availability with MariaDB EnterpriseMariaDB Corporation
 
Zero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesZero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesSeveralnines
 
MySQL Multi Master Replication
MySQL Multi Master ReplicationMySQL Multi Master Replication
MySQL Multi Master ReplicationMoshe Kaplan
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera ClusterAbdul Manaf
 
Galera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction SlidesGalera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction SlidesSeveralnines
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability Mydbops
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Marco Tusa
 
M|18 Writing Stored Procedures in the Real World
M|18 Writing Stored Procedures in the Real WorldM|18 Writing Stored Procedures in the Real World
M|18 Writing Stored Procedures in the Real WorldMariaDB plc
 
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)M Malai
 
Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )Mydbops
 
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4Ulf Wendel
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMariaDB plc
 
Plny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practicesPlny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practicesDimas Prasetyo
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQLUlf Wendel
 
MySQL topology healing at OLA.
MySQL topology healing at OLA.MySQL topology healing at OLA.
MySQL topology healing at OLA.Mydbops
 
MySQL 5.6 Global Transaction Identifier - Use case: Failover
MySQL 5.6 Global Transaction Identifier - Use case: FailoverMySQL 5.6 Global Transaction Identifier - Use case: Failover
MySQL 5.6 Global Transaction Identifier - Use case: FailoverUlf Wendel
 

Was ist angesagt? (20)

Highly Available MySQL/PHP Applications with mysqlnd
Highly Available MySQL/PHP Applications with mysqlndHighly Available MySQL/PHP Applications with mysqlnd
Highly Available MySQL/PHP Applications with mysqlnd
 
High Availability with MariaDB Enterprise
High Availability with MariaDB EnterpriseHigh Availability with MariaDB Enterprise
High Availability with MariaDB Enterprise
 
Zero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesZero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best Practices
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
MySQL Multi Master Replication
MySQL Multi Master ReplicationMySQL Multi Master Replication
MySQL Multi Master Replication
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
 
Galera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction SlidesGalera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction Slides
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
 
M|18 Writing Stored Procedures in the Real World
M|18 Writing Stored Procedures in the Real WorldM|18 Writing Stored Procedures in the Real World
M|18 Writing Stored Procedures in the Real World
 
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
 
Planning for Disaster Recovery (DR) with Galera Cluster
Planning for Disaster Recovery (DR) with Galera ClusterPlanning for Disaster Recovery (DR) with Galera Cluster
Planning for Disaster Recovery (DR) with Galera Cluster
 
Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )
 
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
 
Galera Cluster DDL and Schema Upgrades 220217
Galera Cluster DDL and Schema Upgrades 220217Galera Cluster DDL and Schema Upgrades 220217
Galera Cluster DDL and Schema Upgrades 220217
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
 
Plny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practicesPlny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practices
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQL
 
MySQL topology healing at OLA.
MySQL topology healing at OLA.MySQL topology healing at OLA.
MySQL topology healing at OLA.
 
MySQL 5.6 Global Transaction Identifier - Use case: Failover
MySQL 5.6 Global Transaction Identifier - Use case: FailoverMySQL 5.6 Global Transaction Identifier - Use case: Failover
MySQL 5.6 Global Transaction Identifier - Use case: Failover
 

Ähnlich wie Failover or not to failover

Nagios Conference 2014 - Andy Brist - Nagios XI Failover and HA Solutions
Nagios Conference 2014 - Andy Brist - Nagios XI Failover and HA SolutionsNagios Conference 2014 - Andy Brist - Nagios XI Failover and HA Solutions
Nagios Conference 2014 - Andy Brist - Nagios XI Failover and HA SolutionsNagios
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability SolutionsLenz Grimmer
 
Mysqlhacodebits20091203 1260184765-phpapp02
Mysqlhacodebits20091203 1260184765-phpapp02Mysqlhacodebits20091203 1260184765-phpapp02
Mysqlhacodebits20091203 1260184765-phpapp02Louis liu
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability SolutionsLenz Grimmer
 
PoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HAPoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HAUlf Wendel
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia DatabasesJaime Crespo
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerkuchinskaya
 
Become a MySQL DBA - webinar series - slides: Which High Availability solution?
Become a MySQL DBA - webinar series - slides: Which High Availability solution?Become a MySQL DBA - webinar series - slides: Which High Availability solution?
Become a MySQL DBA - webinar series - slides: Which High Availability solution?Severalnines
 
Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29liufabin 66688
 
Advanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suiteAdvanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suiteKenny Gryp
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningFromDual GmbH
 
EXPERIENCE WITH MYSQL HA SOLUTION AND GROUP REPLICATION
EXPERIENCE WITH MYSQL HA SOLUTION AND GROUP REPLICATIONEXPERIENCE WITH MYSQL HA SOLUTION AND GROUP REPLICATION
EXPERIENCE WITH MYSQL HA SOLUTION AND GROUP REPLICATIONMysql User Camp
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveUlf Wendel
 
MySQL for Large Scale Social Games
MySQL for Large Scale Social GamesMySQL for Large Scale Social Games
MySQL for Large Scale Social GamesYoshinori Matsunobu
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialJean-François Gagné
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentJean-François Gagné
 
MongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-SetMongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-SetVivek Parihar
 

Ähnlich wie Failover or not to failover (20)

Nagios Conference 2014 - Andy Brist - Nagios XI Failover and HA Solutions
Nagios Conference 2014 - Andy Brist - Nagios XI Failover and HA SolutionsNagios Conference 2014 - Andy Brist - Nagios XI Failover and HA Solutions
Nagios Conference 2014 - Andy Brist - Nagios XI Failover and HA Solutions
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability Solutions
 
Mysqlhacodebits20091203 1260184765-phpapp02
Mysqlhacodebits20091203 1260184765-phpapp02Mysqlhacodebits20091203 1260184765-phpapp02
Mysqlhacodebits20091203 1260184765-phpapp02
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability Solutions
 
PoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HAPoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HA
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
 
Become a MySQL DBA - webinar series - slides: Which High Availability solution?
Become a MySQL DBA - webinar series - slides: Which High Availability solution?Become a MySQL DBA - webinar series - slides: Which High Availability solution?
Become a MySQL DBA - webinar series - slides: Which High Availability solution?
 
Fail over fail_back
Fail over fail_backFail over fail_back
Fail over fail_back
 
Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29Drupal Con My Sql Ha 2008 08 29
Drupal Con My Sql Ha 2008 08 29
 
Kvm optimizations
Kvm optimizationsKvm optimizations
Kvm optimizations
 
Advanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suiteAdvanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suite
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL Tuning
 
EXPERIENCE WITH MYSQL HA SOLUTION AND GROUP REPLICATION
EXPERIENCE WITH MYSQL HA SOLUTION AND GROUP REPLICATIONEXPERIENCE WITH MYSQL HA SOLUTION AND GROUP REPLICATION
EXPERIENCE WITH MYSQL HA SOLUTION AND GROUP REPLICATION
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
 
MySQL for Large Scale Social Games
MySQL for Large Scale Social GamesMySQL for Large Scale Social Games
MySQL for Large Scale Social Games
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated Environment
 
MongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-SetMongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-Set
 

Mehr von Henrik Ingo

Introduction to new high performance storage engines in mongodb 3.0
Introduction to new high performance storage engines in mongodb 3.0Introduction to new high performance storage engines in mongodb 3.0
Introduction to new high performance storage engines in mongodb 3.0Henrik Ingo
 
Meteor - The next generation software stack
Meteor - The next generation software stackMeteor - The next generation software stack
Meteor - The next generation software stackHenrik Ingo
 
MongoDB for Oracle Experts - OUGF Harmony 2014
MongoDB for Oracle Experts - OUGF Harmony 2014 MongoDB for Oracle Experts - OUGF Harmony 2014
MongoDB for Oracle Experts - OUGF Harmony 2014 Henrik Ingo
 
Building Your First MongoDB App
Building Your First MongoDB AppBuilding Your First MongoDB App
Building Your First MongoDB AppHenrik Ingo
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorHenrik Ingo
 
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19Henrik Ingo
 
Spatial functions in MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others
Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and othersSpatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others
Spatial functions in MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and othersHenrik Ingo
 
Introducing Xtrabackup Manager
Introducing Xtrabackup ManagerIntroducing Xtrabackup Manager
Introducing Xtrabackup ManagerHenrik Ingo
 
Froscon 2012 how big corporations play the open source game
Froscon 2012   how big corporations play the open source gameFroscon 2012   how big corporations play the open source game
Froscon 2012 how big corporations play the open source gameHenrik Ingo
 
Databases and the Cloud
Databases and the CloudDatabases and the Cloud
Databases and the CloudHenrik Ingo
 
Fixed in drizzle
Fixed in drizzleFixed in drizzle
Fixed in drizzleHenrik Ingo
 
Choosing a MySQL High Availability solution - Percona Live UK 2011
Choosing a MySQL High Availability solution - Percona Live UK 2011Choosing a MySQL High Availability solution - Percona Live UK 2011
Choosing a MySQL High Availability solution - Percona Live UK 2011Henrik Ingo
 
Froscon2011: How i learned to use sql and then learned not to use it
Froscon2011:  How i learned to use sql and then learned not to use itFroscon2011:  How i learned to use sql and then learned not to use it
Froscon2011: How i learned to use sql and then learned not to use itHenrik Ingo
 
How to grow your open source project 10x and revenues 5x OSCON2011
How to grow your open source project 10x and revenues 5x OSCON2011How to grow your open source project 10x and revenues 5x OSCON2011
How to grow your open source project 10x and revenues 5x OSCON2011Henrik Ingo
 

Mehr von Henrik Ingo (14)

Introduction to new high performance storage engines in mongodb 3.0
Introduction to new high performance storage engines in mongodb 3.0Introduction to new high performance storage engines in mongodb 3.0
Introduction to new high performance storage engines in mongodb 3.0
 
Meteor - The next generation software stack
Meteor - The next generation software stackMeteor - The next generation software stack
Meteor - The next generation software stack
 
MongoDB for Oracle Experts - OUGF Harmony 2014
MongoDB for Oracle Experts - OUGF Harmony 2014 MongoDB for Oracle Experts - OUGF Harmony 2014
MongoDB for Oracle Experts - OUGF Harmony 2014
 
Building Your First MongoDB App
Building Your First MongoDB AppBuilding Your First MongoDB App
Building Your First MongoDB App
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
 
Spatial functions in MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others
Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and othersSpatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others
Spatial functions in MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others
 
Introducing Xtrabackup Manager
Introducing Xtrabackup ManagerIntroducing Xtrabackup Manager
Introducing Xtrabackup Manager
 
Froscon 2012 how big corporations play the open source game
Froscon 2012   how big corporations play the open source gameFroscon 2012   how big corporations play the open source game
Froscon 2012 how big corporations play the open source game
 
Databases and the Cloud
Databases and the CloudDatabases and the Cloud
Databases and the Cloud
 
Fixed in drizzle
Fixed in drizzleFixed in drizzle
Fixed in drizzle
 
Choosing a MySQL High Availability solution - Percona Live UK 2011
Choosing a MySQL High Availability solution - Percona Live UK 2011Choosing a MySQL High Availability solution - Percona Live UK 2011
Choosing a MySQL High Availability solution - Percona Live UK 2011
 
Froscon2011: How i learned to use sql and then learned not to use it
Froscon2011:  How i learned to use sql and then learned not to use itFroscon2011:  How i learned to use sql and then learned not to use it
Froscon2011: How i learned to use sql and then learned not to use it
 
How to grow your open source project 10x and revenues 5x OSCON2011
How to grow your open source project 10x and revenues 5x OSCON2011How to grow your open source project 10x and revenues 5x OSCON2011
How to grow your open source project 10x and revenues 5x OSCON2011
 

Kürzlich hochgeladen

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Kürzlich hochgeladen (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Failover or not to failover

  • 1. Failover, or not Failover, that is the question Percona Live MySQL Conference and Expo 2013 Massimo Brignoli, SkySQL Henrik Ingo, Nokia Please share and reuse this presentation licensed under the Creative Commonse Attribution License
  • 2. Agenda ● Why HA is more difficult for databases ● Steps to failover ● Monitoring ● Automating failover ● Sounds great! What could possibly go wrong? ● Amazon Dynamo ● Galera and NDB
  • 3. Fault tolerance = redundancy ● RAID ● 2 power units per server ● Cluster of servers ● 2 kidneys per person ● Redudancy at all levels: Software, Hardware, Network, Electricity... A chain is as strong as the weakest link.
  • 4. Durability "Durability is an interesting concept. If I flush a transaction to disk, it is said to be durable. But if I then take a backup, it is even more durable." Heikki Tuuri
  • 5. Why High Availability is More Difficult for Databases Redundancy of server AND Redundancy of data WHILE Performing thousands of write operations per second onto the dataset
  • 6. What failover? 1. Primary server 2. Secondary / Standby server for redundancy 3. In case Primary fails, Secondary server must become the new Primary
  • 7. Steps to failover (theory) 1. Notice failure 2. Move VIP 3. Continue
  • 8. Automating failover Generic Clustering Solutions ● Pacemaker/Corosync ● Linux Heartbeat ● Red Hat Cluster Suite ● Solaris Cluster ● Windows Server Failover Clustering ● etc... MySQL Specific Solutions ● MMM ● PRM ● MHA ● JDBC connector
  • 9. Steps to failover (DRBD) VIP VIP 1. Have DRBD 2. Notice failure 3. Shutdown MySQL on primary 4. Unmount disk on primary 5. Mount disk on secondary 6. Start MySQL on secondary 7. Wait for InnoDB recovery 8. Wait for InnoDB recovery 9. Wait for InnoDB recovery 10. Unset VIP on primary 11. Set VIP on secondary 12. Continue 13. Should you add a new secondary?
  • 10. Steps to failover (MySQL replication) VIP VIP 1. Have replication 2. Notice failure 3. Make slave writable 4. Make master read-only 5. Unset VIP on master 6. Set VIP on slave 7. Continue 8. Should you add a new slave?
  • 11. What if you have more than 2 servers? (MySQL replication) VIP VIP ? ● MySQL replication failover with more than 2 servers can be a hassle. ● Which slave should become the new master? ● All slaves must be pointed to the new master. ● They must figure out where to continue replication (binlog position) ● MySQL 5.6 GTID helps.
  • 12. MHA and SkySQL... ● Combination of resource manager + scripts ● Automating failover process: ○ New Master selection ○ Slaves reconfiguration ○ VIP management ○ Missing binlogs retrieval
  • 13. Sounds great, what could possibly go wrong?
  • 14. Sounds great, what could possibly go wrong? VIP 1. Have replication ○ Ok, is it working? What if it's not working? ○ Is it replicating in the right direction? ○ Does your bash script handle binlog positions correctly? ○ Asynchronous? 2. Notice failure ○ Polling interval ○ Who is polling? ○ ...and from where? ○ How is he handling failure himself? ○ False positives ○ Is failover the right response to every failure? 3. STONITH ○ Shutdown MySQL on Primary? How? It's not responding... ○ Unmount disk on Primary? How? It's not responding... ○ "You need a STONITH device"! Hehe, nice try... 4. Move VIP ○ Unset VIP on Master/Primary? How? It's not responding... ○ Set VIP on Secondary/Slave. This will work fine. Unfortunately. 5. Continue 6. Add back new/same Secondary ○ Automatically of course. Even if it just failed 15 seconds ago. VIP
  • 15. Case Github https://github.com/blog/1261-github-availability-this-week ● MySQL replication, Pacemaker, Corosync, Percona Replication Manager ● PRM health check fails due to high load during schema migration. ● Failover! ● New node has cold caches, so even worse performance. ● Failover! (back) ● Disable PRM ● A slave is found outdated as replication is not happening ● Enable PRM and hope it will fix it ● Pacemaker segfaults, causing cluster partition ● PRM selects the outdated node as master, shuts down others ● All kinds of data inconsistencies ● Restart PRM on all nodes ● ...
  • 16. Case Github Lesson learned: Automated failover is dangerous Cold cache is dangerous
  • 17. But... Not automating is also dangerous Baron Schwartz: 75% of replication failures are human errors http://www.percona.com/about-us/mysql-white-paper/causes-of-downtime-in-production-mysql-servers 80% of Aviation accidents are caused by human errors http://asasi.org/papers/2004/Shappell%20et%20al_HFACS_ISASI04.pdf 80% Events caused by human errors, 70% of them due to organization weaknesses http://www.hss.doe.gov/sesa/corporatesafety/hpc/fundamentals.html
  • 18. Are we solving the right problem?
  • 19. Instead of automating the problem... Eliminate the problem!
  • 20.
  • 21. Amazon Dynamo R + W > N Voldemort, Cassandra, RIAK, DynamoDB, S3 http://openlife.cc/blogs/2012/september/failover-evil
  • 22. N=3, R=W=2 R + W > N
  • 23. Eventual consistency is internal only R + W > N
  • 24. Failover? Single node failure is a non-event!
  • 25. For relational databases? Synchronous replication is special case of Dynamo: W=N & R=1
  • 26. Or is there a failover after all? Due to W=N, writers actually notice node failures! Cluster reconfiguration needed. (Readers are ok.) ?
  • 29. What have we learned? ● Failover with DRBD is painful because it is slow. ● Failover with MySQL replication is painful because it's a mess. ● Amazon Dynamo has no failover ● Galera Cluster has no failover but needs cluster reconfiguration. Same thing... ● MySQL NDB Cluster has failover but you can't see it.