High-Availability using MySQL Fabric

1.848 Aufrufe

Veröffentlicht am

MySQL Fabric is an open-source framework for the management of farms of servers. It is designed to be easy to use and available for both small and large server farms.

In order to create a solution that is truly resilient to failures, it is necessary to ensure redundancy of every component in the system and have a solid foundation for detecting and handling failures.

In this session, you will learn how to build a robust
high-availability solution using MySQL Fabric, what components you need and how they should be set up. You will learn how MySQL Fabric handles high-availability of the application servers and how to ensure high-availability of the Fabric system as a whole. You will also learn how to leverage, for example, OpenStack to ensure that the system keeps operating in the presence of failures.

Veröffentlicht in: Technologie
0 Kommentare
6 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

Keine Downloads
Aufrufe
Aufrufe insgesamt
1.848
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
54
Aktionen
Geteilt
0
Downloads
48
Kommentare
0
Gefällt mir
6
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

High-Availability using MySQL Fabric

  1. 1. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | High Availability using MySQL Fabric: Managing Farms of Servers Mats Kindahl Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
  2. 2. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.2 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decision. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Safe Harbor Statement
  3. 3. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.3 Program Agenda  Building reliable systems  MySQL Fabric overview  Managing redundancy  Procedure automation and the Executor  Failure detection and failure handling
  4. 4. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.4 Program Agenda  Using Fabric with existing high-availability setups  Making Fabric highly available  Thoughts for the future  Closing remarks
  5. 5. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.5 Insert Picture Here Building Reliable Systems Insert Picture Here
  6. 6. Copyright © 20135 Oracle and/or its affiliates. All rights reserved.6 Insert Picture Here High-availability is an integral part of designing a reliable system Building for reliability
  7. 7. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.7 What causes downtime? ● System failures ● Hardware faults ● Software bugs ● Disasters ● Maintenance ● User errors
  8. 8. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.8 High-availability concepts ● Redundancy ● Duplicate critical components ● Monitoring ● Detecting failing components ● Monitor load ● Procedures ● Activate replacements ● Distribute load
  9. 9. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.9 High-availability solutions ● Primary-seconday approach ● MySQL Replication ● Shared-nothing clusters ● MySQL Cluster ● MySQL Group Replication (not GA) ● Tightly coupled clusters ● DRBD ● WSFC ● Solaris Clustering ● Oracle Clusterware ● Oracle VM High Availability
  10. 10. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.10 Insert Picture Here MySQL Fabric Overview
  11. 11. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.11 What is MySQL Fabric? An extensible and easy-to-use framework for managing a farm of MySQL servers supporting high- availability and sharding.
  12. 12. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.12 What does it mean? ● Management system ● Manages a MySQL Farm ● Distributed framework ● Framework ● Procedure execution ● State store ● Transaction Routing ● Extensible ● High-availability groups ● Sharding ● Cloud support ● Written in Python ● MySQL 5.6 (or later) ● Open Source ● You can participate
  13. 13. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.13 Birds-eye view MySQL Fabric Node Application Operator High-Availability Groups (Shards)
  14. 14. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.14 MySQL Fabric Components ● Fabric-aware connectors ● Enhanced Connector API ● Python, PHP, Java, .NET, C ● MySQL Fabric controller ● Manage farm meta-data ● Provide status information ● Execute Procedures ● MySQL servers ● Organized in high-availability groups ● Handle application data MySQL Fabric controller node High-availability group Application with Fabric-aware connectors
  15. 15. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.15 MySQL Fabric Controller Architecture XML-RPC MySQL-RPC AMQP Protocol Server XML-RPC MySQL-RPC AMQP Protocol Server Sharding Master-Slave Providers Extensions State Store XML-RPC MySQL-RPC AMQP Protocol Server Executor Model Persistance Fabric Core Requests Events Results Eample only! Eample only!
  16. 16. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.16 Insert Picture Here Managing Redudancy
  17. 17. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.17 High-Availability Group Concept ● Group of servers ● Hardware redundancy ● Data redundancy ● Generic Concept ● Implementation-independent ● Self-managed or externally managed ● Different Types ● Primary-Backup (Master-Slave) ● Shared or Replicated Storage ● MySQL Cluster DRBD ndbdndbd ndbd ndbd Default Eamples Only Not Implemented
  18. 18. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.18 ● Create a logical group for the servers ● Empty initially mysqlfabric group create my_group --description='My Group' Creating a high-availability group Create an empty group
  19. 19. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.19 ● Add servers to group ● Group will have no master ● All servers are secondaries (!) Creating a high-availability group Adding servers to the group mysqlfabric group add my_group server1.example.com mysqlfabric group add my_group server2.example.com mysqlfabric group add my_group server3.example.com
  20. 20. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.20 ● Promote one secondary to primary ● Selects secondary at random ● Specific secondary can be selected Creating a high-availability group Promote a primary mysqlfabric group promote my_group mysqlfabric group promote my_group --slave_id='server1.example.com'
  21. 21. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.21 ● Enable built-in failure detector ● Monitor servers in group Creating a high-availability group Enable failure detector mysqlfabric group activate my_group
  22. 22. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.22 ● On primary failure ● Mark primary as faulty ● Trigger fail-over ● On secondary failure ● Mark secondary as faulty Creating a high-availability group
  23. 23. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.23 Insert Picture Here Procedure Automation and the Executor
  24. 24. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.24 ● Management Procedures ● Fail-over ● Slave promotion ● Shard split ● Triggered on events ● Crashing server ● Administrative decision ● Increasing load ● Resilient execution ● Controller node can crash ● Recover partially executed procedure Automating management of a farm Find Candidate Check Candidate Disable Read-only Process Backlog Re-direct Slaves SLAVE_PROMOTED SERVER_LOST
  25. 25. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.25 MySQL Fabric executor ● Event driven executor ● Events will trigger execution of procedures ● Procedures can trigger events themselves ● Each step of a procedure is called a job ● Procedures ● Written in Python ● Interacts with servers ● Write state changes into backing store ● Lock manager for conflict resolution – Conservative 2PL – Avoid deadlocks Queue Backing Store Events
  26. 26. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.26 Example: keep high-availability profile
  27. 27. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.27 ● Register job for event ● @on_event decorator ● Register job with event ● Fetch group of lost server ● Fetch new server from provider ● Add server to group @on_event(SERVER_LOST) def _add_server(group_id, server_uuid): group = Group.fetch(group_id) machines = PROV.create_machines( parameters ) server = MySQLServer( server_uuid, address ) MySQLServer.add(server) group.add(server) _configure_as_slave(server) Automating adding a server
  28. 28. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.28 ● Before starting a job: ● Aquire the necessary locks ● Checkpoint execution state in backing store ● Start a transaction on the backing store ● When executing job: ● Updates to backing store inside transaction ● Interact with servers ● After executing a job: ● Mark job completed in internal log ● Commit transaction on backing store What about crashes? Queue Backing Store Events MySQL Fabric execution flow
  29. 29. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.29 ● Two types of jobs: ● Idempotent: Restart the job ● Not idempotent: Execute compensation ● Recovery procedure ● Start the executor ● Collect unfinished checkpoints ● Execute compensation actions … if there are any ● Re-schedule each job in checkpoint Queue Backing Store Events MySQL Fabric executor recovery
  30. 30. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.30 Insert Picture Here Failure detection and failure handling
  31. 31. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.31 ● Group level detection ● Fabric node ping servers in group ● Servers need to be managed by Fabric ● On primary failure ● Mark primary as faulty ● Trigger fail-over of connectors and slaves ● On secondary failure ● Mark secondary as faulty Built-in failure detector
  32. 32. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.32 Built-in failure detector Configuration ● Detections ● Number of failed pings before marked as faulty ● Detection Interval ● Interval between server ping, in seconds ● Detection Timeout ● Timeout for ping, in seconds [failure_tracking] detections = 3 detection_interval = 6 detection_timeout = 1
  33. 33. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.33 ● External failure detectors ● Connectors ● Custom failure detectors ● Reporting API ● Error: suspected server failure ● Failure: server is known to have failed ● Reporting server error ● Trigger fail-over if threshold is exceeded ● Reporting server failure ● Trigger immediate fail-over External failure detectors ? ! MySQL Fabric controller node
  34. 34. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.34 External failure detectors Configuration ● Notifications ● Error threshold ● Notification clients ● Threshold for number of unique clients ● Notification interval ● Notification window [failure_tracking] notifications = 300 notification_clients = 50 notification_interval = 60 failover_interval = 0 prune_time = 3600
  35. 35. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.35 External failure detectors Configuration ● Failover interval ● Minimum interval between failovers ● Used to prevent flapping ● Prune time ● Size of error log (in seconds) to keep [failure_tracking] notifications = 300 notification_clients = 50 notification_interval = 60 failover_interval = 0 prune_time = 3600
  36. 36. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.36 Connector as external failure detector ● Error reporting from connector ● Depends on connector support ● Report suspected failures ● Enabling error reporting ● Error reporting off by default ● Avoid a thundering herd ● Do not enable error reporting for all connectors! ● Failing server will cause all connectors to report failure fabric_config = { … 'report_errors': True, … } cnx = connect( … fabric=fabric_config … )
  37. 37. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.37 Connector as external failure detector Error reporting ● Default errors reported ● Extra errors can be added ● extra_failure_report CR_SERVER_LOST CR_SERVER_GONE_ERROR CR_CONN_HOST_ERROR CR_CONNECTION_ERROR CR_IPSOCK_ERROR from mysql.connector.fabric import extra_failure_report extra_failure_report([error1, error2, …, errorn])
  38. 38. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.38 Connector as external failure detector Cache invalidation ● Cache invalidation by default on ● Server Lost (CR_SERVER_LOST) ● Server read-only (ER_OPTION_PREVENTS_STATEMENT) from mysql.connector.fabric import RESET_CACHE_ON_ERROR RESET_CACHE_ON_ERROR.append(error)
  39. 39. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.39 Insert Picture Here Using Fabric with Existing High-availability Setups
  40. 40. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.40 Using Fabric with Existing Solution ● Servers already managed ● Group Based Solutions ● Virtual IP-based solutions ● Fabric as lookup server ● Connectors can route transactions ● Application can retrieve information from Fabric ● Update state store only
  41. 41. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.41 Example: An existing setup ● DRBD for redundancy ● Disk replicated ● Pacemaker for fail-over ● Heartbeat detect failure ● Resource agent handle fail-over ● Fabric as lookup server ● Fabric for routing transactions Secondary Node Primary Node DRBD Replication Pacemaker Pacemaker
  42. 42. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.42 Example: An existing setup Create a group ● Create a group ● Add server to group ● Fabric should only update state store ● “Promote” the DRBD primary to be primary in group mysqlfabric group create my_group mysqlfabric group add my_group server1.example.com --update_only mysqlfabric group promote my_group --update_only --slave_id=...
  43. 43. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.43 Example: An existing setup Update resource agent ● Change resource agent script ● On Ubuntu: /usr/lib/ocf/resource.d/heartbeat/mysql ● Update resource agent actions to inform Fabric ● Remove old server ● Only update the state store mysqlfabric group demote --update_only --slave_id=7bcb0804-... mysqlfabric group remove --update_only 7bcb0804-...
  44. 44. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.44 Example: An existing setup Update resource agent ● Change resource agent script ● On Ubuntu: /usr/lib/ocf/resource.d/heartbeat/mysql ● Update resource agent actions to inform Fabric ● Add standby server ● Only update the state store mysqlfabric group add --update_only standby.example.com mysqlfabric group promote --update_only 8308b0c4-...
  45. 45. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.45 Insert Picture Here Making Fabric highly available
  46. 46. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.46 Making Fabric highly available ● Standard deployment ● Fabric node and state store on same machine ● Need to use TCP – Socket connection not available yet (Bug#71946) ● Three things can fail: ● State store ● Fabric node ● Machine
  47. 47. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.47 Making Fabric highly available Handling state store failure ● If state store connection is lost: ● Fabric retry until state store becomes available ● Ongoing transactions fail ● Fabric report error if connection not recovered “quickly enough” ● Solution: restart state store ● MySQL handle recovery ● Fabric re-connect automatically
  48. 48. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.48 Making Fabric highly available Handling state store failure ● Connection timeout ● Timeout (in seconds) for connection attempt ● Connection attempts ● Number of attempts before reporting state store failed ● Connection interval ● Delay (in seconds) between connection attempts [storage] connection_timeout = 6 connection_attempts = 6 connection_interval = 1
  49. 49. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.49 Making Fabric highly available Handling Fabric controller node failure ● If Fabric node is lost: ● Ongoing jobs fail ● Execution state checkpointed ● On Fabric node restart: ● Execution state recovered ● Solution: restart Fabric node ● Detect failure – Local ping script ● Restart Fabric node – init.d script ● Neither distributed with Fabric
  50. 50. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.50 Making Fabric Highly Available Handling machine failures ● If the machine fails: ● State store is lost ● Fabric node is lost ● Catastrophic failures can prevent machine recovery ● Solution: ● Replicate meta-data ● Detect machine failure ● Activate duplicate deployment
  51. 51. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.51 Making Fabric Highly Available Replicate meta-data ● Replicate state store ● DRBD ● MySQL Cluster ● MySQL Replication ● Configure DRBD ● Version 8.3 or later ● Replicate block device ● Configure MySQL Servers ● Data directory on replicated device
  52. 52. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.52 Making Fabric Highly Available Replicate meta-data ● Active node ● MySQL Fabric ● MySQL Server ● DRBD primary ● Passive node ● DRBD secondary ● Server and Fabric started on fail-over
  53. 53. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.56 Making Fabric Highly Available Detect machine failure & activate replacement ● Detecting machine failure ● Corosync ● Version 2.0 or later ● Activate Replacement ● Pacemaker ● Version 1.1 or later
  54. 54. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.57 Making Fabric Highly Available Detect machine failure & activate replacement ● Configure MySQL Fabric ● State store in DRBD volume ● Configure Corosync ● Set no-quorum-policy to 'ignore' – Prevent remaining node to shut down ● Turn off STONITH – Node will commit suicide
  55. 55. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.58 Making Fabric Highly Available Detect machine failure & activate replacement ● Configure Pacemaker ● Add MySQL Fabric resource agent ● Colocate Fabric, DRBD, and MySQL and order them ● Avoiding split-brain ● Reliably detect network partition ● Ping reliable resource – Example: Router
  56. 56. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.61 Insert Picture Here Closing Remarks & Ideas for the Future
  57. 57. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.62 Multi-Node Fabric Replicated State Machine ● Multiple Fabric Nodes ● Built-in support ● Fail-over ● Local read instance ● Distributed execution ● Replicated State Machine ● Coordinate procedure execution ● Automatic fail-over ● Paxos or Raft-like implementation
  58. 58. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.63 More Flexibility ● Server Providers ● Amazon AWS ● Kubernetes? ● Built-in high-availability group types ● DRBD ● MySQL Cluster ● Amazon RDS?
  59. 59. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.64 MySQL Fabric Resources Useful links ● Download and try ● http://dev.mysql.com/downloads/utilities/ ● MySQL Fabric Documentation ● http://dev.mysql.com/doc/mysql-utilities/1.5/en/fabric.html ● Forum (MySQL Fabric, Sharding, HA, Utilities) ● http://forums.mysql.com/list.php?144
  60. 60. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.65 MySQL Fabric Resources Blogs ● MySQL High-Availability ● http://mysqlhighavailability.com ● Mats Kindahl ● http://mysqlmusings.blogspot. com ● Alfranio Correia ● http://alfranio-distributed.blog spot.com ● Narayanan Venkateswaran ● http://vnwrites.blogspot.com
  61. 61. Copyright © 2015, Oracle and/or its affiliates. All rights reserved.66 Thank You!

×