4. Topology used with MySQL InnoDB
Cluster Replication
Proxy
mysqlrouter
Application
Cluster
Group
Replication
4
5. Some Group
Replicaiton Backgrond
Two Modes of Operation
Groups can operate in a single-
primary mode with automatic
primary election, where only one
server accepts updates at a time.
Alternatively, for more advanced users,
groups can be deployed in multi-
primary mode, where all servers can
accept updates, even if they are issued
concurrently.
And you can run traditional MySQl
async or semi-sync replication off a
Group Replication Node.
Group Replication works with MySQL 5.7 and MySQl 8.0
You will get better performance out of MySQL 8.0 and you will need to upgrade to a later release if you are running MySQL 5.6
or earlier.
5
7. There is a built-in group membership service
that keeps the view of the group consistent and available for
all servers at any given point in time. Servers can leave and
join the group and the view is updated accordingly. Sometimes
servers can leave the group unexpectedly, in which case the
failure detection mechanism detects this and notifies the
group that the view has changed. This is all automatic.
7
8. Big Idea # 1
✣ The most common way to create a fault-
tolerant system is to resort to making
components redundant
✣
The ultimate challenge is to fuse the logic of
the database and data replication with the logic
of having several servers coordinated in a
consistent and simple way
8
9. Big Idea # 2
✣ MySQL Group Replication provides distributed state
machine replication with strong coordination between
servers. Servers coordinate themselves automatically
when they are part of the same group.
✣ There is a built-in group membership service that
keeps the view of the group consistent and available
for all servers at any given point in time.
9
10. For a transaction to commit,
the majority of the group
have to agree on the order of a
given transaction in the global
sequence of transactions.
Deciding to commit or abort a
transaction is done by each
server individually, but all
servers make the same decision.
Transactions
If there is a network
partition, resulting in a split
where members are unable
to reach agreement, then the
system does not progress
until this issue is resolved.
Hence there is also a
built-in, automatic, split-
brain protection
mechanism.
10
11. Communications
All of this is powered
by the provided Group
Communication
System (GCS)
protocols.
These provide a failure
detection mechanism,
a group membership
service, and safe and
completely ordered
message delivery.
These properties are
key to creating a
system which ensures
that data is consistently
replicated across the
group of servers
At the very core of this technology lies an implementation of the Paxos algorithm. It
acts as the group communication engine.
11
12. What the #%*@ is Paxos!?!?
Paxos is a family of protocols for
solving consensus in a network
of unreliable processors.
Consensus is the process of
agreeing on one result among a
group of participants. This
problem becomes difficult when the
participants or their communication
medium may experience failures.
- Wikipedia
12
13. Consensus
Consensus protocols are the basis for the state
machine replication approach to distributed
computing.
State machine replication is a technique for
converting an algorithm into a fault-tolerant,
distributed implementation.
13
14. Consensus
The Paxos family of protocols includes a spectrum of trade-offs
between the number of processors, number of message delays
before learning the agreed value, the activity level of individual
participants, number of messages sent, and types of failures.
Although no deterministic fault-tolerant consensus protocol can
guarantee progress in an asynchronous network, Paxos
guarantees safety (consistency), and the conditions that could
prevent it from making progress are difficult to provoke.
14
15. The Magic Number
MySQL Group Replication requires a majority of servers to
be active to reach quorum and thus make a decision. This
has direct impact on the number of failures the system can
tolerate without compromising itself and its overall
functionality.
The number of servers (n) needed to tolerate f
failures is then n = 2 x f + 1.
15
18. Primary-Secondary Replication
Traditional MySQL Replication provides a simple Primary-
Secondary approach to replication. There is a primary (master) and
there is one or more secondaries (slaves). The primary executes
transactions, commits them and then they are later (thus
asynchronously) sent to the secondaries to be either re-executed (in
statement-based replication) or applied (in row-based replication). It is a
shared-nothing system, where all servers have a full copy of the data by
default.
18
22. Group Replication is a shared-
nothing replication scheme
where each server has its own
entire copy of the data.
22
23. What Does it Provide?
Group Replication enables you to create fault-
tolerant systems with redundancy by
replicating the system state throughout a set of
servers. Consequently, even if some of the servers
fail, as long it is not all or a majority, the system is
still available, and all it could have degraded
performance or scalability, it is still available.
23
24. What Does it Provide?
Server failures are isolated and independent.
They are tracked by a group membership
service which relies on a distributed failure
detector that is able to signal when any servers
leave the group, either voluntarily or due to an
unexpected halt.
24
25. What Does it Provide?
There is a distributed recovery procedure to ensure
that when servers join the group they are brought up
to date automatically. There is no need for server fail-
over, and the multi-master update everywhere nature
ensures that not even updates are blocked in the
event of a single server failure. Therefore MySQL
Group Replication guarantees that the database
service is continuously available.
25
26. What Does it NOT Provide?
It is important to understand that although the
database service is available, in the event of a
server crash, those clients connected to it must
be redirected, or failed over, to a different
server. This is not something Group Replication
attempts to resolve. A connector, load balancer,
router, or some form of middleware are more
suitable to deal with this issue.
26
28. GR Use Cases
Elastic Replication - Environments that require a very fluid replication infrastructure, where
the number of servers has to grow or shrink dynamically and with as few side-effects as
possible. For instance, database services for the cloud.
Highly Available Shards - Sharding is a popular approach to achieve write scale-out. Use
MySQL Group Replication to implement highly available shards, where each shard maps to a
replication group.
Alternative to Master-Slave replication - In certain situations, using a single master server
makes it a single point of contention. Writing to an entire group may prove more scalable under
certain circumstances.
Autonomic Systems - Additionally, you can deploy MySQL Group Replication purely for the
automation that is built into the replication protocol (described already in this and previous
chapters).
28
30. Failure Detection
There is a failure detection
mechanism provided that is able
to find and report which servers
are silent and as such assumed
to be dead. At a high level, the
failure detector is a distributed
service that provides information
about which servers may be
dead (suspicions).
Later if the group agrees that
the suspicions are probably
true, then the group decides
that a given server has indeed
failed. This means that the
remaining members in the
group take a coordinated
decision to exclude a given
member.
30
31. Failure Detection
Suspicions are triggered
when servers go mute. When
server A does not receive
messages from server B during
a given period, a timeout occurs
and a suspicion is raised.
If a server gets isolated from the rest
of the group, then it suspects that all
others have failed. Being unable to
secure agreement with the group (as
it cannot secure a quorum), its
suspicion does not have
consequences. When a server is
isolated from the group in this way, it
is unable to execute any local
transactions.
31
32. Group Membership
MySQL Group Replication relies on a group
membership service. It defines which servers are
online and participating in the group. The list of
online servers is often referred to as a view.
Therefore, every server in the group has a
consistent view of which are the members
participating actively in the group at a given moment in
time.
32
33. Group Dynamics
Servers have to agree not only
on transaction commits, but
also which is the current view.
Therefore, if servers agree that a
new server becomes part of the
group, then the group itself is
reconfigured to integrate that
server in it, triggering a view
change.
33
The opposite also happens, if a
server leaves the group, voluntarily
or not, then the group dynamically
rearranges its configuration and a
view change is triggered.
34. Group Reconfiguration
When a member leaves voluntarily, it first initiates a dynamic
group reconfiguration. This triggers a procedure, where all
members have to agree on the new view without the
leaving server. However, if a member leaves involuntarily
(for example it has stopped unexpectedly or the network
connection is down) then the failure detection mechanism
realizes this fact and a reconfiguration of the group is
proposed, this one without the failed member.
34
35. An Example GR Cluster
Each of the server instances in a group can run
on an independent physical machine, or on the
same machine. This example shows how to
create a replication group with three MySQL
Server instances on one physical machine. This
means that three data directories are needed,
one per server instance, and that you need to
configure each instance independently.
35
37. MySQl InnoDB Cluster Architecture
37
MySQL InnoDB cluster provides a complete high
availability solution for MySQL.
MySQL Shell includes AdminAPI which enables you
to easily configure and administer a group of at least
three MySQL server instances to function as an
InnoDB cluster.
Each MySQL server instance runs MySQL Group
Replication, which provides the mechanism to
replicate data within InnoDB clusters, with built-in
failover
39. 1. js> c admin@server1
2. js> dba.configureInstance(‘admin@server1’, {‘restart’ : true})
3. js> dba.configureInstance(‘admin@server2’, {‘restart’ : true})
4. js> dba.configureInstance(‘admin@server3’, {‘restart’ : true})
5. js> c admin@server1
6. js> var cluster=dba.createCluster('dfw')
7. js> cluster.addInstance('root@server2’)
8. js> cluster.addInstance('root@server3')
9. $ mysqlrouter &
39
Nine Steps to creating
an InnoDB Cluster
40. Configuring Production Instances
AdminAPI provides the dba.configureInstance() function that checks if an instance is suitably configured for
InnoDB cluster usage, and configures the instance if it finds any settings which are not compatible with InnoDB
cluster.
You run the dba.configureInstance() command against an instance and it checks all of the settings required to
enable the instance to be used for InnoDB cluster usage. If the instance does not require configuration changes,
there is no need to modify the configuration of the instance, and the dba.configureInstance() command output
confirms that the instance is ready for InnoDB cluster usage.
If any changes are required to make the instance compatible with InnoDB cluster, a report of the incompatible
settings is displayed, and you can choose to let the command make the changes to the instance's option file.
Depending on the way MySQL Shell is connected to the instance, and the version of MySQL running on the
instance, you can make these changes permanent by persisting them to a remote instance's option file, see
Persisting Settings. Instances which do not support persisting configuration changes automatically require that you
configure the instance locally, see Configuring Local Instances.
40
43. MySQL Router
mysqlrouter --bootstrap root@localhost:3310 --user=mysqlrouter
Please enter MySQL password for root:
Reconfiguring system MySQL Router instance...
WARNING: router_id 3 not found in metadata
MySQL Router has now been configured for the InnoDB cluster 'dfw'.
The following connection information can be used to connect to the cluster.
Classic MySQL protocol connections to cluster 'dfw':
- Read/Write Connections: localhost:6446
- Read/Only Connections: localhost:6447
X protocol connections to cluster 'dfw':
- Read/Write Connections: localhost:64460
- Read/Only Connections: localhost:64470
43
44. MySQL Router
mysqlrouter --bootstrap root@localhost:3310 --user=mysqlrouter
Please enter MySQL password for root:
Reconfiguring system MySQL Router instance...
WARNING: router_id 3 not found in metadata
MySQL Router has now been configured for the InnoDB cluster 'dfw'.
The following connection information can be used to connect to the cluster.
Classic MySQL protocol connections to cluster 'dfw':
- Read/Write Connections: localhost:6446
- Read/Only Connections: localhost:6447
X protocol connections to cluster 'dfw':
- Read/Write Connections: localhost:64460
- Read/Only Connections: localhost:64470
44
46. Kill An Instance
dba.killSandboxInstance(3310);
MySQL localhost:6446 ssl JS >
dba.killSandboxInstance(3310);
The MySQL sandbox instance on this host in
/var/root/mysql-sandboxes/3310 will be killed
Killing MySQL instance...
Instance localhost:3310 successfully killed.
46
47. Did we switch from 3310 to another
instance?
select @@port;
MySQL localhost:6446 ssl SQL > select
@@port;
+--------+
| @@port |
+--------+
| 3320 |
+--------+
1 row in set (0.0004
47
48. JS > cluster.status();
{
"clusterName": "mycluster",
"defaultReplicaSet": {
"name": "default",
"primary": "localhost:3320",
"ssl": "REQUIRED",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
"topology": {
"localhost:3310": {
"address": "localhost:3310",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)"
},
"localhost:3320": {
"address": "localhost:3320",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"localhost:3330": {
"address": "localhost:3330",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
}
}
},
"groupInformationSourceMember": "mysql://root@localhost:6446"
}
48
So 3310 is gone!
49. JS >dba.startSandboxInstance(3310)
The MySQL sandbox instance on this host in
/var/root/mysql-sandboxes/3310 will be
started
Starting MySQL instance...
Instance localhost:3310 successfully
started.
49
52. What about on the
application side?
Need to reconnect as 3310 is in super read-only
mode and the r/w node is on 3320.
ERROR 1290 (HY000): The MySQL server is
running with the --super-read-only option so it
cannot execute this statement
52
53. Tuning Recovery
Whenever a new member joins a replication group, it connects to a suitable donor and
fetches the data that it has missed up until the point it is declared online. This critical
component in Group Replication is fault tolerant and configurable. The following section explains
how recovery works and how to tune the settings
A random donor is selected from the existing online members in the group. This way there is a
good chance that the same server is not selected more than once when multiple members enter the
group.
If the connection to the selected donor fails, a new connection is automatically attempted to a new
candidate donor. Once the connection retry limit is reached the recovery procedure terminates with
an error.
53
55. Starting MySQLRouter
mysqlrouter --bootstrap root@localhost:3310 --directory /tmp/router
Please enter MySQL password for root:
Bootstrapping MySQL Router instance at '/tmp/router'...
MySQL Router has now been configured for the InnoDB cluster 'dfw'.
The following connection information can be used to connect to the
cluster.
Classic MySQL protocol connections to cluster 'dfw':
- Read/Write Connections: localhost:6446
- Read/Only Connections: localhost:6447
X protocol connections to cluster 'dfw': 55
57. How mysql router works!
57
Applications connect to MySQL Router and not directly to MySQL Server,
and if the connection fails then applications are designed to retry the
connection because MySQL Router selects a new MySQL server after
failed attempts.
This is also called simple redirect connection routing because it requires
the application to retry the connection. That is, if a connection from
MySQL Router to the MySQL server is interrupted, the application
encounters a connection failure. However, a new connection attempt
triggers Router to find and connect to another MySQL server.
58. Router Requirement
MySQL Router usage does not require specific libraries or
interfaces. Aside from managing the MySQL Router instance,
write your application as if MySQL Router was a typical MySQL
instance.
For these reasons, the application should be written to test for
connection errors and, if encountered, retry the connection. If
this technique or one similar is employed in your application
then using MySQL Router will not require any extra effort.
58
59. Does the router impact performance?
Whenever you introduce a component in a communication
stream there will be a certain amount of overhead incurred and
is affected heavily by workload.
Fortunately, performance testing on the current release has
shown approximately 1% within the same speed as a direct
connection for simple redirect connection routing.
59
60. Where do I install MySQL Router?
For best performance, MySQL Router is typically installed on the same
host as the application that uses it. Doing so can decrease network latency,
allow a local unix domain socket connection to the application instead of TCP/IP,
and typically application server's are easiest to scale. But, this is not a
requirement as Router can be installed on any host, even its own.
Note: Unix domain sockets can function with applications connecting to MySQL
Router, but not for MySQL Router connecting to a MySQL Server.
60
61. Deploying in Multi-Primary or Single-
Primary Mode
Single
In this mode the group has a single-
primary server that is set to read-
write mode. All the other members in
the group are set to read-only mode
(with super-read-only=ON ). This
happens automatically. The primary
is typically the first server to
bootstrap the group, all other servers
that join automatically learn about
the primary server and are set to
read only.
61
Multi-Primary:
In multi-primary mode, there is no
notion of a single primary. There is
no need to engage an election
procedure since there is no server
playing any special role.
All servers are set to read-write
mode when joining the group.