SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Binlog Servers
at
Booking.com
Jean-François Gagné
jeanfrancois DOT gagne AT booking.com
Presented at Oracle Open World 2015
Booking.com
2
Booking.com‟
● Based in Amsterdam since 1996
● Online Hotel and Accommodation Agent:
● 170 offices worldwide
● +819.000 properties in 221 countries
● 42 languages (website and customer service)
● Part of the Priceline Group
● And we use MySQL:
● Thousands (1000s) of servers, ~85% replicating
● >110 masters: ~25 >50 slaves & ~8 >100 slaves
3
Binlog Server: Session Summary
1. Replication and the Binlog Server
2. Extreme Read Scaling
3. Remote Site Replication (and Disaster Recovery)
4. Easy High Availability
5. Other Use-Cases
(Crash Safety, Parallel Replication and Backups)
6. Binlog Servers at Booking.com
7. New master without touching slaves
4
Binlog Server: Replication
● One master / one or more slaves
● The master records all writes in a journal: the binary logs
● Each slave:
● Downloads the journal and saves it locally (IO thread): relay logs
● Executes the relay logs on the local database (SQL thread)
● Could produce binary logs to be itself a master (log-slave-updates)
● Replication is:
● Asynchronous  lag
● Single threaded (in MySQL 5.6)  slower than the master
5
Binlog Server: Booking.com‟‟
● Typical replication deployment:
+---+
| M |
+---+
|
+------+-- ... --+---------------+-------- ...
| | | |
+---+ +---+ +---+ +---+
| S1| | S2| | Sn| | M1|
+---+ +---+ +---+ +---+
|
+-- ... --+
| |
+---+ +---+
| T1| | Tm|
+---+ +---+
● Si and Tj are for read scaling
● Mi are the DR master
6
Intermediate Master
:-(
Binlog Server: What
● Binlog Server (BLS): is a daemon that:
● Downloads the binary logs from the master
● Saves them identically as on the master
● Serves them to slaves
● A or X are the same for B and C:
● By design, the binary logs served by A and X are the same
7
+---+ / 
| A | ---> / X 
+---+ -----
| |
+---+ +---+
| B | | C |
+---+ +---+
Binlog Server: Read Scaling
● Typical replication topology for read scaling:
+---+
| M |
+---+
|
+--------+--------+--- ... ---+
| | | |
+---+ +---+ +---+ +---+
| S1| | S2| | S3| | Sn|
+---+ +---+ +---+ +---+
● When there are too many slaves, the network of M is overloaded:
● 100 slaves x 1Mbit/s: very close to 1Gbit/s
● OSC or purging data in RBR becomes hard
 Slave lag or unreachable master for writes
8
Binlog Server: Read Scaling‟
● Typical solution: fan-out with Intermediary Masters (IM):
+---+
| M |
+---+
|
+--------+- ... -+
| | |
+---+ +---+ +---+
| M1| | M2| | Mm|
+---+ +---+ +---+
| | |
+-- ... +- ... +-- ... --+
| | | |
+---+ +---+ +---+ +---+
| S1| | T1| | Z1| | Zi|
+---+ +---+ +---+ +---+
9
● But Intermediate Masters bring problems:
● log-slave-updates  IM are slower than slaves
● Lag of an IM  all its slaves are lagging
● Rogue transaction on IM  infection of all its slave
● Failure of an IM  all its slaves stop replicating
(and action must be taken fast)
Binlog Server: Read Scaling‟‟
● Solving IM problems with shared disk:
● Filers (expensive) or DRBD (doubling the number of servers)
● sync_binlog = 1 + trx_commit = 1  slower replication  lag
● After a crash of an Intermediate Master:
● we need InnoDB recovery  replication on slaves stalled  lag
● and the cache is cold  replication will be slow  lag
● Solving IM problems with GTIDs:
● They allow slave repointing at the cost of added complexity :-|
● But they do not completely solve the lag problem :-(
● And we cannot migrate online with MySQL 5.6 :-( :-(
10
Binlog Server: Read Scaling‟‟‟
● New Solution: replace IM by Binlog Servers
+---+
| M |
+---+
|
+----------------+----- ... -----+
| | |
/  /  / 
/ I1 / I2 / Im
----- ----- -----
| | |
+------+ ... +--- ... +--- ... ---+
| | | | |
+---+ +---+ +---+ +---+ +---+
| S1| | S2| | Si| | Sj| | Sn|
+---+ +---+ +---+ +---+ +---+
11
● BLS should not lag
● If a BLS fails, repoint its slaves
to other BLSs (easy by design)
Binlog Server: Remote Site
● Typical deployment for remote site:
+---+
| A |
+---+
|
+------+------+---------------+
| | | |
+---+ +---+ +---+ +---+
| B | | C | | D | | E |
+---+ +---+ +---+ +---+
|
+------+------+
| | |
+---+ +---+ +---+
| F | | G | | H |
+---+ +---+ +---+
12
Intermediate Master
:-(
E is an Intermediate Master
 same problems as read scaling
Binlog Server: Remote Site‟
● Ideally, we would like this:
+---+
| A |
+---+
|
+------+------+---------------+------+------+------+
| | | | | | |
+---+ +---+ +---+ +---+ +---+ +---+ +---+
| B | | C | | D | | E | | F | | G | | H |
+---+ +---+ +---+ +---+ +---+ +---+ +---+
● No lag and no Single Point of Failure (SPOF)
● But no master on remote site for writes (easy solvable problem)
● And expensive in WAN bandwidth (harder problem to solve)
13
Binlog Server: Remote Site‟‟
● New solution: a Binlog Server on the remote site:
+---+
| A |
+---+
|
+------+------+---------------+
| | | |
+---+ +---+ +---+ / 
| B | | C | | D | / X 
+---+ +---+ +---+ -----
|
+------+------+------+
| | | |
+---+ +---+ +---+ +---+
| E | | F | | G | | H |
+---+ +---+ +---+ +---+
14
Binlog Server: Remote Site‟‟‟
● Or deploy 2 Binlog Servers to get better resilience:
+---+
| A |
+---+
|
+------+------+---------------+
| | | |
+---+ +---+ +---+ /  / 
| B | | C | | D | / X  ------> / Y 
+---+ +---+ +---+ ----- -----
| |
+------+ +------+
| | | |
+---+ +---+ +---+ +---+
| E | | F | | G | | H |
+---+ +---+ +---+ +---+
15
● If Y fails, repoint G and H to X,
● If X fails, repoint Y to A and E and F to Y
Binlog Server: Remote Site‟‟‟ ‟
● Interesting property: if A fails, E, F, G & H converge to a common state
+---+
| A |
+---+
|
+------+------+---------------+
| | | |
+---+ +---+ +---+ / 
| B | | C | | D | / X 
+---+ +---+ +---+ -----
|
+------+------+------+
| | | |
+---+ +---+ +---+ +---+
| E | | F | | G | | H |
+---+ +---+ +---+ +---+
● New master promotion is easy on remote site
16
Binlog Server: Remote Site‟‟‟ ‟‟
● Step by step master promotion:
1. The 1st slave that is up to date can be the new master
2. “SHOW MASTER STATUS” or “RESET MASTER”,
and “RESET SLAVE ALL” on the new master
3. Writes can be pointed to the new master
4. Once a slave is up to date, repoint it
to the new master at the position of step # 2
5. Keep delayed/lagging slaves under X until up to date
6. Once no slaves is left under X, recycle it as
a Binlog Server for the new master
17
/ 
/ X 
-----
|
+-------------+
|
+---+
| G |
+---+ +---+
| F |
+---+
|
+------+
| |
+---+ +---+
| E | | H |
+---+ +---+
Binlog Server: High Availability
● This property can be used for high availability:
+---+
| A |
+---+
|
|
/ 
/ X 
-----
|
+------+------+------+------+------+------+------+
| | | | | | | |
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
| B | | C | | D | | E | | F | | G | | H | | I |
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
18
Binlog Server: Other Use-Cases
● Better Crash-Safe Replication
● http://blog.booking.com/better_crash_safe_replication_for_mysql.html
● Better Parallel Replication in MySQL 5.7 (LOGICAL_CLOCK)
● http://blog.booking.com/better_parallel_replication_for_mysql.html
● Easier Point in Time Recovery
● http://jfg-mysql.blogspot.com/
2015/10/binlog-servers-for-backups-and-point-in-time-recovery.html
19
Binlog Server: Better // Replication
● Four transactions on X, Y and Z:
+---+
| X |
+---+
|
V
+---+
| Y |
+---+
|
V
+---+
| Z |
+---+
● IM might stall the parallel replication pipeline
● To benefit from parallel replication, IM must disappear
● The Binlog Server allows exactly that
20
On Y:
----Time---->
B---C
B---C
B-------C
B-------C
On Z:
----Time--------->
B---C
B---C
B-------C
B-------C
On X:
----Time---->
T1 B---C
T2 B---C
T3 B-------C
T4 B-------C
Binlog Server: Point in Time Recovery
● Implementing Point in Time Recovery means:
● to regularly take a backup of the database,
● and to save the binary logs of that database.
● Executing Point in Time Recovery means:
● Restoring the backup,
● Applying the binary logs.
+---+ /  +---+
| M | ----> / X  ----> | S |
+---+ ----- +---+
21
BLS@Booking.com
● Reminder: typical deployment at Booking.com:
+---+
| M |
+---+
|
+--- ... ---+ ... +-------------+--- ...
| | | |
+---+ +---+ +---+ +---+
| S1| | Sj| | Sn| | M1|
+---+ +---+ +---+ +---+
|
+-- ... --+
| |
+---+ +---+
| T1| | To|
+---+ +---+
22
BLS@Booking.com‟
● We are deploying Binlog Server Clusters to offload masters:
+---+
| M |
+---+
|
+-----+-----+ ... +-------------+--- ...
| | | | |
/  /  +---+ +---+ +---+
/ A1 / A2 | Sj| | Sn| | M1|
----- ----- +---+ +---+ +---+
| | |
+-+-----+-+ ... -+-----+
| | | |
+---+ +---+ /  / 
| S1| ... | Si| / Z1 / Z2
+---+ +---+ ----- -----
23
● We have in production:
● >40 Binlog Servers
● >20 BLS Clusters
● >650 slaves replicating
from Binlog Servers
BLS@Booking.com‟‟
● What is a Binlog Server Cluster ?
● At least 2 Binlog Servers
● Replicating from the same master
● With independent failure mode (not same switch/rack/…)
● With a Service DNS entry resolving to all IP addresses
 Failure of a BLS transparent to slaves
● Thanks to DNS, the slaves connected to a failing
Binlog Server reconnect to the others
 Easy maintenance/upgrade of a Binlog Server
24
|
+-----+
| |
/  / 
/ A1 / A2
----- -----
| |
+-+-----+-+
| |
+---+ +---+
| S1| ... | Si|
+---+ +---+
BLS@Booking.com‟‟‟
● We are deploying BLS side-by-side with IM to reduce delay:
+---+
| M |
+---+
|
+-----+-----+ ... +-------------+-----+----------- ...
| | | | | |
/  /  +---+ +---+ +---+ / -->/ 
/ A1 / A2 | Sj| | Sn| | M1| / B1 / B2
----- ----- +---+ +---+ +---+ ----- -----
| | | | |
+-+-----+-+ ... +-+-----+-+
| | | |
+---+ +---+ +---+ +---+
| S1| ... | Si| | Tk| ... | To|
+---+ +---+ +---+ +---+
25
BLS@Booking.com‟‟‟ ‟
● We are deploying a new Data Center without IM:
+---+
| M |
+---+
|
+-----+-----+ ... +-------------+-----+---------------------+
| | | | | | |
/  /  +---+ +---+ +---+ / -->/  / -->/ 
/ A1 / A2 | Sj| | Sn| | M1| / B1 / B2 / C1 / C2
----- ----- +---+ +---+ +---+ ----- ----- ----- -----
| | | | | | |
+-+-----+-+ ... +-+-----+-+ +-+-----+-+
| | | | | |
+---+ +---+ +---+ +---+ +---+ +---+
| S1| ... | Si| | Tk| ... | To| | U1| ... | Up|
+---+ +---+ +---+ +---+ +---+ -----
26
HA with Binlog Servers
● Distributed Binlog Serving Service (DBSS):
+---+
| M |
+---+
|
+----+----------------------------------------------------------+
| |
+----+-------+--------------+-------+--------------+-------+----+
| | | | | |
+---+ +---+ +---+ +---+ +---+ +---+
| S1|...| Sn| | T1|...| Tm| | U1|...| Uo|
+---+ +---+ +---+ +---+ +---+ +---+
● Properties:
● A single Binlog Server failure does not disrupt the service (resilience)
● Minimise inter Data Center bandwidth requirements
● Allows to promote a new master without touching any slave
27
HA with Binlog Servers‟
● Zoom in DBSS:
|
+----|---------------------------------------------------------+
| | |
| +----------------------+---------------------+ |
| | | | |
| /  /  /  |
| / --->/  / --->/  / --->/  |
| ----- /  ----- /  ----- /  |
| | ----- | ----- | ----- |
+----|-------|--------------|-------|-------------|-------|----+
| | | | | |
28
HA with Binlog Servers‟‟
● Crash of the master:
+--------------------------------------------------------------+
| |
| |
| |
| /  /  /  |
| / --->/  / --->/  / --->/  |
| ----- /  ----- /  ----- /  |
| | ----- | ----- | ----- |
+----|-------|--------------|-------|-------------|-------|----+
| | | | | |
29
HA with Binlog Servers‟‟
● Crash of the master:
● Step # 1: level the Binlog Servers (the slaves will follow)
+--------------------------------------------------------------+
| |
| |
| |
| /  <----------------- /  ----------------> /  |
| / --->/  / --->/  / --->/  |
| ----- /  ----- /  ----- /  |
| | ----- | ----- | ----- |
+----|-------|--------------|-------|-------------|-------|----+
| | | | | |
30
HA with Binlog Servers‟‟‟
● Crash of the master:
● Step # 2: promote a slave as the new master (there is a trick)
|
+-------------------------------------------------|------------+
| | |
| +----------------------+---------------------+ |
| | | | |
| /  /  /  |
| / --->/  / --->/  / --->/  |
| ----- /  ----- /  ----- /  |
| | ----- | ----- | ----- |
+----|-------|--------------|-------|-------------|-------|----+
| | | | | |
31
HA with Binlog Servers‟‟‟ ‟
● Crash of the master - the trick:
● Needs the same binary log filename on master and slaves
1. “FLUSH BINARY LOGS” on candidate master
until its binary log filename follows the one available on the BLSs
2. On the new master:
● “PURGE BINARY LOGS TO „<latest binary log file>'”
● “RESET SLAVE ALL”
3. Point the writes to the new master
4. Make the Binlog Servers replicate from the new master
● From the point of view of the Binlog Server, the master only rebooted
with a new ServerID and a new UUID.
32
New Master wo Touching Slaves
+---+
| M |
+---+
|
+----+----------------------------------------------------+
| |
+----+-------+-----------+-------+-----------+-------+----+
| | | | | |
+---+ +---+ +---+ +---+ +---+ +---+
| S1|...| Sn| | T1|...| Tm| | U1|...| Uo|
+---+ +---+ +---+ +---+ +---+ +---+
33
New Master wo Touching Slaves
34
+-/+
| X |
+/-+
+---------------------------------------------------------+
| |
+----+-------+-----------+-------+-----------+-------+----+
| | | | | |
+---+ +---+ +---+ +---+ +---+ +---+
| S1|...| Sn| | T1|...| Tm| | U1|...| Uo|
+---+ +---+ +---+ +---+ +---+ +---+
New Master wo Touching Slaves
35
+-/+ +---+
| X | | T1|
+/-+ +---+
|
+------------------------+--------------------------------+
| |
+----+-------+-----------+-------+-----------+-------+----+
| | | | | |
+---+ +---+ +---+ +---+ +---+ +---+
| S1|...| Sn| | T2|...| Tm| | U1|...| Uo|
+---+ +---+ +---+ +---+ +---+ +---+
New Master wo Touching Slaves‟
36
● “FLUSH BINARY LOGS” in a loop is ugly (but it works)
● A “RESET MASTER at/to „binlog.00xxxx‟”
would be much nicer:
● https://bugs.mysql.com/bug.php?id=77438
Binlog Servers with Orchestrator
37
● Orchestrator is the tool we use for managing Binlog Servers
● https://github.com/orchestrator/orchestrator
Binlog Server: Links
● http://blog.booking.com/mysql_slave_scaling_and_more.html
● http://blog.booking.com/abstracting_binlog_servers_and_mysql_master_promotion_wo_reconfig
uring_slaves.html
● HOWTO Install and Configure Binlog Servers:
http://jfg-mysql.blogspot.com/2015/04/maxscale-binlog-server-howto-install-and-configure.html
● http://blog.booking.com/better_crash_safe_replication_for_mysql.html
● http://blog.booking.com/better_parallel_replication_for_mysql.html
(http://blog.booking.com/evaluating_mysql_parallel_replication_2-slave_group_commit.html)
● http://jfg-mysql.blogspot.nl/2015/10/binlog-servers-for-backups-and-point-in-time-recovery.html
● Note: the Binlog Servers concept should work with any version of
MySQL (5.7, 5.6, 5.5 and 5.1)
38
Questions
Jean-François Gagné
jeanfrancois DOT gagne AT booking.com

Weitere ähnliche Inhalte

Andere mochten auch

Avoiding the ring of death
Avoiding the ring of deathAvoiding the ring of death
Avoiding the ring of deathAishvarya Verma
 
Media magazine covers
Media magazine covers Media magazine covers
Media magazine covers rjenkinss
 
Basic Knowledge on MySql Replication
Basic Knowledge on MySql ReplicationBasic Knowledge on MySql Replication
Basic Knowledge on MySql ReplicationTasawr Interactive
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group ReplicationBogdan Kecman
 
A/B Testing - In data we trust
A/B Testing - In data we trustA/B Testing - In data we trust
A/B Testing - In data we trustPedro Marques
 
Riding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication StreamRiding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication StreamJean-François Gagné
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASAshnikbiz
 
Mix ‘n’ Match Async and Group Replication for Advanced Replication Setups
Mix ‘n’ Match Async and Group Replication for Advanced Replication SetupsMix ‘n’ Match Async and Group Replication for Advanced Replication Setups
Mix ‘n’ Match Async and Group Replication for Advanced Replication SetupsPedro Gomes
 
New awesome features in MySQL 5.7
New awesome features in MySQL 5.7New awesome features in MySQL 5.7
New awesome features in MySQL 5.7Zhaoyang Wang
 
MySQL Day Paris 2016 - State Of The Dolphin
MySQL Day Paris 2016 - State Of The DolphinMySQL Day Paris 2016 - State Of The Dolphin
MySQL Day Paris 2016 - State Of The DolphinOlivier DASINI
 

Andere mochten auch (11)

Historia 2
Historia 2Historia 2
Historia 2
 
Avoiding the ring of death
Avoiding the ring of deathAvoiding the ring of death
Avoiding the ring of death
 
Media magazine covers
Media magazine covers Media magazine covers
Media magazine covers
 
Basic Knowledge on MySql Replication
Basic Knowledge on MySql ReplicationBasic Knowledge on MySql Replication
Basic Knowledge on MySql Replication
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group Replication
 
A/B Testing - In data we trust
A/B Testing - In data we trustA/B Testing - In data we trust
A/B Testing - In data we trust
 
Riding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication StreamRiding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication Stream
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
 
Mix ‘n’ Match Async and Group Replication for Advanced Replication Setups
Mix ‘n’ Match Async and Group Replication for Advanced Replication SetupsMix ‘n’ Match Async and Group Replication for Advanced Replication Setups
Mix ‘n’ Match Async and Group Replication for Advanced Replication Setups
 
New awesome features in MySQL 5.7
New awesome features in MySQL 5.7New awesome features in MySQL 5.7
New awesome features in MySQL 5.7
 
MySQL Day Paris 2016 - State Of The Dolphin
MySQL Day Paris 2016 - State Of The DolphinMySQL Day Paris 2016 - State Of The Dolphin
MySQL Day Paris 2016 - State Of The Dolphin
 

Mehr von Jean-François Gagné

MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)Jean-François Gagné
 
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorAlmost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorJean-François Gagné
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyJean-François Gagné
 
The consequences of sync_binlog != 1
The consequences of sync_binlog != 1The consequences of sync_binlog != 1
The consequences of sync_binlog != 1Jean-François Gagné
 
Autopsy of a MySQL Automation Disaster
Autopsy of a MySQL Automation DisasterAutopsy of a MySQL Automation Disaster
Autopsy of a MySQL Automation DisasterJean-François Gagné
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentJean-François Gagné
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentJean-François Gagné
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyJean-François Gagné
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyJean-François Gagné
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialJean-François Gagné
 
MySQL Parallel Replication by Booking.com
MySQL Parallel Replication by Booking.comMySQL Parallel Replication by Booking.com
MySQL Parallel Replication by Booking.comJean-François Gagné
 
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...Jean-François Gagné
 
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitationsMySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitationsJean-François Gagné
 
The two little bugs that almost brought down Booking.com
The two little bugs that almost brought down Booking.comThe two little bugs that almost brought down Booking.com
The two little bugs that almost brought down Booking.comJean-François Gagné
 
How Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagHow Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagJean-François Gagné
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsJean-François Gagné
 

Mehr von Jean-François Gagné (17)

MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
 
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorAlmost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
The consequences of sync_binlog != 1
The consequences of sync_binlog != 1The consequences of sync_binlog != 1
The consequences of sync_binlog != 1
 
Autopsy of a MySQL Automation Disaster
Autopsy of a MySQL Automation DisasterAutopsy of a MySQL Automation Disaster
Autopsy of a MySQL Automation Disaster
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated Environment
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated Environment
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
MySQL Parallel Replication by Booking.com
MySQL Parallel Replication by Booking.comMySQL Parallel Replication by Booking.com
MySQL Parallel Replication by Booking.com
 
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
 
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitationsMySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
 
The two little bugs that almost brought down Booking.com
The two little bugs that almost brought down Booking.comThe two little bugs that almost brought down Booking.com
The two little bugs that almost brought down Booking.com
 
Autopsy of an automation disaster
Autopsy of an automation disasterAutopsy of an automation disaster
Autopsy of an automation disaster
 
How Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagHow Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lag
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
 

Kürzlich hochgeladen

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Binlog Servers at Booking.com

  • 1. Binlog Servers at Booking.com Jean-François Gagné jeanfrancois DOT gagne AT booking.com Presented at Oracle Open World 2015
  • 3. Booking.com‟ ● Based in Amsterdam since 1996 ● Online Hotel and Accommodation Agent: ● 170 offices worldwide ● +819.000 properties in 221 countries ● 42 languages (website and customer service) ● Part of the Priceline Group ● And we use MySQL: ● Thousands (1000s) of servers, ~85% replicating ● >110 masters: ~25 >50 slaves & ~8 >100 slaves 3
  • 4. Binlog Server: Session Summary 1. Replication and the Binlog Server 2. Extreme Read Scaling 3. Remote Site Replication (and Disaster Recovery) 4. Easy High Availability 5. Other Use-Cases (Crash Safety, Parallel Replication and Backups) 6. Binlog Servers at Booking.com 7. New master without touching slaves 4
  • 5. Binlog Server: Replication ● One master / one or more slaves ● The master records all writes in a journal: the binary logs ● Each slave: ● Downloads the journal and saves it locally (IO thread): relay logs ● Executes the relay logs on the local database (SQL thread) ● Could produce binary logs to be itself a master (log-slave-updates) ● Replication is: ● Asynchronous  lag ● Single threaded (in MySQL 5.6)  slower than the master 5
  • 6. Binlog Server: Booking.com‟‟ ● Typical replication deployment: +---+ | M | +---+ | +------+-- ... --+---------------+-------- ... | | | | +---+ +---+ +---+ +---+ | S1| | S2| | Sn| | M1| +---+ +---+ +---+ +---+ | +-- ... --+ | | +---+ +---+ | T1| | Tm| +---+ +---+ ● Si and Tj are for read scaling ● Mi are the DR master 6 Intermediate Master :-(
  • 7. Binlog Server: What ● Binlog Server (BLS): is a daemon that: ● Downloads the binary logs from the master ● Saves them identically as on the master ● Serves them to slaves ● A or X are the same for B and C: ● By design, the binary logs served by A and X are the same 7 +---+ / | A | ---> / X +---+ ----- | | +---+ +---+ | B | | C | +---+ +---+
  • 8. Binlog Server: Read Scaling ● Typical replication topology for read scaling: +---+ | M | +---+ | +--------+--------+--- ... ---+ | | | | +---+ +---+ +---+ +---+ | S1| | S2| | S3| | Sn| +---+ +---+ +---+ +---+ ● When there are too many slaves, the network of M is overloaded: ● 100 slaves x 1Mbit/s: very close to 1Gbit/s ● OSC or purging data in RBR becomes hard  Slave lag or unreachable master for writes 8
  • 9. Binlog Server: Read Scaling‟ ● Typical solution: fan-out with Intermediary Masters (IM): +---+ | M | +---+ | +--------+- ... -+ | | | +---+ +---+ +---+ | M1| | M2| | Mm| +---+ +---+ +---+ | | | +-- ... +- ... +-- ... --+ | | | | +---+ +---+ +---+ +---+ | S1| | T1| | Z1| | Zi| +---+ +---+ +---+ +---+ 9 ● But Intermediate Masters bring problems: ● log-slave-updates  IM are slower than slaves ● Lag of an IM  all its slaves are lagging ● Rogue transaction on IM  infection of all its slave ● Failure of an IM  all its slaves stop replicating (and action must be taken fast)
  • 10. Binlog Server: Read Scaling‟‟ ● Solving IM problems with shared disk: ● Filers (expensive) or DRBD (doubling the number of servers) ● sync_binlog = 1 + trx_commit = 1  slower replication  lag ● After a crash of an Intermediate Master: ● we need InnoDB recovery  replication on slaves stalled  lag ● and the cache is cold  replication will be slow  lag ● Solving IM problems with GTIDs: ● They allow slave repointing at the cost of added complexity :-| ● But they do not completely solve the lag problem :-( ● And we cannot migrate online with MySQL 5.6 :-( :-( 10
  • 11. Binlog Server: Read Scaling‟‟‟ ● New Solution: replace IM by Binlog Servers +---+ | M | +---+ | +----------------+----- ... -----+ | | | / / / / I1 / I2 / Im ----- ----- ----- | | | +------+ ... +--- ... +--- ... ---+ | | | | | +---+ +---+ +---+ +---+ +---+ | S1| | S2| | Si| | Sj| | Sn| +---+ +---+ +---+ +---+ +---+ 11 ● BLS should not lag ● If a BLS fails, repoint its slaves to other BLSs (easy by design)
  • 12. Binlog Server: Remote Site ● Typical deployment for remote site: +---+ | A | +---+ | +------+------+---------------+ | | | | +---+ +---+ +---+ +---+ | B | | C | | D | | E | +---+ +---+ +---+ +---+ | +------+------+ | | | +---+ +---+ +---+ | F | | G | | H | +---+ +---+ +---+ 12 Intermediate Master :-( E is an Intermediate Master  same problems as read scaling
  • 13. Binlog Server: Remote Site‟ ● Ideally, we would like this: +---+ | A | +---+ | +------+------+---------------+------+------+------+ | | | | | | | +---+ +---+ +---+ +---+ +---+ +---+ +---+ | B | | C | | D | | E | | F | | G | | H | +---+ +---+ +---+ +---+ +---+ +---+ +---+ ● No lag and no Single Point of Failure (SPOF) ● But no master on remote site for writes (easy solvable problem) ● And expensive in WAN bandwidth (harder problem to solve) 13
  • 14. Binlog Server: Remote Site‟‟ ● New solution: a Binlog Server on the remote site: +---+ | A | +---+ | +------+------+---------------+ | | | | +---+ +---+ +---+ / | B | | C | | D | / X +---+ +---+ +---+ ----- | +------+------+------+ | | | | +---+ +---+ +---+ +---+ | E | | F | | G | | H | +---+ +---+ +---+ +---+ 14
  • 15. Binlog Server: Remote Site‟‟‟ ● Or deploy 2 Binlog Servers to get better resilience: +---+ | A | +---+ | +------+------+---------------+ | | | | +---+ +---+ +---+ / / | B | | C | | D | / X ------> / Y +---+ +---+ +---+ ----- ----- | | +------+ +------+ | | | | +---+ +---+ +---+ +---+ | E | | F | | G | | H | +---+ +---+ +---+ +---+ 15 ● If Y fails, repoint G and H to X, ● If X fails, repoint Y to A and E and F to Y
  • 16. Binlog Server: Remote Site‟‟‟ ‟ ● Interesting property: if A fails, E, F, G & H converge to a common state +---+ | A | +---+ | +------+------+---------------+ | | | | +---+ +---+ +---+ / | B | | C | | D | / X +---+ +---+ +---+ ----- | +------+------+------+ | | | | +---+ +---+ +---+ +---+ | E | | F | | G | | H | +---+ +---+ +---+ +---+ ● New master promotion is easy on remote site 16
  • 17. Binlog Server: Remote Site‟‟‟ ‟‟ ● Step by step master promotion: 1. The 1st slave that is up to date can be the new master 2. “SHOW MASTER STATUS” or “RESET MASTER”, and “RESET SLAVE ALL” on the new master 3. Writes can be pointed to the new master 4. Once a slave is up to date, repoint it to the new master at the position of step # 2 5. Keep delayed/lagging slaves under X until up to date 6. Once no slaves is left under X, recycle it as a Binlog Server for the new master 17 / / X ----- | +-------------+ | +---+ | G | +---+ +---+ | F | +---+ | +------+ | | +---+ +---+ | E | | H | +---+ +---+
  • 18. Binlog Server: High Availability ● This property can be used for high availability: +---+ | A | +---+ | | / / X ----- | +------+------+------+------+------+------+------+ | | | | | | | | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | B | | C | | D | | E | | F | | G | | H | | I | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ 18
  • 19. Binlog Server: Other Use-Cases ● Better Crash-Safe Replication ● http://blog.booking.com/better_crash_safe_replication_for_mysql.html ● Better Parallel Replication in MySQL 5.7 (LOGICAL_CLOCK) ● http://blog.booking.com/better_parallel_replication_for_mysql.html ● Easier Point in Time Recovery ● http://jfg-mysql.blogspot.com/ 2015/10/binlog-servers-for-backups-and-point-in-time-recovery.html 19
  • 20. Binlog Server: Better // Replication ● Four transactions on X, Y and Z: +---+ | X | +---+ | V +---+ | Y | +---+ | V +---+ | Z | +---+ ● IM might stall the parallel replication pipeline ● To benefit from parallel replication, IM must disappear ● The Binlog Server allows exactly that 20 On Y: ----Time----> B---C B---C B-------C B-------C On Z: ----Time---------> B---C B---C B-------C B-------C On X: ----Time----> T1 B---C T2 B---C T3 B-------C T4 B-------C
  • 21. Binlog Server: Point in Time Recovery ● Implementing Point in Time Recovery means: ● to regularly take a backup of the database, ● and to save the binary logs of that database. ● Executing Point in Time Recovery means: ● Restoring the backup, ● Applying the binary logs. +---+ / +---+ | M | ----> / X ----> | S | +---+ ----- +---+ 21
  • 22. BLS@Booking.com ● Reminder: typical deployment at Booking.com: +---+ | M | +---+ | +--- ... ---+ ... +-------------+--- ... | | | | +---+ +---+ +---+ +---+ | S1| | Sj| | Sn| | M1| +---+ +---+ +---+ +---+ | +-- ... --+ | | +---+ +---+ | T1| | To| +---+ +---+ 22
  • 23. BLS@Booking.com‟ ● We are deploying Binlog Server Clusters to offload masters: +---+ | M | +---+ | +-----+-----+ ... +-------------+--- ... | | | | | / / +---+ +---+ +---+ / A1 / A2 | Sj| | Sn| | M1| ----- ----- +---+ +---+ +---+ | | | +-+-----+-+ ... -+-----+ | | | | +---+ +---+ / / | S1| ... | Si| / Z1 / Z2 +---+ +---+ ----- ----- 23 ● We have in production: ● >40 Binlog Servers ● >20 BLS Clusters ● >650 slaves replicating from Binlog Servers
  • 24. BLS@Booking.com‟‟ ● What is a Binlog Server Cluster ? ● At least 2 Binlog Servers ● Replicating from the same master ● With independent failure mode (not same switch/rack/…) ● With a Service DNS entry resolving to all IP addresses  Failure of a BLS transparent to slaves ● Thanks to DNS, the slaves connected to a failing Binlog Server reconnect to the others  Easy maintenance/upgrade of a Binlog Server 24 | +-----+ | | / / / A1 / A2 ----- ----- | | +-+-----+-+ | | +---+ +---+ | S1| ... | Si| +---+ +---+
  • 25. BLS@Booking.com‟‟‟ ● We are deploying BLS side-by-side with IM to reduce delay: +---+ | M | +---+ | +-----+-----+ ... +-------------+-----+----------- ... | | | | | | / / +---+ +---+ +---+ / -->/ / A1 / A2 | Sj| | Sn| | M1| / B1 / B2 ----- ----- +---+ +---+ +---+ ----- ----- | | | | | +-+-----+-+ ... +-+-----+-+ | | | | +---+ +---+ +---+ +---+ | S1| ... | Si| | Tk| ... | To| +---+ +---+ +---+ +---+ 25
  • 26. BLS@Booking.com‟‟‟ ‟ ● We are deploying a new Data Center without IM: +---+ | M | +---+ | +-----+-----+ ... +-------------+-----+---------------------+ | | | | | | | / / +---+ +---+ +---+ / -->/ / -->/ / A1 / A2 | Sj| | Sn| | M1| / B1 / B2 / C1 / C2 ----- ----- +---+ +---+ +---+ ----- ----- ----- ----- | | | | | | | +-+-----+-+ ... +-+-----+-+ +-+-----+-+ | | | | | | +---+ +---+ +---+ +---+ +---+ +---+ | S1| ... | Si| | Tk| ... | To| | U1| ... | Up| +---+ +---+ +---+ +---+ +---+ ----- 26
  • 27. HA with Binlog Servers ● Distributed Binlog Serving Service (DBSS): +---+ | M | +---+ | +----+----------------------------------------------------------+ | | +----+-------+--------------+-------+--------------+-------+----+ | | | | | | +---+ +---+ +---+ +---+ +---+ +---+ | S1|...| Sn| | T1|...| Tm| | U1|...| Uo| +---+ +---+ +---+ +---+ +---+ +---+ ● Properties: ● A single Binlog Server failure does not disrupt the service (resilience) ● Minimise inter Data Center bandwidth requirements ● Allows to promote a new master without touching any slave 27
  • 28. HA with Binlog Servers‟ ● Zoom in DBSS: | +----|---------------------------------------------------------+ | | | | +----------------------+---------------------+ | | | | | | | / / / | | / --->/ / --->/ / --->/ | | ----- / ----- / ----- / | | | ----- | ----- | ----- | +----|-------|--------------|-------|-------------|-------|----+ | | | | | | 28
  • 29. HA with Binlog Servers‟‟ ● Crash of the master: +--------------------------------------------------------------+ | | | | | | | / / / | | / --->/ / --->/ / --->/ | | ----- / ----- / ----- / | | | ----- | ----- | ----- | +----|-------|--------------|-------|-------------|-------|----+ | | | | | | 29
  • 30. HA with Binlog Servers‟‟ ● Crash of the master: ● Step # 1: level the Binlog Servers (the slaves will follow) +--------------------------------------------------------------+ | | | | | | | / <----------------- / ----------------> / | | / --->/ / --->/ / --->/ | | ----- / ----- / ----- / | | | ----- | ----- | ----- | +----|-------|--------------|-------|-------------|-------|----+ | | | | | | 30
  • 31. HA with Binlog Servers‟‟‟ ● Crash of the master: ● Step # 2: promote a slave as the new master (there is a trick) | +-------------------------------------------------|------------+ | | | | +----------------------+---------------------+ | | | | | | | / / / | | / --->/ / --->/ / --->/ | | ----- / ----- / ----- / | | | ----- | ----- | ----- | +----|-------|--------------|-------|-------------|-------|----+ | | | | | | 31
  • 32. HA with Binlog Servers‟‟‟ ‟ ● Crash of the master - the trick: ● Needs the same binary log filename on master and slaves 1. “FLUSH BINARY LOGS” on candidate master until its binary log filename follows the one available on the BLSs 2. On the new master: ● “PURGE BINARY LOGS TO „<latest binary log file>'” ● “RESET SLAVE ALL” 3. Point the writes to the new master 4. Make the Binlog Servers replicate from the new master ● From the point of view of the Binlog Server, the master only rebooted with a new ServerID and a new UUID. 32
  • 33. New Master wo Touching Slaves +---+ | M | +---+ | +----+----------------------------------------------------+ | | +----+-------+-----------+-------+-----------+-------+----+ | | | | | | +---+ +---+ +---+ +---+ +---+ +---+ | S1|...| Sn| | T1|...| Tm| | U1|...| Uo| +---+ +---+ +---+ +---+ +---+ +---+ 33
  • 34. New Master wo Touching Slaves 34 +-/+ | X | +/-+ +---------------------------------------------------------+ | | +----+-------+-----------+-------+-----------+-------+----+ | | | | | | +---+ +---+ +---+ +---+ +---+ +---+ | S1|...| Sn| | T1|...| Tm| | U1|...| Uo| +---+ +---+ +---+ +---+ +---+ +---+
  • 35. New Master wo Touching Slaves 35 +-/+ +---+ | X | | T1| +/-+ +---+ | +------------------------+--------------------------------+ | | +----+-------+-----------+-------+-----------+-------+----+ | | | | | | +---+ +---+ +---+ +---+ +---+ +---+ | S1|...| Sn| | T2|...| Tm| | U1|...| Uo| +---+ +---+ +---+ +---+ +---+ +---+
  • 36. New Master wo Touching Slaves‟ 36 ● “FLUSH BINARY LOGS” in a loop is ugly (but it works) ● A “RESET MASTER at/to „binlog.00xxxx‟” would be much nicer: ● https://bugs.mysql.com/bug.php?id=77438
  • 37. Binlog Servers with Orchestrator 37 ● Orchestrator is the tool we use for managing Binlog Servers ● https://github.com/orchestrator/orchestrator
  • 38. Binlog Server: Links ● http://blog.booking.com/mysql_slave_scaling_and_more.html ● http://blog.booking.com/abstracting_binlog_servers_and_mysql_master_promotion_wo_reconfig uring_slaves.html ● HOWTO Install and Configure Binlog Servers: http://jfg-mysql.blogspot.com/2015/04/maxscale-binlog-server-howto-install-and-configure.html ● http://blog.booking.com/better_crash_safe_replication_for_mysql.html ● http://blog.booking.com/better_parallel_replication_for_mysql.html (http://blog.booking.com/evaluating_mysql_parallel_replication_2-slave_group_commit.html) ● http://jfg-mysql.blogspot.nl/2015/10/binlog-servers-for-backups-and-point-in-time-recovery.html ● Note: the Binlog Servers concept should work with any version of MySQL (5.7, 5.6, 5.5 and 5.1) 38