Weitere ähnliche Inhalte Ähnlich wie Solving MySQL replication problems with Tungsten (20) Mehr von Giuseppe Maxia (20) Kürzlich hochgeladen (20) Solving MySQL replication problems with Tungsten1. Using Tungsten Replicator
to solve replication
problems
Neil Armitage, Cluster implementation Engineer, Continuent
Giuseppe Maxia, QA Director, Continuent
©Continuent 2012. 1
Monday, December 03, 12 1
2. ABOUT US
• Neil Armitage
• Continuent Tungsten Deployment and Support
Engineer, Continuent, Inc
• 20 years development and DB experience
• Giuseppe Maxia, a.k.a. "The Data Charmer"
• QA Director, Continuent, Inc
• 25 years development and DB experience
• long timer MySQL community member. Oracle
ACE Director
©Continuent 2012. 2
Monday, December 03, 12 2
3. What are we talking about?
• Requirements
• Components
• Installation
• Topologies
• Administration
• Troubleshooting
©Continuent 2012. 3
Monday, December 03, 12 3
4. Tungsten Replicator Concepts
Replicator The replication engine
Role Master, slave, direct slave
service A.k.a. "pipeline"
stage extract,queue,apply
©Continuent 2012. 4
Monday, December 03, 12 4
5. Tungsten Replicator Components
THL Transaction History Log
service schema Makes the node crash proof
properties file service definition
Ruling from a centralized
tools
location
©Continuent 2012. 5
Monday, December 03, 12 5
6. Tungsten Replicator in a nutshell
host1
host2
master
slave
binlog THL
global
transaction ID THL
trep_commit_seqno
origin seqno eventid
trep_commit_seqno
origin seqno eventid
©Continuent 2012. 6
Monday, December 03, 12 6
7. Planning
• Hosts
• Topology
• Stand-alone or taking over
©Continuent 2012. 7
Monday, December 03, 12 7
8. master-slave Heterogeneous Direct slave
Regular
MySQL
MySQL Oracle
Master
Oracle MySQL
fan-in slave all-masters star-schema
©Continuent 2012.
Monday, December 03, 12 8
9. Installation
©Continuent 2012. 9
Monday, December 03, 12 9
10. Installation
• System Requirements
• Validate !rst
• Deploying from a single location
©Continuent 2012. 10
Monday, December 03, 12 10
11. Installation - tools
• tools/ tungsten-installer
• tools/ con!gure-service
• tools/update
• (Using the cookbook recipes, you hardly see
them)
©Continuent 2012. 11
Monday, December 03, 12 11
13. Installation
• Check the requirements
• Get the binaries
• Expand the tarball
• Run cookbook
©Continuent 2012. 13
Monday, December 03, 12 13
14. REQUIREMENTS
• Java JRE or JDK (Sun/Oracle or Open-jdk)
• Ruby 1.8 (only during installation)
• ssh access to the same user in all nodes
• MySQL user with all privileges
©Continuent 2012. 14
Monday, December 03, 12 14
16. master-slave
host2
host1
master slave
THL
binlog THL
host3
slave
THL
©Continuent 2012. 16
Monday, December 03, 12 16
17. direct
host2
host1
slave
master
relay log
THL
binlog
host3
slave
relay log
THL
©Continuent 2012. 17
Monday, December 03, 12 17
18. • Overview of VM’s
• overview of cookbook
©Continuent 2012. 18
Monday, December 03, 12 18
19. Requirements : how to
• step by step: how it happened
©Continuent 2012. 19
Monday, December 03, 12 19
20. installing VMs
• Step-by-step demo
©Continuent 2012. 20
Monday, December 03, 12 20
22. tungsten cookbook
tungsten-replicator-2.0.7-xx
|
+--/cluster-home
+--/cookbook
+--/tools
+--/tungsten-replicator
©Continuent 2012. 22
Monday, December 03, 12 22
23. tungsten cookbook
tungsten-replicator-2.0.7-xx
|
+--/cookbook
|
+--COMMON_NODES.sh
+--USER_VALUES.sh
+--NODES_MASTER_SLAVE.sh
+--show_master_slave.sh
+--test_master_slave.sh
+--clear_cluster_master_slave.sh
...
©Continuent 2012. 23
Monday, December 03, 12 23
24. tungsten cookbook
tungsten-replicator-2.0.7-xx
|
+--/cookbook
|
+--COMMON_NODES.sh
+--USER_VALUES.sh
+--NODES_STAR.sh
+--show_star.sh
+--test_star.sh
+--clear_cluster_star.sh
...
©Continuent 2012. 24
Monday, December 03, 12 24
25. tungsten cookbook
tungsten-replicator-2.0.7-xx
|
+--/cookbook
|
+--COMMON_NODES.sh
+--USER_VALUES.sh
+--NODES_ALL_MASTERS.sh
+--show_all_masters.sh
+--test_all_masters.sh
+--clear_cluster_all_masters.sh
...
©Continuent 2012. 25
Monday, December 03, 12 25
26. tungsten cookbook
tungsten-replicator-2.0.7-xx
|
+--/cookbook
|
+--COMMON_NODES.sh
+--USER_VALUES.sh
+--NODES_FAN_IN.sh
+--show_fan_in.sh
+--test_fan_in.sh
+--clear_cluster_fan_in.sh
...
©Continuent 2012. 26
Monday, December 03, 12 26
27. tungsten cookbook
$ cat COMMON_NODES.sh
export NODE1=host1
export NODE2=host2
export NODE3=host3
export NODE4=host4
©Continuent 2012. 27
Monday, December 03, 12 27
28. tungsten cookbook
$ cat USER_VALUES.sh
# User defined values for the cluster to be
installed.
export TUNGSTEN_BASE=$HOME/installs/cookbook
export DATABASE_USER=tungsten
export BINLOG_DIRECTORY=/var/lib/mysql
export MY_CNF=/etc/my.cnf
export DATABASE_PASSWORD=secret
export DATABASE_PORT=3306
export TUNGSTEN_SERVICE=cookbook
export RMI_PORT=10000
export THL_PORT=2112
export START_OPTION=start
©Continuent 2012. 28
Monday, December 03, 12 28
29. sample master-slave installation
• edit cookbook/COMMON_NODES.sh
• edit cookbook/USER_VALUES.sh
• run cookbook/install_master_slave.sh
• and then:
• run cookbook/show_master_slave.sh
• run cookbook/test_master_slave.sh
©Continuent 2012. 29
Monday, December 03, 12 29
30. What does the installation do
1: Validate all servers
host4 host1 host2 host3
Report all errors
©Continuent 2012. 30
Monday, December 03, 12 30
31. What does the installation do
1: (again) Validate all servers
host4 host1 host2 host3
©Continuent 2012. 31
Monday, December 03, 12 31
32. What does the installation do
2: install Tungsten in all servers
host4
$HOME/ host1
tinstall/ host2
config/ host3
releases/
relay/
thl/
tungsten/
©Continuent 2012. 32
Monday, December 03, 12 32
33. example (from manual installation)
ssh r2 chmod 444 $HOME/tinstall
./tools/tungsten-installer
--master-slave --master-host=r1
--datasource-user=tungsten
--datasource-password=secret
--service-name=dragon
--home-directory=$HOME/tinstall
--thl-directory=$HOME/tinstall/logs
--relay-directory=$HOME/tinstall/relay
--cluster-hosts=r1,r2,r3,r4 --start
ERROR >> qa.r2.continuent.com >> /home/tungsten/
tinstall is not writeable
©Continuent 2012. 33
Monday, December 03, 12 33
34. example
ssh r2 chmod 755 $HOME/tinstall
./tools/tungsten-installer
--master-slave --master-host=r1
--datasource-user=tungsten
--datasource-password=secret
--service-name=dragon
--home-directory=$HOME/tinstall
--thl-directory=$HOME/tinstall/logs
--relay-directory=$HOME/tinstall/relay
--cluster-hosts=r1,r2,r3,r4 --start
# no errors
©Continuent 2012. 34
Monday, December 03, 12 34
35. Master - to - master
• Bi-directional installation
• Operational steps:
• install a master service in both servers
• install the corresponding slave service in the
other server
master-alpha
slave-alpha
slave-bravo
master-bravo
©Continuent 2012. 35
Monday, December 03, 12 35
36. BI-DIR: the painless way
• edit cookbook/COMMON_NODES.sh
• edit cookbook/USER_VALUES.sh
• remove two nodes
• edit the variables in cookbook/
NODES_ALL_MASTERS.sh
• cookbook/install_all_masters.sh
©Continuent 2012. 36
Monday, December 03, 12 36
37. A master-alpha
slave-bravo
B master-bravo all-masters
slave-charlie
slave-alpha
A slave-delta
slave-charlie
slave-delta
B
D C
D master-delta C master-charlie
slave-alpha slave-alpha
slave-bravo slave-bravo
slave-charlie slave-delta
©Continuent 2012. 37
Monday, December 03, 12 37
38. Multiple masters
• fan-in
• Steps:
• install a master service in each node
• install a slave service for each master in the fan-
in node
• or :
• cookbook/install_fan_in.sh
©Continuent 2012. 38
Monday, December 03, 12 38
39. Master - to - master
• star schema
• Steps:
• install a master service in each server
• in the hub, install a slave service for each spoke
• in each spoke, install a slave service for the hub,
using bypass option
• cookbook/install_star.sh
©Continuent 2012. 39
Monday, December 03, 12 39
40. Taking Over from Standard
Replication
• cookbook/install_standard_replicaton.sh
• cookbook/takeover.sh
©Continuent 2012. 40
Monday, December 03, 12 40
43. Common Commands
• replicator
• trepctl
• thl
• the Tungsten service schema
©Continuent 2012. 43
Monday, December 03, 12 43
44. replicator
• It’s the service provider
• You launch it once when you start
• You may restart it when you change con!g
©Continuent 2012. 44
Monday, December 03, 12 44
45. trepctl
• Tungsten Replicator ConTroLler
• It’s the driving seat for your replication
• You can start, update, and stop services
• You can get speci!c info
©Continuent 2012. 45
Monday, December 03, 12 45
46. trepctl
• Tungsten Replicator Controller
• put services online or o"ine
• check status
• skip events
• inspect internals
• change roles
• heartbeat
• backup/restore
• ... and a lot more
©Continuent 2012. 46
Monday, December 03, 12 46
47. thl
• Transaction History List
• Gives you access to the Tungsten relay logs
©Continuent 2012. 47
Monday, December 03, 12 47
48. thl
• Transaction History Log
• info
• index
• list (total or a speci!c event, or by range)
• purge
©Continuent 2012. 48
Monday, December 03, 12 48
49. Tungsten service schema
• one for each service
• named "tungsten_SERVICE_NAME"
• e.g. tungsten_alpha, tungsten_dragon
• Most important table: trep_commit_seqno
©Continuent 2012. 49
Monday, December 03, 12 49
50. Looking at the tungsten service db
select * from tungsten_dragon.trep_commit_seqnoG
******************* 1. row *******************
task_id: 0
seqno: 102
fragno: 0
last_frag: 1
source_id: qa.r1.continuent.com
epoch_number: 0
eventid: mysql-bin.
000002:0000000000018903;0
applied_latency: 0
update_timestamp: 2012-02-06 05:56:12
shard_id: tungsten_dragon
extract_timestamp: 2012-02-06 05:56:09
©Continuent 2012. 50
Monday, December 03, 12 50
51. Where are the tools
in the tungsten directory:
$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin
replicator # the daemon
trepctl # replicator controller
thl # transaction history log tool
©Continuent 2012. 51
Monday, December 03, 12 51
52. Starting and stopping the replicator
cd $TUNGSTEN_BASE/tungsten/tungsten-replicator/bin
./replicator status
Tungsten Replicator Service is running (PID:32400).
./replicator stop
Stopping Tungsten Replicator Service...
Stopped Tungsten Replicator Service.
./replicator start
Starting Tungsten Replicator Service...
©Continuent 2012. 52
Monday, December 03, 12 52
53. checking replicator vitals
trepctl services
Processing services command...
NAME VALUE
---- -----
appliedLastSeqno: -1 # bad sign?
appliedLatency : -1.0
role : slave
serviceName : dragon
serviceType : local
started : true
state : ONLINE
Finished services command...
©Continuent 2012. 53
Monday, December 03, 12 53
54. sending a heartbeat
trepctl -host $MASTER_HOST heartbeat
trepctl services
Processing services command...
NAME VALUE
---- -----
appliedLastSeqno: 102
appliedLatency : 3.139
role : slave
serviceName : dragon
serviceType : local
started : true
state : ONLINE
Finished services command...
©Continuent 2012. 54
Monday, December 03, 12 54
55. replicator status (1)
trepctl status
Processing status command...
NAME VALUE
---- -----
appliedLastEventId : mysql-bin.000002:0000000000018903;0
appliedLastSeqno : 102
appliedLatency : 3.139
clusterName : default
currentEventId : NONE
currentTimeMillis : 1328504342058
dataServerHost : qa.r4.continuent.com
extensions :
latestEpochNumber : 0
masterConnectUri : thl://qa.r1.continuent.com:2112/
masterListenUri : thl://qa.r4.continuent.com:2112/
maximumStoredSeqNo : 102
minimumStoredSeqNo : 0
[...]
©Continuent 2012. 55
Monday, December 03, 12 55
56. replicator status (2)
[...]
offlineRequests : NONE
pendingError : NONE
pendingErrorCode : NONE
pendingErrorEventId : NONE
pendingErrorSeqno : -1
pendingExceptionMessage: NONE
resourcePrecedence : 99
rmiPort : 10000
role : slave
seqnoType : java.lang.Long
serviceName : dragon
serviceType : local
simpleServiceName : dragon
siteName : default
sourceId : qa.r4.continuent.com
state : ONLINE
timeInStateSeconds : 245.215
uptimeSeconds : 245.539
Finished status command...
©Continuent 2012. 56
Monday, December 03, 12 56
57. A failover scenario
1: MySQL native replication
©Continuent 2012. 57
Monday, December 03, 12 57
58. 1. one Master, two slaves
• Loading the “employees” test database
©Continuent 2012. 58
Monday, December 03, 12 58
59. 2. Master goes away
* Stop replication
* Slaves are updated at di#erent levels
# 2
select count(*) from titles
333,145
# 3
select count(*) from titles
443,308
©Continuent 2012. 59
Monday, December 03, 12 59
60. 3. Look into Slave #2 binary logs
• !nd the last transaction
©Continuent 2012. 60
Monday, December 03, 12 60
61. 4. Look into Slave #3 binary logs
1. !nd the transaction that was last in slave #2
2. Recognize that last transaction in the log of
slave #3 (This can actually take you a
LOOOONG TIME)
3. Get the position immediately after this
transaction
4. (e.g. 134000 in !le mysql-bin.000018)
©Continuent 2012. 61
Monday, December 03, 12 61
62. 5. promote Slave #3 to master
* in slave #2
CHANGE MASTER TO
master_host=‘slave_3_IP’,
master_user=‘slavename’,
master_password=‘slavepassword’,
master_log_file=‘mysql-bin.000018’,
master_log_pos=134000;
©Continuent 2012. 62
Monday, December 03, 12 62
63. A failover scenario
1I: Tungsten Replicator
©Continuent 2012. 63
Monday, December 03, 12 63
64. 1. one master, two slaves
• loading the ‘employees’ test database
©Continuent 2012. 64
Monday, December 03, 12 64
65. 2. Master goes away
* Stop replication
* Slaves are updated at di#erent levels
# 2
select count(*) from titles
333,145
# 3
select count(*) from titles
443,308
©Continuent 2012. 65
Monday, December 03, 12 65
66. 3. no need to !nd the last transaction
# simply change roles
trepctl -host slave3 setrole -role master
trepctl -host slave2 setrole
-role slave -uri thl://slave3
trepctl -host slave3 online
State: ONLINE
trepctl -host slave2 online
State: GOING-ONLINE:SYNCHRONIZING
©Continuent 2012. 66
Monday, December 03, 12 66
67. 4. Check that the slave has
synchronized
# new master
select seqno from tungsten.trep_commit_seqno;
78
# new slave
select seqno from tungsten.trep_commit_seqno;
64
©Continuent 2012. 67
Monday, December 03, 12 67
68. 4. Tell the replicator to hurry up
# new master
trepctl -node slave3 flush
Master log is synchronized with database at log
sequence number: 78
# new slave
trepctl host slave2 wait -applied 78
ONLINE
select seqno from tungsten.trep_commit_seqno;
78
©Continuent 2012. 68
Monday, December 03, 12 68
69. 4. ... and we’re done
# new master
select count(*) from employees.titles
count(*)
443308
# new slave:
count(*)
443308
©Continuent 2012. 69
Monday, December 03, 12 69
70. planned role switch
cookbook/install_master_slave.sh
cookbook/switch
©Continuent 2012. 70
Monday, December 03, 12 70
71. Switch: 0
✔ online
A master
M:A M:A
slave B C slave
✔ online ✔ online
©Continuent 2012. 71
Monday, December 03, 12 71
72. Switch: 1
✗ o"ine
A master
Master goes
o"ine
M:A M:A
slave B C slave
✔ online ✔ online
©Continuent 2012. 72
Monday, December 03, 12 72
73. Switch: 2
✗ o"ine
Wait for
A master
latest
transaction
to be applied
M:A M:A
slave B C slave
✔ online ✔ online
©Continuent 2012. 73
Monday, December 03, 12 73
74. Switch: 3
✗ o"ine
A master
slaves go
o"ine
M:A M:A
slave B C slave
✗ o"ine ✗ o"ine
©Continuent 2012. 74
Monday, December 03, 12 74
75. Switch: 4
✗ o"ine
A master Slave is
promoted.
Notice: 2
masters, but
o"ine
M:A
slave B C master
✗ o"ine ✗ o"ine
©Continuent 2012. 75
Monday, December 03, 12 75
76. Switch: 5
✗ o"ine old master
M:C
and
A slave
remaining
slave are
linked to new
master
M:C
slave B C master
✗ o"ine ✗ o"ine
©Continuent 2012. 76
Monday, December 03, 12 76
77. Switch: 6
✔ online old master
M:C
and
A slave
remaining
slave connect
to new
master
M:C
slave B C master
✔ online ✔ online
©Continuent 2012. 77
Monday, December 03, 12 77
78. Tungsten GTID vs MySQL 5.6 GTID
• What is GTID
• How it works in Tungsten
• How it works (or not) in MySQL 5.6
©Continuent 2012. 78
Monday, December 03, 12 78
79. without global transaction ID
binlog
commit
commit
commit
commit
A master
position
binlog
binlog
slave B C slave
position 79
position
©Continuent 2012.
Monday, December 03, 12 79
80. with global transaction ID
commit
commit
commit
commit
A master
id#200
slave B C slave
©Continuent 2012. id#200 80 id#200
Monday, December 03, 12 80
81. Tungsten and global transaction ID:
activation
(none)
active by default
©Continuent 2012. 81
Monday, December 03, 12 81
82. Tungsten and global transaction ID:
status
trepctl status
Processing status command...
NAME VALUE
---- -----
appliedLastEventId : mysql-bin.000002:0000000000001442;0
appliedLastSeqno : 6
appliedLatency : 0.862
clusterName : default
currentEventId : NONE
currentTimeMillis : 1354304680923
dataServerHost : qa.r4.continuent.com
©Continuent 2012. 82
Monday, December 03, 12 82
83. Tungsten and global transaction ID:
seeing transactions
thl list -seqno 6
SEQ# = 6 / FRAG# = 0 (last frag)
- TIME = 2012-11-30 20:44:35.0
- EPOCH# = 0
- EVENTID = mysql-bin.000002:0000000000001442;0
- SOURCEID = qa.r1.continuent.com
- SQL(0) = insert into test.v1 values (1, 'inserted
by node #1') /* ___SERVICE___ = [cookbook] */
©Continuent 2012. 83
Monday, December 03, 12 83
84. Tungsten and global transaction ID:
changing master connection
trepctl offline
trepctl online -seqno 105
©Continuent 2012. 84
Monday, December 03, 12 84
85. Tungsten and Global transaction ID:
crash-safe slave tables
mysql -e 'select * from tungsten_cookbook.trep_commit_seqnoG'
*************************** 1. row ***************************
task_id: 0
seqno: 6
fragno: 0
last_frag: 1
source_id: qa.r1.continuent.com
epoch_number: 0
eventid: mysql-bin.000002:0000000000001442;0
applied_latency: 0
update_timestamp: 2012-11-30 20:44:35
shard_id: test
extract_timestamp: 2012-11-30 20:44:35
©Continuent 2012. 85
Monday, December 03, 12 85
86. Tungsten and Global transaction ID:
crash-safe tables and parallel replication
mysql -e 'select seqno, source_id, shard_id,update_timestamp from
tungsten_cookbook.trep_commit_seqno'
+-------+----------------------+----------+---------------------+
| seqno | source_id | shard_id | update_timestamp |
+-------+----------------------+----------+---------------------+
| 7 | qa.r1.continuent.com | db1 | 2012-11-30 20:54:14 |
| 8 | qa.r1.continuent.com | db2 | 2012-11-30 20:54:14 |
| 9 | qa.r1.continuent.com | db3 | 2012-11-30 20:54:14 |
| 10 | qa.r1.continuent.com | db4 | 2012-11-30 20:54:14 |
| 11 | qa.r1.continuent.com | db5 | 2012-11-30 20:54:14 |
| 12 | qa.r1.continuent.com | db6 | 2012-11-30 20:54:14 |
| 13 | qa.r1.continuent.com | db7 | 2012-11-30 20:54:14 |
| 14 | qa.r1.continuent.com | db8 | 2012-11-30 20:54:14 |
| 15 | qa.r1.continuent.com | db9 | 2012-11-30 20:54:14 |
| 16 | qa.r1.continuent.com | db10 | 2012-11-30 20:54:14 |
+-------+----------------------+----------+---------------------+
©Continuent 2012. 86
Monday, December 03, 12 86
87. MySQL 5.6 and global transaction ID
activation
mysqld --log-slave-updates
--gtid-mode=on
--disable-gtid-unsafe-statements
WARNING: in MySQL 5.6.9, this will change to
--enforce-gtid-consistency
©Continuent 2012. 87
Monday, December 03, 12 87
88. MySQL 5.6 and global transaction ID
status
show slave statusG
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 13233
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 1837
Relay_Log_File: mysql_sandbox13234-relay-bin.000005
Relay_Log_Pos: 2047
Relay_Master_Log_File: mysql-bin.000002
...
Retrieved_Gtid_Set: 46E13434-3B28-11E2-BF47-6C626DA07446:1-7
Executed_Gtid_Set: 46E13434-3B28-11E2-BF47-6C626DA07446:1-7
©Continuent 2012. 88
Monday, December 03, 12 88
89. MySQL 5.6 and global transaction ID
changing master connection
CHANGE MASTER TO master_log_file='mysql-bin-000003',
master_log_pos='1234'
# No global transaction ID is used
©Continuent 2012. 89
Monday, December 03, 12 89
90. MySQL 5.6 and global transaction ID
crash-safe slave table
select * from slave_relay_log_infoG
********************* 1. row ********************
Number_of_lines: 7
Relay_log_name: ./mysql_sandbox13234-relay-bin.000005
Relay_log_pos: 2047
Master_log_name: mysql-bin.000002
Master_log_pos: 1837
Sql_delay: 0
Number_of_workers: 5
Id: 1
# NO Global transaction ID is used!
©Continuent 2012. 90
Monday, December 03, 12 90
91. MySQL 5.6 and global transaction ID
crash-safe slave table + parallel
select * from mysql.slave_worker_infoG
Id: 12
Relay_log_name: ./mysql_sandbox13234-relay-bin.000007
Relay_log_pos: 4299
Master_log_name: mysql-bin.000002
Master_log_pos: 7155
Checkpoint_relay_log_name: ./mysql_sandbox13234-relay-bin.000007
Checkpoint_relay_log_pos: 1786
Checkpoint_master_log_name: mysql-bin.000002
Checkpoint_master_log_pos: 4642
Checkpoint_seqno: 9
Checkpoint_group_size: 64
Checkpoint_group_bitmap: ?
# NO Global transaction ID is used!
©Continuent 2012. 91
Monday, December 03, 12 91
92. Filters
©Continuent 2012. 92
Monday, December 03, 12 92
93. Replicator Pipeline Architecture
Tungsten Replicator Process
Pipeline
Stage Stage Stage
Assign
Extract Shard Apply Extract Apply Extract Apply
ID
Transaction
History Log
MySQL THL
Binlog Slave
DBMS
Monday, December 03, 12 93
94. Replicator Pipeline Architecture
Tungsten Replicator Process
Pipeline
Stage Stage Stage
Assign
Extract Shard Apply Extract Apply Extract Apply
ID
Transaction
History Log
MySQL filter THL filter
Binlog Slave
DBMS
Monday, December 03, 12 93
95. Restrict replication to some schemas
and tables
./tools/tungsten-installer
--master-slave -a
...
--svc-extractor-filters=replicate
"--property=replicator.filter.replicate.do=test,*.foo"
...
--start-and-report
# test="test.*" -> same drawback as binlog-do-db in MySQL
# *.foo = table 'foo' in any database
# employees.dept_codes,employees.salaries => safest way
©Continuent 2012. 94
Monday, December 03, 12 94
96. Exclude some schemas and tables
from replication
./tools/tungsten-installer
--master-slave -a
...
--svc-extractor-filters=replicate
"--property=replicator.filter.replicate.ignore=test,*.foo"
...
--start-and-report
# test="test.*" -> same drawback as binlog-ignore-db in MySQL
# *.foo = table 'foo' in any database
# employees.dept_codes,employees.salaries => safest way
# DO NOT MIX .do and .ignore!
# (you can do it, but it may not do what you mean)
©Continuent 2012. 95
Monday, December 03, 12 95
97. Change name of replicated schema
-a --svc-applier-filters=dbtransform
--property=replicator.filter.dbtransform.from_regex1=stores
--property=replicator.filter.dbtransform.to_regex1=playground
# from_regex1=stores -> name of the schema in the master
# to_regex1=playground -> name of the schema in the slave
# WARNING: requires "USE schema_name" to work properly.
©Continuent 2012. 96
Monday, December 03, 12 96
98. Multi-master:
Con$ict prevention
©Continuent 2012. 97
Monday, December 03, 12 97
101. Con$ict prevention facts
• Sharded by database
• De!ned dynamically
©Continuent 2012. 98
Monday, December 03, 12 98
102. Con$ict prevention facts
• Sharded by database
• De!ned dynamically
• Applied either at the master or at the slave
©Continuent 2012. 98
Monday, December 03, 12 98
103. Con$ict prevention facts
• Sharded by database
• De!ned dynamically
• Applied either at the master or at the slave
• methods:
©Continuent 2012. 98
Monday, December 03, 12 98
104. Con$ict prevention facts
• Sharded by database
• De!ned dynamically
• Applied either at the master or at the slave
• methods:
• make replication fail
©Continuent 2012. 98
Monday, December 03, 12 98
105. Con$ict prevention facts
• Sharded by database
• De!ned dynamically
• Applied either at the master or at the slave
• methods:
• make replication fail
• drop silently
©Continuent 2012. 98
Monday, December 03, 12 98
106. Con$ict prevention facts
• Sharded by database
• De!ned dynamically
• Applied either at the master or at the slave
• methods:
• make replication fail
• drop silently
• drop with warning
©Continuent 2012. 98
Monday, December 03, 12 98
110. Prevention methods: Fail on master
host1
host3
M S
S
host2 host4
master
S M S
©Continuent 2012. 100
Monday, December 03, 12 100
111. INSERT A
x,y
Prevention methods: Fail on master
host1
host3
M S
S
host2 host4
master
S M S
INSERT A
©Continuent 2012.
x,z 100
Monday, December 03, 12 100
112. INSERT A
x,y
Prevention methods: Fail on master
host1
host3
M S
S
host2 host4
master
S M S
INSERT A
©Continuent 2012.
x,z 100
Monday, December 03, 12 100
113. Prevention methods: Fail on slave
host1
host3
M S
S
host2 host4
master
S M S
©Continuent 2012. 101
Monday, December 03, 12 101
114. INSERT A
x,y
Prevention methods: Fail on slave
host1
host3
M S
S
host2 host4
master
S M S
INSERT A
©Continuent 2012.
x,z 101
Monday, December 03, 12 101
115. INSERT A
x,y
Prevention methods: Fail on slave
host1
host3
M S
S
host2 host4
master
S M S
INSERT A
©Continuent 2012.
x,z 101
Monday, December 03, 12 101
116. Prevention methods: Fail on slave
host1
(Multiple sources)
M
host2 host3
master
M S
©Continuent 2012. 102
Monday, December 03, 12 102
117. INSERT A
x,y
Prevention methods: Fail on slave
host1
(Multiple sources)
M
host2 host3
master
M S
INSERT A
©Continuent 2012.
x,z 102
Monday, December 03, 12 102
118. INSERT A
x,y
Prevention methods: Fail on slave
host1
(Multiple sources)
M
host2 host3
master
M S
INSERT A
©Continuent 2012.
x,z 102
Monday, December 03, 12 102
119. Prevention methods: DROP on master
host1
host3
M S
S
host2 host4
master
S M S
©Continuent 2012. 103
Monday, December 03, 12 103
120. INSERT B
x,y
Prevention methods: DROP on master
host1
host3
M S
S
host2 host4
master
S M S
©Continuent 2012. 103
Monday, December 03, 12 103
121. INSERT B
x,y
Prevention methods: DROP on master
host1
host3
M S
S
host2 host4
master
S M S
INSERT B
©Continuent 2012.
x,z 103
Monday, December 03, 12 103
122. Prevention methods: DROP on master
host1
host3
M S
S
host2 host4
master
S M S
INSERT B
©Continuent 2012.
x,z 103
Monday, December 03, 12 103
123. Prevention methods: DROP on slave
host1
(Multiple sources)
M
host2 host3
master
M S
©Continuent 2012. 104
Monday, December 03, 12 104
124. INSERT A
x,y
Prevention methods: DROP on slave
host1
(Multiple sources)
M
host2 host3
master
M S
INSERT A
©Continuent 2012.
x,z 104
Monday, December 03, 12 104
125. Prevention methods: DROP on slave
host1
(Multiple sources)
M
host2 INSERT B host3
x,y
master
INSERT B
M x,z S
©Continuent 2012. 105
Monday, December 03, 12 105
126. Prevention methods: DROP on slave
host1
(Multiple sources)
M
host2 host3
master
INSERT B
M x,z S
©Continuent 2012. 105
Monday, December 03, 12 105
127. Con$ict prevention syntax during
installation
--svc-extractor-filters=shardfilter
--property=replicator.stage.q-to-
dbms.filters=mysqlsessions,pkey,bidiSlave,shardfilter
--property=replicator.filter.shardfilter.unknownShardPolicy=warn
# OR:
--property=replicator.filter.shardfilter.unknownShardPolicy=drop
--property=replicator.filter.shardfilter.unknownShardPolicy=error
©Continuent 2012. 106
Monday, December 03, 12 106
128. con$ict prevention in a star topology
H
Host1 m
A master: alpha d
C database: employees
H
m
d
A Host3
master: charlie (hub)
B C
database: buildings
Host2
master: bravo
database: vehicles B C
©Continuent 2012. 107
Monday, December 03, 12 107
129. con$ict prevention rules
trepctl -host host1 -service charlie
shard -insert < shards_charlie.map
cat shards_charlie.map
shard_id master critical
tungsten_charlie charlie false
buildings charlie false
test charlie false
©Continuent 2012. 108
Monday, December 03, 12 108
130. con$ict prevention rules
trepctl -host host2 -service charlie
shard -insert < shards_charlie.map
cat shards_charlie.map
shard_id master critical
tungsten_charlie charlie false
buildings charlie false
test charlie false
©Continuent 2012. 109
Monday, December 03, 12 109
131. con$ict prevention rules
trepctl -host host3 -service alpha
shard -insert < shards_alpha.map
cat shards_alpha.map
shard_id master critical
tungsten_alpha alpha false
employees alpha false
test alpha false
©Continuent 2012. 110
Monday, December 03, 12 110
132. con$ict prevention rules
trepctl -host host3 -service bravo
shard -insert < shards_bravo.map
cat shards_alpha.map
shard_id master critical
tungsten_bravo bravo false
vehicles bravo false
test bravo false
©Continuent 2012. 111
Monday, December 03, 12 111
135. Viewing THL Events
thl info
log directory = /home/tungsten/installs/master_slave/thl/dragon/
min seq# = 0
max seq# = 101
events = 101
©Continuent 2012. 114
Monday, December 03, 12 114
136. viewing THL events
thl index
LogIndexEntry thl.data.0000000001(0:102)
©Continuent 2012. 115
Monday, December 03, 12 115
137. viewing THL events
thl index
[...]
LogIndexEntry thl.data.0000000001(0:18)
LogIndexEntry thl.data.0000000002(19:33)
LogIndexEntry thl.data.0000000003(34:35)
LogIndexEntry thl.data.0000000004(36:3641)
LogIndexEntry thl.data.0000000005(3642:3712)
LogIndexEntry thl.data.0000000006(3713:3838)
LogIndexEntry thl.data.0000000007(3839:3949)
LogIndexEntry thl.data.0000000008(3950:4011)
LogIndexEntry thl.data.0000000009(4012:4039)
LogIndexEntry thl.data.0000000010(4040:4057)
LogIndexEntry thl.data.0000000011(4058:4067)
LogIndexEntry thl.data.0000000012(4068:4073)
LogIndexEntry thl.data.0000000013(4074:4085)
LogIndexEntry thl.data.0000000014(4086:4095)
LogIndexEntry thl.data.0000000015(4096:4101)
LogIndexEntry thl.data.0000000016(4102:4111)
©Continuent 2012. 116
Monday, December 03, 12 116
138. viewing THL events
thl list -seqno 102
[...]
SEQ# = 102 / FRAG# = 0 (last frag)
- TIME = 2012-02-06 05:56:09.0
- EPOCH# = 0
- EVENTID = mysql-bin.000002:0000000000018903;0
- SOURCEID = qa.r1.continuent.com
- METADATA =
[mysql_server_id=10;is_metadata=true;service=dragon;shard=tung
sten_dragon;heartbeat=NONE]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [##charset = ISO8859_1, autocommit = 1, sql_auto_is_null = 1, foreign_key_checks = 1,
unique_checks = 1, sql_mode = 'IGNORE_SPACE', character_set_client = 8, collation_connection = 8,
collation_server = 8]
- SCHEMA = tungsten_dragon
- SQL(0) = UPDATE tungsten_dragon.heartbeat SET source_tstamp=
"2012-02-06 05:56:09", salt= 2, name= "NONE" WHERE id= 1 /*
___SERVICE___ = [dragon] */
©Continuent 2012. 117
Monday, December 03, 12 117
139. Skipping a THL Event
trepctl online -skip-seqno 1092
trepctl online -skip-seqno 1092,1093,1094
# see example
©Continuent 2012. 118
Monday, December 03, 12 118
140. Adding a Member
• Let's see the cookbook, and use it
©Continuent 2012. 119
Monday, December 03, 12 119
142. Replicator Pipeline Architecture
Tungsten Replicator Process
Pipeline
Stage Stage Stage
Extract Apply
Assign Parallel
Extract Shard Apply Extract Apply Extract Apply
Queue
ID Extract Apply
“channels”
Transaction
History Log
shard.list
MySQL THL file
Binlog Slave
DBMS
Monday, December 03, 12 121
143. Parallel replication facts
✓ Sharded by database
✓ Good choice for slave lag problems
❖ Bad choice for single database projects
©Continuent 2012. 122
Monday, December 03, 12 122
144. Parallel Replication test
STOPPED
binary logs
MySQL slave Concurrent sysbench
on 30 databases
running for 1 hour
OFFLINE
TOTAL DATA: 130 GB
Tungsten slave direct:
alpha
RAM per server: 20GB
(slave)
replicator alpha
Slaves will have 1 hour lag
Monday, December 03, 12 123
145. measuring results
START
binary logs
MySQL slave
ONLINE Recording
catch-up time
Tungsten slave direct:
alpha
(slave)
replicator alpha
Monday, December 03, 12 124
146. MySQL native
replication
slave catch up in 04:29:30
Monday, December 03, 12 125
147. Tungsten parallel
replication
slave catch up in 00:55:40
Monday, December 03, 12 126
152. parallel replication
direct slave facts
✓ No need to install Tungsten on the master
©Continuent 2012. 130
Monday, December 03, 12 130
153. parallel replication
direct slave facts
✓ No need to install Tungsten on the master
✓ Tungsten runs only on the slave
©Continuent 2012. 130
Monday, December 03, 12 130
154. parallel replication
direct slave facts
✓ No need to install Tungsten on the master
✓ Tungsten runs only on the slave
✓ Replication can revert to native slave with two
commands (trepctl offline; start slave)
©Continuent 2012. 130
Monday, December 03, 12 130
155. parallel replication
direct slave facts
✓ No need to install Tungsten on the master
✓ Tungsten runs only on the slave
✓ Replication can revert to native slave with two
commands (trepctl offline; start slave)
✓ Native replication can continue on other slaves
©Continuent 2012. 130
Monday, December 03, 12 130
156. parallel replication
direct slave facts
✓ No need to install Tungsten on the master
✓ Tungsten runs only on the slave
✓ Replication can revert to native slave with two
commands (trepctl offline; start slave)
✓ Native replication can continue on other slaves
❖ Failover (either native or Tungsten) becomes a manual
task
©Continuent 2012. 130
Monday, December 03, 12 130
158. Checking parallel replication
trepctl status
trepctl status -name tasks
trepctl status -name shards
trepctl status -name stores
©Continuent 2012. 132
Monday, December 03, 12 132
160. Identify the Failed Component
• Steps
1. trepctl services
2. trepctl -service SVC_NAME status
3. look at the logs
4. Take action
©Continuent 2012. 134
Monday, December 03, 12 134
161. reading the logs
ls $TUNGSTEN_BASE/tungsten/tungsten-replicator/logs/
trepsvc.log user.log
# let's see it in practice
©Continuent 2012. 135
Monday, December 03, 12 135
163. We are hiring QA engineers
http://continuent.com/about/careers
©Continuent 2012. 137
Monday, December 03, 12 137
164. 560 S. Winchester Blvd., Suite 500 Our Blogs:
San Jose, CA 95128 http://scale-out-blog.blogspot.com
Tel +1 (866) 998-3642 http://datacharmer.blogspot.com
Fax +1 (408) 668-1009 http://flyingclusters.blogspot.com
e-mail: sales@continuent.com
Continuent Website:
http://www.continuent.com
Tungsten Replicator 2.0:
http://code.google.com/p/tungsten-replicator
©Continuent 2012 138
Monday, December 03, 12 138