SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Downloaden Sie, um offline zu lesen
www.seznam.cz 1 of 67
MySQL / MongoDB Meetup
3.10.2017, Prague
Agenda
- Introduction
- About Seznam and Sklik.cz from DB point of view
- Architecture and scaling of MySQL
- A glimpse into the world of HBase
- MongoDB from the DBA point of view (cancelled, sorry)
Next time (ca 3/2018)
- Call for papers is open!
Architecture in Seznam.cz and Sklik.cz
Radim Špigel
Senior developer of Sklik, Seznam.cz
www.seznam.cz 3 of 67
www.seznam.cz 4 of 67
●
•
●
●
●
•
●
www.seznam.cz 5 of 67
●
●
●
●
●
●
●
●
●
www.seznam.cz 6 of 67
www.seznam.cz 7 of 67
www.seznam.cz 8 of 67
www.seznam.cz 9 of 67
●
●
●
●
●
●
●
Architecture and scaling of MySQL
Audience: Beginners
Michal Kuchta
Senior developer of Sklik, Seznam.cz
www.seznam.cz
Common setup
• LAMP server
• Linux, Apache, MySQL, PHP
• Most common usage
• Everything on single machine
+ Easy to maintain
+ Cheap
- SPOF
- Poor performance under high load
- IO scheduling
- Splitted memory between application and DB
LAMP Server
11 of 67
www.seznam.cz
Brute force scaling
• Split database and application
• One machine for all database operations
+ Database on it’s own dedicated hardware
+ Dedicated resources
+ Better optimalization possibilities
- Another server to maintain
- Still SPOF
MySQL Server
Application server
12 of 67
www.seznam.cz
Brute force scaling
• Dedicated database server
• 128 – 256 GB RAM
• SSD drives
■ Anyone runs MySQL on HDD today?
• A lot of memory for InnoDB buffer pool - database runs from RAM
■ Anyone uses MyISAM today?
• Price? Around $19.000 per machine
MySQL Server
13 of 67
www.seznam.cz
Master
Slave Slave Slave
Horizontal scaling
• Master – Slave replications
• Writes goes to master
• Reads goest to slaves
• Good if you have high read load
• Statement based vs. row based
binlog entrys
+ Better performance for selects
(read scale-out)
+ Hot backup
+ Intentional delay
- Does not scale writes
- Replication lag (asynchronous)
- Replication tends to break sometimes
- Needs manual failover, master is SPOF
14 of 67
www.seznam.cz
Master
Slave Slave Slave
Master
Slave Slave Slave
Master – Master
replication
DC 1 DC 2
• Introduce second DC
+ Geographical fault tolerance
+ “hot” backup
+ Maintenance in one DC does not affect traffic.
- Still only one “active” master
- Where is the master?
- Cross DC lag
15 of 67
www.seznam.cz
Scaling of writes - sharding.
• Shard
• Same structure
• Different subset of data
• Horizontal scaling of writes
• Two approaches: Multitenancy routing, colocation routing
Master Master MasterShard 1 Shard 2 Shard 3
Slave Slave Slave Slave Slave Slave Slave Slave Slave
16 of 67
www.seznam.cz
Shard manager
We have to solve routing of application requests to correct shard.
Master Master MasterShard 1 Shard 2 Shard 3
Slave Slave Slave Slave Slave Slave Slave Slave Slave
17 of 67
www.seznam.cz
Shard manager
Where are John’s data?
We have to solve routing of application requests to correct shard.
Master Master MasterShard 1 Shard 2 Shard 3
Slave Slave Slave Slave Slave Slave Slave Slave Slave
18 of 67
www.seznam.cz
Shard manager
User John is on shard 2.
We have to solve routing of application requests to correct shard.
Master Master MasterShard 1 Shard 2 Shard 3
Slave Slave Slave Slave Slave Slave Slave Slave Slave
19 of 67
www.seznam.cz
We have to solve routing of application requests to correct shard.
Master Master MasterShard 1 Shard 2 Shard 3
Slave Slave Slave Slave Slave Slave Slave Slave Slave
Shard manager
20 of 67
www.seznam.cz
Cross-shard relations
For example messaging center
- Each message has sender
- Each message has recipient
- Potentialy each user is on different shard.
Master Master MasterShard 1 Shard 2 Shard 3
Slave Slave Slave Slave Slave Slave Slave Slave Slave
21 of 67
www.seznam.cz
Cross-shard relations
Possible solution: Duplicate data
+ Good solution for static data (enums)
- Difficult to maintain consistency in case of updates
(solved on application level)
Master Master MasterShard 1 Shard 2 Shard 3
Slave Slave Slave Slave Slave Slave Slave Slave Slave
22 of 67
www.seznam.cz
Cross-shard relations
Possible solution: Common data in separate database
+ Only one instance of data
+ No consistency problems
- Potentially less performant, we are back to the one
database solution.
Master Master MasterShard 1 Shard 2 Shard 3
Slave Slave Slave Slave Slave Slave Slave Slave Slave
Common DB
23 of 67
www.seznam.cz
We use both solutions on Sklik
- each is good for different subset of
data.
Common DB
Master Master MasterShard 1 Shard 2 Shard 3
Slave Slave Slave Slave Slave Slave Slave Slave Slave
24 of 67
www.seznam.cz
Summary
+ Almost unlimited horizontal scaling
+ Good for high load applications.
- Bad for analytical querying over all shards
- Common data problem
- Routing on application level
- A lot of components to monitor and maintain
Master Master MasterShard 1 Shard 2 Shard 3
Slave Slave Slave Slave Slave Slave Slave Slave Slave
25 of 67
www.seznam.cz
Balancing of shard load
- You can add another shard
- You can move data between shards
Problem: PK collisions
- No AUTO_INCREMENT
- ID allocation must be handled on application level
- We are using shard manager to assign IDs
Master Master MasterShard 1 Shard 2 Shard 3
Slave Slave Slave Slave Slave Slave Slave Slave Slave
26 of 67
www.seznam.cz
Master
Master Master
Master
Shard
- Galera - Semi-synchronous replications
- True multi-master setup
- Two phase commit
- Automatic provisioning
27 of 67
www.seznam.cz
Node 1 Node 2
BEGIN
Query 1
Query 2
Query 3
COMMIT
Transaction
Transaction transfered to other nodes
OK or ROLLBACK
Certification
Transaction applied
asynchronously
COMMIT
result
(OK or
ROLLBACK)
Physical
commit
User interaction
Additional time
required for commit
28 of 67
www.seznam.cz
+ HA without manual failover
+ Read scaling
+ Write scaling
+ Automatic resync of failed nodes
- Conflict detection at commit
- InnoDB only (who uses MyISAM these days?)
- Difficult DDL statements (rolling schema upgrade)
- Maximum transaction size 2GB
29 of 67
www.seznam.cz
Shard 1 - Master-slave
Master
Slave Slave Slave
Shard 2 - Galera
1. Prepare empty shard based on Galera
2. Migrate all users to that shard
3. Drop old shard and use hardware for new Galera shard
4. Move (some) users back to original shard
30 of 67
www.seznam.cz
Shard 1 - Master-slave
Master
Slave Slave Slave
Shard 2 - Galera
Migrate all users
1. Prepare empty shard based on Galera
2. Migrate all users to that shard
3. Drop old shard and use hardware for new Galera shard
4. Move (some) users back to original shard
31 of 67
www.seznam.cz
Shard 1 - Galera Shard 2 - Galera
1. Prepare empty shard based on Galera
2. Migrate all users to that shard
3. Drop old shard and use hardware for new Galera shard
4. Move (some) users back to original shard
32 of 67
www.seznam.cz
Migrate some users
back
1. Prepare empty shard based on Galera
2. Migrate all users to that shard
3. Drop old shard and use hardware for new Galera shard
4. Move (some) users back to original shard
Shard 2 - GaleraShard 1 - Galera
33 of 67
www.seznam.cz
Shard 1 - DC 1
Master
Slave Slave Slave
Shard 1 - DC 2
Master
Slave Slave Slave
Master-Master
1. Disconnect master-master replication between DCs, traffic goes to DC 1.
2. Drop shard at DC 2, recreate as Galera cluster
3. Reestablish master-master replication, let galera cluster catch up with DC 1.
4. Redirect traffic to DC 2
5. Disconnect master-master replication between DCs, traffic goes to DC 2.
6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest.
7. Reattach traffic to DC 1.
Traffic Traffic
34 of 67
www.seznam.cz
Shard 1 - DC 1
Master
Slave Slave Slave
Shard 1 - DC 2
Master
Slave Slave Slave
Master-Master
1. Disconnect master-master replication between DCs, traffic goes to DC 1.
2. Drop shard at DC 2, recreate as Galera cluster
3. Reestablish master-master replication, let galera cluster catch up with DC 1.
4. Redirect traffic to DC 2
5. Disconnect master-master replication between DCs, traffic goes to DC 2.
6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest.
7. Reattach traffic to DC 1.
Traffic Traffic
35 of 67
www.seznam.cz
Shard 1 - DC 1
Master
Slave Slave Slave
Shard 1 - DC 2
1. Disconnect master-master replication between DCs, traffic goes to DC 1.
2. Drop shard at DC 2, recreate as Galera cluster
3. Reestablish master-master replication, let galera cluster catch up with DC 1.
4. Redirect traffic to DC 2
5. Disconnect master-master replication between DCs, traffic goes to DC 2.
6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest.
7. Reattach traffic to DC 1.
Traffic
Galera quorum
36 of 67
www.seznam.cz
Shard 1 - DC 1
Master
Slave Slave Slave
Shard 1 - DC 2
1. Disconnect master-master replication between DCs, traffic goes to DC 1.
2. Drop shard at DC 2, recreate as Galera cluster
3. Reestablish master-master replication, let galera cluster catch up with DC 1.
4. Redirect traffic to DC 2
5. Disconnect master-master replication between DCs, traffic goes to DC 2.
6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest.
7. Reattach traffic to DC 1.
Traffic
Galera quorum
Master-Master
37 of 67
www.seznam.cz
Shard 1 - DC 1
Master
Slave Slave Slave
Shard 1 - DC 2
1. Disconnect master-master replication between DCs, traffic goes to DC 1.
2. Drop shard at DC 2, recreate as Galera cluster
3. Reestablish master-master replication, let galera cluster catch up with DC 1.
4. Redirect traffic to DC 2
5. Disconnect master-master replication between DCs, traffic goes to DC 2.
6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest.
7. Reattach traffic to DC 1.
Traffic
Galera quorum
Master-Master Traffic
38 of 67
www.seznam.cz
Shard 1 - DC 1
Master
Slave Slave Slave
Shard 1 - DC 2
1. Disconnect master-master replication between DCs, traffic goes to DC 1.
2. Drop shard at DC 2, recreate as Galera cluster
3. Reestablish master-master replication, let galera cluster catch up with DC 1.
4. Redirect traffic to DC 2
5. Disconnect master-master replication between DCs, traffic goes to DC 2.
6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest.
7. Reattach traffic to DC 1.
Galera quorum
Master-Master Traffic
39 of 67
www.seznam.cz
Shard 1 - DC 1 Shard 1 - DC 2
1. Disconnect master-master replication between DCs, traffic goes to DC 1.
2. Drop shard at DC 2, recreate as Galera cluster
3. Reestablish master-master replication, let galera cluster catch up with DC 1.
4. Redirect traffic to DC 2
5. Disconnect master-master replication between DCs, traffic goes to DC 2.
6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest.
7. Reattach traffic to DC 1.
Traffic
40 of 67
www.seznam.cz
Shard 1 - DC 1 Shard 1 - DC 2
1. Disconnect master-master replication between DCs, traffic goes to DC 1.
2. Drop shard at DC 2, recreate as Galera cluster
3. Reestablish master-master replication, let galera cluster catch up with DC 1.
4. Redirect traffic to DC 2
5. Disconnect master-master replication between DCs, traffic goes to DC 2.
6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest.
7. Reattach traffic to DC 1.
TrafficTraffic
41 of 67
www.seznam.cz 42 of 67
www.seznam.cz 43 of 67
www.seznam.cz
• mysqldump
+ Easy to use
+ Can backup only selected databases/tables
- No data consistency
- Really slow
• Percona XtraBackup
+ Online backup of whole tablespace
+ Strictly consistent
+ Only copies data + differential binlog
- InnoDB only (again, who uses MyISAM nowadays?)
- You cannot select certain databases/tables.
44 of 67
www.seznam.cz
SELECT question FROM audience;
45 of 67
A Glimpse into the world of HBase
Audience: Beginners
Michal Fizek
Senior developer of Sklik, Seznam.cz
www.seznam.cz
www.seznam.cz 48 of 67
www.seznam.cz
HBase - Sklik.cz - real example
● 10 millions keywords
● 120 statistical values per keyword, per day
● for one year period:
○
○
● hundreds of thousands users
49 of 67
www.seznam.cz
What is HBase?
● NoSQL, BigTable(in Java),
● KeyValue, ColumnBased(ColumnFamily)
● distributed, scalable
● fault tolerant
● strong consistency (CP)
● availability?
● petabytes of data
50 of 67
www.seznam.cz
Data architecture in HBase
51 of 67
www.seznam.cz
Tables and rows
⋮
User02Key01 |
User02Key00 |
● keys, data
● sorted lexicographically
● binary data
User02KeyZZ |
UserXYKeyZY |
UserXYKeyZZ |
⋮
Data …..
Data …..
Data …..
Data …..
Data …..
⋮
⋮
● contain rows
● defined during design
● “readable names”
Tables:
Rows:
52 of 67
www.seznam.cz
Columns
● qualifier
● sparse matrix
● key-value
● binary names and data
● even names can contain data
● sorted lexicographically
Rowkey
User02Key00 2012/01/01->data 2012/01/02->data
User02Key01 2009/12/23 -> long data 2010/12/23->data
key 1-> value Key 2-> long value
53 of 67
www.seznam.cz
Versions
● every cell(column) can contain versioned data
● every value is versioned
● long integer
● again arbitrary values
● sorted in descending order
● version count can be configured
54 of 67
www.seznam.cz
Regions
region User00Key00
region User00KeyZZ
region UserXYKeyZY
Regionserver 1
Regionserver 2
⋮
User02Key01 |
User02Key00 |
User02KeyZZ |
UserXYKeyZY |
UserXYKeyZZ |
⋮
Data …..
Data …..
Data …..
Data …..
Data …..
⋮
⋮
55 of 67
www.seznam.cz
Column Family
● columns grouped to logical “units”
● separated physical storage
● ColumnFamily based
● optimization
Rowkey
User02Key01
keyAA=val1, keyAB=val2 keyAA=val2, keyBB=val2 ...
2011/12/23=data, 2012/12/23=data 2011/12/23=data2
Fulltext Context
56 of 67
www.seznam.cz
Data architecture in HBase
● data in tables - sparse matrix
● binary data
● regions
● columns ; ColumnFamily
● versions
● no joins, no foreign keys
● variable scheme
57 of 67
www.seznam.cz
Coprocessors a Filters
● java classes
● server side (data locality)
● wide optimization possibilities
58 of 67
www.seznam.cz
Sequential reading
● use sorted rows and columns
● can be restricted to column family
● filters - almost “endless” optimization possibilities
● very fast
59 of 67
www.seznam.cz
Other operations
● get, scan
● put, delete
● checkAndPut, checkAndDelete, checkAndMutate
● exist, increment, append
● batch
60 of 67
www.seznam.cz
HBase cluster architecture
61 of 67
www.seznam.cz
HBase use case
● lexicographically sorted rows and columns
● binary data and keys
● variable scheme
● coprocessors
● sharded data
Properties
62 of 67
www.seznam.cz
● sequential reading
● variable scheme
● data divided to collections
● really lot of data
● transactional processing
● not enough HW
● variable queries
● random writes, a lot of updates (or deletes)
Pros:
Cons:
63 of 67
HBase use case
www.seznam.cz
RDBMS vs HBase scheme
● entity and their relationships description vs
query-first
● data normalization vs duplicated informations
● key design emphasis
● clustering
64 of 67
www.seznam.cz
HBase in Seznam.cz
● Fulltext
○
○
○
○
● Sklik
○
○
○
○
65 of 67
www.seznam.cz 66 of 67
Next time ca 3/2018
Call for papers is open!
www.seznam.cz

Weitere ähnliche Inhalte

Was ist angesagt?

Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )Mydbops
 
Proxysql sharding
Proxysql shardingProxysql sharding
Proxysql shardingMarco Tusa
 
Zero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesZero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesSeveralnines
 
Repair & Recovery for your MySQL, MariaDB & MongoDB / TokuMX Clusters - Webin...
Repair & Recovery for your MySQL, MariaDB & MongoDB / TokuMX Clusters - Webin...Repair & Recovery for your MySQL, MariaDB & MongoDB / TokuMX Clusters - Webin...
Repair & Recovery for your MySQL, MariaDB & MongoDB / TokuMX Clusters - Webin...Severalnines
 
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group ReplicationPercona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group ReplicationKenny Gryp
 
Galera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction SlidesGalera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction SlidesSeveralnines
 
M|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera ClusterM|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera ClusterMariaDB plc
 
User Camp High Availability Presentation
User Camp High Availability PresentationUser Camp High Availability Presentation
User Camp High Availability Presentationsankalita chakraborty
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howSergey Petrunya
 
Set Up & Operate Open Source Oracle Replication
Set Up & Operate Open Source Oracle ReplicationSet Up & Operate Open Source Oracle Replication
Set Up & Operate Open Source Oracle ReplicationContinuent
 
Tungsten University: Replicate Between MySQL And Oracle
Tungsten University: Replicate Between MySQL And OracleTungsten University: Replicate Between MySQL And Oracle
Tungsten University: Replicate Between MySQL And OracleContinuent
 
3 周彦偉-隨需而變 我所經歷的my sql架構變遷﹣周彥偉﹣acmug@2015.12台北
3 周彦偉-隨需而變 我所經歷的my sql架構變遷﹣周彥偉﹣acmug@2015.12台北3 周彦偉-隨需而變 我所經歷的my sql架構變遷﹣周彥偉﹣acmug@2015.12台北
3 周彦偉-隨需而變 我所經歷的my sql架構變遷﹣周彥偉﹣acmug@2015.12台北Ivan Tu
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseCloudera, Inc.
 
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorAlmost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorJean-François Gagné
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityColin Charles
 
Database-Migration and -Upgrade with Transportable Tablespaces
Database-Migration and -Upgrade with Transportable TablespacesDatabase-Migration and -Upgrade with Transportable Tablespaces
Database-Migration and -Upgrade with Transportable TablespacesMarkus Flechtner
 
Introduction to XtraDB Cluster
Introduction to XtraDB ClusterIntroduction to XtraDB Cluster
Introduction to XtraDB Clusteryoku0825
 
Tungsten University: MySQL Multi-Master Operations Made Simple With Tungsten ...
Tungsten University: MySQL Multi-Master Operations Made Simple With Tungsten ...Tungsten University: MySQL Multi-Master Operations Made Simple With Tungsten ...
Tungsten University: MySQL Multi-Master Operations Made Simple With Tungsten ...Continuent
 
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017Alex Robinson
 
Oracle Multitenant Database 2.0 - Improvements in Oracle Database 12c Release 2
Oracle Multitenant Database 2.0 - Improvements in Oracle Database 12c Release 2Oracle Multitenant Database 2.0 - Improvements in Oracle Database 12c Release 2
Oracle Multitenant Database 2.0 - Improvements in Oracle Database 12c Release 2Markus Flechtner
 

Was ist angesagt? (20)

Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )
 
Proxysql sharding
Proxysql shardingProxysql sharding
Proxysql sharding
 
Zero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesZero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best Practices
 
Repair & Recovery for your MySQL, MariaDB & MongoDB / TokuMX Clusters - Webin...
Repair & Recovery for your MySQL, MariaDB & MongoDB / TokuMX Clusters - Webin...Repair & Recovery for your MySQL, MariaDB & MongoDB / TokuMX Clusters - Webin...
Repair & Recovery for your MySQL, MariaDB & MongoDB / TokuMX Clusters - Webin...
 
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group ReplicationPercona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
 
Galera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction SlidesGalera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction Slides
 
M|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera ClusterM|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera Cluster
 
User Camp High Availability Presentation
User Camp High Availability PresentationUser Camp High Availability Presentation
User Camp High Availability Presentation
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and how
 
Set Up & Operate Open Source Oracle Replication
Set Up & Operate Open Source Oracle ReplicationSet Up & Operate Open Source Oracle Replication
Set Up & Operate Open Source Oracle Replication
 
Tungsten University: Replicate Between MySQL And Oracle
Tungsten University: Replicate Between MySQL And OracleTungsten University: Replicate Between MySQL And Oracle
Tungsten University: Replicate Between MySQL And Oracle
 
3 周彦偉-隨需而變 我所經歷的my sql架構變遷﹣周彥偉﹣acmug@2015.12台北
3 周彦偉-隨需而變 我所經歷的my sql架構變遷﹣周彥偉﹣acmug@2015.12台北3 周彦偉-隨需而變 我所經歷的my sql架構變遷﹣周彥偉﹣acmug@2015.12台北
3 周彦偉-隨需而變 我所經歷的my sql架構變遷﹣周彥偉﹣acmug@2015.12台北
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBase
 
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorAlmost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra Interoperability
 
Database-Migration and -Upgrade with Transportable Tablespaces
Database-Migration and -Upgrade with Transportable TablespacesDatabase-Migration and -Upgrade with Transportable Tablespaces
Database-Migration and -Upgrade with Transportable Tablespaces
 
Introduction to XtraDB Cluster
Introduction to XtraDB ClusterIntroduction to XtraDB Cluster
Introduction to XtraDB Cluster
 
Tungsten University: MySQL Multi-Master Operations Made Simple With Tungsten ...
Tungsten University: MySQL Multi-Master Operations Made Simple With Tungsten ...Tungsten University: MySQL Multi-Master Operations Made Simple With Tungsten ...
Tungsten University: MySQL Multi-Master Operations Made Simple With Tungsten ...
 
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
 
Oracle Multitenant Database 2.0 - Improvements in Oracle Database 12c Release 2
Oracle Multitenant Database 2.0 - Improvements in Oracle Database 12c Release 2Oracle Multitenant Database 2.0 - Improvements in Oracle Database 12c Release 2
Oracle Multitenant Database 2.0 - Improvements in Oracle Database 12c Release 2
 

Ähnlich wie MySQL Meetup Prague - Modern Data Lake

rac_for_beginners_ppt.pdf
rac_for_beginners_ppt.pdfrac_for_beginners_ppt.pdf
rac_for_beginners_ppt.pdfHODCA1
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
 
Akka Cluster in Production
Akka Cluster in ProductionAkka Cluster in Production
Akka Cluster in Productionbilyushonak
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allenjaxconf
 
Why MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it BackWhy MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it BackSveta Smirnova
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4thelabdude
 
MariaDB Galera Cluster - Simple, Transparent, Highly Available
MariaDB Galera Cluster - Simple, Transparent, Highly AvailableMariaDB Galera Cluster - Simple, Transparent, Highly Available
MariaDB Galera Cluster - Simple, Transparent, Highly AvailableMariaDB Corporation
 
Redis trouble shooting_eng
Redis trouble shooting_engRedis trouble shooting_eng
Redis trouble shooting_engDaeMyung Kang
 
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE
 
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASMSAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASMAlex Zaballa
 
Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure bloomreacheng
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At CraigslistMySQLConference
 
Apache Spark and Online Analytics
Apache Spark and Online Analytics Apache Spark and Online Analytics
Apache Spark and Online Analytics Databricks
 
Galera explained 3
Galera explained 3Galera explained 3
Galera explained 3Marco Tusa
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerkuchinskaya
 

Ähnlich wie MySQL Meetup Prague - Modern Data Lake (20)

rac_for_beginners_ppt.pdf
rac_for_beginners_ppt.pdfrac_for_beginners_ppt.pdf
rac_for_beginners_ppt.pdf
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
Akka Cluster in Production
Akka Cluster in ProductionAkka Cluster in Production
Akka Cluster in Production
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allen
 
Cpu Caches
Cpu CachesCpu Caches
Cpu Caches
 
Why MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it BackWhy MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it Back
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
MariaDB Galera Cluster - Simple, Transparent, Highly Available
MariaDB Galera Cluster - Simple, Transparent, Highly AvailableMariaDB Galera Cluster - Simple, Transparent, Highly Available
MariaDB Galera Cluster - Simple, Transparent, Highly Available
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
Redis trouble shooting_eng
Redis trouble shooting_engRedis trouble shooting_eng
Redis trouble shooting_eng
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
 
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASMSAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
 
System Design.pdf
System Design.pdfSystem Design.pdf
System Design.pdf
 
Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At Craigslist
 
Apache Spark and Online Analytics
Apache Spark and Online Analytics Apache Spark and Online Analytics
Apache Spark and Online Analytics
 
Galera explained 3
Galera explained 3Galera explained 3
Galera explained 3
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
 
Mongo db roma replication and sharding
Mongo db roma replication and shardingMongo db roma replication and sharding
Mongo db roma replication and sharding
 

Kürzlich hochgeladen

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 

Kürzlich hochgeladen (20)

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 

MySQL Meetup Prague - Modern Data Lake

  • 1. www.seznam.cz 1 of 67 MySQL / MongoDB Meetup 3.10.2017, Prague Agenda - Introduction - About Seznam and Sklik.cz from DB point of view - Architecture and scaling of MySQL - A glimpse into the world of HBase - MongoDB from the DBA point of view (cancelled, sorry) Next time (ca 3/2018) - Call for papers is open!
  • 2. Architecture in Seznam.cz and Sklik.cz Radim Špigel Senior developer of Sklik, Seznam.cz
  • 4. www.seznam.cz 4 of 67 ● • ● ● ● • ●
  • 5. www.seznam.cz 5 of 67 ● ● ● ● ● ● ● ● ●
  • 9. www.seznam.cz 9 of 67 ● ● ● ● ● ● ●
  • 10. Architecture and scaling of MySQL Audience: Beginners Michal Kuchta Senior developer of Sklik, Seznam.cz
  • 11. www.seznam.cz Common setup • LAMP server • Linux, Apache, MySQL, PHP • Most common usage • Everything on single machine + Easy to maintain + Cheap - SPOF - Poor performance under high load - IO scheduling - Splitted memory between application and DB LAMP Server 11 of 67
  • 12. www.seznam.cz Brute force scaling • Split database and application • One machine for all database operations + Database on it’s own dedicated hardware + Dedicated resources + Better optimalization possibilities - Another server to maintain - Still SPOF MySQL Server Application server 12 of 67
  • 13. www.seznam.cz Brute force scaling • Dedicated database server • 128 – 256 GB RAM • SSD drives ■ Anyone runs MySQL on HDD today? • A lot of memory for InnoDB buffer pool - database runs from RAM ■ Anyone uses MyISAM today? • Price? Around $19.000 per machine MySQL Server 13 of 67
  • 14. www.seznam.cz Master Slave Slave Slave Horizontal scaling • Master – Slave replications • Writes goes to master • Reads goest to slaves • Good if you have high read load • Statement based vs. row based binlog entrys + Better performance for selects (read scale-out) + Hot backup + Intentional delay - Does not scale writes - Replication lag (asynchronous) - Replication tends to break sometimes - Needs manual failover, master is SPOF 14 of 67
  • 15. www.seznam.cz Master Slave Slave Slave Master Slave Slave Slave Master – Master replication DC 1 DC 2 • Introduce second DC + Geographical fault tolerance + “hot” backup + Maintenance in one DC does not affect traffic. - Still only one “active” master - Where is the master? - Cross DC lag 15 of 67
  • 16. www.seznam.cz Scaling of writes - sharding. • Shard • Same structure • Different subset of data • Horizontal scaling of writes • Two approaches: Multitenancy routing, colocation routing Master Master MasterShard 1 Shard 2 Shard 3 Slave Slave Slave Slave Slave Slave Slave Slave Slave 16 of 67
  • 17. www.seznam.cz Shard manager We have to solve routing of application requests to correct shard. Master Master MasterShard 1 Shard 2 Shard 3 Slave Slave Slave Slave Slave Slave Slave Slave Slave 17 of 67
  • 18. www.seznam.cz Shard manager Where are John’s data? We have to solve routing of application requests to correct shard. Master Master MasterShard 1 Shard 2 Shard 3 Slave Slave Slave Slave Slave Slave Slave Slave Slave 18 of 67
  • 19. www.seznam.cz Shard manager User John is on shard 2. We have to solve routing of application requests to correct shard. Master Master MasterShard 1 Shard 2 Shard 3 Slave Slave Slave Slave Slave Slave Slave Slave Slave 19 of 67
  • 20. www.seznam.cz We have to solve routing of application requests to correct shard. Master Master MasterShard 1 Shard 2 Shard 3 Slave Slave Slave Slave Slave Slave Slave Slave Slave Shard manager 20 of 67
  • 21. www.seznam.cz Cross-shard relations For example messaging center - Each message has sender - Each message has recipient - Potentialy each user is on different shard. Master Master MasterShard 1 Shard 2 Shard 3 Slave Slave Slave Slave Slave Slave Slave Slave Slave 21 of 67
  • 22. www.seznam.cz Cross-shard relations Possible solution: Duplicate data + Good solution for static data (enums) - Difficult to maintain consistency in case of updates (solved on application level) Master Master MasterShard 1 Shard 2 Shard 3 Slave Slave Slave Slave Slave Slave Slave Slave Slave 22 of 67
  • 23. www.seznam.cz Cross-shard relations Possible solution: Common data in separate database + Only one instance of data + No consistency problems - Potentially less performant, we are back to the one database solution. Master Master MasterShard 1 Shard 2 Shard 3 Slave Slave Slave Slave Slave Slave Slave Slave Slave Common DB 23 of 67
  • 24. www.seznam.cz We use both solutions on Sklik - each is good for different subset of data. Common DB Master Master MasterShard 1 Shard 2 Shard 3 Slave Slave Slave Slave Slave Slave Slave Slave Slave 24 of 67
  • 25. www.seznam.cz Summary + Almost unlimited horizontal scaling + Good for high load applications. - Bad for analytical querying over all shards - Common data problem - Routing on application level - A lot of components to monitor and maintain Master Master MasterShard 1 Shard 2 Shard 3 Slave Slave Slave Slave Slave Slave Slave Slave Slave 25 of 67
  • 26. www.seznam.cz Balancing of shard load - You can add another shard - You can move data between shards Problem: PK collisions - No AUTO_INCREMENT - ID allocation must be handled on application level - We are using shard manager to assign IDs Master Master MasterShard 1 Shard 2 Shard 3 Slave Slave Slave Slave Slave Slave Slave Slave Slave 26 of 67
  • 27. www.seznam.cz Master Master Master Master Shard - Galera - Semi-synchronous replications - True multi-master setup - Two phase commit - Automatic provisioning 27 of 67
  • 28. www.seznam.cz Node 1 Node 2 BEGIN Query 1 Query 2 Query 3 COMMIT Transaction Transaction transfered to other nodes OK or ROLLBACK Certification Transaction applied asynchronously COMMIT result (OK or ROLLBACK) Physical commit User interaction Additional time required for commit 28 of 67
  • 29. www.seznam.cz + HA without manual failover + Read scaling + Write scaling + Automatic resync of failed nodes - Conflict detection at commit - InnoDB only (who uses MyISAM these days?) - Difficult DDL statements (rolling schema upgrade) - Maximum transaction size 2GB 29 of 67
  • 30. www.seznam.cz Shard 1 - Master-slave Master Slave Slave Slave Shard 2 - Galera 1. Prepare empty shard based on Galera 2. Migrate all users to that shard 3. Drop old shard and use hardware for new Galera shard 4. Move (some) users back to original shard 30 of 67
  • 31. www.seznam.cz Shard 1 - Master-slave Master Slave Slave Slave Shard 2 - Galera Migrate all users 1. Prepare empty shard based on Galera 2. Migrate all users to that shard 3. Drop old shard and use hardware for new Galera shard 4. Move (some) users back to original shard 31 of 67
  • 32. www.seznam.cz Shard 1 - Galera Shard 2 - Galera 1. Prepare empty shard based on Galera 2. Migrate all users to that shard 3. Drop old shard and use hardware for new Galera shard 4. Move (some) users back to original shard 32 of 67
  • 33. www.seznam.cz Migrate some users back 1. Prepare empty shard based on Galera 2. Migrate all users to that shard 3. Drop old shard and use hardware for new Galera shard 4. Move (some) users back to original shard Shard 2 - GaleraShard 1 - Galera 33 of 67
  • 34. www.seznam.cz Shard 1 - DC 1 Master Slave Slave Slave Shard 1 - DC 2 Master Slave Slave Slave Master-Master 1. Disconnect master-master replication between DCs, traffic goes to DC 1. 2. Drop shard at DC 2, recreate as Galera cluster 3. Reestablish master-master replication, let galera cluster catch up with DC 1. 4. Redirect traffic to DC 2 5. Disconnect master-master replication between DCs, traffic goes to DC 2. 6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest. 7. Reattach traffic to DC 1. Traffic Traffic 34 of 67
  • 35. www.seznam.cz Shard 1 - DC 1 Master Slave Slave Slave Shard 1 - DC 2 Master Slave Slave Slave Master-Master 1. Disconnect master-master replication between DCs, traffic goes to DC 1. 2. Drop shard at DC 2, recreate as Galera cluster 3. Reestablish master-master replication, let galera cluster catch up with DC 1. 4. Redirect traffic to DC 2 5. Disconnect master-master replication between DCs, traffic goes to DC 2. 6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest. 7. Reattach traffic to DC 1. Traffic Traffic 35 of 67
  • 36. www.seznam.cz Shard 1 - DC 1 Master Slave Slave Slave Shard 1 - DC 2 1. Disconnect master-master replication between DCs, traffic goes to DC 1. 2. Drop shard at DC 2, recreate as Galera cluster 3. Reestablish master-master replication, let galera cluster catch up with DC 1. 4. Redirect traffic to DC 2 5. Disconnect master-master replication between DCs, traffic goes to DC 2. 6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest. 7. Reattach traffic to DC 1. Traffic Galera quorum 36 of 67
  • 37. www.seznam.cz Shard 1 - DC 1 Master Slave Slave Slave Shard 1 - DC 2 1. Disconnect master-master replication between DCs, traffic goes to DC 1. 2. Drop shard at DC 2, recreate as Galera cluster 3. Reestablish master-master replication, let galera cluster catch up with DC 1. 4. Redirect traffic to DC 2 5. Disconnect master-master replication between DCs, traffic goes to DC 2. 6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest. 7. Reattach traffic to DC 1. Traffic Galera quorum Master-Master 37 of 67
  • 38. www.seznam.cz Shard 1 - DC 1 Master Slave Slave Slave Shard 1 - DC 2 1. Disconnect master-master replication between DCs, traffic goes to DC 1. 2. Drop shard at DC 2, recreate as Galera cluster 3. Reestablish master-master replication, let galera cluster catch up with DC 1. 4. Redirect traffic to DC 2 5. Disconnect master-master replication between DCs, traffic goes to DC 2. 6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest. 7. Reattach traffic to DC 1. Traffic Galera quorum Master-Master Traffic 38 of 67
  • 39. www.seznam.cz Shard 1 - DC 1 Master Slave Slave Slave Shard 1 - DC 2 1. Disconnect master-master replication between DCs, traffic goes to DC 1. 2. Drop shard at DC 2, recreate as Galera cluster 3. Reestablish master-master replication, let galera cluster catch up with DC 1. 4. Redirect traffic to DC 2 5. Disconnect master-master replication between DCs, traffic goes to DC 2. 6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest. 7. Reattach traffic to DC 1. Galera quorum Master-Master Traffic 39 of 67
  • 40. www.seznam.cz Shard 1 - DC 1 Shard 1 - DC 2 1. Disconnect master-master replication between DCs, traffic goes to DC 1. 2. Drop shard at DC 2, recreate as Galera cluster 3. Reestablish master-master replication, let galera cluster catch up with DC 1. 4. Redirect traffic to DC 2 5. Disconnect master-master replication between DCs, traffic goes to DC 2. 6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest. 7. Reattach traffic to DC 1. Traffic 40 of 67
  • 41. www.seznam.cz Shard 1 - DC 1 Shard 1 - DC 2 1. Disconnect master-master replication between DCs, traffic goes to DC 1. 2. Drop shard at DC 2, recreate as Galera cluster 3. Reestablish master-master replication, let galera cluster catch up with DC 1. 4. Redirect traffic to DC 2 5. Disconnect master-master replication between DCs, traffic goes to DC 2. 6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest. 7. Reattach traffic to DC 1. TrafficTraffic 41 of 67
  • 44. www.seznam.cz • mysqldump + Easy to use + Can backup only selected databases/tables - No data consistency - Really slow • Percona XtraBackup + Online backup of whole tablespace + Strictly consistent + Only copies data + differential binlog - InnoDB only (again, who uses MyISAM nowadays?) - You cannot select certain databases/tables. 44 of 67
  • 46. A Glimpse into the world of HBase Audience: Beginners Michal Fizek Senior developer of Sklik, Seznam.cz
  • 49. www.seznam.cz HBase - Sklik.cz - real example ● 10 millions keywords ● 120 statistical values per keyword, per day ● for one year period: ○ ○ ● hundreds of thousands users 49 of 67
  • 50. www.seznam.cz What is HBase? ● NoSQL, BigTable(in Java), ● KeyValue, ColumnBased(ColumnFamily) ● distributed, scalable ● fault tolerant ● strong consistency (CP) ● availability? ● petabytes of data 50 of 67
  • 52. www.seznam.cz Tables and rows ⋮ User02Key01 | User02Key00 | ● keys, data ● sorted lexicographically ● binary data User02KeyZZ | UserXYKeyZY | UserXYKeyZZ | ⋮ Data ….. Data ….. Data ….. Data ….. Data ….. ⋮ ⋮ ● contain rows ● defined during design ● “readable names” Tables: Rows: 52 of 67
  • 53. www.seznam.cz Columns ● qualifier ● sparse matrix ● key-value ● binary names and data ● even names can contain data ● sorted lexicographically Rowkey User02Key00 2012/01/01->data 2012/01/02->data User02Key01 2009/12/23 -> long data 2010/12/23->data key 1-> value Key 2-> long value 53 of 67
  • 54. www.seznam.cz Versions ● every cell(column) can contain versioned data ● every value is versioned ● long integer ● again arbitrary values ● sorted in descending order ● version count can be configured 54 of 67
  • 55. www.seznam.cz Regions region User00Key00 region User00KeyZZ region UserXYKeyZY Regionserver 1 Regionserver 2 ⋮ User02Key01 | User02Key00 | User02KeyZZ | UserXYKeyZY | UserXYKeyZZ | ⋮ Data ….. Data ….. Data ….. Data ….. Data ….. ⋮ ⋮ 55 of 67
  • 56. www.seznam.cz Column Family ● columns grouped to logical “units” ● separated physical storage ● ColumnFamily based ● optimization Rowkey User02Key01 keyAA=val1, keyAB=val2 keyAA=val2, keyBB=val2 ... 2011/12/23=data, 2012/12/23=data 2011/12/23=data2 Fulltext Context 56 of 67
  • 57. www.seznam.cz Data architecture in HBase ● data in tables - sparse matrix ● binary data ● regions ● columns ; ColumnFamily ● versions ● no joins, no foreign keys ● variable scheme 57 of 67
  • 58. www.seznam.cz Coprocessors a Filters ● java classes ● server side (data locality) ● wide optimization possibilities 58 of 67
  • 59. www.seznam.cz Sequential reading ● use sorted rows and columns ● can be restricted to column family ● filters - almost “endless” optimization possibilities ● very fast 59 of 67
  • 60. www.seznam.cz Other operations ● get, scan ● put, delete ● checkAndPut, checkAndDelete, checkAndMutate ● exist, increment, append ● batch 60 of 67
  • 62. www.seznam.cz HBase use case ● lexicographically sorted rows and columns ● binary data and keys ● variable scheme ● coprocessors ● sharded data Properties 62 of 67
  • 63. www.seznam.cz ● sequential reading ● variable scheme ● data divided to collections ● really lot of data ● transactional processing ● not enough HW ● variable queries ● random writes, a lot of updates (or deletes) Pros: Cons: 63 of 67 HBase use case
  • 64. www.seznam.cz RDBMS vs HBase scheme ● entity and their relationships description vs query-first ● data normalization vs duplicated informations ● key design emphasis ● clustering 64 of 67
  • 65. www.seznam.cz HBase in Seznam.cz ● Fulltext ○ ○ ○ ○ ● Sklik ○ ○ ○ ○ 65 of 67
  • 66. www.seznam.cz 66 of 67 Next time ca 3/2018 Call for papers is open!