MySQL Meetup Prague - Modern Data Lake

www.seznam.cz 1 of 67
MySQL / MongoDB Meetup
3.10.2017, Prague
Agenda
- Introduction
- About Seznam and Sklik.cz from DB point of view
- Architecture and scaling of MySQL
- A glimpse into the world of HBase
- MongoDB from the DBA point of view (cancelled, sorry)
Next time (ca 3/2018)
- Call for papers is open!

Architecture in Seznam.cz and Sklik.cz
Radim Špigel
Senior developer of Sklik, Seznam.cz

●
•
●
●
●
•
●

●
●
●
●
●
●
●
●
●

●
●
●
●
●
●
●

Architecture and scaling of MySQL
Audience: Beginners
Michal Kuchta

www.seznam.cz
Common setup
• LAMP server
• Linux, Apache, MySQL, PHP
• Most common usage
• Everything on single machine
+ Easy to maintain
+ Cheap
- SPOF
- Poor performance under high load
- IO scheduling
- Splitted memory between application and DB
LAMP Server
11 of 67

www.seznam.cz
Brute force scaling
• Split database and application
• One machine for all database operations
+ Database on it’s own dedicated hardware
+ Dedicated resources
+ Better optimalization possibilities
- Another server to maintain
- Still SPOF
MySQL Server
Application server
12 of 67

www.seznam.cz
Brute force scaling
• Dedicated database server
• 128 – 256 GB RAM
• SSD drives
■ Anyone runs MySQL on HDD today?
• A lot of memory for InnoDB buffer pool - database runs from RAM
■ Anyone uses MyISAM today?
• Price? Around $19.000 per machine
MySQL Server
13 of 67

www.seznam.cz
Master
Slave Slave Slave
Horizontal scaling
• Master – Slave replications
• Writes goes to master
• Reads goest to slaves
• Good if you have high read load
• Statement based vs. row based
binlog entrys
+ Better performance for selects
(read scale-out)
+ Hot backup
+ Intentional delay
- Does not scale writes
- Replication lag (asynchronous)
- Replication tends to break sometimes
- Needs manual failover, master is SPOF
14 of 67

www.seznam.cz
Master
Slave Slave Slave
Master
Slave Slave Slave
Master – Master
replication
DC 1 DC 2
• Introduce second DC
+ Geographical fault tolerance
+ “hot” backup
+ Maintenance in one DC does not affect traffic.
- Still only one “active” master
- Where is the master?
- Cross DC lag
15 of 67

www.seznam.cz
Scaling of writes - sharding.
• Shard
• Same structure
• Different subset of data
• Horizontal scaling of writes
• Two approaches: Multitenancy routing, colocation routing
Master Master MasterShard 1 Shard 2 Shard 3
Slave Slave Slave Slave Slave Slave Slave Slave Slave
16 of 67

www.seznam.cz
Shard manager
We have to solve routing of application requests to correct shard.
17 of 67

www.seznam.cz
Shard manager
Where are John’s data?
18 of 67

www.seznam.cz
Shard manager
User John is on shard 2.
19 of 67

www.seznam.cz
Shard manager
20 of 67

www.seznam.cz
Cross-shard relations
For example messaging center
- Each message has sender
- Each message has recipient
- Potentialy each user is on different shard.
21 of 67

www.seznam.cz
Possible solution: Duplicate data
+ Good solution for static data (enums)
- Difficult to maintain consistency in case of updates
(solved on application level)
22 of 67

www.seznam.cz
Possible solution: Common data in separate database
+ Only one instance of data
+ No consistency problems
- Potentially less performant, we are back to the one
database solution.
Common DB
23 of 67

www.seznam.cz
We use both solutions on Sklik
- each is good for different subset of
data.
Common DB
24 of 67

www.seznam.cz
Summary
+ Almost unlimited horizontal scaling
+ Good for high load applications.
- Bad for analytical querying over all shards
- Common data problem
- Routing on application level
- A lot of components to monitor and maintain
25 of 67

www.seznam.cz
Balancing of shard load
- You can add another shard
- You can move data between shards
Problem: PK collisions
- No AUTO_INCREMENT
- ID allocation must be handled on application level
- We are using shard manager to assign IDs
26 of 67

www.seznam.cz
Master
Master Master
Master
Shard
- Galera - Semi-synchronous replications
- True multi-master setup
- Two phase commit
- Automatic provisioning
27 of 67

www.seznam.cz
Node 1 Node 2
BEGIN
Query 1
Query 2
Query 3
COMMIT
Transaction
Transaction transfered to other nodes
OK or ROLLBACK
Certification
Transaction applied
asynchronously
COMMIT
result
(OK or
ROLLBACK)
Physical
commit
User interaction
Additional time
required for commit
28 of 67

www.seznam.cz
+ HA without manual failover
+ Read scaling
+ Write scaling
+ Automatic resync of failed nodes
- Conflict detection at commit
- InnoDB only (who uses MyISAM these days?)
- Difficult DDL statements (rolling schema upgrade)
- Maximum transaction size 2GB
29 of 67

www.seznam.cz
Shard 1 - Master-slave
Master
Slave Slave Slave
Shard 2 - Galera
1. Prepare empty shard based on Galera
2. Migrate all users to that shard
3. Drop old shard and use hardware for new Galera shard
4. Move (some) users back to original shard
30 of 67

www.seznam.cz
Shard 1 - Master-slave
Master
Slave Slave Slave
Shard 2 - Galera
Migrate all users
31 of 67

www.seznam.cz
Shard 1 - Galera Shard 2 - Galera
32 of 67

www.seznam.cz
Migrate some users
back
Shard 2 - GaleraShard 1 - Galera
33 of 67

www.seznam.cz
Shard 1 - DC 1
Master
Slave Slave Slave
Shard 1 - DC 2
Master
Slave Slave Slave
Master-Master
1. Disconnect master-master replication between DCs, traffic goes to DC 1.
2. Drop shard at DC 2, recreate as Galera cluster
3. Reestablish master-master replication, let galera cluster catch up with DC 1.
4. Redirect traffic to DC 2
6. Drop shard at DC 1, attach it to Galera cluster in DC 2, Galera node provisioning does the rest.
7. Reattach traffic to DC 1.
Traffic Traffic
34 of 67

www.seznam.cz
Shard 1 - DC 1
Master
Slave Slave Slave
Shard 1 - DC 2
Master
Slave Slave Slave
Master-Master
Traffic Traffic
35 of 67

www.seznam.cz
Shard 1 - DC 1
Master
Slave Slave Slave
Shard 1 - DC 2
Traffic
Galera quorum
36 of 67

www.seznam.cz
Shard 1 - DC 1
Master
Slave Slave Slave
Shard 1 - DC 2
Traffic
Galera quorum
Master-Master
37 of 67

www.seznam.cz
Shard 1 - DC 1
Master
Slave Slave Slave
Shard 1 - DC 2
Traffic
Galera quorum
Master-Master Traffic
38 of 67

www.seznam.cz
Shard 1 - DC 1
Master
Slave Slave Slave
Shard 1 - DC 2
Galera quorum
Master-Master Traffic
39 of 67

www.seznam.cz
Shard 1 - DC 1 Shard 1 - DC 2
Traffic
40 of 67

www.seznam.cz
Shard 1 - DC 1 Shard 1 - DC 2
TrafficTraffic
41 of 67

www.seznam.cz
• mysqldump
+ Easy to use
+ Can backup only selected databases/tables
- No data consistency
- Really slow
• Percona XtraBackup
+ Online backup of whole tablespace
+ Strictly consistent
+ Only copies data + differential binlog
- InnoDB only (again, who uses MyISAM nowadays?)
- You cannot select certain databases/tables.
44 of 67

www.seznam.cz
SELECT question FROM audience;
45 of 67

A Glimpse into the world of HBase
Audience: Beginners
Michal Fizek

www.seznam.cz
HBase - Sklik.cz - real example
● 10 millions keywords
● 120 statistical values per keyword, per day
● for one year period:
○
○
● hundreds of thousands users
49 of 67

www.seznam.cz
What is HBase?
● NoSQL, BigTable(in Java),
● KeyValue, ColumnBased(ColumnFamily)
● distributed, scalable
● fault tolerant
● strong consistency (CP)
● availability?
● petabytes of data
50 of 67

www.seznam.cz
Data architecture in HBase
51 of 67

www.seznam.cz
Tables and rows
⋮
User02Key01 |
User02Key00 |
● keys, data
● sorted lexicographically
● binary data
User02KeyZZ |
UserXYKeyZY |
UserXYKeyZZ |
⋮
Data …..
Data …..
Data …..
Data …..
Data …..
⋮
⋮
● contain rows
● defined during design
● “readable names”
Tables:
Rows:
52 of 67

www.seznam.cz
Columns
● qualifier
● sparse matrix
● key-value
● binary names and data
● even names can contain data
● sorted lexicographically
Rowkey
User02Key00 2012/01/01->data 2012/01/02->data
User02Key01 2009/12/23 -> long data 2010/12/23->data
key 1-> value Key 2-> long value
53 of 67

www.seznam.cz
Versions
● every cell(column) can contain versioned data
● every value is versioned
● long integer
● again arbitrary values
● sorted in descending order
● version count can be configured
54 of 67

www.seznam.cz
Column Family
● columns grouped to logical “units”
● separated physical storage
● ColumnFamily based
● optimization
Rowkey
User02Key01
keyAA=val1, keyAB=val2 keyAA=val2, keyBB=val2 ...
2011/12/23=data, 2012/12/23=data 2011/12/23=data2
Fulltext Context
56 of 67

www.seznam.cz
Data architecture in HBase
● data in tables - sparse matrix
● binary data
● regions
● columns ; ColumnFamily
● versions
● no joins, no foreign keys
● variable scheme
57 of 67

www.seznam.cz
Coprocessors a Filters
● java classes
● server side (data locality)
● wide optimization possibilities
58 of 67

www.seznam.cz
Sequential reading
● use sorted rows and columns
● can be restricted to column family
● filters - almost “endless” optimization possibilities
● very fast
59 of 67

www.seznam.cz
Other operations
● get, scan
● put, delete
● checkAndPut, checkAndDelete, checkAndMutate
● exist, increment, append
● batch
60 of 67

www.seznam.cz
HBase cluster architecture
61 of 67

www.seznam.cz
HBase use case
● lexicographically sorted rows and columns
● binary data and keys
● variable scheme
● coprocessors
● sharded data
Properties
62 of 67

www.seznam.cz
● sequential reading
● variable scheme
● data divided to collections
● really lot of data
● transactional processing
● not enough HW
● variable queries
● random writes, a lot of updates (or deletes)
Pros:
Cons:
63 of 67
HBase use case

www.seznam.cz
RDBMS vs HBase scheme
● entity and their relationships description vs
query-first
● data normalization vs duplicated informations
● key design emphasis
● clustering
64 of 67

www.seznam.cz
HBase in Seznam.cz
● Fulltext
○
○
○
○
● Sklik
○
○
○
○
65 of 67

Next time ca 3/2018
Call for papers is open!

MySQL Meetup Prague - Modern Data Lake

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie MySQL Meetup Prague - Modern Data Lake

Ähnlich wie MySQL Meetup Prague - Modern Data Lake (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

MySQL Meetup Prague - Modern Data Lake