3. Agenda
• Need for HA
• Principle of Distributed computing.
• MySQL HA Solutions Available.
• Introduction to Galera.
• Percona Cluster.
• Node Recovery.
• Backup and Recovery.
• Load Balancers.
• Things to be considered.
4. Need For HA
• Site Reliability.
• Ensure Uptime
• Failover.
• Disaster Recovery.
• Scheduled / Unscheduled downtime.
• Avoid Single Point Failure
5. Principle of Distributed Computing
CAP Theorem
Consistency , Availability and Partitioning
Only Two out of three is possible in any distributed computing
system.
AP – MySQL Replication
CA – Galera Cluster
7. MySQL HA Solutions Available
• Master – Slave
• Master – Master
• Single Writer
• NDB Cluster ( Oracle MySQL )
• Galera Cluster
• Tungsten Replicator
• Storage Level Replication ( DRDB)
8. Introduction to Galera
• Founded By Codership
• Synchronous Replication
• Parallel Replication
• Multi-Threaded
• Automated node recovery
• Zero slave lag
• Read/ Write Scalable
• WAN Based Optimization ( Galera 3.0)
• A True Open Source.
• Support InnoDB and TokuDB ( MyISAM Experimental )
9. Introduction to Galera
What is Galera ?
Galera is a replication plugin for the synchronous and multi-
master replication to achieve HA.
“Wsrep_provider_options” controls library.
Available Distributions
• Percona XtraDB Cluster
• MariaDB Cluster
• As a plugin over MySQL
•
10. Introduction to Galera
• Shared nothing Architecture.
• Network is the heart.
What is wsrep ? (Write Set REPlication )
It is an API to connect the Galera library and control
characteristics. It helps to implement synchronous replication
and certification based multi-master-replication.
13. Percona Cluster
Why Percona Server ?
• Enhanced InnoDB (XtraDB )
• Performance Improvement
• Xtrabackup-v2 ( Makes SST better )
• Better Bug fixes.
• A better MySQL for scalability.
• Aligned with Oracle MySQL with better instrumentation and
performance patches.
14. Transaction in Galera
• Transaction is handled by Galera Plugin.
• Uses traditional dual phase commit.
• It also handles locking.
• Uses the optimistic locking method.
• The commits are based on certifications (keys).
• Smaller transaction are always better.
• Increase in network latency increases query time.
• Support InnoDB and TokuDB. MyISAM is still experimental.
16. Transaction in Galera
• Synchronous ( Virtual ) Replication.
• Wsrep_causal_reads=ON ( true synchronous).
• Auto_increment_* is handled by cluster.
• Uses GTID for Transaction. ( Not the GTID in MySQL 5.6 )
• Transaction latency increases with increase in nodes.
17. Galera Ports
As Galera is complex beyond standard MySQL .It needs multiple
ports too for its successful operation
• 3306: Standard MySQL port
• 4567: Group Communication
• 4568: IST
• 4444: SST
The firewall rule can be designed based these ports.
18. Node Recovery
• Node recovery is automated.
• Validates the gcache for state files.
• Chooses the State Transfer method
1) IST (Incremental State Transfer)
2) SST (State Snapshot Transfer )
19. Node Recovery
IST :
• Recover from write sets in gcache ( memory ).
• Faster recovery method.
• Have good gcache size.
• Does affect the node state.
20. Node Recovery
SST :
State Snapshot Transfer the complete transfer ( Cloning ) of
data to recreate a node.
u When not in Gcache ( wsrep_local_cached_downto)
u Adding a new node
Different Methods of SST
Xtrabackup-v2 (best ) , rsync , mysqldump .
21. Node Recovery
SST :
u SST will cause a node in cluster to Donor state.
u Donor selection is automatic.
u Donor selection can be forced by wsrep_sst_donor
Hack:
SST can be avoided by Full and Incremental hot backups in
node recovery. ( It forces the IST ).
Blog by Jay Janssen on bypassing SST.
22. Node Recovery
Validate Node after recovery
Wsrep_local_state=4
Wsrep_local_state_Comment
1) Joining
2) Donor/desynced
3) Joined
4) Synced
23. Wsrep_local_state_Comment
1) Joining : Initial state of node
2) Donor/desynced : Huge delay in replication write set
3) Joined : Delay less than 1000 write sets
4) Synced : Sync with all nodes
Node Recovery
24. Load Balancers
• HAProxy
• Pen Proxy
• Galera Load Balancer ( based on Pen )
• Max Scale
• ProxySQL
Note :
Make sure Load Balancer aware of the Galera State.
Use the appropriate Load balancer Algorithm.
25. Things should be considered
• Support only Transactional engines.
• Row Based replication.
• Read committed Isolation.
• Innodb_autoinc_lock_mode=2.
• Avoid huge transactions.
• Wsrep_max_ws_rows (128K)
• Wsrep_max_ws_size. (1G)
• Network is the heart.
• Keep the DB design simple
26. Things should be considered
• Foreign Keys can cause error ( Bug )
• Maintain Quorum
• Check for application error after commit
• Keep odd number of nodes