More Related Content Similar to [INSIGHT OUT 2011] A25 2 TB highly available mysql solution(alex) (20) More from Insight Technology, Inc. (20) [INSIGHT OUT 2011] A25 2 TB highly available mysql solution(alex)1. Building 2TB Highly Available
MySQL Database
Alex Gorbachev
Insight-Out Database Symposium
Tokyo, 2011
2. Alex Gorbachev
• CTO, The Pythian Group
• Blogger
• OakTable Network member
• Oracle ACE Director
• BattleAgainstAnyGuess.com
• President, Oracle RAC SIG
2 © 2009/2010 Pythian
3. Why Companies Trust Pythian
• Recognized Leader:
• Global industry-leader in remote database administration services and consulting for Oracle,
Oracle Applications, MySQL and SQL Server
• Work with over 150 multinational companies such as Western Union, Fox Interactive Media, and
MDS Inc. to help manage their complex IT deployments
• Expertise:
• One of the world’s largest concentrations of dedicated, full-time DBA expertise.
• Global Reach & Scalability:
• 24/7/365 global remote support for DBA and consulting, systems administration, special
projects or emergency response
3
8 © 2011 Pythian
4. Agenda
• Migration
• Schema, data, application code
• HA infrastructure
• Options available
• Implemented - Heartbeat cold failover cluster
• Acceptance testing
• How we simulated failures
• DR setup & backups
• Replication between two data-centers
• 2 TB on MySQL - that’s not a simple e-commerce web-site
4 © 2009/2010 Pythian
5. Project profile
• Document management solution
• Archival & retrieve
• Web front-end
• Critical availability requirements
• 1 TB 2 years ago, grown to 2+ TB by now
5 © 2009/2010 Pythian
6. Migration from Oracle RDB
• MySQL Migration Toolkit
• RDB has a package to connect via Oracle TNS
• Java
• Create and review schema
• Pump the data (1TB)
6 © 2009/2010 Pythian
7. Schema conversion
• Integer sizes mismatch - smallint, mediumint, decimal(10.2), etc...
• DATE VMS => DATE or DATETIME
• MEDIUMBLOB + LONGBLOB
• no DEFERRABLE constraints in MySQL
• character set / VARCHAR behavior (trailing space)
• Sequences => AUTO INCREMENT
• InnoDB storage => file per table
• want Oracle tablespaces there!
• page size 16 KB
• No stored procedures and modules conversion
7 © 2009/2010 Pythian
8. 1 TB data move
• ARCHIVE part
• Separate and load in advance - 800 GB
• LIVE part
• 200 GB - 30 hours
• MySQL migration toolkit
• agent mode to speed up data transfer
• Speeding up
• Disable binlogs
• Build indexes and constraints later
• Our bottleneck - single threaded MySQL Migration Toolkit
8 © 2009/2010 Pythian
9. Hardware
• Primary data-center
• 2 x IBM x3850 Servers
• Each in different chassis
• 4 quad core Intel XEON E7330, 2.4 GHz
• 16 GB RAM
• Storage IBM DS4700 Express Model 72
• Fiber-channel
• RAID5 with 6 300GB disks +spare = 1.5 TB
• DR data-center
• 1 x IBM x3850 Servers
• Same storage
9 © 2009/2010 Pythian
10. Primary DC HA: Options
• MySQL replication
• - Can loose some data (seconds), not reliable
• - Double storage requirements
• + potential to scale out
• DRBD replication
• - Performance impact in SYNC mode
• - Double storage requirements
• - no scale out (primary + mirror only)
• + reliable
• Third-party replication
• - additional cost and additional vendor
• + more reliable than standard replication
10 © 2009/2010 Pythian
11. Primary DC HA: cold failover cluster
• Heartbeat controls resources
• Shared storage
• LUN’s accessible from two servers
• ext3 - mounted on active node *only*
• no LVM - LVM is not clustered
• Virtual IP / VIP
• Up only on one node
• MySQL 5.0.67 instance is running on active node
• read-write data - must be InnoDB
• read-only data - can be MyISAM
11 © 2009/2010 Pythian
13. Heartbeat and network infrastructure
Chassis 1 Chassis 2
Switch 3 Switch 4
Single NIC used
Management Switch Single NIC used
Data
Data
RSA RSA
Port Port
Management
Port Management
Port
Crossover DB9
Female
HA Backup - RS485
HA – CAT5
Database Server 1 Single NIC
crossover Database Server 2
13 © 2009/2010 Pythian
14. Heartbeat and network infrastructure
• Private heartbeat network
• Cross-over ethernet patch-cord
• ++ Simple $100 switch - works great
• --- Expensive switch and VLAN - no good
• Serial link heartbeat
• Redundant to ethernet
• Access to RSA2 cards
• Remote reset and remote power off / lights-out
• Dedicated management network and management switches
14 © 2009/2010 Pythian
15. Shared storage setup
• Linux multipathing MPIO
• 2 HBA’s per server
• 2 controllers on SAN box
• Added the 2nd SAN box (cheap SATA disks)
• errors=panic in mount options
• default is make it read-only
• SANLUN’s visible from both nodes
• NEVER MOUNT FILESYSTEM ON BOTH NODES!!!
• ext3 is not clustered
15 © 2009/2010 Pythian
16. Heartbeat and monitoring
• Heartbeat 1.0
• Starts and stops resources in sequence
• Failure detected during start
• No resources monitoring - required Heartbeat 2.0
• Not sure if 2.0 is stable enough
• mon 1.2.0 Service Monitoring Daemon
• mon.wiki.kernel.org
• Stable
• Has number of “monitors” out-of-the-box
• Can write custom monitors
16 © 2009/2010 Pythian
17. Heartbeat resources
Start sequence (stop is reverse)
1. Virtual / floating IP
2. SAN mount points
3. MySQL daemon / instance
4. mon
5. mon-shadow
mon monitors all resource and initiates a failover
mon-shadow monitors and restarts mon only
mon monitors and restarts mon-shadow
17 © 2009/2010 Pythian
18. “mon” monitors
• msql-mysql.monitor
• fping.monitor
• freespace.monitor custom mount point monitor
• mon.monitor
On resource failure - goes to standby role.
Other potential options - stop heartbeat or reboot or
reset.
18 © 2009/2010 Pythian
19. Improving failover
• innodb_max_dirty_pages_pct=5 in my.cnf
• service_startup_timeout=60 in /etc/init.d/mysql
• Heartbeat resource manager retries offline 10 times
• /usr/lib64/heartbeat/ResourceManager => ${HA_STOPRETRYMAX=10}
• Changed to one
• mysql.pid- don’t place it on shared storage
• mon didn’t have timeout functionality
• Hacked the perl script and added timeout
19 © 2009/2010 Pythian
20. Other gotchas
• Standard MySQL monitor improvement
• Added insert/delete from a dummy table
• Standard /etc/init.d/mysql is not POSIX compliant
• mysql start returns error when MySQL is already up
• mysql stop returns error when MySQL is already down
• SELINUX=disabled
• innodb-flush-method= O_DSYNC or O_DIRECT
• ibmrsa-telnet STONITH plug-in has a bug
• http://lists.community.tummy.com/pipermail/linux-ha/2008-June/
033279.html
• Heartbeat’s test suite - BasicSanityCheck
20 © 2009/2010 Pythian
21. Acceptance testing - 42 individual tests (1)
• Node down
• power-off, halt command, cpu overload
• Network tests
• (ifconfig) -Heartbeat NIC down, app NIC down, management NIC
down
• spam serial link - cat /dev/zero >/dev/ttyS0
• pulling heartbeat cables - one at a time and together
• Storage tests
• freeze IO - dmsetup suspend --noflush lunmultipathproddb-01
• pull cables (one HBA and both HBA ports)
• mess up mount points between two servers
21 © 2009/2010 Pythian
22. Acceptance testing - 42 individual tests (2)
• MySQL daemon test
• MySQL dies - kill -9 {mysqld_pid} {mysql_safe_pid}
• MySQL hangs - kill -STOP {mysqld_pid}
• MySQL can’t connect (max connections)
• “mon” tests
• kill -9, kill -STOP, manual start on wrong node (including shadow)
• Heartbeat
• kill -9, kill -STOP
• Stopping and starting
• Graceful switchover between the nodes
22 © 2009/2010 Pythian
23. • Split into LIVE and ARCHIVE
Backup infrastructure • LIVE - InnoDB 200-500GB
• ARCHIVE - MyISAM 2 TB
• ARCHIVE backup - production
• can lock + rsync
• no LVM => no snapshot
• storage snapshot is expensive
• LIVE backup - on slave
• FLUSH ... WITH READ LOCK
• Stop slave SQL thread
• LVM snapshot or RSYNC
• Restore
• LIVE first as a whole instance
• ARCHIVE later - it’s MyISAM
23 © 2009/2010 Pythian
25. Where are we 3 years after migration?
• Data size grown to 2+ TB
• HB Cluster saved out behind number of times
• Various system failure
• failover takes only 2-3 minutes
• Several times switched over to DR
• Planned power outages and other maintenance
• HB Cluster helped a lot with maintenance
• OS patching - switchover takes tens of seconds
• Recovery has been verified and tested
• Plans?
• MySQL 5.5, RedHat Cluster Suite or HB 2.0 (consolidate other DBs)
25 © 2009/2010 Pythian
26. Q&A
Please fill in your evaluations!
Email me - gorbachev@pythian.com
Read my blog - http://www.pythian.com
Follow me on Twitter - @AlexGorbachev
Join Pythian fan club on Facebook & LinkedIn
26 © 2009/2010 Pythian