Scaling MySQL in Amazon Web Services

7.934 Aufrufe

Veröffentlicht am

For All Your Base, 2014, in Oxford. 30 minute overview of AWS, RDS and MySQL.

Veröffentlicht in: Technologie

Scaling MySQL in Amazon Web Services

  1. 1. Scaling MySQL in AWS All Your Base Conf, Oxford, UK Laine Campbell, Co-Founder October 17th, 2014
  2. 2. Who am I? DB Architect, Entrepreneur and super hero… Good hair, not overly clever, female pronouns
  3. 3. Who am I? Humor - Self effacing, self aggrandizing, dry… Cajun, rogueish, and I destroy everything I touch...
  4. 4. Who am I? When in doubt, laugh
  5. 5. Who am I? Happy Belated Ada Lovelace Day
  6. 6. Agenda ● Amazon options for implementation ● MySQL scaling patterns ● Resiliency ● Round it out ● War Stories
  7. 7. RDS and EC2/MySQL A love story...
  8. 8. Amazon RDS (DBaaS) Basic Operations Managed Ease of Deployment Supports Scaling via Replication Resilient via Replication, EBS RAID, Multi-AZ, Multi-Region
  9. 9. What is Multi-AZ? Automatic Failover, spread across availability zones Significantly reduces impact of operations such as: ● Backups ● Replica Builds ● Patching
  10. 10. Managed Operations ● Backups and Recovery ● Provisioning ● Patching ● Auto Failover ● Replication
  11. 11. What does it cost? ● Instance tax: 35% - 37% ● Provisioned IOPS: 15% ● Lack of transparency - can increase downtime
  12. 12. What does it cost? ● Multi AZ Master - db.r3.8xlarge, 1 yr reserved ○ $17,312 ● 3 replicas - b.r3.8xlarge, 1 yr reserved ○ $25,968 ● Provisioned IOPS - general purpose (new) ○ 1 TB dataset = $6,144 (5 copies) ● Total Cost RDS = $49,424 ● Off of RDS = $34,726
  13. 13. Cost Thoughts ● RDS costs $14,698/yr ○ DBA costs $144K/yr, = $108 hour (time off, productivity, retention/churn) ○ Equals 136 hours of DBA time (3.5 weeks) ● Automating via EC2 is a one time job, RDS tax is ALWAYS ○ 5 clusters costs you 680 hours of DBA time (17 weeks) ○ 5 clusters, 3 years = 2,040 hours, or 51 weeks ○ What can your DBA do with an extra 51 weeks?
  14. 14. Other Costs ● Lock-in: ○ In 5.6, you can replicate out, making this moot. ○ You have to automate once out, you could have spent this in the beginning. ● Lack of Visibility: ○ Dtrace, TCPDump, Top, VMStat, etc… ● Lack of Control: ○ Data Security, Shared Environments, Backups???? ○ Restarts due to exploits, etc...
  15. 15. Amazon EC2, Roll Your Own Build your own automation: ● Provisioning, with replication ● Configuration management ● Backup and recovery ● Instance registration Other DBaaS Options such as Tesora Trove ● Community, free ● Enterprise starts at $25,000 for 50 instances ● don’t forget RDS, 5 clusters, 1 years = $73,440
  16. 16. Choosing RDS vs EC2
  17. 17. Why use RDS? ● Legacy apps that cannot use 5.6, and can accept < 99.65% SLAs ● Low volume/traffic applications for companies who choose to not have their own operations expertise
  18. 18. Why use EC2? ● Want MariaDB or XtraDB Variants ● Want more flexibility in multi-region setups (before 5.6) ● Want portability to other clouds ● High performance and scale needs that require access to the OS, and to the FULL DB instance
  19. 19. Storage Options in AWS Type RDS EC2 Persistent Max IOPS Max Throughput Cost Pure SSD No Yes No 100,000 390 MBps Free w/Instance Cost EBS General Purpose SSD Yes Yes Yes 3,000 128 MBps $.10/GB/month EBS PIOPS SSD Yes Yes Yes 4,000 128 MBps $.125/GB/month + $. 065/PIOPS/month EBS Magnetic Yes Yes Yes 40-200 40-90 MBps $.05/GB/month + $. 05/million IOPS
  20. 20. Scaling MySQL at AWS
  21. 21. My Definition of Scaling ● Capacity is elastic and automated ● Performance stays reasonably consistent ● Availability scales ● Resiliency scales ● Operational visibility scales ● Backup and recovery scales
  22. 22. MySQL Workload Scaling ● Break out workloads to their own clusters ○ To facilitate sharding large data-sets horizontally ○ To segregate specific workload characteristics ● Evaluate each workload’s read/write needs Total dataset size and growth Data change delta (updates/deletes)
  23. 23. MySQL Workload Scaling User Login/Profile 1 TB (1MM User) 20,000 iops peak 10% write 90% read User Content 5 TB 500,000 iops peak 1% write 99% read Site Metadata 5 GB 500 iops peak 25% write Shard Candidate 75% read Shard Candidate
  24. 24. MySQL Workload Scaling Determine sharding size based on constraints ○ AWS Write IO ○ Replication Limits ○ Tolerance for large numbers of systems ○ Budget
  25. 25. MySQL Workload Scaling Sharding Topology: ● Schema:Shard relationship 1:1 ● Instance:Schema relationship 1:N ● Host:Instance relationship 1:1 Host 1 Instance 1 Shard 1 Shard 2
  26. 26. MySQL Workload Scaling User Profile Data - 10 shards, hashable ○ Per Shard: 2,000 (500MB) iops, 200 wps (50MB), 100 GB storage ■ Two replication threads/shards per cluster ○ SSD General Purpose EBS, 2 300 GB Volumes Striped ( ■ 1,800 IOPS, 6000 burst, 800 MBps ○ 3,600 rps (900 MBps) requires 2 replicas (500 MBps), +1 for redundancy ○ 1 master, 1 failover, 3 replicas = 25 hosts, r3.2x lg ■ Memory = 61 GB > active dataset ■ Assumes read/write splitting ○ Total Cost = $48,600 instances + $18,000 storage = $66,600
  27. 27. MySQL Workload Scaling Host 1 Instance 1 Shard 1 Instance 2 Keep Schema:Shard relationship 1:1 Change Schema:Instance relationship from 1:N to 1:1 Change Instance:Host relationship from 1:1 to 1:N Shard 2
  28. 28. MySQL Workload Scaling Summary ○ Final Constraint is Write IOPS ○ Sharding eliminates Constraint ○ There are ways to reduce those reads: ■ Caching ○ There are ways to reduce those writes: ■ Throttling concurrency with queuing and loose coupling to keep write IO down ■ Compression! Native or application based ○ Moving storage to ephemeral SSD saves $$$$ ■ If you truly can leverage backup and recovery with small datasets
  29. 29. MySQL Workload Scaling Master Replica Replica Failover Shard 1 Master Replica Replica Failover Shard 2
  30. 30. Resiliency Layers ● Sharding ○ Shard N = y% of traffic, where y = 1/Shards*100 ○ Aka with 64 shards, one shard lost = 1.56% ● EBS Snapshots ○ Rapid redeployment of failed nodes (kill, rebuild)
  31. 31. Type of Change EC2 RDS Master (Non Multi-AZ) RDS Master (Multi-AZ) RDS Replica Instance resize up/down Rolling Migrations Moderate Downtime Minimal Downtime Moderate Downtime (take out of service) EBS <-> PIOPS Severe Perf impact Severe Perf impact Minor Perf impact Severe Perf Impact (take out of service) PIOPS Amount Change Minor Perf impact Minor Perf impact Minor Perf impact Severe Perf Impact (take out of service) Disk Space Change (add) Severe Perf impact Severe Perf impact Minor Perf impact Severe Perf Impact (take out of service) Disk Space Change (reduce) Rolling Migrations Severe Downtime (promote from replica) Minimal Downtime Moderate Downtime (take out of service)
  32. 32. Power Up - Resiliency via Geography
  33. 33. MySQL Workload Scaling S1 Master US East 1 Replica Replica S2 Failover During failover, spawn replicas US West 1 AZ 2 AZ 3 AZ 1 Future Replica Future Replica AZ 2 AZ 3 Nginx Nginx
  34. 34. Cluster Management and Failover ● Roll your own…. ○ Config Mgmt ○ Automation Scripting ○ HAProxy ● Spend some dough ○ RDS ○ Continuent Tungsten ○ Scalearc ● Bleeding edge but clotting ○ Galera, via MariaDB or XtraDB Cluster ● Bleeding edge and fresh ○ Openstack, Trove and Tesora
  35. 35. Rounding it Out: Operational Visibility ● Monitoring and Alerting: Sensu (not Nagios) ● Time Series Trending: Graphite or OpenTSDB ● Graphing of Data: Grafana ● Log Aggregation and Management: Logstash or Splunk ● Application Monitoring: New Relic or AppDynamics
  36. 36. Rounding it Out: Backup and Recovery ● Sharding keeps systems small and agile ○ Snapshots for tactical kill and build ○ S3 for longer-term ○ Glacier forever ○ Offsite for Legal
  37. 37. Questions and Follow up ○ lcampbell@pythian.com ○ www.pythian.com ○ www.linkedin.com/lainecampbell ○ www.adafoundation.org

×