SlideShare a Scribd company logo
1 of 47
Download to read offline
Scaling MySQL in AWS
Presented by: Laine Campbell
April 3rd, 2014
Agenda
1. Overview of options: RDS and EC2/MySQL
2. MySQL scaling patterns
3. Performance/Availability
4. Implementation choices
5. Common failure patterns
Who the *&%^#$ am I?
Laine Campbell
Co-Founder and CEO, Blackbird (formerly PalominoDB)
9 years building the DB team/infrastructure at
Travelocity.
7 years at PalominoDB/Blackbird, supporting 50+
companies, 1000s of databases and way too much
coffee.
AWS Options for MySQL:
RDS and EC2/MySQL
A love story...
AWS Relational Database Service
(RDS)
Basic Operations Managed
Ease of Deployment
Supports Scaling via Replication
Reliable via Replication, EBS RAID, Multi-AZ
Managed Operations
Backups and Recovery
Provisioning
Patching
Auto Failover
Replication
RDS Backup and Recovery
Storage is done via EBS
Snapshot and binlog based (point in time)
A Non Multi-AZ implementation creates spikes in
latency during backups
Avoided in Multi-AZ via backups on the secondary
Snapshots only
Advanced Backup and Recovery
Creating non-RDS backups done via mysqldump,
mydumper, custom extraction
You can create non-RDS replicas using a logical
backup in 5.6 only
non-RDS replicas will break during AZ failovers - thus
not useful for production or for large datasets
Disaster Recovery
Cross region replication is
supported in 5.6
Cross region replication incurs
cross-region data transfer
costs
Relay replicas recommended if
you wish to minimize expenses
Provisioning
Initial creation of single or multi-AZ
masters
Single command replica creation
(serialized)
via snapshots, multi-AZ avoids a
one minute IO suspension.
Patching
Automatically managed in
maintenance windows
Alerts sent for the coming week, so
you can determine impact,
reschedule, etc…
Multi-AZ mitigates impact of
invasive maintenance
RDS Challenges (Opportunities?)
Abstraction from kernel, OS processlist, OS commands
etc...
No SUPER access, changes to management via Stored
Procedure (minimal but annoying)
Log access becomes more challenging (but
manageable)
The more experienced of an operator you are, the
grumpier you will be!
RDS Challenges (Opportunities?)
Snapshot backups not
portable/accessible outside of
RDS
Multi-AZ failover can strand
replicas when relaxing binlog
consistency for performance.
(sync_binlog=0).
Without the ability to manually
CHANGE MASTER, one must
rebuild all replicas after a failover.
RDS Visibility Impacts
Agent based instrumentation that requires localhost
installation won’t work
No access to TCPDUMP/Port listening
SAR, processlist for swapping, vmstat, iostat etc...
Log forensics become harder but manageable (must
download first)
EC2 and MySQL
All the MySQL you’ve come to love and hate
Any topologies you can dream
Access to many more types of instances and storage
Why RDS or EC2?
You can’t run 5.6, and you can’t tolerate the risk of
single region? (~99.65% SLA per month) Use EC2
You don’t have operational expertise to manage
backups, provisioning and replication? Use RDS
pro-tip, if you can’t manage a system, how can you
troubleshoot advanced performance issues with the
visibility issues in RDS?
Why RDS or EC2?
Want MariaDB, XtraDB? Use EC2
Large data-sets generally require file level backups and
portability? Use EC2
pro-tip, if you can’t get a mysqldump or a parallel dump
to load/export in a timely fashion, you probably don’t
want RDS
Scaling Patterns for MySQL in AWS
Scaling in RDS - Vertical
RAM up to 244 GB per instance, creating excellent
ability to put large datasets in RAM
Network performance up to 10 GB
CPU up to 32 cores
Provisioned IOPs are game changers, and mandatory
for production, performance sensitive applications.
Scaling in RDS - Provisioned IOPs
1,000 - 30,000 IOPS
100 GB to 3 TB
Stable, predictable IO
Realizing Max IOPS - 20,000
● cr1.8xlarge Instance Type
● MySQL 16 KB Page Size
● Full Duplex IO Channel
● 50% reads, 50% writes
Scaling in RDS - Provisioned IOPs
Overprovisioning from realized, can create latency
reductions
● In an unbalanced workload, for instance reads
consuming channel limits
● Write channel bandwidth remains unsaturated
● By doubling IOPS, you increase concurrency, thus
reducing latency. Transaction rates increase
● Consumption of IOPS can reduce as transaction
rates increase, and manifest as:
○ Improved use of group commit
○ larger log writes
Scaling in RDS - Reads
Native replication allows for scale out of reads, just as
in EC2 or your own datacenter
RAM up to 244 GB per instance, creating much better
ability to put large datasets in RAM
5.6 allows for the memcache plugin
Scaling in RDS - Writes
Like any system, you must split workloads if writes
consume max capacity of PIOPS.
● Functional Partitioning
● Sharding
Scaling in RDS - Concerns
Sharding:
● Management of RDS instances to roll shards up and
down can be a new paradigm.
● Overall, this can be done, but does require a logical
shift.
Resource Constraints:
● No access to SSDs (up to 91,250 read or 78,750
write IOPS of 14KB size)
Data Movements:
● No access to data copies outside of replica builds
can dramatically increase data movement time
Scaling in EC2 - Vertical
Higher variety of instances. Similar top level
constraints of:
● RAM
● CPU
● PIOPS
● Network
Ephemeral storage SSD create a whole new class of IO
performance: (up to 91,250 read or 78,750 write IOPS
of 14KB size)
Scaling in EC2 - Reads
In addition to standard MySQL replication, you have
new options
● Galera, MariaDB/Galera and XtraDB Cluster
● Tungsten Replicator and Cluster
Scaling in EC2 - Writes
Sharding still becomes necessary, but in EC2 over
RDS, one has access to snapshots:
● Management of large datasets becomes much
easier
● Shard management functions in more typical
paradigms
Scaling in EC2 - Concerns
SSD and Ephemeral Storage
● Instances become even more volatile
● Backups via EBS snapshot are impossible, requiring
LVMs or similar
● One might consider keeping writes to PIOPs max
(20,000) for writes and leverage SSD for reads
Availability for MySQL in AWS
AWS Availability: Regions and Zones
AWS Availability: Regions and Zones
Amazon Regions equate to data-centers in different
geographical regions.
Availability zones are isolated from one another in the
same region to minimize impact of failures.
AWS Availability: Regions and Zones
Amazon states AZs do not share :
•Cooling
•Network
•Security
•Generators
•Facilities
AWS Availability: Regions and Zones
Apr, 2011 - US East Region EBS Failed
● Incorrect network failover.
● Saturated intra-node communications.
● Cascading failures impacted EBS in all AZs.
Jul, 2012 - US East Partial Impact
● Electrical storms impacted multiple sites.
● Failover of metadata DB took too long.
● EBS I/O was frozen to minimize corruption.
AWS Availability: Regions and Zones
99.95% Monthly SLA for a region (multiple AZs)
● Implies multiple AZ is mandatory
● Implies multi-region is necessary for 99.99% or
higher
Availability in RDS - Multi-AZ
The core of an HA solution
Block level replication, active/passive
Saves you from most master crashes
Reduces impact of backups, upgrades, locks for
provisioning replicas
When not in 5.6, and using log_sync != 1, you often
lose replicas during failover
Availability in RDS - Multi-AZ
IO impact from
replication
You do not get to choose
the failover AZ, meaning
you must be ready to
move app servers
Availability in RDS - Replicas
Redundant replicas make total sense. N+1 meets most
needs with the ease of provisioning
You must have replicas in every AZ you have app
servers in (if using replicas for reads)
AWS states cross-AZ latency impact of low single digit
millisecond impact. Real world indicates occasional
much larger spikes
Availability in RDS - Replicas
Redundant replicas make total sense. N+1 meets most
needs with the ease of provisioning
You must have replicas in every AZ you have app
servers in (if using replicas for reads)
AWS states cross-AZ latency impact of low single digit
millisecond impact. Real world indicates occasional
much larger spikes
Availability in EC2 - Options
You can use Galera, XtraDB Cluster, or similar for a
read/write anywhere solution
MySQL MHA can be used to do failovers
Continuent’s Tungsten product can also manage
failovers
AWS Benefits: Dynamicity
AWS Availability: Regions and Zones
Type of Change EC2 RDS Master
(Non Multi-AZ)
RDS Master
(Multi-AZ)
RDS Replica
Instance resize
up/down
Rolling
Migrations
Moderate
Downtime
Minimal
Downtime
Moderate
Downtime (take out
of service)
EBS <-> PIOPS Severe
Performance
impact.
Severe
Performance
impact.
Minor
Performance
impact.
Severe
Performance
Impact (take out of
service)
PIOPS Amount
Change
Minor
Performance
impact.
Minor
Performance
impact.
Minor
Performance
impact.
Performance
Impact (take out of
service)
Disk Space Change
(add)
Performance
impact.
Performance
impact.
Minor
Performance
impact.
Performance
Impact (take out of
service)
Disk Space Change
(reduce)
Rolling
Migrations
Moderate
Downtime
Moderate
Downtime
Moderate
Downtime (take out
of service)
AWS Failure Scenarios
Predicting and Managing Failure
Operations is about managing
change and mitigating risk
Predicting and Managing Failure
Local Failures
• Database crashes
• Human error
o Misconfigure
o Write to a replica
o Drop a table/database/career
• Localized EBS hangs and corruption
• Unacceptable/unpredictable performance
Predicting and Managing Failure
Local Failures
● When it goes bad, don’t waste time diagnosing.
o Shoot it in the head!
● Plan!
○ Simulate availability and region level failures
○ Wipe storage, reduce IOPS, shut down
○ Chaos monkey is your friend
● Observe!
○ Monitor for early failures, predict
Predicting and Managing Failure
Mitigation
In RDS:
Use Multi-AZ
Use replicas in multiple AZs
Replicate to multiple regions, and out of AWS
In EC2:
Use a failover (Galera, Tungsten, MHA/HAProxy)
Use multiple AZs and regions
Frequent Backups (practicing restores)

More Related Content

What's hot

Deep Dive on Amazon Aurora - Covering New Feature Announcements
Deep Dive on Amazon Aurora - Covering New Feature AnnouncementsDeep Dive on Amazon Aurora - Covering New Feature Announcements
Deep Dive on Amazon Aurora - Covering New Feature AnnouncementsAmazon Web Services
 
(DAT405) Amazon Aurora Deep Dive
(DAT405) Amazon Aurora Deep Dive(DAT405) Amazon Aurora Deep Dive
(DAT405) Amazon Aurora Deep DiveAmazon Web Services
 
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...Amazon Web Services
 
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...Continuent
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASAshnikbiz
 
Data Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWSData Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWSJohn McCormack
 
Amazon Aurora Let's Talk About Performance
Amazon Aurora Let's Talk About PerformanceAmazon Aurora Let's Talk About Performance
Amazon Aurora Let's Talk About PerformanceDanilo Poccia
 
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...Amazon Web Services
 
AWS Summit London 2014 | Maximising EC2 and EBC Performance (400)
AWS Summit London 2014 | Maximising EC2 and EBC Performance (400)AWS Summit London 2014 | Maximising EC2 and EBC Performance (400)
AWS Summit London 2014 | Maximising EC2 and EBC Performance (400)Amazon Web Services
 
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksMigrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksAmazon Web Services
 
Migrating and Running DBs on Amazon RDS for Oracle
Migrating and Running DBs on Amazon RDS for OracleMigrating and Running DBs on Amazon RDS for Oracle
Migrating and Running DBs on Amazon RDS for OracleMaris Elsins
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesPhil Peace
 
(SDD415) NEW LAUNCH: Amazon Aurora: Amazon’s New Relational Database Engine |...
(SDD415) NEW LAUNCH: Amazon Aurora: Amazon’s New Relational Database Engine |...(SDD415) NEW LAUNCH: Amazon Aurora: Amazon’s New Relational Database Engine |...
(SDD415) NEW LAUNCH: Amazon Aurora: Amazon’s New Relational Database Engine |...Amazon Web Services
 
Amazon RDS for MySQL: Best Practices and Migration
Amazon RDS for MySQL: Best Practices and MigrationAmazon RDS for MySQL: Best Practices and Migration
Amazon RDS for MySQL: Best Practices and MigrationAmazon Web Services
 
Avoiding the ring of death
Avoiding the ring of deathAvoiding the ring of death
Avoiding the ring of deathAishvarya Verma
 
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Amazon Web Services
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScyllaDB
 
Scaling Up and Out your Virtualized SQL Servers
Scaling Up and Out your Virtualized SQL ServersScaling Up and Out your Virtualized SQL Servers
Scaling Up and Out your Virtualized SQL Serversheraflux
 

What's hot (20)

Deep Dive on Amazon Aurora - Covering New Feature Announcements
Deep Dive on Amazon Aurora - Covering New Feature AnnouncementsDeep Dive on Amazon Aurora - Covering New Feature Announcements
Deep Dive on Amazon Aurora - Covering New Feature Announcements
 
(DAT405) Amazon Aurora Deep Dive
(DAT405) Amazon Aurora Deep Dive(DAT405) Amazon Aurora Deep Dive
(DAT405) Amazon Aurora Deep Dive
 
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
 
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
 
Data Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWSData Scotland 2019: You can run SQL Server on AWS
Data Scotland 2019: You can run SQL Server on AWS
 
Amazon Aurora Let's Talk About Performance
Amazon Aurora Let's Talk About PerformanceAmazon Aurora Let's Talk About Performance
Amazon Aurora Let's Talk About Performance
 
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
 
AWS Summit London 2014 | Maximising EC2 and EBC Performance (400)
AWS Summit London 2014 | Maximising EC2 and EBC Performance (400)AWS Summit London 2014 | Maximising EC2 and EBC Performance (400)
AWS Summit London 2014 | Maximising EC2 and EBC Performance (400)
 
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksMigrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
 
Migrating and Running DBs on Amazon RDS for Oracle
Migrating and Running DBs on Amazon RDS for OracleMigrating and Running DBs on Amazon RDS for Oracle
Migrating and Running DBs on Amazon RDS for Oracle
 
What's New in Amazon Aurora
What's New in Amazon AuroraWhat's New in Amazon Aurora
What's New in Amazon Aurora
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
(SDD415) NEW LAUNCH: Amazon Aurora: Amazon’s New Relational Database Engine |...
(SDD415) NEW LAUNCH: Amazon Aurora: Amazon’s New Relational Database Engine |...(SDD415) NEW LAUNCH: Amazon Aurora: Amazon’s New Relational Database Engine |...
(SDD415) NEW LAUNCH: Amazon Aurora: Amazon’s New Relational Database Engine |...
 
Amazon RDS for MySQL: Best Practices and Migration
Amazon RDS for MySQL: Best Practices and MigrationAmazon RDS for MySQL: Best Practices and Migration
Amazon RDS for MySQL: Best Practices and Migration
 
Avoiding the ring of death
Avoiding the ring of deathAvoiding the ring of death
Avoiding the ring of death
 
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
 
Your First Week on AWS
Your First Week on AWSYour First Week on AWS
Your First Week on AWS
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
 
Scaling Up and Out your Virtualized SQL Servers
Scaling Up and Out your Virtualized SQL ServersScaling Up and Out your Virtualized SQL Servers
Scaling Up and Out your Virtualized SQL Servers
 

Similar to Percona Live 2014 - Scaling MySQL in AWS

RDS for MySQL, No BS Operations and Patterns
RDS for MySQL, No BS Operations and PatternsRDS for MySQL, No BS Operations and Patterns
RDS for MySQL, No BS Operations and PatternsLaine Campbell
 
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...Amazon Web Services
 
PASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best PracticesPASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best PracticesAmazon Web Services
 
What’s new in Amazon RDS - ADB207 - Chicago AWS Summit
What’s new in Amazon RDS - ADB207 - Chicago AWS SummitWhat’s new in Amazon RDS - ADB207 - Chicago AWS Summit
What’s new in Amazon RDS - ADB207 - Chicago AWS SummitAmazon Web Services
 
What's new in Amazon RDS - ADB206 - New York AWS Summit
What's new in Amazon RDS - ADB206 - New York AWS SummitWhat's new in Amazon RDS - ADB206 - New York AWS Summit
What's new in Amazon RDS - ADB206 - New York AWS SummitAmazon Web Services
 
MySQL on AWS RDS
MySQL on AWS RDSMySQL on AWS RDS
MySQL on AWS RDSMydbops
 
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and ScalableAmazon Web Services
 
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017Amazon Web Services
 
AWS Webcast - Introduction to RDS Low Admin High Perf DBS
AWS Webcast - Introduction to RDS Low Admin High Perf DBSAWS Webcast - Introduction to RDS Low Admin High Perf DBS
AWS Webcast - Introduction to RDS Low Admin High Perf DBSAmazon Web Services
 
AWS Certified Solutions Architect Professional Course S10-S14
AWS Certified Solutions Architect Professional Course S10-S14AWS Certified Solutions Architect Professional Course S10-S14
AWS Certified Solutions Architect Professional Course S10-S14Neal Davis
 
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016Amazon Web Services
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Dave Anselmi
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceAmazon Web Services
 
Reducing Database Costs via Shard Consolidation
Reducing Database Costs via Shard ConsolidationReducing Database Costs via Shard Consolidation
Reducing Database Costs via Shard ConsolidationAmazon Web Services
 
Scaling web application in the Cloud
Scaling web application in the CloudScaling web application in the Cloud
Scaling web application in the CloudFederico Feroldi
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWSAmazon Web Services
 
Amazon relational database service (rds)
Amazon relational database service (rds)Amazon relational database service (rds)
Amazon relational database service (rds)AWS Riyadh User Group
 

Similar to Percona Live 2014 - Scaling MySQL in AWS (20)

RDS for MySQL, No BS Operations and Patterns
RDS for MySQL, No BS Operations and PatternsRDS for MySQL, No BS Operations and Patterns
RDS for MySQL, No BS Operations and Patterns
 
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
 
PASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best PracticesPASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best Practices
 
What’s new in Amazon RDS - ADB207 - Chicago AWS Summit
What’s new in Amazon RDS - ADB207 - Chicago AWS SummitWhat’s new in Amazon RDS - ADB207 - Chicago AWS Summit
What’s new in Amazon RDS - ADB207 - Chicago AWS Summit
 
What's new in Amazon RDS - ADB206 - New York AWS Summit
What's new in Amazon RDS - ADB206 - New York AWS SummitWhat's new in Amazon RDS - ADB206 - New York AWS Summit
What's new in Amazon RDS - ADB206 - New York AWS Summit
 
MySQL on AWS RDS
MySQL on AWS RDSMySQL on AWS RDS
MySQL on AWS RDS
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable
 
Amazon Aurora (Debanjan Saha) - AWS DB Day
Amazon Aurora (Debanjan Saha) - AWS DB DayAmazon Aurora (Debanjan Saha) - AWS DB Day
Amazon Aurora (Debanjan Saha) - AWS DB Day
 
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017
 
AWS Webcast - Introduction to RDS Low Admin High Perf DBS
AWS Webcast - Introduction to RDS Low Admin High Perf DBSAWS Webcast - Introduction to RDS Low Admin High Perf DBS
AWS Webcast - Introduction to RDS Low Admin High Perf DBS
 
AWS Certified Solutions Architect Professional Course S10-S14
AWS Certified Solutions Architect Professional Course S10-S14AWS Certified Solutions Architect Professional Course S10-S14
AWS Certified Solutions Architect Professional Course S10-S14
 
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
Bases de datos en la nube con AWS
Bases de datos en la nube con AWSBases de datos en la nube con AWS
Bases de datos en la nube con AWS
 
Reducing Database Costs via Shard Consolidation
Reducing Database Costs via Shard ConsolidationReducing Database Costs via Shard Consolidation
Reducing Database Costs via Shard Consolidation
 
Scaling web application in the Cloud
Scaling web application in the CloudScaling web application in the Cloud
Scaling web application in the Cloud
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS
 
Amazon relational database service (rds)
Amazon relational database service (rds)Amazon relational database service (rds)
Amazon relational database service (rds)
 

More from Pythian

DB Engineering - From Antiquated to Engineer
DB Engineering - From Antiquated to EngineerDB Engineering - From Antiquated to Engineer
DB Engineering - From Antiquated to EngineerPythian
 
TechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWSTechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWSPythian
 
MySQL administration in Amazon RDS
MySQL administration in Amazon RDSMySQL administration in Amazon RDS
MySQL administration in Amazon RDSPythian
 
Maximizing SQL Reviews and Tuning with pt-query-digest
Maximizing SQL Reviews and Tuning with pt-query-digestMaximizing SQL Reviews and Tuning with pt-query-digest
Maximizing SQL Reviews and Tuning with pt-query-digestPythian
 
Online Schema Changes for Maximizing Uptime
 Online Schema Changes for Maximizing Uptime Online Schema Changes for Maximizing Uptime
Online Schema Changes for Maximizing UptimePythian
 
MYSQL Patterns in Amazon - Make the Cloud Work For You
MYSQL Patterns in Amazon - Make the Cloud Work For YouMYSQL Patterns in Amazon - Make the Cloud Work For You
MYSQL Patterns in Amazon - Make the Cloud Work For YouPythian
 
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxMYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxPythian
 
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityRamp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityPythian
 
Pdb my sql backup london percona live 2012
Pdb my sql backup   london percona live 2012Pdb my sql backup   london percona live 2012
Pdb my sql backup london percona live 2012Pythian
 

More from Pythian (9)

DB Engineering - From Antiquated to Engineer
DB Engineering - From Antiquated to EngineerDB Engineering - From Antiquated to Engineer
DB Engineering - From Antiquated to Engineer
 
TechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWSTechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWS
 
MySQL administration in Amazon RDS
MySQL administration in Amazon RDSMySQL administration in Amazon RDS
MySQL administration in Amazon RDS
 
Maximizing SQL Reviews and Tuning with pt-query-digest
Maximizing SQL Reviews and Tuning with pt-query-digestMaximizing SQL Reviews and Tuning with pt-query-digest
Maximizing SQL Reviews and Tuning with pt-query-digest
 
Online Schema Changes for Maximizing Uptime
 Online Schema Changes for Maximizing Uptime Online Schema Changes for Maximizing Uptime
Online Schema Changes for Maximizing Uptime
 
MYSQL Patterns in Amazon - Make the Cloud Work For You
MYSQL Patterns in Amazon - Make the Cloud Work For YouMYSQL Patterns in Amazon - Make the Cloud Work For You
MYSQL Patterns in Amazon - Make the Cloud Work For You
 
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxMYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
 
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityRamp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
 
Pdb my sql backup london percona live 2012
Pdb my sql backup   london percona live 2012Pdb my sql backup   london percona live 2012
Pdb my sql backup london percona live 2012
 

Recently uploaded

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 

Recently uploaded (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 

Percona Live 2014 - Scaling MySQL in AWS

  • 1. Scaling MySQL in AWS Presented by: Laine Campbell April 3rd, 2014
  • 2. Agenda 1. Overview of options: RDS and EC2/MySQL 2. MySQL scaling patterns 3. Performance/Availability 4. Implementation choices 5. Common failure patterns
  • 3. Who the *&%^#$ am I? Laine Campbell Co-Founder and CEO, Blackbird (formerly PalominoDB) 9 years building the DB team/infrastructure at Travelocity. 7 years at PalominoDB/Blackbird, supporting 50+ companies, 1000s of databases and way too much coffee.
  • 4. AWS Options for MySQL: RDS and EC2/MySQL A love story...
  • 5. AWS Relational Database Service (RDS) Basic Operations Managed Ease of Deployment Supports Scaling via Replication Reliable via Replication, EBS RAID, Multi-AZ
  • 6. Managed Operations Backups and Recovery Provisioning Patching Auto Failover Replication
  • 7. RDS Backup and Recovery Storage is done via EBS Snapshot and binlog based (point in time) A Non Multi-AZ implementation creates spikes in latency during backups Avoided in Multi-AZ via backups on the secondary Snapshots only
  • 8. Advanced Backup and Recovery Creating non-RDS backups done via mysqldump, mydumper, custom extraction You can create non-RDS replicas using a logical backup in 5.6 only non-RDS replicas will break during AZ failovers - thus not useful for production or for large datasets
  • 9. Disaster Recovery Cross region replication is supported in 5.6 Cross region replication incurs cross-region data transfer costs Relay replicas recommended if you wish to minimize expenses
  • 10. Provisioning Initial creation of single or multi-AZ masters Single command replica creation (serialized) via snapshots, multi-AZ avoids a one minute IO suspension.
  • 11. Patching Automatically managed in maintenance windows Alerts sent for the coming week, so you can determine impact, reschedule, etc… Multi-AZ mitigates impact of invasive maintenance
  • 12. RDS Challenges (Opportunities?) Abstraction from kernel, OS processlist, OS commands etc... No SUPER access, changes to management via Stored Procedure (minimal but annoying) Log access becomes more challenging (but manageable) The more experienced of an operator you are, the grumpier you will be!
  • 13. RDS Challenges (Opportunities?) Snapshot backups not portable/accessible outside of RDS Multi-AZ failover can strand replicas when relaxing binlog consistency for performance. (sync_binlog=0). Without the ability to manually CHANGE MASTER, one must rebuild all replicas after a failover.
  • 14. RDS Visibility Impacts Agent based instrumentation that requires localhost installation won’t work No access to TCPDUMP/Port listening SAR, processlist for swapping, vmstat, iostat etc... Log forensics become harder but manageable (must download first)
  • 15. EC2 and MySQL All the MySQL you’ve come to love and hate Any topologies you can dream Access to many more types of instances and storage
  • 16. Why RDS or EC2? You can’t run 5.6, and you can’t tolerate the risk of single region? (~99.65% SLA per month) Use EC2 You don’t have operational expertise to manage backups, provisioning and replication? Use RDS pro-tip, if you can’t manage a system, how can you troubleshoot advanced performance issues with the visibility issues in RDS?
  • 17. Why RDS or EC2? Want MariaDB, XtraDB? Use EC2 Large data-sets generally require file level backups and portability? Use EC2 pro-tip, if you can’t get a mysqldump or a parallel dump to load/export in a timely fashion, you probably don’t want RDS
  • 18. Scaling Patterns for MySQL in AWS
  • 19. Scaling in RDS - Vertical RAM up to 244 GB per instance, creating excellent ability to put large datasets in RAM Network performance up to 10 GB CPU up to 32 cores Provisioned IOPs are game changers, and mandatory for production, performance sensitive applications.
  • 20. Scaling in RDS - Provisioned IOPs 1,000 - 30,000 IOPS 100 GB to 3 TB Stable, predictable IO Realizing Max IOPS - 20,000 ● cr1.8xlarge Instance Type ● MySQL 16 KB Page Size ● Full Duplex IO Channel ● 50% reads, 50% writes
  • 21. Scaling in RDS - Provisioned IOPs Overprovisioning from realized, can create latency reductions ● In an unbalanced workload, for instance reads consuming channel limits ● Write channel bandwidth remains unsaturated ● By doubling IOPS, you increase concurrency, thus reducing latency. Transaction rates increase ● Consumption of IOPS can reduce as transaction rates increase, and manifest as: ○ Improved use of group commit ○ larger log writes
  • 22. Scaling in RDS - Reads Native replication allows for scale out of reads, just as in EC2 or your own datacenter RAM up to 244 GB per instance, creating much better ability to put large datasets in RAM 5.6 allows for the memcache plugin
  • 23. Scaling in RDS - Writes Like any system, you must split workloads if writes consume max capacity of PIOPS. ● Functional Partitioning ● Sharding
  • 24. Scaling in RDS - Concerns Sharding: ● Management of RDS instances to roll shards up and down can be a new paradigm. ● Overall, this can be done, but does require a logical shift. Resource Constraints: ● No access to SSDs (up to 91,250 read or 78,750 write IOPS of 14KB size) Data Movements: ● No access to data copies outside of replica builds can dramatically increase data movement time
  • 25. Scaling in EC2 - Vertical Higher variety of instances. Similar top level constraints of: ● RAM ● CPU ● PIOPS ● Network Ephemeral storage SSD create a whole new class of IO performance: (up to 91,250 read or 78,750 write IOPS of 14KB size)
  • 26. Scaling in EC2 - Reads In addition to standard MySQL replication, you have new options ● Galera, MariaDB/Galera and XtraDB Cluster ● Tungsten Replicator and Cluster
  • 27. Scaling in EC2 - Writes Sharding still becomes necessary, but in EC2 over RDS, one has access to snapshots: ● Management of large datasets becomes much easier ● Shard management functions in more typical paradigms
  • 28. Scaling in EC2 - Concerns SSD and Ephemeral Storage ● Instances become even more volatile ● Backups via EBS snapshot are impossible, requiring LVMs or similar ● One might consider keeping writes to PIOPs max (20,000) for writes and leverage SSD for reads
  • 31. AWS Availability: Regions and Zones Amazon Regions equate to data-centers in different geographical regions. Availability zones are isolated from one another in the same region to minimize impact of failures.
  • 32. AWS Availability: Regions and Zones Amazon states AZs do not share : •Cooling •Network •Security •Generators •Facilities
  • 33. AWS Availability: Regions and Zones Apr, 2011 - US East Region EBS Failed ● Incorrect network failover. ● Saturated intra-node communications. ● Cascading failures impacted EBS in all AZs. Jul, 2012 - US East Partial Impact ● Electrical storms impacted multiple sites. ● Failover of metadata DB took too long. ● EBS I/O was frozen to minimize corruption.
  • 34. AWS Availability: Regions and Zones 99.95% Monthly SLA for a region (multiple AZs) ● Implies multiple AZ is mandatory ● Implies multi-region is necessary for 99.99% or higher
  • 35. Availability in RDS - Multi-AZ The core of an HA solution Block level replication, active/passive Saves you from most master crashes Reduces impact of backups, upgrades, locks for provisioning replicas When not in 5.6, and using log_sync != 1, you often lose replicas during failover
  • 36. Availability in RDS - Multi-AZ IO impact from replication You do not get to choose the failover AZ, meaning you must be ready to move app servers
  • 37. Availability in RDS - Replicas Redundant replicas make total sense. N+1 meets most needs with the ease of provisioning You must have replicas in every AZ you have app servers in (if using replicas for reads) AWS states cross-AZ latency impact of low single digit millisecond impact. Real world indicates occasional much larger spikes
  • 38.
  • 39. Availability in RDS - Replicas Redundant replicas make total sense. N+1 meets most needs with the ease of provisioning You must have replicas in every AZ you have app servers in (if using replicas for reads) AWS states cross-AZ latency impact of low single digit millisecond impact. Real world indicates occasional much larger spikes
  • 40. Availability in EC2 - Options You can use Galera, XtraDB Cluster, or similar for a read/write anywhere solution MySQL MHA can be used to do failovers Continuent’s Tungsten product can also manage failovers
  • 42. AWS Availability: Regions and Zones Type of Change EC2 RDS Master (Non Multi-AZ) RDS Master (Multi-AZ) RDS Replica Instance resize up/down Rolling Migrations Moderate Downtime Minimal Downtime Moderate Downtime (take out of service) EBS <-> PIOPS Severe Performance impact. Severe Performance impact. Minor Performance impact. Severe Performance Impact (take out of service) PIOPS Amount Change Minor Performance impact. Minor Performance impact. Minor Performance impact. Performance Impact (take out of service) Disk Space Change (add) Performance impact. Performance impact. Minor Performance impact. Performance Impact (take out of service) Disk Space Change (reduce) Rolling Migrations Moderate Downtime Moderate Downtime Moderate Downtime (take out of service)
  • 44. Predicting and Managing Failure Operations is about managing change and mitigating risk
  • 45. Predicting and Managing Failure Local Failures • Database crashes • Human error o Misconfigure o Write to a replica o Drop a table/database/career • Localized EBS hangs and corruption • Unacceptable/unpredictable performance
  • 46. Predicting and Managing Failure Local Failures ● When it goes bad, don’t waste time diagnosing. o Shoot it in the head! ● Plan! ○ Simulate availability and region level failures ○ Wipe storage, reduce IOPS, shut down ○ Chaos monkey is your friend ● Observe! ○ Monitor for early failures, predict
  • 47. Predicting and Managing Failure Mitigation In RDS: Use Multi-AZ Use replicas in multiple AZs Replicate to multiple regions, and out of AWS In EC2: Use a failover (Galera, Tungsten, MHA/HAProxy) Use multiple AZs and regions Frequent Backups (practicing restores)