Home-grown sharding is hard - REALLY HARD! ScaleBase scales-out MySQL, delivering all the benefits of MySQL sharding, with NONE of the sharding headaches. This webinar explains: MySQL scale-out without embedding code and re-writing apps, Successful sharding on Amazon and private clouds, Single vs. multiple shards per server, Eliminating data silos, Creating a redundant, fault tolerant architecture with no single-point-of-failure, Re-balancing and splitting shards
2. 2
Agenda
• Scalability Issues
• MySQL 5.6
• Why Do-It-Yourself (DIY) Sharding Sucks
• ScaleBase Data Distribution:
– Successful sharding on Amazon and private clouds
– Single vs. multiple shards per server
– Eliminating data silos
– Creating a redundant, fault-tolerant architecture
– Re-balancing and splitting shards
• Q & A
3. 3
Doron Levari, Founder & CTO
Doron Levari,
Founder & CTO
A technologist and long-
time veteran of the
database industry. Prior
to founding
ScaleBase, Doron was
CEO to Aluna.
4. 4
What We Do
Simply and cost-effectively scale
MySQL to support an infinite
number of users, transactions and data
with NO disruption to the existing infrastructure
6. 6
MySQL Scalability Challenges
• Too many transactions
• Too many users
• Too much data
• Too many writes
• Capacity
• Throughput
• Performance inconsistencies
7. 7
Improvements in MySQL 5.6 – Single Box
Partitioning Improvements
– Explicit Partition Selection:
SELECT * FROM employees
PARTITION (p0, p2);
– Import / Export for Partitioned Tables:
Bring a new data set into a partitioned
table, or export a partition to manage it
as a regular table ALTER TABLE e
EXCHANGE PARTITION p0 WITH
TABLE e2;
http://dev.mysql.co/tech-resources/articles/whats-new-in-mysql-5.6.html
Replication Improvements
– Optimizations to Row-Based
Replication
– Multi-Threaded Slaves
– Improvements to Data Integrity
– Crash-Safe Slaves
– Replication Checksums
SCALABILITY issues remain due to the limitations of a single box:
To ensure ACID, you still face limitations with:
- Memory management - Thread management
- Semaphores - Locking
- Recovery tasks
No new functionality for sharing workloads across multiple boxes
8. 8
What are my Options
1. More/Bigger Hardware?
– Temporary fix…you will need new hardware again
– More memory…helps mostly with “reads,” but not with “writes”
– Every write operation is at least 4 write operations in database, plus
multiple activities in the database engine memory
2. Application re-architecture?
– Steer workload away from the database
– Example: introduce a caching layer
– Force application re-writes; new test & QA cycles
3. Do it Yourself Sharding?
4. Migrate to new database architecture
– Other RDBMS/NewSQL / NoSQL?
– Force application re-writes; new test & QA cycles
– ACID/Durability Issues
9. 9
Scale Out your Existing MySQL
• Keep your MySQL - keep your InnoDB
• Ecosystem compatibility, preserve skills
• 100% application compatibility
• Smoother migration, no down-time, no forklift
• Your data is safe
• No “in-memory” magic
• No “in-memory” size limit
Don’t throw out the baby with the bath water!
11. 11
What is Sharding?
Wikipedia - Shard (database architecture) http://en.wikipedia.org/wiki/Shard_(database_architecture)
A database shard is a horizontal partition in a
database or search engine. Each individual partition
is referred to as a shard.
Horizontal partitioning is a database design
principle whereby rows of a database table are held
separately, rather than being split into columns.
Each partition forms part of a shard, which may in
turn be located on a separate database server or
physical location.
13. 13
• Maintaining DB ops and IPs in the app
• Non-optimized sharding strategies
– No good way to maintain global tables
replicated across all database
• Sacrifices development agility,
additional administrative complexity
• Results in database silos
• Database ecosystem breaks because
the application “conceals” sharding
strategies internally
• Risks for data inconsistency
• Adding and removing databases
is not supported…overprovisioning…
• Jeopardizes high availability, backups & disaster recovery
• Demands custom application code that can fail ACID compliance
DIY Sharding Challenges
Challenges exist because
application code changes are
required to support multiple
database instances.
15. 15
Data Distribution: Application Experience
Without ScaleBase: App must be customized to support shards
With ScaleBase: App sees ONE database…
…and doesn’t require any customization
ScaleBase acts as a proxy between the app and the
database, virtualizing the database environment
16. 16
Manual Sharding versus ScaleBase
Sharding Limitations:
• Major app rewrite, maintaining code
• Maintaining DB ops & IPs in the app
• Administration/3rd party tools are broken
• DB silos/Database ecosystem is blind
– Application “hides” sharding strategies
• Non-optimized data distribution policy
– No good way to maintain global
tables, replicated across all database
• Sacrifices development agility
• Adding/removing DBs is not supported
• Risks for data inconsistency
• Demands custom application code that
can fail ACID compliance
• Jeopardizes high
availability, backups, and disaster
recovery
ScaleBase Benefits:
• No hard-coding application re-writes
• Unlimited scalability
• Improve performance
• Real time elasticity
• ACID compliance
• Verified data consistency
• Real time monitoring, traffic analysis
• Carefully analyze distribution policy
• Enable system upgrades and updates
• Simplified, centralized admin
– Adding users
– Changing schemas
– Maintenance scripts
– Management queries
17. 17
Typical ScaleBase Data Traffic Manager Deployment
Application
Servers
BI
Management
Database A Replica A
Database B Replica B
Database C Replica C
Database D Replica D
Unlimited Scale
ScaleBase
Architecture
is Fault Tolerant
19. 19
ScaleBase Enables MySQL Scale Out without Re-
writing Apps
• Data distribution and scale-out is part of the database
architecture, not the application
• One IP to connect to, and “see a unified database”
– The application
– Entire ecosystem (ETL, mysqldump, PHPMyAdmin)
– No special sharding wizard developer
– No app re-design, re-dev, re-QA, re-test, re-deploy
– No hard-coded variables lost in the code
– No special documentation
20. 20
ScaleBase Enable Scale Out on AWS and Private
Clouds
• A virtualized DB environment makes it easy to change real
infrastructure, because it’s decoupled from the application
• No cloud makes your database elastic
• ScaleBase enables elasticity of MySQL in the cloud (EC2, RDS, etc.)
Scale-up hits
AWS’s tiered
configuration
limits fast
Scale-out is
unlimited and
gives cloud
flexibility
21. 21
ScaleBase Supports Scale Out on Single & Multiple
Machines
Advantages of several
shards on one machine:
– Several smaller MySQL
instances better utilize
cores, memory
– When data grows, each
instance can later on
migrate to a bigger
machine of its own
Advantages of several shards
on multiple machines
– Leverage commodity hardware
– When reaches machine limits -
ScaleBase enables online data
redistribution (resharding) and
shard-split
22. 22
ScaleBase Enables Splitting Shards
• ScaleBase also redistributes data across the array to eliminate hot
spots, splitting the hot spot into two databases
23. 23
ScaleBase Re-balances Shards
• Special analysis and alerts about approaching limits
• ScaleBase dynamically redistributes data (resharding) - moving the
data across the array from the over-utilized to the under-utilized
24. 24
ScaleBase Provides Optimal Data Distribution Policies
A good data distribution policy ensures that a specific
transaction is directed to a specific database
1,000 transactions
250
transactions
250
transactions
250
transactions
250
transactions
1,000 transactions
25. 25
ScaleBase Eliminates Data Silos
When a query needs data
from several databases,
ScaleBase:
– Runs the query in parallel
on all databases
– Aggregates results into one
meaningful result-set to be returned to the client – the same
result-set that would have been returned from a single DB!
– Including cross-db GROUP BY, ORDER BY, aggregate functions
– Including cross-db JOIN operations
– Enables 2-phase commit for transactions spanning multiple
databases
26. 26
ScaleBase Provides a Fault Tolerant Architecture
Application
Servers
BI
Management
Database A Replica A
Database B Replica B
Database C Replica C
Database D Replica D
Fully Redundant
Resilience to failures
Scheduled
maintenance without
downtime
30. 30
Detailed Scale Out Case Studies
Large Chip Co
• Scalability
• Multiple Apps
• Multiple growing
users
• Availability
• MySQL DB
Solar Edge
• Next Gen
Monitoring App
• Massive Scale
• Monitors real
time data from
thousands of
distributed
systems
Mozilla
• New Product/
Next Gen App/
AppStore
• Scalability
• Geo-clustering
AppDynamics
• Next gen APM
company
• Scalability for the
Netflix
implementation
31. 31
ScaleBase Deployment
Environments
– Public Cloud
– AWS, Rackspace, any
– Private cloud
– Hosted / on-premise
Databases Supported
– MySQL 5.1, 5.5, 5.6 (under
certification)
– AWS RDS MySQL 5.1, 5.5
– Maria DB 10.0 (under
certification)
Path to Scale-Out:
1. Data Distribution
Policy Analysis
2. Functional Test
3. Load Test
4. Production Migration
(safe, online)
32. 32
Summary
ScaleBase provides cost-effective Scale-Out solutions
• Scale to an infinite number of users, data and transactions
• Improve performance
• No application rewrites
• Real-time elasticity
• ACID Compliant
• Expert analysis and simple deployment
• Leverage existing MySQL ecosystem/skills
• Improve database visibility with real-time monitoring
• Simplified, centralized administration
33. 33
Questions (please enter directly into the GTW side panel)
paul.campaniello@scalebase.com
doron.levari@scalebase.com
www.ScaleBase.com
617.630.2800
Additional Resources
http://www.scalebase.com/blog/
http://www.scalebase.com/resources/
@scalebase