This document discusses how data distribution policies for distributed relational database systems (RDBMS) need to change and adapt over time to match evolving application usage patterns, workloads, and business needs. It outlines three stages in a data distribution policy's lifecycle where changes are needed: 1) changing demand and traffic loads, 2) changing application usage, and 3) new product capabilities. The key to adapting is regularly "rebalancing" the distributed database by modifying the data distribution policy using software that separates this logic from the application code.
Machine Learning Software Engineering Patterns and Their Engineering
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Distribution Policy
1. Distributed RDBMS
Data Distribution Policy: Part 3
Changing your data distribution policy
October 2014
2. Data Distribution Policy: Part 3
Distributed RDBMSs provide many scalability, availability
and performance advantages.
This presentation takes a deeper look at distributed
RDBMS efficiency over the long haul as application usage
patterns, user requirements, and workloads change.
The presentation discusses:
• Three stages of your data distribution policy’s lifecycle.
• Adapting the distributed RDBMS to match application changes.
• Ensuring that your distributed relational database is flexible and
2
elastic enough to accommodate endless growth and change.
3. 3
Why is a Distributed Relational Database Good?
Distributed relational databases are a perfect match for
Cloud computing models and distributed Cloud
infrastructure.
They are the way forward for delivering web scale
applications and keeping ACID properties.
• Social apps
• Games
• Many concurrent users
• High transaction throughput
• Very large data volumes
4. What Is a Data Distribution Policy? – Recap
A data distribution policy describes the rules under which
data is distributed across a distributed RDBMS.
(a virtual database made up of many database instances, or “shards”).
A good data distribution policy aims to:
1. Maintain full relational database integrity
2. Distribute workloads in an even and predictable manner
3. Minimize the amount of joins across the array of
4
database instances
4. Align with workflow and application usage patterns
5. Yield database scalability
5. 5
“Change, nothing stays the same…”
...shouted 80’s rock band Van
Halen proudly in their song
“Unchained”.
Just as music fashions change,
we know databases must adapt
to follow new usage patterns.
Unexpected influxes of data or
transactions are difficult to
predict. You may have only
anticipated and planed for a
specific amount of growth and
capacity.
But what if you
underestimate
your success?
6. Imagine your application is
taking off with great
success. Sounds like good
news, right?
However, it might be hard
on your database, as your
business success can
generate significantly more
transactions, concurrent
users and data that all
needs to be
accommodated.
6
Taking Additional Data into Consideration
These types of situations
are hard to predict and
occur on a daily basis.
7. If you already have a distributed RDBMS, the original data
distribution policy was (hopefully) created based on specific
application usage patterns and workflows (see Part 2 of this
presentation series).
Over time, application workflows and application usage
patterns can change.
This can lead to database hotspots, bottlenecks, and
database clusters that are overloaded compared to other
clusters.
7
Adapting to Change
8. Data Distribution Policy Lifecycle
8
There are three main situations to accommodate during a
data distribution policy’s lifecycle:
1. Changing Demand and Traffic Loads
2. Changing Application Usage
3. New Product Capabilities
The answer to these changes is typically the same:
“Rebalance” the distributed database
Distribution policy management through lifecycle changes is
a key issue to test in any distributed RDBMS technology.
9. 9
Rebalancing the Distributed Database
In the past, changing a data distribution policy has been
hard to address. Manually changing sharding code within
an application was the frontline battle zone of changing data
distribution.
Today, software like ScaleBase can accommodate all these
changes easily for you, quickly and with minimal disruptions
to live systems.
10. Scenario 1: Adapting to Changing Demand and
Traffic Loads
Data distribution policies should always be designed so
that data that is frequently accessed together is aggregated
into the same database instance (or shard) as this provides
the greatest efficiency and scalability benefits.
Data distributions are built according to anticipated traffic
predictions (both reads and writes), but traffic loads
change.
10
11. Scenario 1: Adapting to Changing Demand and
Traffic Loads – Typical Challenges
1. Be aware of changes in workload patterns and understand
11
their impact on your distributed relational database.
2. A specific application function’s sudden popularity, or
changes in your business environment can lead to usage
spikes and transaction bottlenecks from increased demand
and unexpected transaction patterns.
3. If workloads appear where the distribution policy was not
optimized, new and unplanned operations may cause more
costly execution paths that result in sub-par performance and
scalability.
* Automated threshold alerting and various other monitoring
can help you stay ahead of peaks and bottlenecks, so look for
these facilities in any distributed solution you choose.
12. Scenario 2: Adapting to Application Usage
Changes
Over time, it’s quite common for application usage to
change. When this happens:
1. The system’s new behavior patterns need to be understood
12
in order to make appropriate changes and optimizations.
2. Adapting to change is typically where do-it-yourself
home-grown sharding fails.
3. Re-writing the custom application code that did the initial
data distribution to provide new data distribution can lead to
errors that are easy to make, hard to uncover, and hard to
recover from.
4. Identifying the distribution policy changes required to
optimally re-balance workloads around new application
usage patterns needs some of the analysis that we
described in Part 2.
13. Scenario 3: Adapting to New Product
Capabilities
The final challenge comes is modifying an application that
add new capabilities to your product or service.
1. Updated business requirements can necessitate different
13
functions to integrate new solutions with existing systems,
extending the current application and database to
accommodate relevant new business needs.
2. Old-fashioned do-it-yourself distribution policy hardcoding
eliminates flexibility and often does not allow changes to be
made, turning an implementation attempt into a very
complicated and daunting task.
14. Scenario 3: Adapting to New Product
Capabilities (Continued)
You can’t stop your business while you’re rebalancing the
distributed database with data placement changes. Rewriting
application data redistribution code creates yet another
challenge in implementing a change while keeping existing data
and operations intact.
As a result, many cases companies have opted to rebuild their
system again from scratch instead of attempting to make
modifications.
If data distribution logic is built into the actual application, it can
be very hard to make system modifications on the fly. This is
costly and ultimately results in maintenance nightmares,
performance degradation, and downtime.
This is not good!
14
15. What Can You Do?
To simplify the management of the data distribution policy
that underlies your distributed RDBMS you MUST make a
strict separation between your application and where the
database distribution policy is defined, managed, and
maintained.
• If you’re a startup building a new app, or if you have an
15
existing app that needs to scale for growth, you want to
“hit the road, running!” (again, to quote Van Halen).
ScaleBase software was created to handle changes like
the ones previously mentioned, providing customers with
the peace of mind they need to grow successfully, in any
manner, at any rate and to any scale.
16. ScaleBase Software
• ScaleBase is a distributed database built on MySQL and
16
optimized for the cloud. It deploys in minutes so your
database can handle an unlimited number of users,
humongous volumes of data, and faster transactions.
• It dynamically optimizes workloads and availability by
logically distributing data across public, private, and geo-distributed
clouds.
17. Try ScaleBase Today
ScaleBase software is available for free:
• ScaleBase Website
• Amazon Marketplace
• Rackspace Marketplace
• IBM Cloud marketplace
• ScaleBase’s free online Analysis Genie service
AWS Marketplace Guide and a AWS Getting Started
Tutorial are available from the documentation section of the
ScaleBase website.
17
Contact ScaleBase
sales@scalebase.com
18. Data Distribution Policy: Part 1 and 2
Data Distribution Policy Part 1:
• What a data distribution policy is
• The challenges faced when data is distributed via sharding
• What defines a good data distribution policy
• The best way to distribute data for your application and
18
workload
Data Distribution Policy Part 2:
• The different approaches to data distribution
• How to create your own data distribution policy, whether you
are scaling an existing application or creating a new app.
• How ScaleBase can help you create your policy
19. Distributed RDBMS
Data Distribution Policy: Part 3
Changing your data distribution policy
October 2014