Migrating minimal databases with minimal downtime to AWS RDS, Amazon Redshift and Amazon Aurora
Migration of databases to same and different engines and from on premise to cloud
Schema conversion from Oracle and SQL Server to MySQL and Aurora
2. KEY TAKEAWAYS
Migrating Databases
Migrating minimal databases with minimal downtime to AWS RDS, Amazon Redshift and
Amazon Aurora
On Premise to Cloud
Migration of databases to same and different engines and from on premise to cloud
Schema Conversion
Schema conversion from Oracle and SQL Server to MySQL and Aurora
3. Traditional Approach= Time, Cost
Commercial tool for migration/replication
Application Downtime
Legacy Schema Objects
4. Introducing AWS RDS Migration Tool
Easy to setup and start migration in less than
15 mins
No downtime of applications during migration
Replicate from EC2 -> RDS or vice versa
Move data to same or different database
engines
Cost effective and no upfront cost
6. Amazon RDS Migration
Tool consists of a Web-based
console and a replication server
to replicate data across
heterogeneous data sources.
Amazon RDS Migration Tool can
execute replication between
enterprise databases including
Oracle, Microsoft SQL Server,
and IBM DB2.
Replication is log based, which
means that only the changes
are read. This reduces the
impact on the source
databases.
Amazon RDS Migration
Tool can carry out two types of
replication: Full Load and
Change Processing (CDC).
7. Load data
efficiently and
quickly to
operational data
stores/
warehouses
Create
copies of
production
databases
Distribute
data
across
databases
Amazon
RDS
Migration
Tool has
high
throughput,
speed, and
scale.
Full Load: The full
load process
creates files or
tables at the target
database,
automatically
defines the
metadata that is
required at the
target, and
populates the
tables with data
from the source.
Change
Processing (CDC):
Change processing
captures changes
in the source data
or metadata as
they occur and
applies them to the
target database as
soon as possible in
near-real-time.
Features
8. Load reduction: It is recommended that you have a copy of all or of a subset of a collection on a different
server to reduce the load on the main server.
Improved service: Users of the copy of the information may get better access to the copy of the data
than to the original.
Security considerations: Some users might be allowed access to a subset of the data and only this
subset is made available as a replicated copy to those users.
Geographic distribution: The enterprise (for example, a chain of retail stores or warehouses)
may be widely distributed and each node uses primarily its own subset of the data (in addition
to all of the data being available at a central location for less common use).
Disaster Recovery: A copy of the main data is required for rapid failover (the capability to
switch over to a redundant or standby computer server, in case of failure of the main
system).
Support the need for implementing "cloud" computing.
Replication
9. During replication, a collection
of data is copied from system
A to system B. A is known as
the source (for this collection),
B is known as the target. A
system can be either a source
or a target or even both (within
certain restrictions). When a
number of sources and targets
and data collections are
defined, the replication
topology can be quite
complex.
Integrity: Make sure that the data in
the target actually reflects the
completed result of a change in the
source and not some intermediate
invalid result.
Latency: How out-of-date is the
copy?
Consistency: Make sure that if
the change affects several
different tables or rows, the
copy reflects a consistent state
all were changed or none).
The first two issues are the
responsibility of the replicator.
While some latency is
unavoidable in any system, a
good replicator will aim not to
exceed several seconds of
latency as a general rule.
10. Replication Tasks
The definition of a task consists of:
Specifying the source and target databases
Specifying the source and target tables to be kept in sync
Specifying the relevant source table columns
Specifying filtering conditions (if any) for each source table, as Boolean predicates on the values one or
more source columns (the predicates are in SQLite syntax)
Listing the target table columns and (optionally) specifying their data types and values (as expressions or
functions over the values of one or more source or target columns, using SQL syntax). If not specified, the
same column names and values as the source tables are used, with default mapping of the source DBMS
data types onto the target DBMS data types. Amazon RDS Migration Tool automatically takes care of the
required filtering, transformations and computations during the Load or CDC execution.
11. Replication Tasks
The simplest specification of a task may not mention of the target data, with only the source tables (or
ALL, or a mask) specified. In this case, the target tables are identical to the source tables, using the
default mappings between the source and target DBMS data types. In this way, the entire definition
process could be accomplished by a single click, referred to as "Click to Replicate".
Once a task is defined, it can be activated immediately. The target tables with the necessary metadata
definitions are automatically created and loaded, and the CDC is activated. The replication activity can
then be monitored, stopped, or restarted using the Amazon RDS Migration Console.
12. Full Load & CDC
The full load process creates files or tables at the
target database, automatically defines the metadata
that is required at the target, and populates the tables
with data from the source. Unlike the CDC process
the data is loaded one entire table or file at a time for
efficiency purposes.
The Load process can be interrupted and when
restarted it continues from wherever it was stopped.
New tables can be added to an existing target
without reloading the existing tables. Similarly,
columns in previously-populated target tables can be
added or dropped without requiring reloading.
CDC operates by reading the recovery log file of the source
database management system and grouping together the
entries for each transaction. Various techniques are employed
to ensure that this is done in an efficient manner without
seriously impacting the latency of the target data.
The Change Data Capture (CDC) process captures
changes in the source data or metadata as they occur
and applies them to the target database as soon as
possible in near-real-time. The changes are captured
and applied as units of single committed transactions,
and several different target tables can be updated as the
result of a single source commit.
13. Defining Global Transformation
Use Global Transformations to make similar changes to multiple tables, owners, and columns in the same
task.
You may need to use this option when you want to change the names of all tables. You can change the
names using wild cards and patterns. For example, you may want to change the names of the tables
from account_% to ac_%. This is helpful when replicating data from an Microsoft SQL Server database to
an Oracle database where the Microsoft SQL Server database has a limit of 128 characters for a table
name and the Oracle database has a limit of 31 characters.
You may also need to change a specific data type in the source to a different data type in the target for
many or all of the tables in the task. Global transformation will accomplish this without having to define a
transformation for each table individually.
14. Global Transformation types
Rename
Schema
Rename
Table
Rename
Column
Add
Column
Drop
Column
Convert
Data Type
Select this if you
want to change
the schema name
for multiple tables.
Select this if you
want to change
the name of
multiple tables.
Select this if you
want to change
the name of
multiple columns.
Select this if you
want to add a
column with a
similar name to
multiple tables.
Select this if you
want to drop a
column with a
similar name from
multiple tables.
Select this if you
want to change a
specific data type to
a different one
across multiple
tables.