More Related Content Similar to Databases - Choosing the right Database on AWS (20) More from Amazon Web Services (20) Databases - Choosing the right Database on AWS2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data on AWS: How To Choose The
Right Database and Data Storage
Simon Lee,
Business Development Manager
simonhl@amazon.com
3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data is a strategic asset
for every organization
The world’s most valuable
resource is
*Copyright: The Economist, 2017, David Parkins
4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The move
toward
data-centric
companies
Five largest companies by
market cap*
2001
2006
2011
2016
2018
$1.091T
$406B
$446B
$406B
$582B
$976B
$365B
$383B
$556B
$383B
$877B
$272B
$327B
$277B
$452B
$839B
$261B
$293B
$237B
$364B
$523B
$260B
$273B
$228B
$228B
5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is a data-
centric
company?
What do we sell?
How do we make money?
6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thinking about data as an asset, not a cost
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Stop
throwing
data away
Make it
available to
more users
Arm users
with more
data processing
technologies
7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data
every 5 years
There is more data than
people think
15
years
live for
Data platforms need to
1,000x
scale
>10x
grows
8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hadoop Elasticsearch
There are more
ways to analyze data
than ever before
Years ago
11 8 5 4
Presto Spark
Didn’t exist
9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Democratization
of data
Governance
& control
There are more
people working
with data than
ever before
How do I provide democratized
access to data to enable informed
decisions while at the same time
enforce data governance and
prevent mismanagement of the
data?
10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How do we build new
types of applications that
can leverage this data?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Modern apps create new requirements
Users: 1 million+
Data volume: TB–PB–EB
Locality: Global
Performance: Milliseconds–microseconds
Request rate: Millions
Access: Web, mobile, IoT, devices
Scale: Up-down, Out-in
Economics: Pay for what you use
Developer access: No assembly requiredSocial mediaRide hailing Media streaming Dating
12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Social mediaRide hailing Media streaming Dating
As application requirements change,
data processing engines need to evolve as well
On Prime Day, DynamoDB requests from
Alexa, the Amazon.com sites, and the
Amazon fulfillment centers totaled 3.34
trillion, peaking at 12.9 million per second
Databases need to be able to provide reliable performance with
highly variable demands and deliver consistent, single-digit
millisecond response time at any scale.
13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Common data categories and use cases
Relational
Referential
integrity, ACID
transactions,
schema-
on-write
Lift and shift, ERP,
CRM, finance
Key-value
High
throughput, low-
latency reads
and writes, endless
scale
Real-time bidding,
shopping cart, social,
product catalog,
customer preferences
Document
Store documents
and quickly
access querying
on any attribute
Content
management,
personalization,
mobile
In-memory
Query by key
with
microsecond
latency
Leaderboards,
real-time analytics,
caching
Graph
Quickly and
easily create and
navigate
relationships
between
data
Fraud detection,
social networking,
recommendation
engine
Time-series
Collect, store,
and process data
sequenced by
time
IoT applications,
event tracking
Ledger
Complete,
immutable, and
verifiable history of
all changes to
application data
Systems
of record, supply
chain, health care,
registrations,
financial
14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS purpose-built databases
Relational Key-value Document In-memory Graph Time-series Ledger
DynamoDB NeptuneAmazon RDS
Aurora CommercialCommunity
Timestream QLDBElastiCacheDocumentDB
15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Aurora
MySQL and PostgreSQL-compatible relational database built for the cloud
Performance and availability of commercial-grade databases at 1/10th the cost
Performance
and scalability
Availability
and durability
Highly secure Fully managed
5x throughput of standard MySQL
and 3x of standard PostgreSQL;
scale-out up to
15 read replicas
Fault-tolerant, self-healing storage;
six copies of data
across three Availability Zones;
continuous backup to Amazon S3
Network isolation,
encryption at rest/transit
Managed by RDS:
No hardware provisioning, software
patching, setup, configuration, or
backups
17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Relational Database Service (RDS)
Managed relational database service with a choice of six popular database engines
Easy to administer Available and durable Highly scalable Fast and secure
No need for infrastructure
provisioning, installing, and
maintaining DB software
Automatic Multi-AZ data replication;
automated backup, snapshots,
failover
Scale database compute
and storage with a few
clicks with no app
downtime
SSD storage and guaranteed
provisioned I/O; data
encryption at rest and in transit
18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon DynamoDB
Fast and flexible key value database service for any scale
Comprehensive
security
Encrypts all data by default
and fully integrates with AWS
Identity and Access
Management for robust
security
Performance at scale
Consistent, single-digit millisecond
response times at any scale; build
applications with virtually unlimited
throughput
Global database for global
users and apps
Build global applications with fast
access to local data by easily
replicating tables across multiple
AWS Regions
Serverless
No server provisioning, software
patching, or upgrades; scales up
or down automatically;
continuously backs up your data
19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon DocumentDB
Fast, scalable, highly available, fully managed MongoDB-compatible database service
Fully Managed
Managed by AWS:
No hardware provisioning,
software patching, setup,
configuration, or backups
Fast
Millions of requests per second,
millisecond latency
MongoDB-compatible
Compatible with MongoDB
Community Edition 3.6. Use the same
drivers and tools
Reliable
Six replicas of your data across
three AZs with full backup and
restore
20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon ElastiCache
Redis and Memcached compatible, in-memory data store and cache
Secure and reliable
Network isolation, encryption
at rest/transit, HIPAA, PCI,
FedRAMP, multi AZ, and
automatic failover
Redis & Memcached
compatible
Fully compatible with open source
Redis and Memcached
Easily scalable
Scale writes and reads with sharding
and replicas
Extreme performance
In-memory data store and cache
for microsecond response times
21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Neptune
Fully managed graph database
Easy
Build powerful queries easily
with Gremlin and SPARQL
Fast
Query billions of relationships with
millisecond latency
Open
Supports Apache TinkerPop & W3C
RDF graph models
Reliable
Six replicas of your data across
three AZs with full backup and
restore
22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Timestream (sign up for the preview)
Fast, scalable, fully managed time-series database
1,000x faster and 1/10th the
cost of relational databases
Collect data at the rate of
millions of inserts per second
(10M/second)
Trillions of
daily events
Adaptive query processing
engine maintains steady,
predictable performance
Time-series analytics
Built-in functions for
interpolation, smoothing, and
approximation
Serverless
Automated setup, configuration,
server provisioning, software
patching
23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Quantum Ledger Database (QLDB)
Fully managed ledger database
Track and verify history of all changes made to your application’s data
Immutable
Maintains a sequenced record of all
changes to your data, which cannot
be deleted or modified; you have
the ability to query and analyze the
full history
Cryptographically
verifiable
Uses cryptography to
generate a secure output
file of your data’s history
Easy to use
Easy to use, letting you
use familiar database
capabilities like SQL APIs for
querying the data
Highly scalable
Executes 2–3X as many
transactions than ledgers
in common blockchain
frameworks
24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Database Migration Service (AWS DMS)
M I G R A T I N G
D A T A B A S E S
T O A W S
Migrate between on-premises and AWS
Migrate between databases
Automated schema conversion
Data replication for
zero-downtime migration
100,000+
databases migrated
25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Customers are moving to AWS Databases
Verizon is migrating over 1,000 business-critical applications and database backend systems to AWS, several of
which also include the migration of production databases to Amazon Aurora.
Wappa migrated from their Oracle database to Amazon Aurora and improved their reporting time per
user by 75 percent.
Trimble migrated their Oracle databases to Amazon RDS and project they will pay about 1/4th of what they
paid when managing their private infrastructure.
Intuit migrated from Microsoft SQL Server to Amazon Redshift to reduce data-processing timelines and get
insights to decision makers faster and more frequently.
Equinox Fitness migrated its Teradata on-premises data warehouse to Amazon Redshift. They went from static
reports to a modern data lake that delivers dynamic reports.
By December 2018, Amazon.com migrated 88% of their Oracle DBs (and 97% of critical system DBs) moved
to Amazon Aurora and Amazon DynamoDB. They also migrated their 50 PB Oracle Data Warehouse to AWS
(Amazon S3, Amazon Redshift, and Amazon EMR).
Samsung Electronics migrated their Cassandra clusters to Amazon DynamoDB for their Samsung Cloud
workload with 70% cost savings.
27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Equinox Fitness Clubs is a company with integrated luxury and lifestyle
offerings centered on movement, nutrition and regeneration. Equinox
built connected experiences using applications that connect to Apple
Health and built data collection in their exercise equipment.
More than 200 locations within every major city across the world
28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Many lines of business across 98
clubs & 200+ studios in total
Plus central supporting functions
Digital
Products
CRM Marketing Creative
Development/
Building
Finance Member’s
Services
Maintenance
Personal
training
Pilates Spa Group
Fitness
Membership/
Sales
Retail Food
Services
29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Digital products
End user applications
Connections to Apple Health
Connected
equipment
Pursuit (gamified cycling experience)
Cardio
Digital assessment
Location tracking
Connected tech
30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lake architecture
Data & analytics apps
Equinox apps
Third-party apps
Informatica
Maximilian
Amazon EMR
PT
App
Pursuit
Engage
Exact
Target
Adobe Social
MOSO
Fitness
Agg.
Amazon
Redshift
31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The assembled pipeline
Adobe
Analytics
Amazon
EMR
AthenaS3
Glue Data
Catalog
Redshift
Spectrum
S3
32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Re-platformed and productionalized
2 apps in 4 months
Finished re-platform in under a year
Dependability–very few operational issues
Faster time-to-benefit via automated regression
Huge cost savings over Teradata
Results
Reduced time-to-benefit and increased end-
user productivity
33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
We need to
rethink what we
mean by data and
analytics
35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
This is data
36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
This is data
37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
This is data
Skip the trip.
one-hour delivery
Exclusively for Amazon Prime Members
38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data can be used to
connect more deeply
with your customer base
39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Reporting,
analysis, modeling,
and planning are
not going away
40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why data lakes?
Data Lakes provide:
Relational and non-relational data
Scale-out to EBs
Diverse set of analytics and machine learning tools
Work on data without any data movement
Designed for low cost storage and analytics
OLTP ERP CRM LOB
Data Warehouse
Business
Intelligence
Data Lake
100110000100101011100101010
111001010100001011111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
Learning
DW Queries Big data
processing
Interactive Real-time
41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Typical steps of building a data lake
Setup Storage1
Move data2
Cleanse, prep, and
catalog data
3
Configure and enforce
security and compliance
policies
4
Make data available
for analytics
5
42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3 is the base
Data Lake Storage
Secure, highly scalable, durable object storage
with millisecond latency for data access
Store any type of data–web sites, mobile apps,
corporate applications, and IoT sensors, at any
scale
Store data in the format you want:
Unstructured (logs, dump files) | semi-structured (JSON, XML) |
structured (CSV, Parquet)
Storage lifecycle integration
Amazon S3-Standard | Amazon S3-Infrequent Access | Amazon
Glacier
43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Lake Formation
Build, secure, and manage a data lake in days
Build a data lake in days, not
months
Build and deploy a fully managed
data lake with a few clicks
Enforce security policies
across multiple services
Centrally define security, governance,
and auditing policies in one place and
enforce those policies for all users and all
applications
Combine different analytics
approaches
Empower analyst and data scientist
productivity, giving them self-service
discovery and safe access to all data
from a single catalog
44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How it works
Data Lakes and analytics on AWS
S3
IAM KMS
OLTP
ERP
CRM
LOB
Devices
Web
Sensors
Social Kinesis
Build Data Lakes quickly
• Identify, crawl, and catalog sources
• Ingest and clean data
• Transform into optimal formats
Simplify security management
• Enforce encryption
• Define access policies
• Implement audit login
Enable self-service and combined analytics
• Analysts discover all data available for analysis from a
single data catalog
• Use multiple analytics tools over the same data
Athena
Amazon
Redshift
AI Services
Amazon
EMR
Amazon
QuickSight
Data
Catalog
45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Storing is not enough, data needs to be discoverable
Dark data are the information assets
organizations collect, process, and
store during regular business
activities, but generally fail to use
for other purposes (for example,
analytics, business relationships and
direct monetizing).
CRM ERP Data warehouse Mainframe
data
Web Social Log
files
Machine
data
Semi-
structured
Unstructured
“
”Gartner IT Glossary, 2018
https://www.gartner.com/it-glossary/dark-data
46. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use AWS Glue to cleanse, prep, and catalog
AWS Glue Data Catalog - a single view
across your data lake
Automatically discovers data and stores schema
Makes data searchable, and available for ETL
Contains table definitions and custom metadata
Use AWS Glue ETL jobs to cleanse,
transform, and store processed data
Serverless Apache Spark environment
Use Glue ETL libraries or bring your own code
Write code in Python or Scala
Call any AWS API using the AWS boto3 SDKAmazon S3
(Raw data)
Amazon S3
(Staging data)
Amazon S3
(Processed data)
AWS Glue Data Catalog
Crawlers Crawlers Crawlers
47. CHALLENGE
Need to create constant feedback loop for
designers
Gain up-to-the-minute understanding of
gamer satisfaction to guarantee gamers are
engaged, thus resulting in the most popular
game played in the world
Fortnite | 125+ million players
48. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Epic Games uses Data Lakes and analytics
Entire analytics platform running on AWS
S3 leveraged as a Data Lake
All telemetry data is collected with Kinesis
Real-time analytics done through Spark on EMR, DynamoDB to
create scoreboards and real-time queries
Use Amazon EMR for large batch data processing
Game designers use data to inform their decisions
Game
clients
Game
servers
Launcher
Game
services
N E A R R E A L T I M E P I P E L I N E
N E A R R E A L T I M E P I P E L I N E
Grafana
Scoreboards API
Limited Raw Data
(real time ad-hoc SQL)
User ETL
(metric definition)
Spark on EMR DynamoDB
NEAR REALTIME PIPELINES
BATCH PIPELINES
ETL using
EMR
Tableau/BI
Ad-hoc SQLS3
(Data Lake)
Kinesis
APIs
Databases
S3
Other
sources
49. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data has power
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
50. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS databases and analytics – There’s a lot more!
Broad and deep portfolio, built for builders
AWS Marketplace
Amazon Redshift
Data warehousing
Amazon EMR
Hadoop + Spark
Athena
Interactive analytics
Kinesis Analytics
Real-time
Amazon Elasticsearch service
Operational Analytics
RDS
MySQL, PostgreSQL, MariaDB,
Oracle, SQL Server
Aurora
MySQL, PostgreSQL
Amazon
QuickSight
Amazon
SageMaker
DynamoDB
Key value, Document
ElastiCache
Redis, Memcached
Neptune
Graph
Timestream
Time Series
QLDB
Ledger Database
S3/Amazon Glacier
AWS Glue
ETL & Data Catalog
Lake Formation
Data Lakes
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Data Pipeline | Direct Connect
Data Movement
AnalyticsDatabases
Business Intelligence & Machine Learning
Data Lake
Managed
Blockchain
Blockchain
Templates
Blockchain
Amazon
Comprehend
Amazon
Rekognition
Amazon
Lex
Amazon
Transcribe
AWS DeepLens 250+ solutions
730+ Database
solutions
600+ Analytics
solutions
25+ Blockchain
solutions
20+ Data lake
solutions
30+ solutions
RDS on VMWare
51. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Most startup database & analytics cloud customers
52. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Most enterprise database & analytics cloud customers
54. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.