5. AWS plugin for Eclipse
https://aws.amazon.com/documentation/awstoolkiteclipse/
6. 3rd party plugins for Intellij IDEA
• AWS Elastic Beanstalk Integration
https://plugins.jetbrains.com/plugin/7274-aws-elastic-beanstalk-integration
• AWS CloudFormation
https://plugins.jetbrains.com/plugin/7371-aws-cloudformation
• AWS Manager – almost 2 years old :-/
https://plugins.jetbrains.com/plugin/4558-aws-manager
7. Managing credentials
• Please do not hardcode them in your application
• Please do not store them on EC2 instances
• It WILL end in tears
• AWS credentials: use IAM Roles
• Backend credentials: use the EC2 Parameter Store
• Automatic encryption with Amazon KMS
https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/index.html?com/amazonaws/auth/AWSCredentialsProvider.htm
l
8. One backend to rule them all?
… and in darkness bind them
15. Amazon DynamoDB
Key-Value, Document, Object Scales to Any WorkloadFully Managed NoSQL
Access Control Event Driven ProgrammingFast and Consistent
https://aws.amazon.com/dynamodb/
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html
16. Expedia is a leader in the $1 trillion travel industry, with an
extensive portfolio that includes some of the world’s most
trusted travel brands.
With DynamoDB we were up
and running in a less than a
day, and there is no need for
a team to maintain it.
Kuldeep Chowhan
Principal Engineer, Expedia
”
“ • Expedia’s real-time analytics
application collects data for its “test
& learn” experiments on Expedia
sites.
• The analytics application processes
~200 million messages daily.
• Ease of setup, monitoring, and
scaling were key factors in
choosing DynamoDB.
Case Study – Expedia
https://aws.amazon.com/fr/blogs/big-data/how-expedia-implemented-near-real-time-analysis-of-interdependent-datasets/
19. Amazon Elastic Map Reduce (EMR)
• Apache Hadoop, Spark, Hive,etc.
• Managed service
• Easy to start, resize & terminate clusters
• Cost-efficient, especially with Spot Instances
• Integration with backends
https://aws.amazon.com/emr/
20. Case study – FINRA
FINRA, the primary regulatory agency for stock brokers in
the US, uses AWS extensively in their IT operations and
has migrated key portions of its technology stack to AWS
including Market Surveillance and Member Regulation.
For market surveillance, each night FINRA loads
approximately 35 billion rows of data into Amazon S3 and
Amazon EMR (up to 10,000 nodes) to monitor trading
activity on exchanges and market centers in the US.
https://aws.amazon.com/solutions/case-studies/finra/
22. Amazon Athena
• Run read-only SQL queries on S3 data
• No data loading, no indexing, no nothing
• No infrastructure to create, manage or scale
• Service based on Presto
• Table creation: Apache Hive Data Definition Language
• ANSI SQL operators and functions: what Presto
supports, with a few exceptions
AWS re:Invent 2016: Introducing Athena (BDA303) https://www.youtube.com/watch?v=DxAuj_Ky5aw
https://prestodb.io
24. Case Study – DataXu
CDN
Real Time
Bidding
Retargeting
Platform
Kinesis
Attribution & ML
S3
Reporting
Data Visualization
Data
Pipeline
ETL (Spark
SQL)
Amazon Athena
180TB of Log Data per Day
https://aws.amazon.com/solutions/case-studies/dataxu/
26. • Relational data warehouse
• SQL is all you need to know
• Fully managed
• Massively parallel
• Petabyte scale
• As low as $1000/TB/year
• Athena-like capabilities with Redshift Spectrum
Amazon Redshift
https://aws.amazon.com/redshift
http://www.allthingsdistributed.com/2012/11/amazon-redshift.html
Intro to Amazon Redshift Spectrum https://www.youtube.com/watch?v=gchd2sDhSuY
27. What customers says about Amazon
Redshift“Redshift is twenty times faster than Hive” (5x – 20x reduction in query times) link
“Queries that used to take hours came back in seconds. Our analysts are orders of
magnitude more productive.” (20x – 40x reduction in query times) link
…[Redshift] performance has blown away everyone here (we generally see 50-100x
speedup over Hive). link
“Team played with Redshift today and concluded it is ****** awesome.
Un-indexed complex queries returning in < 10s.”
“Did I mention it's ridiculously fast? We'll be using it immediately to provide our
analysts an alternative to Hadoop.”
Channel We regularly process multibillion row datasets and we do that in a matter of hours. link
32. • Converts your tables, views, stored procedures,
and data manipulation language to RDS or
Amazon Redshift
• Highlights where manual edits are needed
AWS Schema Conversion Tool
https://aws.amazon.com/dms/
33. Move data to the same or different database engine
Move data to Redshift, DynamoDB or S3
Keep your apps running during the migration
Start your first migration in 10 minutes or less
Replicate within, to, or from Amazon EC2 or RDS
AWS Database Migration Service
https://aws.amazon.com/dms/
http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Sources.html
http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Targets.html
https://aws.amazon.com/blogs/database/database-migration-what-do-you-need-to-know-before-you-start/
35. AWS is a rich and lively environment for Java platforms
Your choice of backends: relational, NoSQL, Big Data, analytics
The tools you need, with less or no infrastructure drama
Built-in high availability, scalability, security & compliance
Focus on creativity and productivity, not on plumbing
Walk through what is on the slide
Quick overview of what RDS is. This is a deep dive session so we want to use most of our time to go deeper into the service.
Security is a top priority at AWS as well with many of our customers to we will take some time to talk more about what security features can be implemented on RDS
We’ll spend some time on Metrics and Monitoring and how you can keep an eye on what is going on with your databases running on RDS.
We’ll look into how you scale up or down on RDS and how that can help you.
We’ll spend some time looking into backups an snapshots an some things you can do with them on RDS
We’ll walk through some areas around how you can implement high availability on RDS
And for the people looking to move to RDS for the first time or even looking to move more databases to RDS we’ll look into migrating to RDS.
Call out that throughout the presentation we will touch some on Aurora, where it is different from what we are talking about but there are also Aurora overview and Deep Dive sessions happening that will go into these areas more thoroughly.
Quick overview of what RDS is.
This is a Deep Dive so there are some assumptions that some of the basics with RDS and the benefits are already understood.
We are going to touch on many of these in more depth throughout the presentation.
RDS is a managed database service. This service allows you more time to focus on your application: You focus on Schema Design, query construction, query optimization, and building your application.
Infra Mgmt
* AWS does patching
AWS Handles backup and replication
AWS manages the Infrastructure
You focus on your application
HA and automated failover management
High end features that you can do on your own but you get automatically.
Availability of Resources
Simple and Fast to deploy
Scale up/Down
Simple and fast to scale
Cost-effective
No Cost to get started
Pay only for what you consume
Application Compatibility
* Six different engines to choose from
* Discuss some popular applications, or even your own code, and how it can still work on RDS. If you are using one of the engines that are currently supported then there is a good chance you can get it working on RDS.
There are a lot of things that an Expert DBA and infra person would have to do. With RDS you are getting this with just a few clicks.
There are some very well known and large companies that are utilizing RDS as a backbone for the applications or services that they offer. This slide has a small sample of some of those customers.
This is for Oracle, MySQL, MariaDB, PostgreSQL, SQL Server
*AirBNB is running its main MySQL databases on RDS. They chose RDS because it allowed them to simplify a lot of the administrative tasks associated with their databases. They were able to address replication and scaling needs through simple API calls or clicks in the management console.
Vodafone uses MySQL RDS to store cricket score information from its content providers and is able to ensure availability through a Multi-Availability Zone configuration.
Code.org is using RDS to support the operation of the tutorials that visitors to its site take and leverage the high availability features of the service to ensure that they can always server customers around the world.
Make more splashy, add Icons from detail page
Expedia says. “With DynamoDB we were up and running in a less than day, and there is no need for a team to maintain.”. They are an example of a customer powering web applications using DynamoDB.
==
STORY BACKGROUND
Expedia initially started out setting up Cassandra cluster to support an application to collect data for its test & learn experiments that run on Expedia sites, then processes and stores them
The application processes ~200 million messages a day
The team spent more than a week setting up 3 node Cassandra ring and was nowhere close to setting up Cassandra in AWS with clustering, monitoring for scaling out
With DynamoDB we were up and running in a less than day
SOLUTION AND BENEFITS
Setup, monitoring, ease to scale made us choose DynamoDB over Cassandra
With DynamoDB there is no need for a team to maintain
The application uses Apache Storm, Amazon DynamoDB, and Amazon ElastiCache redis
ADDITIONAL INFORMATION
https://www.youtube.com/watch?v=ie4dWGT76LM
3 MM bids/ second(!)
3 PB of raw data per day, which translates to 180TB of compressed data per day
Simple – Two perspectives End User, Operational
Fast – Optimized for scanning PB – TB data sets
- Columnar Storage, Compression, 1MB Block Size
- MPP shared nothing architecture
CHEAP – Reserved node pricing $1000TB/YR HDD
On-Demand SSD platform cheap as $0.25/HR
Common themes that customers focus on when they describe their experience with Redshift are its high performance, low price, and easy of use. Here are some quotes form customers about performance.
When you are migrating between different database engines there may be cases where the schema you created in your current database will not directly convert to the Aurora database engine and requires you to make some changes to it before it will work in Aurora.
The AWS Schema Conversion Tool is a development environment that you download to your desktop and use to save time when migrating from Oracle and SQL Server to next-generation cloud databases such as Amazon Aurora.
You can convert database objects such as tables, indexes, views, stored procedures, and Data Manipulation Language (DML) statements like SELECT, INSERT, DELETE, UPDATE. For any objects that cannot be converted the tool will call out what it could not convert so that you can look up what you need to change manually.
This tool can greatly save development time in researching and fixing objects that you want to migrate from your database into Aurora.
When it comes to moving your database onto RDS there are a variety of approaches and tools that you can utilize.
The AWS Database Migration Service is one way that you can use to migrate your database onto RDS.
This service was first announced at our re:Invent conference in October 2015 we announced and then on March 15th this service was made generally available.
This service allows you to move data from your source database that may be running in your own data center, on an EC2 instance, or even in RDS into another RDS instance that is running the same or a different database engine.
Once you have this service hooked up between your source and target database the service it will replicate the existing data from the source to the target and it will also capture changes on the source and apply those to the target. What this allows you to do is keep your applications up and running during the migration process. You also have the ability to let the migration process run as long as you need with this approach.