6. Getting your Data into AWS
Amazon S3
Corporate(Data(
Center(
• Console Upload
• FTP
• AWS Import Export
• S3 API
• Direct Connect
• Storage Gateway
• 3rd Party Commercial Apps
• Tsunami UDP
1
7. Services: Storage: Amazon S3
Deployment(&(Administra=on(
App(Services(
Compute( Storage(
Database(
Networking(
AWS(Global(Infrastructure(
Amazon Simple Storage Service (S3)
• Unlimited storage of objects of any type
• 99.999999999% durability, replicated across multiple facilities
• Cost effective storage, US$0.03/GB Month
• Granular access control and permissions over objects
• Encryption at rest using AES 256bit server side encryption
• Encryption in transit using HTTPS protocol
• High performance throughput supporting parallelized
upload or download
• Import or export data via physical device handling service
• Data remains in geographic location chosen
8. Write directly to a data source
Your%applica+on% Amazon S3
DynamoDB%
Any%other%data%
store%
Amazon S3
Amazon%EC2%%
2
9. Services: Database: Amazon DynamoDB
" Zero Admin NoSQL Service
" Unlimited Storage
" Provisioned Throughput
" Consistent <10ms response
" Durable on SSD
Compute( Storage(
Database(
Networking(
AWS(Global(Infrastructure(
21. What is Amazon Elastic MapReduce (EMR)?
EMR is Hadoop in the Cloud!
22. How does EMR work ?
EMR%Cluster
S3
Put the data
into S3
Choose: Hadoop distribution, # of
nodes, types of nodes, custom
configs, Hive/Pig/etc.
Get the output from
S3
Launch the cluster using the
EMR console, CLI, SDK, or
APIs
You can also store
everything in HDFS
25. Services: Database: Amazon Redshift
Amazon Redshift
• Easily and rapidly analyze petabytes of data
• Fully managed data warehouse service
• Automated deployment and administration
• 1/10th the cost of traditional data warehouses
• $1000 / Terabyte / year
• Compatible with popular BI tools
Deployment((Administra=on(
App(Services(
Compute( Storage(
Database(
Networking(
AWS(Global(Infrastructure(
26. Your choice of BI Tools on the cloud
Amazon%SQS%
Amazon S3
DynamoDB%
Any%SQL%or%NO%SQL%
Store%
Log%Aggrega+on%%
tools%
Amazon
EMR
Amazon
Redshift
Pre-processing
framework
41. Kinesis architecture
AZ AZ AZ
Durable, highly consistent storage replicates data
across three data centers (availability zones)
Amazon Web Services
Aggregate and
archive to S3
Millions of
sources producing
100s of terabytes
per hour
Front
End
Authentication
Authorization
Ordered stream
of events supports
multiple readers
Real-time
dashboards
and alarms
Machine learning
algorithms or
sliding window
analytics
Aggregate analysis
in Hadoop or a
data warehouse
Inexpensive: $0.028 per million puts
74. The AWS Big Data Portfolio
COLLECT | STORE | ANALYZE | SHARE
Direct Connect S3
Import Export
S3 EC2
DynamoDB Redshift
Glacier
EMR
Data Pipeline
Kinesis
CloudFront