This document discusses big data solutions in the cloud. It describes Hadoop, MapReduce, and NoSQL databases for storing and analyzing large datasets. It also discusses AWS services like S3, EMR, Redshift, DynamoDB, and Kinesis that can be used to build scalable big data architectures in the cloud. Examples are provided showing how these AWS services can be used together to perform log analysis, recommendations, and streaming analytics on big data.
10. Big Data Verticals and Use cases
Media/
Advertising
Targeted
Advertising
Image and
Video
Processing
Oil & Gas
Seismic
Analysis
Retail
Recommendations
Transactions
Analysis
Life
Sciences
Genome
Analysis
Financial
Services
Monte Carlo
Simulations
Risk
Analysis
Security
Anti-virus
Fraud
Detection
Image
Recognition
Social
Network/
Gaming
User
Demographics
Usage
analysis
In-game
metrics
14. 400 GB of logs per day
~12 Terabytes per month
15.
16. 1) Load log file data for six
months of user search history
into Amazon S3
Amazon S3
Search ID Search Text Final Selection
12423451 westen Westin
14235235 wisten Westin
54332232 westenn Westin
12423451
14235235
54332232
12423451
14235235
54332232
12423451
14235235
54332232
12423451
14235235
54332232
12423451
17. Amazon S3 Amazon EMR
Log Files
2) Spin up a 200 node cluster
Hadoop Cluster
18. 3) 200 nodes simultaneously analyze
this data looking for common
misspellings
… this takes a few hours
Hadoop Cluster
Amazon S3 Amazon EMR
19. Amazon S3 Amazon EMR
4) New common misspellings and
suggestions loaded back into S3
Hadoop Cluster
Log Files
20. Amazon S3 Amazon EMR
5) When the job is done, the
cluster is shut down.
Log Files
31. ID Age State
123 20 CA
345 25 WA
678 40 FL
Relational Table
ID Attributes
123 Age:20, State:CA
345 Age:25, Country: Australia, Gender: F, Smoker: No
678 Age:40
Non-Relational Table
37. Data
Sources
App.4
[Machine
Learning]
AWS
Endpoint
App.1
[Aggregate
&
De-‐Duplicate]
Data
Sources
Data
Sources
Data
Sources
App.2
[Metric
ExtracIon]
S3
DynamoDB
Redshift
App.3
[Sliding
Window
Analysis]
Data
Sources
Availability
Zone
Availability
Zone
Shard 1
Shard 2
Shard N
Availability
Zone
Amazon Kinesis
EMR
39. Amazon Mobile Analytics
Fast: get your data within an hour
Automatic MAU, DAU, session and
retention reports
Design and track custom app events
Data is not mined or sold by Amazon
40. Expand your skills with AWS
Certification
Exams
Validate your proven
technical expertise with
the AWS platform
aws.amazon.com/certification
On-Demand
Resources
Videos & Labs
Get hands-on practice
working with AWS
technologies in a live
environment
aws.amazon.com/training/
self-paced-labs
Instructor-Led
Courses
Training Classes
Expand your technical
expertise to design, deploy,
and operate scalable,
efficient applications on AWS
aws.amazon.com/training
41. Big Data Tutorials
aws.amazon.com/big-data
Redshift Free Trial
aws.amazon.com/redshift/free-trial