AWS Paris Summit 2014 - Closing Keynote Werner Vogels - Beyond the fridge

Beyond the Fridge 
The world of Connected Data !
Dr. Werner Vogels!
CTO, Amazon.com!

The amount of information generated during the ﬁrst day of
a baby’s life today is equivalent to 70 times the information
contained in the Library of Congress"

Observations – Theory – Models – Facts"

Human Genome Project"
Collaborative project to sequence every single letter!
of the human genetic code.!
13 years and $billions to complete.!
Gigabyte scale datasets (transferred between sites on!
iPods!)!

Beyond the Human Genome"
45+ species sequenced: mouse, rat, gorilla, rabbit, !
platypus, nematode, zebra ﬁsh...!
Compare genomes between species to identify!
biologically interesting areas of the genome.!
100Gb scale datasets. Increased computational
requirements.!

The Next Generation"
New sequencing instruments lead to a dramatic!
drop in cost and time required to sequence a genome.!
Sequence and compare genetic code of individuals to!
ﬁnd areas of variation. Much more interesting.!
Terabyte scale datasets. Signiﬁcant computational
requirements.!

The 1000 Genomes Projects"
Public/private consortium to build world’s largest!
collection of human genetic variation.!
Hugely important dataset to drive new insight into!
known genetic traits, and the identiﬁcation of new ones.!
Vast, complex data and computational resources required,
beyond reach of most research groups and hospitals.!

1000 Genomes in the Cloud"
The 1000 Genomes data made available to all on AWS.!
Stored for free as part of the Public Datasets program.!
Updated regularly.!
200Tb. 1700 individual genomes. As much compute and
storage as required available to all.!

Dropcam
is
the
biggest
inbound
video

service
on
the
Web

•  More
data
uploaded
per

minute
than
YouTube

•  Petabytes
of
data

processed
every
month

•  Billions
of
mo=on
events

detected

Who
is
my
customer
really?

What
do
people
really
like?

What
is
happening
socially
with
my
products?

Where
do
people
consume
my
product?

How
do
people
really
use
your
product?

75% of users select"
movies based on"
recommendations"

More than 27 million users!
~ 30 million plays per day!
More than 40 billion events per day !
~ 4 million ratings per day!
~ 3 million searches per day!
Geo-location data!
Device information!
Time of day and week (it now can verify that users watch more TV shows during
the week and more movies during the weekend)!
Metadata from third parties such as Nielsen!
Social media data from Facebook and Twitter!

COLLECT
|
STORE
|
ORGANIZE
|
ANALYZE
|
SHARE

What was happening  
yesterday?!

What ! right now?!
trades are executing!
is the exception rate!
is the ad click-through!
topics are trending"
inventory remains!
queries are slow!
are the high scores!
!
!

Kinesis
architecture

Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates data
across three data centers (availability zones)
Aggregate and
archive to S3
Millions of
sources producing
100s of terabytes
per hour
Front
End
Authentication
Authorization
Ordered stream
of events supports
multiple readers
Real-time
dashboards
and alarms
Machine learning
algorithms or
sliding window
analytics
Aggregate analysis
in Hadoop or a
data warehouse
Inexpensive: $0.028 per million puts

AWS
Internal
Metering
Service

Capture
Submissions
Process in
Realtime
Store in
Redshift
Clients
Submitting
Data
Workload
•  Tens of millions records/sec
•  Multiple TB per hour
•  100,000s of sources
New features
•  Scale with the business
•  Provide real-time alerting
•  Inexpensive
•  Improved auditing

Workload

•  Daily
load
of
billions
records
from
millions
of
files
from

hundreds
of
sources

•  3
hour
SLA
to
load
and
audit
data

•  Hundreds
of
customers

•  Hundreds
of
queries
per
hour

New
features

•  Our
data
is
fresh,
we
ingest
every
6
hours

•  Now
processing
triple
the
volume
in
less
than
25%
of

the
=me

•  “Hammerstone”
ETL
solu=on

–  Built
on
AWS
Data
Pipeline

–  Build
business
specific
marts

–  Build
workload
specific
clusters

•  Supports
a
variety
of
analy=cs
tools:
Tableau,
R,
Toad,

SQL
Developer,
etc.

Internal
AWS
Data
Warehouse

Over 200 internal
data sources
Data staged in
Amazon S3
"Hammerstone:"
Custom ETL
using AWS
Data Pipeline
Data processing
Redshift cluster
Batch reporting
Redshift cluster
Ad hoc query
Redshift cluster

CONNECTED DATA
REQUIRES 
NO LIMITS"

Cloud enables
connected data
collection!

Cloud enables
connected data
processing!

Cloud enables
connected data
collaboration!

AWS Paris Summit 2014 - Closing Keynote Werner Vogels - Beyond the fridge

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to AWS Paris Summit 2014 - Closing Keynote Werner Vogels - Beyond the fridge

Similar to AWS Paris Summit 2014 - Closing Keynote Werner Vogels - Beyond the fridge (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

AWS Paris Summit 2014 - Closing Keynote Werner Vogels - Beyond the fridge