2. 2
Big Data still challenges most enterprises
Store AnalyzeIngest
1 4
0 9
5CUSTOMER &
OPERATIONAL
DATA
CUSTOMER &
OPERATIONAL
INSIGHTS
ANALYTICS PIPELINE
Rigid data
ingest
Inability to process
new types of data
Limited analytics
platforms
Inability to generate
new types of insights
Data doesn’t
“connect”
Siloed data
prevents single
source of truth
8. 8
Service Used in this Lab
S3: Store data at Any Scalable with Unmatched Durability & Availability
• Built to store any amount of data
• Runs on the world’s largest global
cloud infrastructure
• Designed to deliver 99.999999999% durability
• Geographic redundancy & automatic replication
• Seamlessly replicates data between any region
• Tiered storage to optimize price/performance:
Store data at $0.023/GB/month at S3
($0.004/GB/month at Glacier)
S3
Standard
S3 Standard
Infrequent Access
S3 One Zone-IA
Glacier
10. 10
Amazon Kinesis—Real Time
time
Load data streams
into AWS data stores
Kinesis Data
Firehose
Build custom
applications that
analyze data streams
Kinesis Data
Streams
Capture, process,
and store video
streams for analytics
Kinesis Video
Streams
Analyze data streams
with SQL
Kinesis Data
Analytics
SQL
11. 11
Capture and submit
streaming data to Firehose
Analyze streaming data using your
favorite BI tools
Firehose loads streaming data
continuously into S3, Amazon Redshift,
and Amazon ES
Service Used in this Lab
Amazon Kinesis Firehose
Zero administration: Capture and deliver streaming data to Amazon S3, Amazon Redshift, or
Amazon Elasticsearch Service without writing an app or managing infrastructure.
Direct-to-data-store integration: Batch, compress, and encrypt streaming data for delivery
in as little as 60 seconds.
Seamless elasticity: Seamlessly scales to match data throughput without intervention.
12. 12
Kinesis Data Firehose—How it Works
Ingest Transform Deliver
Amazon S3
Amazon Redshift
Amazon Elasticsearch Service
AWS IoT
Amazon Kinesis Agent
Amazon Kinesis Streams
Amazon CloudWatch Logs
Amazon CloudWatch Events
Apache Kafka
13. 13
Kinesis Streams Kinesis Firehose
Use case Capture and expose data streams to build
arbitrary stream processing applications
Fully-managed service for automatic
loading of data streams into S3,
Redshift, ElasticSearch etc.
Provisioning and
Resource Management
Provisioned model via Streams No provisioning of the underlying
resource called a
DeliveryStream
Customer-controllable construct of ‘Shards’ No Shards (not a customer visible
construct)
Scaling Customer owns explicit scaling of stream/
shards via Split/Merge APIs
Firehose is completely elastic – no
explicit scaling actions needed.
Ingestion (Put Data) Customer owns “partition key’ as part of
PUT* API call
Different and simplified Put* API that
doesn’t need a partition key
Retrieval (Get Data) Get API to retrieve data directly from stream No Get* API – Data appears in
destination (like S3/ Redshift) after
meeting configuration criteria
Full-flexibility to write/ manage own
application with Kinesis Client Library and
connector libraries
No application to be written or
managed. Customers specify simple
set of pre-defined configurations on the
console or API
Amazon Kinesis Streams vs Firehose