A quick overview of AWS Kinesis: What is Kinesis, what problems does Kinesis solve, and how might you integrate Kinesis with an existing data warehouse.
8. Internet of Things
• Large number of sensors.
• Self registering
• Pushing data
• May or may not retain any
historic data.
= Only one chance to get data
9. Batch ETL
• Data needs to wait
somewhere between loads.
• If data is only loaded six hours
per day, then four-times as
much hardware is needed.
• Latency of hours
13. Problems with DIY Streaming ETL
1. Message queues deliver once. If you
want to fan out to many readers the
application in front needs to know about
each of them and queue the same
message repeatedly.
2. Order of message delivery is not
guaranteed.
3. If the program reading data crashes
partway through aggregating, messages
are lost.
14. What is Kinesis
• Kinesis is like a message queue,
but more scalable and with multiple
readers of each message.
• Kinesis is like a NOSQL database, but
with message delivery and daily purging.
• Kinesis is like an Enterprise Service Bus
focused on Analytics.
• For a limited, if common, use case
Kinesis is the best of all.
16. Kinesis Components
• Each Queue/DB is called a Stream
• Each stream scales by adding Shards
• Each Shard provides 1 MB/s in and
2MB/s out
• Shards are only $0.44/day, so autoscale
them to give some safety margin
• Also pay about 2 cents per million puts
17. Kinesis Client Library
• Kinesis expects you to write bespoke
producer and consumer programs
• KCL provides automatic multi-threading
with one worker thread per shard.
• Similar to Hadoop, framework handles
the lifting the bespoke program does the
“reduce”
• You have to autoscale the EC2 groups.
21. Integrating Kinesis into an
existing Data Warehouse
1. Access data in near real-time
2. Facilitate more-traditional ETL
3. Archive
22.
23. Near Real-time Data
1. Analyze individual transactions
2. Send alerts for both individual
transactions and trends
3. Aggregate to feed a
live dashboard
24. Facilitate Traditional ETL
1. Write lightly transformed data to
S3 to batch COPY into Redshift
2. Pre-compute aggregates, then
write them to S3
3. Provide a durable, replayable
buffer in front of traditional ETL
tools.
25. Archive
1. In addition to using your data,
Kinesis makes it easy to log the
full incoming data set to S3.
2. An object store makes more
sense for write-once/read-never
data than a database.
26. When to use Kinesis
1. Internet of Things (IOT)
2. Use for near-real-time
access to data.
3. Have more than one
consumer for each piece of
data.
27. Thanks
1. Our sponsors:
• API Talent
• AWS
• OptimalPeople
2. Bronwyn and Wyn
3. AWS for images on slides
Hinweis der Redaktion
Near-line recommendations, fault-analysis
Drinking from the firehose
Low Latency
Multiple outputs