Weitere ähnliche Inhalte Ähnlich wie ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis (20) Mehr von Amazon Web Services (20) ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Analyzing Streaming Data in Real
Time with Amazon Kinesis
R y a n N i e n h u i s , S e n i o r P r o d u c t M a n a g e r , A m a z o n K i n e s i s
N o v e m b e r 2 0 1 7
2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hourly server logs
Weekly or monthly bills
Daily web-site clickstream
Daily fraud reports
Real time metrics
Real time spending alerts/caps
Real time clickstream analysis
Real time detection
It’s All About the Pace
Batch Processing Stream Processing
3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why? Data loses value over time
Ingest data as it is generated
Analyze data in real time to get
insights immediately
Deliver data to in seconds instead
of hours
4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Simple Pattern for Streaming Data
Continuously creates
data
Continuously writes
data to a stream
Can be almost anything
Data Producer
Durably stores data
Provides temporary
buffer that preps data
Supports very high-
throughput
Streaming Service
Continuously processes
data
Cleans, prepares, &
aggregates
Transforms data to
information
Data Consumer
Mobile Client Amazon Kinesis Amazon Kinesis app
5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Firehose
Build custom
applications that process
and analyze streaming
data
Easily process and
analyze streaming data
with standard SQL
Easily load streaming
data into AWS
6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Data Streams
• Easy administration and low cost
• Build real time applications with framework of choice
• Secure, durable storage
7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Data Analytics
• Powerful real time applications
• Easy to use, fully managed
• Automatic elasticity
8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Data Firehose
• Zero administration and seamless elasticity
• Direct-to-data store integration
• Serverless, continuous data transformations
Amazon S3
Amazon Redshift
9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Data Analytics Applications
Easily write SQL code to process streaming data
Connect to streaming source
Continuously deliver SQL results
10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Common Use Cases
11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Streaming
Ingest-
Transform-Load
Continuous
Metric
Generation
Actionable
Insights
Three Common Scenarios
Compute analytics as the data is generated
React to analytics based off of insights
Deliver data to analytics tools faster and
cheaper
12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Web Analytics and Leaderboards
Amazon
Kinesis Data
Analytics
AWS
Lambda
function
Amazon
Cognito
Lightweight JS
client code
Web Server on
Amazon EC2
Instance
OR
Amazon
DynamoDB
Table
Amazon
Kinesis Data
Streams
Compute top 10 usersIngest web app data Persist to feed live apps
13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring IoT Devices
IoT sensors AWS IoT
Amazon
RDS
MySQL DB
instance
Amazon
Kinesis
Data
Streams
Amazon
Kinesis
Data
Analytics
AWS
Lambda function
Compute avg temp
every 10 sec
Ingest sensor data
Persist time series
analytic to database
14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analyzing CloudTrail Event Logs
AWS
CloudTrail
Amazon
CloudWatch
events trigger
Amazon
Kinesis
Data Analytics
AWS
Lambda
function
Amazon S3
bucket for raw
data
Amazon S3
bucket for
processed data
Amazon
DynamoDB
Table(s)
Chart.JS
Dashboard
Compute
operational metrics
Ingest and deliver raw
log data
Deliver to a real time
dashboards and archival
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Firehose
15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Dive into
Analyzing CloudTrail Event Logs
16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ingest and deliver CloudTrail events
• CloudTrail provides continuous
account activity logging
• Events are sent in real time (to near
real time) to Kinesis Data Firehose or
Streams
• Each event includes a timestamp, IAM
user, AWS service name, API call,
response, and more
AWS
CloudTrail
Amazon
CloudWatch
events trigger
Amazon S3
bucket for raw
data
Kinesis Data
Firehose
17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Automatic ingestion Easy setup Write your own
Stream Data to Amazon Kinesis
Amazon
VPC Flow
Logs
Elastic Load
Balancing
Amazon
RDS
Amazon
CloudWatch
Logs
AWS
CloudTrail
Event Logs
Amazon
Pinpoint
Amazon API
Gateway
AWS IoT
events
AWS SDKs
Amazon
DynamoDB
Amazon
Kinesis Agent
Amazon
Kinesis
Producer
Library
As a proxy:
For change data capture:
Just a sample… many more ways stream data to Amazon Kinesis
18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Compute operational metrics in real time
Compute metrics using SQL in real time
like:
• Total calls by IP, service, API call, IAM
user
• Amazon EC2 API failures (or any other
service)
• Anomalous behavior of Amazon EC2
API (or any other service)
• Top 10 API calls across all services
Amazon
Kinesis
Data Analytics
Raw data Real time
analytics
19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How do I write streaming SQL? Easy!
Streams (in memory tables)
CREATE STREAM calls_per_ip_stream(
eventTimeStamp TIMESTAMP,
computationType VARCHAR(256),
category VARCHAR(1024),
subCategory VARCHAR(1024),
unit VARCHAR(256),
unitValue BIGINT
);
20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How do I write streaming SQL? Easy!
Pumps (continuous query)
CREATE OR REPLACE PUMP calls_per_ip_pump AS
INSERT INTO calls_per_ip_stream
SELECT STREAM "eventTimestamp",
COUNT(*),
"sourceIPAddress"
FROM source_sql_stream_001 ctrail
GROUP BY "sourceIPAddress",
STEP(ctrail.ROWTIME BY INTERVAL '1' MINUTE),
STEP(ctrail."eventTimestamp" BY INTERVAL '1' MINUTE);
21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How do we aggregate streaming data?
• Aggregations (count, sum, min,…) take granular real time data
and turn it into insights
• Data is continuously processed so you need to tell the
application when you want results
Windows!
22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Window Types
• Sliding, tumbling, and custom windows
• Tumbling windows are fixed size and grouped keys do not
overlap
23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Event, ingest, and processing time
• Event time is the timestamp is assigned when the event
occurred, also called client-side time.
• Processing time is when your application reads and analyzes the
data (ROWTIME).
…
GROUP BY "sourceIPAddress",
/* Trigger for results */
STEP(ctrail.ROWTIME BY INTERVAL '1' MINUTE),
/* A timestamp grouping key */
STEP(ctrail."eventTimestamp" BY INTERVAL '1' MINUTE);
24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Persist data for real time dashboards
• Use Kinesis Data
Firehose to archive
processed to in S3
• Use AWS Lambda to
deliver data to
DynamoDB (or another
database)
• Open source or other
tools to visualize the
data
Real time
analytics
AWS
Lambda
function
Amazon S3
bucket for
processed data
Amazon
DynamoDB
Table(s)
Chart.JS
Dashboard
Amazon Kinesis
Data Firehose
25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Late results
• An event is late if it arrives after the computation for which it
logically belongs to has been completed
• Your Kinesis Analytics application will produce an amendment
…
GROUP BY "sourceIPAddress",
/* Trigger for results */
STEP(ctrail.ROWTIME BY INTERVAL '1' MINUTE),
/* A timestamp grouping key */
STEP(ctrail."eventTimestamp" BY INTERVAL '1' MINUTE);
26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Updating a database
• Perform inserts but on duplicate key update
• For DyanamoDB, here is the AWS Lambda code:
…
GROUP BY "sourceIPAddress",
/* Trigger for results */
STEP(ctrail.ROWTIME BY INTERVAL '1' MINUTE),
/* A timestamp grouping key */
STEP(ctrail."eventTimestamp" BY INTERVAL '1' MINUTE);
27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does all this cost?
• All services used in the solution are pay as you
go
• All services used are serverless and have lower
devops expense
• This solution will cost the “average” customer
less than:
$100 per month
28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Where do go next?
29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Try it out yourself
Go to aws.amazon.com/kinesis/
Some good examples:
• Get started in minutes with a clickthrough template for AWS
CloudTrail Event Log Analytics - <link> (friendly URL)
• Tinyurl.com/rt-dashboard
• Great blog posts with example use cases
30. Lots of customer examples
1 billion events/wk from
connected devices | IoT
17 PB of game data per
season | Entertainment
80 billion ad
impressions/day, 30 ms
response time | Ad Tech
100 GB/day click streams
from 250+ sites |
Enterprise
50 billion ad
impressions/day sub-50
ms responses | Ad Tech
10 million events/day
| Retail
Amazon Kinesis as Databus -
Migrate from Kafka to Kinesis| Enterprise
Funnel all
production events
through Amazon
Kinesis
33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
THANK YOU!