SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ray Zhu, Sr. Product Manager, Amazon Kinesis
August 14, 2017
Analyzing Streaming Data in Real-
time with Amazon Kinesis
What to Expect from the Session
• Streaming data overview
• Amazon Kinesis overview
• Customer stories
• Streaming data architectures
• Example – twitter feeds analysis
• Getting started resources
Streaming Data Overview
Most data is produced continuously
Mobile Apps Web Clickstream Application Logs
Metering Records IoT Sensors Smart Buildings
[Wed Oct 11 14:32:52
2000] [error] [client
127.0.0.1] client
denied by server
configuration:
/export/home/live/ap/h
tdocs/test
The diminishing value of data
Processing real-time, streaming data
• Durable
• Continuous
• Fast
• Correct
• Reactive
• Reliable
What are the key requirements?
Ingest Transform Analyze React Persist
Amazon Kinesis Overview
Amazon Kinesis makes it easy to work with
real-time streaming data
Kinesis
Streams
Stores data as a
continuous
replayable stream
for custom
applications
Kinesis
Firehose
Load streaming data
into Amazon S3,
Amazon Redshift, and
Amazon Elasticsearch
Service
Kinesis
Analytics
Analyze data
streams using
standard SQL
queries
Amazon Kinesis Streams
• Reliably ingest and durably store streaming data at low
cost
• Build custom real-time applications to process streaming
data
Amazon Kinesis Firehose
• Reliably ingest, transform, and deliver batched, compressed,
and encrypted data to S3, Redshift, and Elasticsearch
• Point and click setup with zero administration and seamless
elasticity
Amazon Kinesis Analytics
• Interact with streaming data in real-time using SQL
• Build fully managed and elastic stream processing
applications that process data for real-time visualizations
and alarms
Customer Stories
Amazon Kinesis customer diversity
Sushiro
Amazon Game Studio
Amazon Game Studio
Hearst
Hearst
Common streaming data
architectures
Generate time series analytics
• Compute key performance indicators over time periods
• Combine with static or historical data in S3 or Amazon Redshift
Analytics
Streams
Firehose
Amazon
Redshift
S3
Streams
Firehose
Custom, real-
time
destinations
Feed real-time dashboards
• Validate and transform raw data, and then process to calculate
meaningful statistics
• Send processed data downstream for visualization in BI and
visualization services
Amazon
QuickSight
Analytics
Amazon ES
Amazon
Redshift
Amazon
RDS
Streams
Firehose
Create real-time alarms and notifications
• Build sequences of events from the stream, like user sessions in a
clickstream or app behavior through logs
• Identify events (or a series of events) of interest, and react to the
data through alarms and notifications
Analytics
Streams
Firehose
Streams
Amazon
SNS
Amazon
CloudWatch
Lambda
Example: NHL Tweet Analysis
Example Scenario Requirements
Data to capture
• Filter for NHL-related tweets
• Total number of tweets per hour that contain hashtags for
NHL teams
• Top 5 NHL-related hashtags per hour
Output Requirements
• Filtered tweets are saved to Amazon S3
• Hourly aggregate count is saved to Amazon Elasticsearch
Service
• Full team name of top 5 hashtags are saved to Amazon
Elasticsearch Service
Why use Kinesis Analytics for this solution?
Challenges
• Twitter stream can be noisy
• Tweet structure is complex, with several levels of
nested JSON
• NHL-related tweet volume is cyclical
With Kinesis Analytics
• Easily filter out unwanted tweets
• Normalize tweet schema for simple SQL queries
• Automatically scale to meet demand
End-to-End Architecture
Amazon
Kinesis
Stream
Amazon
Kinesis
Analytics
Amazon
Kinesis
Firehose
Amazon
Elasticsearch
Service
Amazon S3
EC2
Instance
Reference
Data
Twitter Data Input
Amazon Kinesis StreamEC2 Instance
Source JSON Data
• Hundreds of attributes
per Tweet
• ~5KB per Tweet
• Most attributes are
unnecessary for use
case
Kinesis Stream Input
• 5 attributes per record
• ~250B per record
• Relevant attributes
How are tweets mapped to a schema?
Amazon Kinesis stream Amazon KinesisAnalytics
{
"id": 795296435386388500,
"text": "#Canes game tonight! #NHL",
"created_at": "11-06-2016 16:07:00",
"tags": [{
"tag": "Canes"
}, {
"tag": "NHL"
}]
}
id text created_at tag
795… #Canes… 11-06-2016… Canes
795… #Canes… 11-06-2016… NHL
Source Data for Kinesis Analytics
How is streaming data accessed with SQL?
STREAM
• Analogous to a TABLE
• Represents continuous data flow
CREATE OR REPLACE STREAM "NHL_TWEET_STREAM" (
ID BIGINT, TWEET_TEXT VARCHAR(140), HASHTAG VARCHAR(140));
PUMP
• Continuous INSERT query
• Inserts data from one in-application stream to another
CREATE OR REPLACE PUMP "NHL_TWEET_PUMP" AS
INSERT INTO "NHL_TWEET_STREAM"
SELECT STREAM * FROM . . .
How do we model our data?
NHL_TWEET_STREAM
•ID
•TEXT
•HASHTAG
TOTAL_TWEETS_STREAM
•TWEET_COUNT
MENTION_COUNT_STREAM
•TEAMNAME
•MENTIONCOUNT
SELECT STREAM *
FROM "SOURCE_STREAM"
WHERE "HASHTAG" NOT IN
<exclude list>
SOURCE_STREAM
•id
•text
•tag
Kinesis
stream
Kinesis
Firehose
Kinesis
Firehose
TeamName
• hashtag
• team
How do we get team name from the hashtag?
• Create CSV file with hashtag to team name map in S3
• Configure Kinesis Analytics application to import file as
reference data
• Reference data appears as a table
• Join streaming data on reference data
hashtag,team
NYRangers,New York Rangers
NYR,New York Rangers
Canes,Carolina Hurricanes
Oilers,Edmonton Oilers
Sharks,San Jose Sharks
s3://mybucket/team_map.csv
Use Reference Data in Query
SELECT STREAM tn."team"
FROM "NHL_TWEET_STREAM" tweets
INNER JOIN "TeamName" tn
ON tweets."HASHTAG" =
LOWER(tn."hashtag")
TeamName
• hashtag
• team
NHL_TWEET_STREAM
•ID
•TEXT
•HASHTAG
hashtag,team
NYRangers,New York Rangers
NYR,New York Rangers
Canes,Carolina Hurricanes
Oilers,Edmonton Oilers
Sharks,San Jose Sharks
s3://mybucket/team_map.csv
New York Rangers
New York Rangers
Carolina Hurricanes
Edmonton Oilers
San Jose Sharks
How do we aggregate streaming data?
• A common requirement in streaming
analytics is to perform set-based operation(s)
(count, average, max, min,..) over events
that arrive within a specified period of time
• Cannot simply aggregate over an entire table
like typical static database
• How do we define a subset in a potentially infinite
stream?
• Windowing functions!
Windowing Concepts
• Windows can be tumbling or sliding
• Windows are fixed length
Output record will have the timestamp of the end of the window
1 5 4 26 8 6 4
t1 t2 t5 t6t3 t4
Time
Window1 Window2 Window3
Aggregate
Function (Sum)
18 14
Output Events
Comparing Types of Windows
• Output created at the end of the window
• The output of the window will be single event based on the
aggregate function used
Tumbling window
Aggregate per time interval
Sliding window
Windows constantly re-evaluated
How do we aggregate team mentions per hour?
• Use TOP_K_ITEMS_TUMBLING function
• Pass cursor to team name stream
• Define window size of 3600 seconds
INSERT INTO "MENTION_COUNT_STREAM"
SELECT STREAM *
FROM TABLE(TOP_K_ITEMS_TUMBLING(
CURSOR(SELECT STREAM tn."team"... ),
'team', -- name of column to aggregate
5, -- number of top items
3600 -- tumbling window size in seconds
));
Output to Kinesis Firehose
MENTION_COUNT_STREAM
•TEAMNAME
•MENTIONCOUNT
Amazon
Elasticsearch
Service
KibanaAmazon
Kinesis Firehose
{
"aggregationtime": "2016-11-06T14:42:03.335",
"teamname": "New York Rangers",
"mentioncount": 604
}
5 records, every hour
Visualize Results with Kibana
Getting Started Today
Getting started resources
• Kinesis Home Page: https://aws.amazon.com/kinesis/
• Kinesis Data Generator: https://aws.amazon.com/blogs/big-data/test-your-
streaming-data-solution-with-the-new-amazon-kinesis-data-generator/
• Kinesis Firehose Lab:
https://run.qwiklab.com/focuses/preview/1641?search=44727
• Build a Log Analytics Solution: https://aws.amazon.com/getting-
started/projects/build-log-analytics-solution/
• Blog Post: https://aws.amazon.com/kinesis/firehose/blog-posts/
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
Build a Real-time Streaming Data Visualization System with Amazon Kinesis Ana...
Build a Real-time Streaming Data Visualization System with Amazon Kinesis Ana...Build a Real-time Streaming Data Visualization System with Amazon Kinesis Ana...
Build a Real-time Streaming Data Visualization System with Amazon Kinesis Ana...
 
ENT312 Learn about Software Procurement Using AWS Marketplace and Service Cat...
ENT312 Learn about Software Procurement Using AWS Marketplace and Service Cat...ENT312 Learn about Software Procurement Using AWS Marketplace and Service Cat...
ENT312 Learn about Software Procurement Using AWS Marketplace and Service Cat...
 
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
 
ENT302 Deep Dive on AWS Management Tools and New Launches
ENT302 Deep Dive on AWS Management Tools and New LaunchesENT302 Deep Dive on AWS Management Tools and New Launches
ENT302 Deep Dive on AWS Management Tools and New Launches
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
ENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the CloudENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the Cloud
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
ENT317 Dynamic Infrastructure? Migrating? Adventures in Keeping Your Applicat...
ENT317 Dynamic Infrastructure? Migrating? Adventures in Keeping Your Applicat...ENT317 Dynamic Infrastructure? Migrating? Adventures in Keeping Your Applicat...
ENT317 Dynamic Infrastructure? Migrating? Adventures in Keeping Your Applicat...
 
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
 
Getting Started with Docker on AWS
Getting Started with Docker on AWSGetting Started with Docker on AWS
Getting Started with Docker on AWS
 
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & Kibana
 
Optimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsOptimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics Workloads
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
 

Ähnlich wie SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis

React Fast by Processing Streaming Data - AWS Summit Tel Aviv 2017
React Fast by Processing Streaming Data - AWS Summit Tel Aviv 2017React Fast by Processing Streaming Data - AWS Summit Tel Aviv 2017
React Fast by Processing Streaming Data - AWS Summit Tel Aviv 2017
Amazon Web Services
 

Ähnlich wie SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis (20)

BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Serverless Real Time Analytics
Serverless Real Time AnalyticsServerless Real Time Analytics
Serverless Real Time Analytics
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
React Fast by Processing Streaming Data - AWS Summit Tel Aviv 2017
React Fast by Processing Streaming Data - AWS Summit Tel Aviv 2017React Fast by Processing Streaming Data - AWS Summit Tel Aviv 2017
React Fast by Processing Streaming Data - AWS Summit Tel Aviv 2017
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Path to the future #4 - Ingestão, processamento e análise de dados em tempo realPath to the future #4 - Ingestão, processamento e análise de dados em tempo real
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
 
Streaming Data Analytics with Amazon Redshift Firehose
Streaming Data Analytics with Amazon Redshift FirehoseStreaming Data Analytics with Amazon Redshift Firehose
Streaming Data Analytics with Amazon Redshift Firehose
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
 
AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
 

Mehr von Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ray Zhu, Sr. Product Manager, Amazon Kinesis August 14, 2017 Analyzing Streaming Data in Real- time with Amazon Kinesis
  • 2. What to Expect from the Session • Streaming data overview • Amazon Kinesis overview • Customer stories • Streaming data architectures • Example – twitter feeds analysis • Getting started resources
  • 4. Most data is produced continuously Mobile Apps Web Clickstream Application Logs Metering Records IoT Sensors Smart Buildings [Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/h tdocs/test
  • 6. Processing real-time, streaming data • Durable • Continuous • Fast • Correct • Reactive • Reliable What are the key requirements? Ingest Transform Analyze React Persist
  • 8. Amazon Kinesis makes it easy to work with real-time streaming data Kinesis Streams Stores data as a continuous replayable stream for custom applications Kinesis Firehose Load streaming data into Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service Kinesis Analytics Analyze data streams using standard SQL queries
  • 9. Amazon Kinesis Streams • Reliably ingest and durably store streaming data at low cost • Build custom real-time applications to process streaming data
  • 10. Amazon Kinesis Firehose • Reliably ingest, transform, and deliver batched, compressed, and encrypted data to S3, Redshift, and Elasticsearch • Point and click setup with zero administration and seamless elasticity
  • 11. Amazon Kinesis Analytics • Interact with streaming data in real-time using SQL • Build fully managed and elastic stream processing applications that process data for real-time visualizations and alarms
  • 20. Generate time series analytics • Compute key performance indicators over time periods • Combine with static or historical data in S3 or Amazon Redshift Analytics Streams Firehose Amazon Redshift S3 Streams Firehose Custom, real- time destinations
  • 21. Feed real-time dashboards • Validate and transform raw data, and then process to calculate meaningful statistics • Send processed data downstream for visualization in BI and visualization services Amazon QuickSight Analytics Amazon ES Amazon Redshift Amazon RDS Streams Firehose
  • 22. Create real-time alarms and notifications • Build sequences of events from the stream, like user sessions in a clickstream or app behavior through logs • Identify events (or a series of events) of interest, and react to the data through alarms and notifications Analytics Streams Firehose Streams Amazon SNS Amazon CloudWatch Lambda
  • 23. Example: NHL Tweet Analysis
  • 24. Example Scenario Requirements Data to capture • Filter for NHL-related tweets • Total number of tweets per hour that contain hashtags for NHL teams • Top 5 NHL-related hashtags per hour Output Requirements • Filtered tweets are saved to Amazon S3 • Hourly aggregate count is saved to Amazon Elasticsearch Service • Full team name of top 5 hashtags are saved to Amazon Elasticsearch Service
  • 25. Why use Kinesis Analytics for this solution? Challenges • Twitter stream can be noisy • Tweet structure is complex, with several levels of nested JSON • NHL-related tweet volume is cyclical With Kinesis Analytics • Easily filter out unwanted tweets • Normalize tweet schema for simple SQL queries • Automatically scale to meet demand
  • 27. Twitter Data Input Amazon Kinesis StreamEC2 Instance Source JSON Data • Hundreds of attributes per Tweet • ~5KB per Tweet • Most attributes are unnecessary for use case Kinesis Stream Input • 5 attributes per record • ~250B per record • Relevant attributes
  • 28. How are tweets mapped to a schema? Amazon Kinesis stream Amazon KinesisAnalytics { "id": 795296435386388500, "text": "#Canes game tonight! #NHL", "created_at": "11-06-2016 16:07:00", "tags": [{ "tag": "Canes" }, { "tag": "NHL" }] } id text created_at tag 795… #Canes… 11-06-2016… Canes 795… #Canes… 11-06-2016… NHL Source Data for Kinesis Analytics
  • 29. How is streaming data accessed with SQL? STREAM • Analogous to a TABLE • Represents continuous data flow CREATE OR REPLACE STREAM "NHL_TWEET_STREAM" ( ID BIGINT, TWEET_TEXT VARCHAR(140), HASHTAG VARCHAR(140)); PUMP • Continuous INSERT query • Inserts data from one in-application stream to another CREATE OR REPLACE PUMP "NHL_TWEET_PUMP" AS INSERT INTO "NHL_TWEET_STREAM" SELECT STREAM * FROM . . .
  • 30. How do we model our data? NHL_TWEET_STREAM •ID •TEXT •HASHTAG TOTAL_TWEETS_STREAM •TWEET_COUNT MENTION_COUNT_STREAM •TEAMNAME •MENTIONCOUNT SELECT STREAM * FROM "SOURCE_STREAM" WHERE "HASHTAG" NOT IN <exclude list> SOURCE_STREAM •id •text •tag Kinesis stream Kinesis Firehose Kinesis Firehose TeamName • hashtag • team
  • 31. How do we get team name from the hashtag? • Create CSV file with hashtag to team name map in S3 • Configure Kinesis Analytics application to import file as reference data • Reference data appears as a table • Join streaming data on reference data hashtag,team NYRangers,New York Rangers NYR,New York Rangers Canes,Carolina Hurricanes Oilers,Edmonton Oilers Sharks,San Jose Sharks s3://mybucket/team_map.csv
  • 32. Use Reference Data in Query SELECT STREAM tn."team" FROM "NHL_TWEET_STREAM" tweets INNER JOIN "TeamName" tn ON tweets."HASHTAG" = LOWER(tn."hashtag") TeamName • hashtag • team NHL_TWEET_STREAM •ID •TEXT •HASHTAG hashtag,team NYRangers,New York Rangers NYR,New York Rangers Canes,Carolina Hurricanes Oilers,Edmonton Oilers Sharks,San Jose Sharks s3://mybucket/team_map.csv New York Rangers New York Rangers Carolina Hurricanes Edmonton Oilers San Jose Sharks
  • 33. How do we aggregate streaming data? • A common requirement in streaming analytics is to perform set-based operation(s) (count, average, max, min,..) over events that arrive within a specified period of time • Cannot simply aggregate over an entire table like typical static database • How do we define a subset in a potentially infinite stream? • Windowing functions!
  • 34. Windowing Concepts • Windows can be tumbling or sliding • Windows are fixed length Output record will have the timestamp of the end of the window 1 5 4 26 8 6 4 t1 t2 t5 t6t3 t4 Time Window1 Window2 Window3 Aggregate Function (Sum) 18 14 Output Events
  • 35. Comparing Types of Windows • Output created at the end of the window • The output of the window will be single event based on the aggregate function used Tumbling window Aggregate per time interval Sliding window Windows constantly re-evaluated
  • 36. How do we aggregate team mentions per hour? • Use TOP_K_ITEMS_TUMBLING function • Pass cursor to team name stream • Define window size of 3600 seconds INSERT INTO "MENTION_COUNT_STREAM" SELECT STREAM * FROM TABLE(TOP_K_ITEMS_TUMBLING( CURSOR(SELECT STREAM tn."team"... ), 'team', -- name of column to aggregate 5, -- number of top items 3600 -- tumbling window size in seconds ));
  • 37. Output to Kinesis Firehose MENTION_COUNT_STREAM •TEAMNAME •MENTIONCOUNT Amazon Elasticsearch Service KibanaAmazon Kinesis Firehose { "aggregationtime": "2016-11-06T14:42:03.335", "teamname": "New York Rangers", "mentioncount": 604 } 5 records, every hour
  • 40. Getting started resources • Kinesis Home Page: https://aws.amazon.com/kinesis/ • Kinesis Data Generator: https://aws.amazon.com/blogs/big-data/test-your- streaming-data-solution-with-the-new-amazon-kinesis-data-generator/ • Kinesis Firehose Lab: https://run.qwiklab.com/focuses/preview/1641?search=44727 • Build a Log Analytics Solution: https://aws.amazon.com/getting- started/projects/build-log-analytics-solution/ • Blog Post: https://aws.amazon.com/kinesis/firehose/blog-posts/