SlideShare ist ein Scribd-Unternehmen logo
1 von 79
Downloaden Sie, um offline zu lesen
Real-time Analytics on
Sungmin, Kim
Solutions Architect, AWS
Agenda
• Why Real-time Data streaming and Analytics?
• How to Build?
• Where to Store streaming data?
• How to Ingest streaming data?
• How to Process streaming data?
• Delivery Streaming Data
• Dive into Stream Process Framework
• Transform, Aggregate, Join Streaming Data
• Case Studies
• Key Takeaways
Why Real-time Data streaming and Analytics?
Data
The world’s most
valuable resource is
no longer oil, but data.*
*Copyright: David Parkins , The Economist, 2017
“
”
Data Loses Value Over Time
* Source: Mike Gualtieri, Forrester, Perishable insights
Real time Seconds Minutes Hours Days Months
Valueofdatatodecision-making
Preventive/predictive
Actionable Reactive Historical
Time-critical decisions Traditional “batch” business intelligence
To create Value, derive insights in Real-time
Batch vs Real-time
Batch Difference Real-time
Arbitrarily, or Periodically Continuity Constant
Store → Process
(Hadoop MapReduce, Hive, Pig, Spark)
Method of analysis
Process → Store
(Spark Streaming, Flink, Apache Storm)
Small - Huge (KB~TB) Data size per a unit Small (B~KB)
Low - High (minutes to hours) Query Latency Low (milliseconds to minutes)
Low - High (hourly/daily/monthly) Request Rate Very High - High (in seconds, minutes)
High - Very high Durability Low - High
¢~$ (Amazon S3, Glacier) Cost/GB $$~$ (Redis, Memcached)
From Batch to Real-time:
Lambda Architecture
Data
Source
Stream
Storage
Speed Layer
Batch Layer
Batch
Process
Batch
View
Real-
time
View
Consumer
Query & Merge
Results
Service Layer
Stream
Ingestion
Raw Data
Storage
Streaming Data
Stream
Delivery
Stream
Process
Lambda Architecture
Streaming
Data
Batch View
Stream Process
Real-time
View
Query
Query
Batch View
Real-time
View
Raw Data
Batch Process
Batch Layer Serving Layer
Speed Layer
Key Components of Real-time Analytics
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Devices and/or
applications that
produce real-time
data at high
velocity
Data from tens of
thousands of data
sources can be
written to a single
stream
Data are stored in the
order they were
received for a set
duration of time and
can be replayed
indefinitely during
that time
Records are read in
the order they are
produced, enabling
real-time analytics
or streaming ETL
Data lake
(most common)
Database
(least common)
Where to Store Streaming
Data?
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Stream Storage
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Amazon Kinesis
Data Streams
Amazon Managed
Streaming for Kafka
Hash
Function
Consumer
Consumer
Consumer
Consumer Group
PK
PK
PK
PK
= next consumer offset oldest datanewest data
Amazon Kinesis
Data Streams
Amazon Managed
Streaming for Kafka
Producers
shard/partition-1
shard/partition-2
5 4 3 2 1 0
3 2 1 0
4 3 2 1 0
4
2
0
shard/partition-3
Why is Stream Storage?
• Decouple producers &
consumers
• Persistent buffer
• Collect multiple streams
• Preserve client ordering
• Parallel consumption
• Streaming MapReduce
• Decouple producers & consumers
• Persistent buffer
• Collect multiple streams
• No client ordering (standard)
• FIFO queue preserves client
ordering
• No streaming MapReduce
• No parallel consumption
• Amazon SNS can publish to
multiple SNS subscribers
(queues or Lambda functions)
Consumers
4 3 2 1
12344 3 2 1
1234
2134
13342
Standard
FIFO
Producers
Amazon SQS Queue
What about SQS?
Publisher
Amazon SNS
Topic
AWS Lambda
function
Amazon SQS
queue
Queue
Subscriber
Topic
Amazon Kinesis
Data Streams
Amazon Managed
Streaming for Kafka
Amazon Kinesis
Data Streams
Amazon Managed
Streaming for Kafka
• Operational Considerations
• Number of clusters?
• Number of brokers per cluster?
• Number of topics per broker?
• Number of partitions per topic?
• Only increase number of
partitions; can’t decrease
• Integration with a few of AWS
Services such as Kinesis Data
Analytics for Java
• Operational Considerations
• Number of Kinesis Data Streams?
• Number of shards per stream?
• Increase/Decrease number of
shards
• Fully Integration with AWS
Services such as Lambda
function, Kinesis Data Analytics,
etc
RequestQueue
- Length
- WaitTime
ResponseQueue
- Length
- WaitTime
Network
- Packet Drop?
Produce/Consume Rate Unbalance
Who is Leader? Disk Full?
Too many topics?
Metrics to Monitor: MSK (Kafka)
Metrics to Monitor: MSK (Kafka)
Metric Level Description
ActiveControllerCount DEFAULT Only one controller per cluster should be active at any given time.
OfflinePartitionsCount DEFAULT Total number of partitions that are offline in the cluster.
GlobalPartitionCount DEFAULT Total number of partitions across all brokers in the cluster.
GlobalTopicCount DEFAULT Total number of topics across all brokers in the cluster.
KafkaAppLogsDiskUsed DEFAULT The percentage of disk space used for application logs.
KafkaDataLogsDiskUsed DEFAULT The percentage of disk space used for data logs.
RootDiskUsed DEFAULT The percentage of the root disk used by the broker.
PartitionCount PER_BROKER The number of partitions for the broker.
LeaderCount PER_BROKER The number of leader replicas.
UnderMinIsrPartitionCount PER_BROKER The number of under minIsr partitions for the broker.
UnderReplicatedPartitions PER_BROKER The number of under-replicated partitions for the broker.
FetchConsumerTotalTimeMsMean PER_BROKER The mean total time in milliseconds that consumers spend on
fetching data from the broker.
ProduceTotalTimeMsMean PER_BROKER The mean produce time in milliseconds.
How about monitoring Kinesis Data Streams?
Consumer
Application
GetRecords()
Data
How long time does a record stay in a shard?
Metrics to Monitor: Kinesis Data Streams
Metric Description
GetRecords.IteratorAgeMilliseconds Age of the last record in all GetRecords
ReadProvisionedThroughputExceeded Number of GetRecords calls throttled
WriteProvisionedThroughputExceeded Number of PutRecord(s) calls throttled
PutRecord.Success, PutRecords.Success Number of successful PutRecord(s) operations
GetRecords.Success Number of successful GetRecords operations
Choosing Good Metrics
Too much information can be just as useless as too little
How to Ingest Streaming
Data?
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Stream Ingestion
• AWS SDKs
• Publish directly from application code via APIs
• AWS Mobile SDK
• Kinesis Agent
• Monitors log files and forwards lines as messages to
Kinesis Data Streams
• Kinesis Producer Library (KPL)
• Background process aggregates and batches messages
• 3rd party and open source
• Kafka Connect (kinesis-kafka-connector)
• fluentd (aws-fluent-plugin-kinesis)
• Log4J Appender (kinesis-log4j-appender)
• and more …
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Amazon Kinesis
Data Streams
How to Process Streaming
Data?
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Elasticsearch
Redshift
Stream Delivery
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Stream
Delivery
Kinesis
Data Firehose
• Kinesis Agent
• CloudWatch Logs
• CloudWatch Events
• AWS IoT
• Direct PUT using APIs
• Kinesis Data Streams
• MSK(Kafka) using
Kafka Connect
Kinesis
Data Analytics
S3
Kinesis Firehose: Filter, Enrich, Convert
Data
Source
apache log
apache log
json Data
Sink
[Wed Oct 11 14:32:52 2017] [error] [client 192.34.86.178]
[Wed Oct 11 14:32:53 2017] [info] [client 127.0.0.1]
{
"date": "2017/10/11 14:32:52",
"status": "error",
"source": "192.34.86.178",
"city": "Boston",
"state": "MA"
}
geo-ip
{
"recordId": "1",
"result": "Ok",
"data": {
"date": "2017/10/11 14:32:52",
"status": "error",
"source": "192.34.86.178",
"city": "Boston",
"state": "MA"
},
},
{
"recordId": "2",
"result": "Dropped"
}
json
Lambda function
Kinesis
Data Firehose
Pre-built Data Transformation Blueprints
Blueprint Description
General Processing For custom transformation logic
Apache Log to JSON
Parses and converts Apache log lines to JSON objects using predefined
JSON field names
Apache Log to CSV Parses and converts Apache log lines to CSV format
Syslog to JSON
Parses and converts Syslog lines to JSON objects using predefined JSON
field names
Syslog to CSV Parses and converts Syslog lines to CSV format
Pre-built Data Conversion
Data
Source
Kinesis
Data Firehose
JSON Data
schema
AWS Glue Data
Catalog
Amazon S3
• Convert the format of your input data from JSON to columnar data
format Apache Parquet or Apache ORC before storing the data in
Amazon S3
• Works in conjunction to the transform features to convert other format
to JSON before the data conversion
convert to
columnar format
/failed
Failure and Error Handling
• S3 Destination
• Pause and retry for up to 24 hours (maximum data retention period)
• If data delivery fails for more than 24 hours, your data is lost.
• Redshift Destination
• Configurable retry duration (0-2 hours)
• After retry, skip and load error manifest files to S3’s errors/ folder
• Elasticsearch Destination
• Configurable retry duration (0-2 hours)
• After retry, skip and load failed records to S3’s elasticsearch_failed/
folder
Stream Process
• Transform
• Filter, Enrich, Convert
• Aggregation
• Windows Queries
• Top-K Contributor
• Join
• Stream-Stream Join
• Stream-(External) Table Join
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
AWS Lambda
Amazon Kinesis
Data Analytics
AWS Glue Amazon EMR
Dive into Stream Process
Services
AWS Lambda
• Serverless functions
• Event-based, stateless processing
• Continuous and simple scaling mechanism
event
(3)
event
(2)
event
(1)
Lambda
(1)
Lambda
(2)
Lambda
(3)
Amazon Kinesis
Data Analytics
AWS Glue Amazon EMR
Serverless ServerlessFully Managed
Architecture: Master-Worker
Master
Worker
(1)
Worker
(2)
Worker
(3)
part-01
part-02
part-03
part-01
part-02
part-03
Master
Workers
Architecture
Architecture
Workers
Master
Streaming Programming
Guide
Treat Streams as Unbounded Tables
“It's raining cats and dogs!”
["It's", "raining", "cats", "and", "dogs!"]
[("It's", 1), ("raining", 1), ("cats", 1),
("and", 1), ("dogs!", 1)]
It’s 1
raining 1
cats 1
and 1
dogs! 1
“It's raining cats and dogs!”
["It's", "raining", "cats", "and", "dogs!"]
[("It's", 1), ("raining", 1), ("cats", 1),
("and", 1), ("dogs!", 1)]
It’s 1
raining 1
cats 1
and 1
dogs! 1
Setup session
Read stream
Start
running
Apply
Streaming
ETL
What about (Stream) SQL?
Data
Source
Stream
Storage
Stream
SQL
Process
Stream
Ingestion
Data
Sink
[("It's", 1),
("raining", 1),
("cats", 1),
("and", 1),
("dogs!", 1)]
“It's raining cats and dogs!” It’s 1
raining 1
cats 1
and 1
dogs! 1
Kinesis Data Analytics (SQL)
• STREAM (in-application): a continuously
updated entity that you can SELECT from and
INSERT into like a TABLE
• PUMP: an entity used to continuously
'SELECT ... FROM' a source STREAM, and
INSERT SQL results into an output STREAM
• Create output stream, which can be used to
send to a destination
SOURCE
STREAM
INSERT
& SELECT
(PUMP)
DESTIN.
STREAM
Destination
Source
[("It's", 1),
("raining", 1),
("cats", 1),
("and", 1),
("dogs!", 1)]
Kinesis Data
Analytics
SQL vs Java
DEMO
https://aws.amazon.com/ko/blogs/aws/new-amazon-kinesis-data-analytics-for-java/
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Firehose
Amazon S3Amazon Kinesis
Data Analytics
(Java)
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Analytics
(SQL)
DEMO: Word Count
“It's raining cats and dogs!”
[("It's", 1),
("raining", 1),
("cats", 1),
("and", 1),
("dogs!", 1)]
It’s 1
raining 1
cats 1
and 1
dogs! 1
[("It's", 1),
("raining", 1),
("cats", 1),
("and", 1),
("dogs!", 1)]
1
2
Filter, Enrich, Convert
Streaming Data
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Revisit Example: Filter, Enrich, Convert
Data
Source
Kinesis
Data Firehose
apache log
apache log
json Data
Sink
[Wed Oct 11 14:32:52 2017] [error] [client 192.34.86.178]
[Wed Oct 11 14:32:53 2017] [info] [client 127.0.0.1]
{
"date": "2017/10/11 14:32:52",
"status": "error",
"source": "192.34.86.178",
"city": "Boston",
"state": "MA"
}
geo-ip
{
"recordId": "1",
"result": "Ok",
"data": {
"date": "2017/10/11 14:32:52",
"status": "error",
"source": "192.34.86.178",
"city": "Boston",
"state": "MA"
},
},
{
"recordId": "2",
"result": "Dropped"
}
json
Lambda function
Stream Process: Filter, Enrich, Convert
Data
Source
apache log
apache log
json Data
Sink
[Wed Oct 11 14:32:52 2017] [error] [client 192.34.86.178]
[Wed Oct 11 14:32:53 2017] [info] [client 127.0.0.1]
{
"date": "2017/10/11 14:32:52",
"status": "error",
"source": "192.34.86.178",
"city": "Boston",
"state": "MA"
}
geo-ip
{
"recordId": "1",
"result": "Ok",
"data": {
"date": "2017/10/11 14:32:52",
"status": "error",
"source": "192.34.86.178",
"city": "Boston",
"state": "MA"
},
},
{
"recordId": "2",
"result": "Dropped"
}
Amazon Kinesis
Data Streams
Lambda function
Amazon EMR AWS GlueAmazon Kinesis
Data Analytics
Stream Process: Filter, Enrich, Convert
Data
Source
apache log
apache log
json Data
Sink
[Wed Oct 11 14:32:52 2017] [error] [client 192.34.86.178]
[Wed Oct 11 14:32:53 2017] [info] [client 127.0.0.1]
{
"date": "2017/10/11 14:32:52",
"status": "error",
"source": "192.34.86.178",
"city": "Boston",
"state": "MA"
}
geo-ip
{
"recordId": "1",
"result": "Ok",
"data": {
"date": "2017/10/11 14:32:52",
"status": "error",
"source": "192.34.86.178",
"city": "Boston",
"state": "MA"
},
},
{
"recordId": "2",
"result": "Dropped"
} Amazon EMR AWS Glue
Amazon MSK
Amazon Kinesis
Data Analytics
(Java)
Kinesis Data Analytics (SQL):
Preprocessing Data
https://aws.amazon.com/ko/blogs/big-data/preprocessing-data-in-amazon-kinesis-analytics-with-aws-lambda/
Integration of Stream Process and
Stream Storage
Amazon
Lambda
Kinesis Data
Analytics (SQL)
Kinesis Data
Analytics (Java)
Glue EMR
Kinesis Data
Firehose O O X X X
Kinesis Data
Streams O O O O O
Managed
Streaming for
Kafka (MSK)
X X O O O
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Aggregate Streaming Data
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Stream Process: Aggregation
• Aggregations (count, sum, min,...) take granular real time data
and turn it into insights
• Data is continuously processed so you need to tell the
application when you want results
• Windowed Queries
a. Sliding Windows (with Overlap)
b. Tumbling Windows (No Overlap)
c. Custom Windows
Join Streaming Data
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Imagine It! How to build?
Stream Process: Join
Data
Source
Stream
Storage
Data
Source
Stream
Storage
Stream
Process
Data
Source
Stream
Storage
Data
Source
Stream
Process
(a) Stream-Stream Join
(b) Stream-Join by Partition Key
(c) Stream-Join by Hash Table
Data
Source
Stream
Storage
Stream
Process
Key-Value
Storage
Why Stream-Stream Join is so difficult?
Data
Source
Stream
Storage
Data
Source
Stream
Storage
Stream
Process
Data
Sink
t0t1t2tN
. . . . . . .
• Timing
• Skewed data
∆𝑡
∆𝑡
∆𝑡
How about Stream-Join by Partition Key?
Data
Source
Stream
Storage
Data
Source
Stream
Storage
Stream
Process
Data
Source
Stream
Storage
Data
Source
Stream
Process
t1t2t3t5
t1t2t3t5
t1t1t2t3
Each shard will be filled with records
coming from fast data producers
shard-1
shard-2
shard-3
Lastly, how about Stream-Join by Hash
Table?
Data
Source
Stream
Storage
Stream
Process
Key-Value
Storage
Data
Source
Stream
Storage
Data
Source
Stream
Storage
Stream
Process
DEMO
{"TICKER_SYMBOL": "CVB",
"SECTOR": "TECHNOLOGY",
"CHANGE": 0.81,
"PRICE": 53.63}
{"TICKER_SYMBOL": "ABC",
"SECTOR": "RETAIL",
"CHANGE": -1.14,
"PRICE": 23.64}
{"TICKER_SYMBOL": "JKL",
"SECTOR": "TECHNOLOGY",
"CHANGE": 0.22,
"PRICE": 15.32}
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Analytics
(SQL)
DEMO: Filter, Aggregate, Join
Continuous filter
Aggregate function
Data enrichment (join)
Bucket
with objects
Ticker,Company
AMZN,Amazon
ASD,SomeCompanyA
BAC,SomeCompanyB
CRM,SomeCompanyC
https://docs.aws.amazon.com/kinesisanalytics/latest/dev/app-add-reference-data.html
Comparing Stream Process
Services
DevOps! Master-Worker Framework
Master
Worker
(1)
Worker
(2)
Worker
(3)
part-01
part-02
part-03
part-01
part-02
part-03
Master is alive?
Worker has enough resources such as CPU, Memory, Disk?
Checkpoint?
Right Instance Type?
C-Family, or R-Family?
Learning curve?
- SQL
- Python
- Scala
- Java
EMR vs Glue vs Kinesis Data Analytics
Operational
Excellence
Kinesis Data
Analytics (SQL)
EMR
Glue
Kinesis Data
Analytics (Java)
Degree of Freedom
≈ Complexity
AWS Glue
Comparing stream processing services
AWS Lambda Amazon Kinesis
Data Analytics
Amazon EMR
Simple programming
interface and scaling
• Serverless functions
• Six languages (Java,
Python, Golang,
Node.js, Ruby, C#)
• Event-based, stateless
processing
• Continuous and simple
scaling mechanism
Easy and powerful
stream processing
Simple, flexible, and
cost-effective ETL & Data
Catalog
Flexibility and choice for
your needs
• Serverless applications
• Supports SQL and Java
(Apache Flink)
• Stateful processing
with automatic backups
• Stream operators make
building app easy
• Serverless applications
• Can use the transforms
native to Apache Spark
Structured Streaming
• Automatically discover
new data, extracts
schema definitions
• Automatically
generates the ETL code
• Choose your instances
• Use your favorite
open-source
framework
• Fine-grained control
over cluster,
debugging tools, and
more
• Deep open-source tool
integrations with AWS
Case Studies
Example Usage Pattern 1: Web Analytics
and Leaderboards
Amazon
DynamoDB
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Streams
Amazon
Cognito
Lightweight JS
client code
Web server on
Amazon EC2
OR
Compute top 10 usersIngest web app data Persist to feed live apps
Lambda
function
https://aws.amazon.com/solutions/implementations/real-time-web-analytics-with-kinesis/
https://aws.amazon.com/blogs/aws/new-serverless-streaming-etl-with-aws-glue/
Example Usage Pattern 2: Monitoring
IoT Devices
Ingest sensor data
Convert json
to parquet
Store all data points
in an S3 data lake
https://aws.amazon.com/blogs/aws/new-serverless-streaming-etl-with-aws-glue/
Example Usage Pattern 3: Analyzing
AWS CloudTrail Event Logs
AWS
CloudTrail
CloudWatch
Events trigger
Kinesis
Data Analytics
Lambda
function
S3 bucket
for raw data
DynamoDB
table
Chart.JS
dashboard
Compute operational
metrics
Ingest raw log data Deliver to real time
dashboards and archival
Kinesis Data
Firehose
https://aws.amazon.com/solutions/implementations/real-time-insights-account-activity/
Takeaways
From Batch to Real-time:
Lambda Architecture
Data
Source
Stream
Storage
Speed Layer
Batch Layer
Batch
Process
Batch
View
Real-
time
View
Consumer
Query & Merge
Results
Service Layer
Stream
Ingestion
Raw Data
Storage
Streaming Data
Stream
Delivery
Stream
Process
Key Components of Real-time Analytics
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
AWS Lambda
Kinesis Data
Analytics
Glue EMR
Kinesis Data
Firehose
Kinesis Data
Streams
Managed
Streaming for
Kafka
Real-Time Applications
- Aggregation
- Top-K Contributor
- Anomaly Detection
Streaming ETL
- Filter, Enrich, Convert
- Join
Kafka Connect
KPL
Kinesis Agent
AWS SDKs
Key Takeaways
• Build decoupled systems
• Data → Store → Process → Store → Analyze → Answers
• Data Source → Stream Ingestion → Stream Storage → Stream
Process → Data Sink
• Follow the principle of "extract data once and reuse multiple
times” to power new customer experiences
• Use the right tool for the job
• Know the AWS services soft and hard limits
• Leverage managed and serverless services (DevOps!)
• Scalable/elastic, available, reliable, secure, no/low admin
Where To Go Next?
• AWS Analytics Immersion Day - Build BI System from Scratch
• Workshop - https://tinyurl.com/yapgwv77
• Slides - https://tinyurl.com/ybxkb74b
• Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 1, 2
• Part1 - https://tinyurl.com/y8vo8q7o
• Part2 - https://tinyurl.com/ycbv7wel
• Streaming Analytics Workshop – Kinesis Data Analytics for Java (Flink)
https://streaming-analytics.labgui.de/
• Amazon MSK Labs
https://amazonmsk-labs.workshop.aws/
• Querying Amazon Kinesis Streams Directly with SQL and Spark Streaming
https://tinyurl.com/y7hklyff
• AWS Glue Streaming ETL - Scala Script Example
https://tinyurl.com/y79x6jda
Appendix
• Amazon Managed Streaming for Apache Kafka: Best Practices
https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html
• Optimizing Your Apache Kafka® Deployment
https://www.confluent.io/blog/optimizing-apache-kafka-deployment/
• Monitoring Kafka performance metrics
https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/

Weitere ähnliche Inhalte

Was ist angesagt?

Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesAmazon Web Services
 
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기AWS Lambda를 기반으로한 실시간 빅테이터 처리하기
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기Amazon Web Services Korea
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightAmazon Web Services
 
Performing real-time ETL into data lakes - ADB202 - Santa Clara AWS Summit.pdf
Performing real-time ETL into data lakes - ADB202 - Santa Clara AWS Summit.pdfPerforming real-time ETL into data lakes - ADB202 - Santa Clara AWS Summit.pdf
Performing real-time ETL into data lakes - ADB202 - Santa Clara AWS Summit.pdfAmazon Web Services
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSAmazon Web Services
 
Building APIs with Amazon API Gateway
Building APIs with Amazon API GatewayBuilding APIs with Amazon API Gateway
Building APIs with Amazon API GatewayAmazon Web Services
 
Boot camp - Migration to AWS
Boot camp - Migration to AWSBoot camp - Migration to AWS
Boot camp - Migration to AWSAmazon Web Services
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
A Brief Look at Serverless Architecture
A Brief Look at Serverless ArchitectureA Brief Look at Serverless Architecture
A Brief Look at Serverless ArchitectureAmazon Web Services
 
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWSAmazon Web Services Korea
 
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)Amazon Web Services Korea
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나Amazon Web Services Korea
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon RedshiftAmazon Web Services
 
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon Web Services
 

Was ist angesagt? (20)

Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless Architectures
 
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기AWS Lambda를 기반으로한 실시간 빅테이터 처리하기
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
 
Performing real-time ETL into data lakes - ADB202 - Santa Clara AWS Summit.pdf
Performing real-time ETL into data lakes - ADB202 - Santa Clara AWS Summit.pdfPerforming real-time ETL into data lakes - ADB202 - Santa Clara AWS Summit.pdf
Performing real-time ETL into data lakes - ADB202 - Santa Clara AWS Summit.pdf
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
AWS RDS
AWS RDSAWS RDS
AWS RDS
 
Building APIs with Amazon API Gateway
Building APIs with Amazon API GatewayBuilding APIs with Amazon API Gateway
Building APIs with Amazon API Gateway
 
Boot camp - Migration to AWS
Boot camp - Migration to AWSBoot camp - Migration to AWS
Boot camp - Migration to AWS
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
A Brief Look at Serverless Architecture
A Brief Look at Serverless ArchitectureA Brief Look at Serverless Architecture
A Brief Look at Serverless Architecture
 
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
민첩하고 비용효율적인 Data Lake 구축 - 문종민 솔루션즈 아키텍트, AWS
 
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Amazon CloudFront 101
Amazon CloudFront 101Amazon CloudFront 101
Amazon CloudFront 101
 
Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나
Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
 

Ähnlich wie Realtime Analytics on AWS

Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxRenjithPillai26
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKSungmin Kim
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsAmazon Web Services
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014Amazon Web Services
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysisAmazon Web Services
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAmazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAmazon Web Services
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)Amazon Web Services Korea
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisAmazon Web Services
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...Amazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...Sungmin Kim
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Amazon Web Services
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 

Ähnlich wie Realtime Analytics on AWS (20)

Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 

Mehr von Sungmin Kim

Build Computer Vision Applications with Amazon Rekognition and SageMaker
Build Computer Vision Applications with Amazon Rekognition and SageMakerBuild Computer Vision Applications with Amazon Rekognition and SageMaker
Build Computer Vision Applications with Amazon Rekognition and SageMakerSungmin Kim
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon AthenaSungmin Kim
 
End-to-End Machine Learning with Amazon SageMaker
End-to-End Machine Learning with Amazon SageMakerEnd-to-End Machine Learning with Amazon SageMaker
End-to-End Machine Learning with Amazon SageMakerSungmin Kim
 
1시간만에 머신러닝 개념 따라 잡기
1시간만에 머신러닝 개념 따라 잡기1시간만에 머신러닝 개념 따라 잡기
1시간만에 머신러닝 개념 따라 잡기Sungmin Kim
 
AWS re:Invent 2020 Awesome AI/ML Services
AWS re:Invent 2020 Awesome AI/ML ServicesAWS re:Invent 2020 Awesome AI/ML Services
AWS re:Invent 2020 Awesome AI/ML ServicesSungmin Kim
 
AWS Personalize 중심으로 살펴본 추천 시스템 원리와 구축
AWS Personalize 중심으로 살펴본 추천 시스템 원리와 구축AWS Personalize 중심으로 살펴본 추천 시스템 원리와 구축
AWS Personalize 중심으로 살펴본 추천 시스템 원리와 구축Sungmin Kim
 
Starup을 위한 AWS AI/ML 서비스 활용 방법
Starup을 위한 AWS AI/ML 서비스 활용 방법Starup을 위한 AWS AI/ML 서비스 활용 방법
Starup을 위한 AWS AI/ML 서비스 활용 방법Sungmin Kim
 
Octember on AWS (Revised Edition)
Octember on AWS (Revised Edition)Octember on AWS (Revised Edition)
Octember on AWS (Revised Edition)Sungmin Kim
 
Amazon Athena 사용 팁
Amazon Athena 사용 팁Amazon Athena 사용 팁
Amazon Athena 사용 팁Sungmin Kim
 
Databases & Analytics AWS re:invent 2019 Recap
Databases & Analytics AWS re:invent 2019 RecapDatabases & Analytics AWS re:invent 2019 Recap
Databases & Analytics AWS re:invent 2019 RecapSungmin Kim
 
Octember on AWS
Octember on AWSOctember on AWS
Octember on AWSSungmin Kim
 
AI/ML re:invent 2019 recap at Delivery Hero Korea
AI/ML re:invent 2019 recap at Delivery Hero KoreaAI/ML re:invent 2019 recap at Delivery Hero Korea
AI/ML re:invent 2019 recap at Delivery Hero KoreaSungmin Kim
 

Mehr von Sungmin Kim (12)

Build Computer Vision Applications with Amazon Rekognition and SageMaker
Build Computer Vision Applications with Amazon Rekognition and SageMakerBuild Computer Vision Applications with Amazon Rekognition and SageMaker
Build Computer Vision Applications with Amazon Rekognition and SageMaker
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
End-to-End Machine Learning with Amazon SageMaker
End-to-End Machine Learning with Amazon SageMakerEnd-to-End Machine Learning with Amazon SageMaker
End-to-End Machine Learning with Amazon SageMaker
 
1시간만에 머신러닝 개념 따라 잡기
1시간만에 머신러닝 개념 따라 잡기1시간만에 머신러닝 개념 따라 잡기
1시간만에 머신러닝 개념 따라 잡기
 
AWS re:Invent 2020 Awesome AI/ML Services
AWS re:Invent 2020 Awesome AI/ML ServicesAWS re:Invent 2020 Awesome AI/ML Services
AWS re:Invent 2020 Awesome AI/ML Services
 
AWS Personalize 중심으로 살펴본 추천 시스템 원리와 구축
AWS Personalize 중심으로 살펴본 추천 시스템 원리와 구축AWS Personalize 중심으로 살펴본 추천 시스템 원리와 구축
AWS Personalize 중심으로 살펴본 추천 시스템 원리와 구축
 
Starup을 위한 AWS AI/ML 서비스 활용 방법
Starup을 위한 AWS AI/ML 서비스 활용 방법Starup을 위한 AWS AI/ML 서비스 활용 방법
Starup을 위한 AWS AI/ML 서비스 활용 방법
 
Octember on AWS (Revised Edition)
Octember on AWS (Revised Edition)Octember on AWS (Revised Edition)
Octember on AWS (Revised Edition)
 
Amazon Athena 사용 팁
Amazon Athena 사용 팁Amazon Athena 사용 팁
Amazon Athena 사용 팁
 
Databases & Analytics AWS re:invent 2019 Recap
Databases & Analytics AWS re:invent 2019 RecapDatabases & Analytics AWS re:invent 2019 Recap
Databases & Analytics AWS re:invent 2019 Recap
 
Octember on AWS
Octember on AWSOctember on AWS
Octember on AWS
 
AI/ML re:invent 2019 recap at Delivery Hero Korea
AI/ML re:invent 2019 recap at Delivery Hero KoreaAI/ML re:invent 2019 recap at Delivery Hero Korea
AI/ML re:invent 2019 recap at Delivery Hero Korea
 

KĂźrzlich hochgeladen

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 

KĂźrzlich hochgeladen (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 

Realtime Analytics on AWS

  • 1. Real-time Analytics on Sungmin, Kim Solutions Architect, AWS
  • 2. Agenda • Why Real-time Data streaming and Analytics? • How to Build? • Where to Store streaming data? • How to Ingest streaming data? • How to Process streaming data? • Delivery Streaming Data • Dive into Stream Process Framework • Transform, Aggregate, Join Streaming Data • Case Studies • Key Takeaways
  • 3. Why Real-time Data streaming and Analytics?
  • 4. Data The world’s most valuable resource is no longer oil, but data.* *Copyright: David Parkins , The Economist, 2017 “ ”
  • 5. Data Loses Value Over Time * Source: Mike Gualtieri, Forrester, Perishable insights Real time Seconds Minutes Hours Days Months Valueofdatatodecision-making Preventive/predictive Actionable Reactive Historical Time-critical decisions Traditional “batch” business intelligence
  • 6. To create Value, derive insights in Real-time
  • 7. Batch vs Real-time Batch Difference Real-time Arbitrarily, or Periodically Continuity Constant Store → Process (Hadoop MapReduce, Hive, Pig, Spark) Method of analysis Process → Store (Spark Streaming, Flink, Apache Storm) Small - Huge (KB~TB) Data size per a unit Small (B~KB) Low - High (minutes to hours) Query Latency Low (milliseconds to minutes) Low - High (hourly/daily/monthly) Request Rate Very High - High (in seconds, minutes) High - Very high Durability Low - High ¢~$ (Amazon S3, Glacier) Cost/GB $$~$ (Redis, Memcached)
  • 8. From Batch to Real-time: Lambda Architecture Data Source Stream Storage Speed Layer Batch Layer Batch Process Batch View Real- time View Consumer Query & Merge Results Service Layer Stream Ingestion Raw Data Storage Streaming Data Stream Delivery Stream Process
  • 9. Lambda Architecture Streaming Data Batch View Stream Process Real-time View Query Query Batch View Real-time View Raw Data Batch Process Batch Layer Serving Layer Speed Layer
  • 10. Key Components of Real-time Analytics Data Source Stream Storage Stream Process Stream Ingestion Data Sink Devices and/or applications that produce real-time data at high velocity Data from tens of thousands of data sources can be written to a single stream Data are stored in the order they were received for a set duration of time and can be replayed indefinitely during that time Records are read in the order they are produced, enabling real-time analytics or streaming ETL Data lake (most common) Database (least common)
  • 11. Where to Store Streaming Data? Data Source Stream Storage Stream Process Stream Ingestion Data Sink
  • 13. Hash Function Consumer Consumer Consumer Consumer Group PK PK PK PK = next consumer offset oldest datanewest data Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka Producers shard/partition-1 shard/partition-2 5 4 3 2 1 0 3 2 1 0 4 3 2 1 0 4 2 0 shard/partition-3
  • 14. Why is Stream Storage? • Decouple producers & consumers • Persistent buffer • Collect multiple streams • Preserve client ordering • Parallel consumption • Streaming MapReduce
  • 15. • Decouple producers & consumers • Persistent buffer • Collect multiple streams • No client ordering (standard) • FIFO queue preserves client ordering • No streaming MapReduce • No parallel consumption • Amazon SNS can publish to multiple SNS subscribers (queues or Lambda functions) Consumers 4 3 2 1 12344 3 2 1 1234 2134 13342 Standard FIFO Producers Amazon SQS Queue What about SQS? Publisher Amazon SNS Topic AWS Lambda function Amazon SQS queue Queue Subscriber
  • 16. Topic Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka
  • 17. Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka • Operational Considerations • Number of clusters? • Number of brokers per cluster? • Number of topics per broker? • Number of partitions per topic? • Only increase number of partitions; can’t decrease • Integration with a few of AWS Services such as Kinesis Data Analytics for Java • Operational Considerations • Number of Kinesis Data Streams? • Number of shards per stream? • Increase/Decrease number of shards • Fully Integration with AWS Services such as Lambda function, Kinesis Data Analytics, etc
  • 18. RequestQueue - Length - WaitTime ResponseQueue - Length - WaitTime Network - Packet Drop? Produce/Consume Rate Unbalance Who is Leader? Disk Full? Too many topics? Metrics to Monitor: MSK (Kafka)
  • 19. Metrics to Monitor: MSK (Kafka) Metric Level Description ActiveControllerCount DEFAULT Only one controller per cluster should be active at any given time. OfflinePartitionsCount DEFAULT Total number of partitions that are offline in the cluster. GlobalPartitionCount DEFAULT Total number of partitions across all brokers in the cluster. GlobalTopicCount DEFAULT Total number of topics across all brokers in the cluster. KafkaAppLogsDiskUsed DEFAULT The percentage of disk space used for application logs. KafkaDataLogsDiskUsed DEFAULT The percentage of disk space used for data logs. RootDiskUsed DEFAULT The percentage of the root disk used by the broker. PartitionCount PER_BROKER The number of partitions for the broker. LeaderCount PER_BROKER The number of leader replicas. UnderMinIsrPartitionCount PER_BROKER The number of under minIsr partitions for the broker. UnderReplicatedPartitions PER_BROKER The number of under-replicated partitions for the broker. FetchConsumerTotalTimeMsMean PER_BROKER The mean total time in milliseconds that consumers spend on fetching data from the broker. ProduceTotalTimeMsMean PER_BROKER The mean produce time in milliseconds.
  • 20. How about monitoring Kinesis Data Streams? Consumer Application GetRecords() Data How long time does a record stay in a shard?
  • 21. Metrics to Monitor: Kinesis Data Streams Metric Description GetRecords.IteratorAgeMilliseconds Age of the last record in all GetRecords ReadProvisionedThroughputExceeded Number of GetRecords calls throttled WriteProvisionedThroughputExceeded Number of PutRecord(s) calls throttled PutRecord.Success, PutRecords.Success Number of successful PutRecord(s) operations GetRecords.Success Number of successful GetRecords operations
  • 22. Choosing Good Metrics Too much information can be just as useless as too little
  • 23. How to Ingest Streaming Data? Data Source Stream Storage Stream Process Stream Ingestion Data Sink
  • 24. Stream Ingestion • AWS SDKs • Publish directly from application code via APIs • AWS Mobile SDK • Kinesis Agent • Monitors log files and forwards lines as messages to Kinesis Data Streams • Kinesis Producer Library (KPL) • Background process aggregates and batches messages • 3rd party and open source • Kafka Connect (kinesis-kafka-connector) • fluentd (aws-fluent-plugin-kinesis) • Log4J Appender (kinesis-log4j-appender) • and more … Data Source Stream Storage Stream Process Stream Ingestion Data Sink Amazon Kinesis Data Streams
  • 25. How to Process Streaming Data? Data Source Stream Storage Stream Process Stream Ingestion Data Sink
  • 26. Elasticsearch Redshift Stream Delivery Data Source Stream Storage Stream Process Stream Ingestion Data Sink Stream Delivery Kinesis Data Firehose • Kinesis Agent • CloudWatch Logs • CloudWatch Events • AWS IoT • Direct PUT using APIs • Kinesis Data Streams • MSK(Kafka) using Kafka Connect Kinesis Data Analytics S3
  • 27. Kinesis Firehose: Filter, Enrich, Convert Data Source apache log apache log json Data Sink [Wed Oct 11 14:32:52 2017] [error] [client 192.34.86.178] [Wed Oct 11 14:32:53 2017] [info] [client 127.0.0.1] { "date": "2017/10/11 14:32:52", "status": "error", "source": "192.34.86.178", "city": "Boston", "state": "MA" } geo-ip { "recordId": "1", "result": "Ok", "data": { "date": "2017/10/11 14:32:52", "status": "error", "source": "192.34.86.178", "city": "Boston", "state": "MA" }, }, { "recordId": "2", "result": "Dropped" } json Lambda function Kinesis Data Firehose
  • 28. Pre-built Data Transformation Blueprints Blueprint Description General Processing For custom transformation logic Apache Log to JSON Parses and converts Apache log lines to JSON objects using predefined JSON field names Apache Log to CSV Parses and converts Apache log lines to CSV format Syslog to JSON Parses and converts Syslog lines to JSON objects using predefined JSON field names Syslog to CSV Parses and converts Syslog lines to CSV format
  • 29. Pre-built Data Conversion Data Source Kinesis Data Firehose JSON Data schema AWS Glue Data Catalog Amazon S3 • Convert the format of your input data from JSON to columnar data format Apache Parquet or Apache ORC before storing the data in Amazon S3 • Works in conjunction to the transform features to convert other format to JSON before the data conversion convert to columnar format /failed
  • 30. Failure and Error Handling • S3 Destination • Pause and retry for up to 24 hours (maximum data retention period) • If data delivery fails for more than 24 hours, your data is lost. • Redshift Destination • Configurable retry duration (0-2 hours) • After retry, skip and load error manifest files to S3’s errors/ folder • Elasticsearch Destination • Configurable retry duration (0-2 hours) • After retry, skip and load failed records to S3’s elasticsearch_failed/ folder
  • 31. Stream Process • Transform • Filter, Enrich, Convert • Aggregation • Windows Queries • Top-K Contributor • Join • Stream-Stream Join • Stream-(External) Table Join Data Source Stream Storage Stream Process Stream Ingestion Data Sink AWS Lambda Amazon Kinesis Data Analytics AWS Glue Amazon EMR
  • 32. Dive into Stream Process Services
  • 33. AWS Lambda • Serverless functions • Event-based, stateless processing • Continuous and simple scaling mechanism event (3) event (2) event (1) Lambda (1) Lambda (2) Lambda (3)
  • 34. Amazon Kinesis Data Analytics AWS Glue Amazon EMR Serverless ServerlessFully Managed
  • 39. Treat Streams as Unbounded Tables
  • 40. “It's raining cats and dogs!” ["It's", "raining", "cats", "and", "dogs!"] [("It's", 1), ("raining", 1), ("cats", 1), ("and", 1), ("dogs!", 1)] It’s 1 raining 1 cats 1 and 1 dogs! 1
  • 41.
  • 42. “It's raining cats and dogs!” ["It's", "raining", "cats", "and", "dogs!"] [("It's", 1), ("raining", 1), ("cats", 1), ("and", 1), ("dogs!", 1)] It’s 1 raining 1 cats 1 and 1 dogs! 1
  • 43.
  • 45. What about (Stream) SQL? Data Source Stream Storage Stream SQL Process Stream Ingestion Data Sink [("It's", 1), ("raining", 1), ("cats", 1), ("and", 1), ("dogs!", 1)] “It's raining cats and dogs!” It’s 1 raining 1 cats 1 and 1 dogs! 1
  • 46. Kinesis Data Analytics (SQL) • STREAM (in-application): a continuously updated entity that you can SELECT from and INSERT into like a TABLE • PUMP: an entity used to continuously 'SELECT ... FROM' a source STREAM, and INSERT SQL results into an output STREAM • Create output stream, which can be used to send to a destination SOURCE STREAM INSERT & SELECT (PUMP) DESTIN. STREAM Destination Source [("It's", 1), ("raining", 1), ("cats", 1), ("and", 1), ("dogs!", 1)]
  • 48. DEMO
  • 49. https://aws.amazon.com/ko/blogs/aws/new-amazon-kinesis-data-analytics-for-java/ Amazon Kinesis Data Streams Amazon Kinesis Data Firehose Amazon S3Amazon Kinesis Data Analytics (Java) Amazon Kinesis Data Streams Amazon Kinesis Data Streams Amazon Kinesis Data Analytics (SQL) DEMO: Word Count “It's raining cats and dogs!” [("It's", 1), ("raining", 1), ("cats", 1), ("and", 1), ("dogs!", 1)] It’s 1 raining 1 cats 1 and 1 dogs! 1 [("It's", 1), ("raining", 1), ("cats", 1), ("and", 1), ("dogs!", 1)] 1 2
  • 50. Filter, Enrich, Convert Streaming Data Data Source Stream Storage Stream Process Stream Ingestion Data Sink
  • 51. Revisit Example: Filter, Enrich, Convert Data Source Kinesis Data Firehose apache log apache log json Data Sink [Wed Oct 11 14:32:52 2017] [error] [client 192.34.86.178] [Wed Oct 11 14:32:53 2017] [info] [client 127.0.0.1] { "date": "2017/10/11 14:32:52", "status": "error", "source": "192.34.86.178", "city": "Boston", "state": "MA" } geo-ip { "recordId": "1", "result": "Ok", "data": { "date": "2017/10/11 14:32:52", "status": "error", "source": "192.34.86.178", "city": "Boston", "state": "MA" }, }, { "recordId": "2", "result": "Dropped" } json Lambda function
  • 52. Stream Process: Filter, Enrich, Convert Data Source apache log apache log json Data Sink [Wed Oct 11 14:32:52 2017] [error] [client 192.34.86.178] [Wed Oct 11 14:32:53 2017] [info] [client 127.0.0.1] { "date": "2017/10/11 14:32:52", "status": "error", "source": "192.34.86.178", "city": "Boston", "state": "MA" } geo-ip { "recordId": "1", "result": "Ok", "data": { "date": "2017/10/11 14:32:52", "status": "error", "source": "192.34.86.178", "city": "Boston", "state": "MA" }, }, { "recordId": "2", "result": "Dropped" } Amazon Kinesis Data Streams Lambda function Amazon EMR AWS GlueAmazon Kinesis Data Analytics
  • 53. Stream Process: Filter, Enrich, Convert Data Source apache log apache log json Data Sink [Wed Oct 11 14:32:52 2017] [error] [client 192.34.86.178] [Wed Oct 11 14:32:53 2017] [info] [client 127.0.0.1] { "date": "2017/10/11 14:32:52", "status": "error", "source": "192.34.86.178", "city": "Boston", "state": "MA" } geo-ip { "recordId": "1", "result": "Ok", "data": { "date": "2017/10/11 14:32:52", "status": "error", "source": "192.34.86.178", "city": "Boston", "state": "MA" }, }, { "recordId": "2", "result": "Dropped" } Amazon EMR AWS Glue Amazon MSK Amazon Kinesis Data Analytics (Java)
  • 54. Kinesis Data Analytics (SQL): Preprocessing Data https://aws.amazon.com/ko/blogs/big-data/preprocessing-data-in-amazon-kinesis-analytics-with-aws-lambda/
  • 55. Integration of Stream Process and Stream Storage Amazon Lambda Kinesis Data Analytics (SQL) Kinesis Data Analytics (Java) Glue EMR Kinesis Data Firehose O O X X X Kinesis Data Streams O O O O O Managed Streaming for Kafka (MSK) X X O O O Data Source Stream Storage Stream Process Stream Ingestion Data Sink
  • 57. Stream Process: Aggregation • Aggregations (count, sum, min,...) take granular real time data and turn it into insights • Data is continuously processed so you need to tell the application when you want results • Windowed Queries a. Sliding Windows (with Overlap) b. Tumbling Windows (No Overlap) c. Custom Windows
  • 59. Imagine It! How to build?
  • 60. Stream Process: Join Data Source Stream Storage Data Source Stream Storage Stream Process Data Source Stream Storage Data Source Stream Process (a) Stream-Stream Join (b) Stream-Join by Partition Key (c) Stream-Join by Hash Table Data Source Stream Storage Stream Process Key-Value Storage
  • 61. Why Stream-Stream Join is so difficult? Data Source Stream Storage Data Source Stream Storage Stream Process Data Sink t0t1t2tN . . . . . . . • Timing • Skewed data ∆𝑡 ∆𝑡 ∆𝑡
  • 62. How about Stream-Join by Partition Key? Data Source Stream Storage Data Source Stream Storage Stream Process Data Source Stream Storage Data Source Stream Process t1t2t3t5 t1t2t3t5 t1t1t2t3 Each shard will be filled with records coming from fast data producers shard-1 shard-2 shard-3
  • 63. Lastly, how about Stream-Join by Hash Table? Data Source Stream Storage Stream Process Key-Value Storage Data Source Stream Storage Data Source Stream Storage Stream Process
  • 64. DEMO
  • 65. {"TICKER_SYMBOL": "CVB", "SECTOR": "TECHNOLOGY", "CHANGE": 0.81, "PRICE": 53.63} {"TICKER_SYMBOL": "ABC", "SECTOR": "RETAIL", "CHANGE": -1.14, "PRICE": 23.64} {"TICKER_SYMBOL": "JKL", "SECTOR": "TECHNOLOGY", "CHANGE": 0.22, "PRICE": 15.32} Amazon Kinesis Data Streams Amazon Kinesis Data Analytics (SQL) DEMO: Filter, Aggregate, Join Continuous filter Aggregate function Data enrichment (join) Bucket with objects Ticker,Company AMZN,Amazon ASD,SomeCompanyA BAC,SomeCompanyB CRM,SomeCompanyC https://docs.aws.amazon.com/kinesisanalytics/latest/dev/app-add-reference-data.html
  • 67. DevOps! Master-Worker Framework Master Worker (1) Worker (2) Worker (3) part-01 part-02 part-03 part-01 part-02 part-03 Master is alive? Worker has enough resources such as CPU, Memory, Disk? Checkpoint? Right Instance Type? C-Family, or R-Family? Learning curve? - SQL - Python - Scala - Java
  • 68. EMR vs Glue vs Kinesis Data Analytics Operational Excellence Kinesis Data Analytics (SQL) EMR Glue Kinesis Data Analytics (Java) Degree of Freedom ≈ Complexity
  • 69. AWS Glue Comparing stream processing services AWS Lambda Amazon Kinesis Data Analytics Amazon EMR Simple programming interface and scaling • Serverless functions • Six languages (Java, Python, Golang, Node.js, Ruby, C#) • Event-based, stateless processing • Continuous and simple scaling mechanism Easy and powerful stream processing Simple, flexible, and cost-effective ETL & Data Catalog Flexibility and choice for your needs • Serverless applications • Supports SQL and Java (Apache Flink) • Stateful processing with automatic backups • Stream operators make building app easy • Serverless applications • Can use the transforms native to Apache Spark Structured Streaming • Automatically discover new data, extracts schema definitions • Automatically generates the ETL code • Choose your instances • Use your favorite open-source framework • Fine-grained control over cluster, debugging tools, and more • Deep open-source tool integrations with AWS
  • 71. Example Usage Pattern 1: Web Analytics and Leaderboards Amazon DynamoDB Amazon Kinesis Data Analytics Amazon Kinesis Data Streams Amazon Cognito Lightweight JS client code Web server on Amazon EC2 OR Compute top 10 usersIngest web app data Persist to feed live apps Lambda function https://aws.amazon.com/solutions/implementations/real-time-web-analytics-with-kinesis/
  • 72. https://aws.amazon.com/blogs/aws/new-serverless-streaming-etl-with-aws-glue/ Example Usage Pattern 2: Monitoring IoT Devices Ingest sensor data Convert json to parquet Store all data points in an S3 data lake https://aws.amazon.com/blogs/aws/new-serverless-streaming-etl-with-aws-glue/
  • 73. Example Usage Pattern 3: Analyzing AWS CloudTrail Event Logs AWS CloudTrail CloudWatch Events trigger Kinesis Data Analytics Lambda function S3 bucket for raw data DynamoDB table Chart.JS dashboard Compute operational metrics Ingest raw log data Deliver to real time dashboards and archival Kinesis Data Firehose https://aws.amazon.com/solutions/implementations/real-time-insights-account-activity/
  • 75. From Batch to Real-time: Lambda Architecture Data Source Stream Storage Speed Layer Batch Layer Batch Process Batch View Real- time View Consumer Query & Merge Results Service Layer Stream Ingestion Raw Data Storage Streaming Data Stream Delivery Stream Process
  • 76. Key Components of Real-time Analytics Data Source Stream Storage Stream Process Stream Ingestion Data Sink AWS Lambda Kinesis Data Analytics Glue EMR Kinesis Data Firehose Kinesis Data Streams Managed Streaming for Kafka Real-Time Applications - Aggregation - Top-K Contributor - Anomaly Detection Streaming ETL - Filter, Enrich, Convert - Join Kafka Connect KPL Kinesis Agent AWS SDKs
  • 77. Key Takeaways • Build decoupled systems • Data → Store → Process → Store → Analyze → Answers • Data Source → Stream Ingestion → Stream Storage → Stream Process → Data Sink • Follow the principle of "extract data once and reuse multiple times” to power new customer experiences • Use the right tool for the job • Know the AWS services soft and hard limits • Leverage managed and serverless services (DevOps!) • Scalable/elastic, available, reliable, secure, no/low admin
  • 78. Where To Go Next? • AWS Analytics Immersion Day - Build BI System from Scratch • Workshop - https://tinyurl.com/yapgwv77 • Slides - https://tinyurl.com/ybxkb74b • Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 1, 2 • Part1 - https://tinyurl.com/y8vo8q7o • Part2 - https://tinyurl.com/ycbv7wel • Streaming Analytics Workshop – Kinesis Data Analytics for Java (Flink) https://streaming-analytics.labgui.de/ • Amazon MSK Labs https://amazonmsk-labs.workshop.aws/ • Querying Amazon Kinesis Streams Directly with SQL and Spark Streaming https://tinyurl.com/y7hklyff • AWS Glue Streaming ETL - Scala Script Example https://tinyurl.com/y79x6jda
  • 79. Appendix • Amazon Managed Streaming for Apache Kafka: Best Practices https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html • Optimizing Your Apache KafkaÂŽ Deployment https://www.confluent.io/blog/optimizing-apache-kafka-deployment/ • Monitoring Kafka performance metrics https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/