Weitere ähnliche Inhalte Ähnlich wie ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds (20) Mehr von Amazon Web Services (20) ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Combining Batch and Stream
Processing to Get the Best of Both
Worlds
R a j e e v S r i n i v a s a n – A W S S o l u t i o n A r c h i t e c t
U j j w a l R a t a n – A W S S o l u t i o n A r c h i t e c t
A B D 3 3 0
N o v e m b e r 2 7 , 2 0 1 7
2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What Is This Chalk Talk About ?
Problem Statement
Today, businesses are looking for ways to process large scale batch and
high velocity streaming data simultaneously in the cloud using a proven
architecture to meet their latency, accuracy, and throughput requirements.
Lambda Architecture is a data-processing architecture designed to handle
massive quantities of data by taking advantage of both batch-processing
and stream-processing methods.” —Wikipedia
Lambda Architecture not to be confused with AWS Lambda
3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda Architect Block Diagram
4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Stream
Processing
Lambda Architecture – Flow Diagram
New
Data Merged
Query
SPEED LAYER
BATCH LAYER
SERVING LAYER
Batch View
Real-Time View
Master
Dataset
Pre-Compute
View
Batch Recompute
Incremental
View
Real-Time Increment
Partial
aggregates…
Real-Time Data
Partial
aggregates
Partial
aggregates
5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda Architecture on AWS – Batch
Layer Landing S3
bucket
AWS Glue
ETL
Amazon
QuickSight
Data Source
Amazon Athena
Batch View
S3 bucket
AWS Glue
Data Catalog
Amazon Glue
Crawler
Amazon EMR
Batch View
S3 bucket
Serverless
Batch Processing
6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda Architecture on AWS – Speed
Layer
AWS Lambda
Amazon Kinesis
Firehose
(Incremental stream)
Amazon Kinesis
Analytics
Incremental View
S3 bucket
Amazon
Kinesis StreamData Source
Amazon EMR
Serverless
Stream Processing
Visualization
Incremental View
S3 bucket
7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda Architecture on AWS
D
a
t
a
S
o
u
r
c
e
Landing
S3 bucket
AWS Lambda
Kinesis Firehose
(Incremental stream)
Kinesis Analytics
Amazon Athena
Batch View
S3 bucket
Incremental View
S3 bucket
Batch Layer
Speed Layer
Serving Layer
Kinesis Stream
V
i
s
u
a
l
i
z
a
t
i
o
n
Merged View
S3 bucket
Amazon EMR
AWS Glue
Crawler
AWS Glue
Catalog
AWS Glue
ETL
8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda Architecture on AWS
D
a
t
a
S
o
u
r
c
e
Landing
S3 bucket
Batch View
S3 bucket
Batch Layer
Speed Layer
Serving Layer
Kinesis Stream
V
i
s
u
a
l
i
z
a
t
i
o
n
Merged View
S3 bucket
Amazon EMR
AWS Glue
Crawler
AWS Glue
Catalog
AWS Glue
ETL
9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
import . . .
DataFrame dataFromS3 = sqlContext.read().json(”s3://").toDF();
dataFromS3.registerTempTable(”batchData");
. . .
val ssc = new StreamingContext(sc, …)
val kinesisStreams = (0 until numStreams).map { i => KinesisUtils.createStream(ssc, appName, streamName, endpointUrl,
regionName, InitialPositionInStream.LATEST, kinesisCheckpointInterval, StorageLevel.MEMORY_AND_DISK_2) }
val unionStreams = ssc.union(kinesisStreams)
unionStreams.foreachRDD((rdd:RDD[String])=>{
. . .
rdd.toDF().registerTempTable(”streamData")
val mergedResult = sqlContext.sql("SELECT ... FROM streamData s JOIN batchData b ON a.data = b.data ...")
mergedResult.save(”s3://... ", "parquet", SaveMode.Overwrite)
}})
ssc.start()
C omb ining Stre am and B atch Proce ssing Qu e ry – Sp ark
Querying Batch
data from Amazon S3 using Spark SQL
Querying
data from Kinesis Stream using
Spark Streaming
Merge query
using Spark SQL
10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kinesis Stream and Kinesis Firehose
• Server-side encryption with a AWS KMS-managed
key (SSE-KMS)
• Kinesis Stream also supports client side encryption
AWS Lambda
• KMS/customer key to encrypt in both rest and
transit
Amazon Athena
• Server side encryption with an Amazon S3-
managed key (SSE-S3)
• Server-side encryption with a AWS Key
Management Service (AWS KMS)-managed key
(SSE-KMS)
• Create table statement with TBLPROPERTIES
'has_encrypted_data'='true'
Security Options
11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
WHITEBOARDING SESSION
12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
P l e a s e c o m p l e t e y o u r s u r v e y !