SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Downloaden Sie, um offline zu lesen
Bridging the Gap:
Connecting AWS and Kafka
Ryanne Dolan & Jason Li
LinkedIn
Overview
● Motivation
● What is Kinesis?
● Architecture
● Checkpointing
● Metrics
● DynamoDB Streams
● Validation
Story starts at Bizo
● Originally batch-oriented architecture: AWS S3, Spark
● Started converting to stream-oriented architecture in 2013: AWS Kinesis
● Acquired by LinkedIn in 2014
● Originally batch-oriented integration with LinkedIn’s data centers
● Started bridge project in 2015
● Other teams (including other startups) have started using it
Use Cases
● User-tracking events from Bizo Kinesis -> LI Kafka
● Leverage LI A/B tooling from AWS (requires Kafka events)
● Leverage LI ELK stack from AWS (Kinesis -> Kafka -> ELK)
● Leverage LI call tracing, site speed, and other metrics, reports
● Ship application and CloudWatch metric from AWS to LI (Kinesis -> Kafka -
> inGraph)
Requirements
● encoding agnostic (Thrift, Avro, JSON, etc)
● to/from Kafka
● to/from Kinesis
● preserve partitioning
● near real-time bidirectional replication
● stream multiplexing and joining
● support multiple AWS accounts
● multiple metrics endpoints (including CloudWatch)
Kinesis Highlights
● VERY similar to Kafka
● Dynamic, scalable “shards” (instead of static partitions)
● Throughput and cost tied to # of shards
● Each shard:
○ $0.015/hour
○ 1000 records/second
○ 1MB/s ingress
○ 2MB/s egress
● Integration with AWS services, e.g. “Firehose” -> S3
Kinesis-Kafka
Bridge
Kinesis-Kafka Bridge
● Samza job
● Kinesis System (Producer and Consumer)
● BridgeTask
● input/output mapping
● Repartitioner
● Transcoder
BridgeTask
Kinesis
Consumer
Repartitioner Transcoder
Kinesis
Producer
Kafka
Consumer
Bridge
Mapping
Kafka
Producere.g. Partition 1 ->
shard-000001
e.g. Thrift -> Avro e.g. llama.abc ->
kafka.xyz
Transcoders
Thrift-encoded
bytes
AvroEnvelope
Transcoder
AvroEnvelope with
Thrift-encoded
payload
AvroSerde
Avro-encoded
bytes
Bridge Mappings
BridgeTask again (notice multiple outputs!)
Kinesis
Consumer
Repartitioner Transcoder
Kinesis
Producer
Kafka
Consumer
Bridge
Mapping
Kafka
Producere.g. Partition 1 ->
shard-000001
e.g. Thrift -> Avro e.g. llama.abc ->
kafka.xyz
BridgeTask
The Kinesis
System
KinesisConsumer
● wraps the Kinesis Consumer Library (KCL)
● extends Samza’s BlockingEnvelopeMap
● creates one KCL Worker per Shard
● at least one Worker per KinesisConsumer instance
● Workers push envelopes into queue
KinesisProducer
● uses Kinesis PutRecords API (batch)
● enqueues envelopes (async)
● flushes off-thread
KinesisSystemAdmin
● queries Kinesis API for # shards at start-up
● tells Samza: # partitions == # shards
● # shards may change at any time, but OK in practice
● KCL will load-balance Workers automatically
The
Checkpointing
Problem
Checkpointing
● TWO sources of checkpoints: Samza and KCL
○ Samza checkpoints to a Kafka topic
○ KCL checkpoints to a DynamoDB table
○ similar semantics
● both systems must agree
● otherwise, possible DATA LOSS
Checkpointing Data Loss
1. KCL consumes a record
2. KCL checkpoints
3. Bridge replays the record to Kafka
4. container crashes before Kafka buffer flushed
5. container restarts
6. KCL restarts at checkpoint
--> buffered records lost
Checkpointing Solution
Checkpoint Kinesis Records only after they are flushed to Kafka.
Checkpoint Kafka Records only after they are flushed to Kinesis.
Producers must notify Consumers when it is safe to checkpoint.
Consumers must be able to request and wait for a Producer flush.
CheckpointableEnvelope
● KinesisConsumer registers onSynced listener
● BridgeTask registers listeners for each output stream
● KafkaProducer fires event after successful flush
● envelope is checkpointed only after ALL output streams have flushed
● each individual envelope is tracked this way, but...
● checkpoints only occur at sentinel envelopes at the end of each
GetRecords batch
● (for non-sentinels, onSynced is a noop)
CheckpointableEnvelope
SyncedProducerProxy wraps KafkaProducer
Metrics
Two Stacks...
● Bizo infra is on AWS CloudWatch, including metrics, alarms, paging
● LinkedIn has inGraphs for same purpose
Need to be able to monitor the bridge from both.
https://engineering.linkedin.com/32/eric-intern-origin-ingraphs
a custom MetricTracker
● publishes metrics to CW and inGraphs
● locally aggregates metrics to minimize API calls
● each metric has dimensions:
○ shard
○ partition
○ stream
○ system
● each metric re-published with hierarchy of dimensions
shard-level metrics
partition-level metrics
stream-level metrics
system-level metrics
application-level metrics
heartbeat metrics
(due to 1 minute buffer)
30sec RTT
DynamoDB Stream to
Kafka Bridge
Motivation
● Some of our services are running on AWS, e.g. video transcoding
● We want to replicate AWS data in LinkedIn data center
○ Serve requests from LinkedIn data center directly
○ Migrate off AWS easily
What is DynamoDB Stream
“A DynamoDB stream is an ordered flow of information about changes to items
in an Amazon DynamoDB table. When you enable a stream on a table,
DynamoDB captures information about every modification to data items in the
table.”
AWS documentation
Example DynamoDB Stream Record
{
"EventID":"f561f0491ce42a95a60ad1fc082ae98b",
"EventName":"MODIFY",
"EventVersion":"1.0",
"EventSource":"aws:dynamodb",
"AwsRegion":"us-east-1",
"Dynamodb":{
"Keys":{
"uuid":{
"S":"255"
}
},
"NewImage":{
<json representation of new image>
},
"OldImage":{
<json representation of old image>
},
"SequenceNumber":"593721700000000002066915768",
"SizeBytes":326,
"StreamViewType":"NEW_AND_OLD_IMAGES"
}
}
DynamoDB Stream Bridge Design
DynamoDB Stream Record to Kafka Message
● Concatenate sorted DynamoDB keys as Kafka partition key
● Put the DynamoDB Stream record in Kafka message. e.g.
{'kafkaMessageSegmentHeader': None, 'payload': '{"EventID":"f5b5e336f056f2656b23bfeed3cd45c8","
EventName":"MODIFY","EventVersion":"1.0","EventSource":"aws:dynamodb","AwsRegion":"us-east-1","Dynamodb":
{"Keys":{"RecordId":{"N":"0"}},"NewImage":{"ReadableTime":{"S":"Wed Feb 17 01:13:02 UTC 2016"},"RecordId":
{"N":"0"},"Timestamp":{"N":"1455671582888"}},"OldImage":{"ReadableTime":{"S":"Wed Feb 17 01:13:02 UTC
2016"},"RecordId":{"N":"0"},"Timestamp":{"N":"1455671582394"}},"SequenceNumber":"
29102800000000002523034570","SizeBytes":141,"StreamViewType":"NEW_AND_OLD_IMAGES"}}'}
Use Cases
● Replicate rich media platform video transcoding metadata to LinkedIn data
center (DynamoDB Stream -> Kafka -> Espresso)
Kinesis vs DynamoDB Stream
Validation
validation pipeline
Summary
With the ability to ship data from AWS Stream to LinkedIn Kafka and vice versa
using Samza, we can now seamlessly integrate AWS with LinkedIn.
Q & A

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Kafka aws
Kafka awsKafka aws
Kafka aws
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Kafka Summit SF 2017 - Kafka and the Polyglot Programmer
Kafka Summit SF 2017 - Kafka and the Polyglot ProgrammerKafka Summit SF 2017 - Kafka and the Polyglot Programmer
Kafka Summit SF 2017 - Kafka and the Polyglot Programmer
 
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails? Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
 
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 

Andere mochten auch

Software Architectures, Week 3 - Microservice-based Architectures
Software Architectures, Week 3 - Microservice-based ArchitecturesSoftware Architectures, Week 3 - Microservice-based Architectures
Software Architectures, Week 3 - Microservice-based Architectures
Angelos Kapsimanis
 

Andere mochten auch (20)

CV
CVCV
CV
 
Automated Infrastructure Security: Monitoring using FOSS
Automated Infrastructure Security: Monitoring using FOSSAutomated Infrastructure Security: Monitoring using FOSS
Automated Infrastructure Security: Monitoring using FOSS
 
Risk management
Risk managementRisk management
Risk management
 
Neuigkeiten von DEPAROM & Co
Neuigkeiten von DEPAROM & CoNeuigkeiten von DEPAROM & Co
Neuigkeiten von DEPAROM & Co
 
Software Architectures, Week 3 - Microservice-based Architectures
Software Architectures, Week 3 - Microservice-based ArchitecturesSoftware Architectures, Week 3 - Microservice-based Architectures
Software Architectures, Week 3 - Microservice-based Architectures
 
Reproducible Science with Python
Reproducible Science with PythonReproducible Science with Python
Reproducible Science with Python
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 
e-Extortion Trends and Defense
e-Extortion Trends and Defensee-Extortion Trends and Defense
e-Extortion Trends and Defense
 
Astricon 2016 - Scaling ARI and Production
Astricon 2016 - Scaling ARI and ProductionAstricon 2016 - Scaling ARI and Production
Astricon 2016 - Scaling ARI and Production
 
Gartner 2017 London: How to re-invent your IT Architecture?
Gartner 2017 London: How to re-invent your IT Architecture?Gartner 2017 London: How to re-invent your IT Architecture?
Gartner 2017 London: How to re-invent your IT Architecture?
 
Fostering a Culture of Analytics
Fostering a Culture of AnalyticsFostering a Culture of Analytics
Fostering a Culture of Analytics
 
Analyze, Influence and Engage Your Customer - v1.7
Analyze, Influence and Engage Your Customer - v1.7Analyze, Influence and Engage Your Customer - v1.7
Analyze, Influence and Engage Your Customer - v1.7
 
Reversing malware analysis training part2 introduction to windows internals
Reversing malware analysis training part2 introduction to windows internalsReversing malware analysis training part2 introduction to windows internals
Reversing malware analysis training part2 introduction to windows internals
 
Streaming architecture with HDP & ELK
Streaming architecture with HDP & ELKStreaming architecture with HDP & ELK
Streaming architecture with HDP & ELK
 
Kelompok 2
Kelompok 2Kelompok 2
Kelompok 2
 
Business selectors
Business selectorsBusiness selectors
Business selectors
 
Apostila De Dispositivos EléTricos
Apostila De Dispositivos EléTricosApostila De Dispositivos EléTricos
Apostila De Dispositivos EléTricos
 
IM World presentation from Chris Swan: Application centric – how the cloud ha...
IM World presentation from Chris Swan: Application centric – how the cloud ha...IM World presentation from Chris Swan: Application centric – how the cloud ha...
IM World presentation from Chris Swan: Application centric – how the cloud ha...
 
Catálogo 15 16 elksport
Catálogo 15 16 elksportCatálogo 15 16 elksport
Catálogo 15 16 elksport
 
#speakgeek - Open Source Software Infrastructure at iconnect360
#speakgeek - Open Source Software Infrastructure at iconnect360#speakgeek - Open Source Software Infrastructure at iconnect360
#speakgeek - Open Source Software Infrastructure at iconnect360
 

Ähnlich wie Bridging the Gap: Connecting AWS and Kafka

Ähnlich wie Bridging the Gap: Connecting AWS and Kafka (20)

AWS Lambda and Serverless Cloud
AWS Lambda and Serverless CloudAWS Lambda and Serverless Cloud
AWS Lambda and Serverless Cloud
 
Spring Cloud and Netflix OSS overview v1
Spring Cloud and Netflix OSS overview v1Spring Cloud and Netflix OSS overview v1
Spring Cloud and Netflix OSS overview v1
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
 
AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 
Serverless Node.js
Serverless Node.jsServerless Node.js
Serverless Node.js
 
Getting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless CloudGetting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless Cloud
 
Confluent Tech Talk Korea
Confluent Tech Talk KoreaConfluent Tech Talk Korea
Confluent Tech Talk Korea
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
GraphQL API on a Serverless Environment
GraphQL API on a Serverless EnvironmentGraphQL API on a Serverless Environment
GraphQL API on a Serverless Environment
 
Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thing
 
Netflix and Containers: Not Stranger Things
Netflix and Containers: Not Stranger ThingsNetflix and Containers: Not Stranger Things
Netflix and Containers: Not Stranger Things
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture Patterns
 
serverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdfserverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdf
 

Kürzlich hochgeladen

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 

Bridging the Gap: Connecting AWS and Kafka

  • 1. Bridging the Gap: Connecting AWS and Kafka Ryanne Dolan & Jason Li LinkedIn
  • 2. Overview ● Motivation ● What is Kinesis? ● Architecture ● Checkpointing ● Metrics ● DynamoDB Streams ● Validation
  • 3. Story starts at Bizo ● Originally batch-oriented architecture: AWS S3, Spark ● Started converting to stream-oriented architecture in 2013: AWS Kinesis ● Acquired by LinkedIn in 2014 ● Originally batch-oriented integration with LinkedIn’s data centers ● Started bridge project in 2015 ● Other teams (including other startups) have started using it
  • 4. Use Cases ● User-tracking events from Bizo Kinesis -> LI Kafka ● Leverage LI A/B tooling from AWS (requires Kafka events) ● Leverage LI ELK stack from AWS (Kinesis -> Kafka -> ELK) ● Leverage LI call tracing, site speed, and other metrics, reports ● Ship application and CloudWatch metric from AWS to LI (Kinesis -> Kafka - > inGraph)
  • 5. Requirements ● encoding agnostic (Thrift, Avro, JSON, etc) ● to/from Kafka ● to/from Kinesis ● preserve partitioning ● near real-time bidirectional replication ● stream multiplexing and joining ● support multiple AWS accounts ● multiple metrics endpoints (including CloudWatch)
  • 6. Kinesis Highlights ● VERY similar to Kafka ● Dynamic, scalable “shards” (instead of static partitions) ● Throughput and cost tied to # of shards ● Each shard: ○ $0.015/hour ○ 1000 records/second ○ 1MB/s ingress ○ 2MB/s egress ● Integration with AWS services, e.g. “Firehose” -> S3
  • 8. Kinesis-Kafka Bridge ● Samza job ● Kinesis System (Producer and Consumer) ● BridgeTask ● input/output mapping ● Repartitioner ● Transcoder
  • 12. BridgeTask again (notice multiple outputs!) Kinesis Consumer Repartitioner Transcoder Kinesis Producer Kafka Consumer Bridge Mapping Kafka Producere.g. Partition 1 -> shard-000001 e.g. Thrift -> Avro e.g. llama.abc -> kafka.xyz
  • 15. KinesisConsumer ● wraps the Kinesis Consumer Library (KCL) ● extends Samza’s BlockingEnvelopeMap ● creates one KCL Worker per Shard ● at least one Worker per KinesisConsumer instance ● Workers push envelopes into queue
  • 16. KinesisProducer ● uses Kinesis PutRecords API (batch) ● enqueues envelopes (async) ● flushes off-thread
  • 17. KinesisSystemAdmin ● queries Kinesis API for # shards at start-up ● tells Samza: # partitions == # shards ● # shards may change at any time, but OK in practice ● KCL will load-balance Workers automatically
  • 19. Checkpointing ● TWO sources of checkpoints: Samza and KCL ○ Samza checkpoints to a Kafka topic ○ KCL checkpoints to a DynamoDB table ○ similar semantics ● both systems must agree ● otherwise, possible DATA LOSS
  • 20. Checkpointing Data Loss 1. KCL consumes a record 2. KCL checkpoints 3. Bridge replays the record to Kafka 4. container crashes before Kafka buffer flushed 5. container restarts 6. KCL restarts at checkpoint --> buffered records lost
  • 21. Checkpointing Solution Checkpoint Kinesis Records only after they are flushed to Kafka. Checkpoint Kafka Records only after they are flushed to Kinesis. Producers must notify Consumers when it is safe to checkpoint. Consumers must be able to request and wait for a Producer flush.
  • 23. ● KinesisConsumer registers onSynced listener ● BridgeTask registers listeners for each output stream ● KafkaProducer fires event after successful flush ● envelope is checkpointed only after ALL output streams have flushed ● each individual envelope is tracked this way, but... ● checkpoints only occur at sentinel envelopes at the end of each GetRecords batch ● (for non-sentinels, onSynced is a noop) CheckpointableEnvelope
  • 26. Two Stacks... ● Bizo infra is on AWS CloudWatch, including metrics, alarms, paging ● LinkedIn has inGraphs for same purpose Need to be able to monitor the bridge from both.
  • 28. a custom MetricTracker ● publishes metrics to CW and inGraphs ● locally aggregates metrics to minimize API calls ● each metric has dimensions: ○ shard ○ partition ○ stream ○ system ● each metric re-published with hierarchy of dimensions
  • 35. (due to 1 minute buffer) 30sec RTT
  • 37. Motivation ● Some of our services are running on AWS, e.g. video transcoding ● We want to replicate AWS data in LinkedIn data center ○ Serve requests from LinkedIn data center directly ○ Migrate off AWS easily
  • 38. What is DynamoDB Stream “A DynamoDB stream is an ordered flow of information about changes to items in an Amazon DynamoDB table. When you enable a stream on a table, DynamoDB captures information about every modification to data items in the table.” AWS documentation
  • 39. Example DynamoDB Stream Record { "EventID":"f561f0491ce42a95a60ad1fc082ae98b", "EventName":"MODIFY", "EventVersion":"1.0", "EventSource":"aws:dynamodb", "AwsRegion":"us-east-1", "Dynamodb":{ "Keys":{ "uuid":{ "S":"255" } }, "NewImage":{ <json representation of new image> }, "OldImage":{ <json representation of old image> }, "SequenceNumber":"593721700000000002066915768", "SizeBytes":326, "StreamViewType":"NEW_AND_OLD_IMAGES" } }
  • 41. DynamoDB Stream Record to Kafka Message ● Concatenate sorted DynamoDB keys as Kafka partition key ● Put the DynamoDB Stream record in Kafka message. e.g. {'kafkaMessageSegmentHeader': None, 'payload': '{"EventID":"f5b5e336f056f2656b23bfeed3cd45c8"," EventName":"MODIFY","EventVersion":"1.0","EventSource":"aws:dynamodb","AwsRegion":"us-east-1","Dynamodb": {"Keys":{"RecordId":{"N":"0"}},"NewImage":{"ReadableTime":{"S":"Wed Feb 17 01:13:02 UTC 2016"},"RecordId": {"N":"0"},"Timestamp":{"N":"1455671582888"}},"OldImage":{"ReadableTime":{"S":"Wed Feb 17 01:13:02 UTC 2016"},"RecordId":{"N":"0"},"Timestamp":{"N":"1455671582394"}},"SequenceNumber":" 29102800000000002523034570","SizeBytes":141,"StreamViewType":"NEW_AND_OLD_IMAGES"}}'}
  • 42. Use Cases ● Replicate rich media platform video transcoding metadata to LinkedIn data center (DynamoDB Stream -> Kafka -> Espresso)
  • 46. Summary With the ability to ship data from AWS Stream to LinkedIn Kafka and vice versa using Samza, we can now seamlessly integrate AWS with LinkedIn.
  • 47. Q & A