SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
Akka Streams,
Kafka, Kinesis
Peter Vandenabeele
Mechelen, June 25, 2015
#StreamProcessingBe
whoami : Peter Vandenabeele
@peter_v
@All_Things_Data (my consultancy)
current client:
Real Impact Analytics @RIAnalytics
Telecom Analytics (emerging markets)
Agenda
5’ Intro (Peter)
40’ Akka Streams, Kafka, Kinesis (Peter)
45’ Spark Streaming and Kafka Demo (Gerard)
15’ Open discussion (all)
30’ beers (doors close at 21:30)
Many thanks to
@All_Things_Data (beer)
@maasg (Gerard Maas)
you !
=> Note: always looking for locations
Agenda
Why ?
Akka + Akka Streams
Demo
Kafka + Kinesis
Demo
Why !
Why ?
distributed state
distributed failure
slow consumers
Akka
Why (for iHeartRadio )
source: tech.iheart.com/post/121599571574/why-we-picked-akka-cluster-as-our-microservice
Akka design
Building
● concurrent <= many (slow) CPU’s
● distributed <= distributed state
● resilient <= distributed failure
● applications <= platform
● on JVM <= Erlang OTP
Akka arch.
state
actors
supervision
distributed !
Akka actor
msg
actor
def receive = {
case CreateUser =>
case UpdateUser =>
case DelUser =>
}
persistence
msg
http
external
● msgs are sent
● recvd in order
● single thread
● stateful !
● errors go “up”
1234
supervisor
Akka usage + courses
● concurrent programming not easy …
● but without Akka … would be much harder
● Spark (see log extract next slide)
● Flink (version 0.9 of 24 June)
● local projects (e.g.”Wegen en verkeer”)
● BeScala Meetup now runs Akka intro course
● commercial courses (Cronos, Scala World...)
Spark heavily based on Akka
log extract from Spark:
java -cp ...
"org.apache.spark.executor.CoarseGrainedExecutorBackend"
"--driver-url" "akka.tcp://sparkDriver@docker-master:
51810/user/CoarseGrainedScheduler"
"--executor-id" "23" "--hostname" "docker-slave2" "--cores" "8"
"--worker-url" "akka.tcp://sparkWorker@docker-slave2:
50268/user/Worker"
Akka Streams
Reactive Streams
● http://reactive-streams.org
● exchange of
stream data across
asynchronous boundary in
bounded fashion
● building and industry standard (open IP)
Demand based
demand
data
“give me max 20”
“sending 2, 5, 10, ...”
“give me max 10 more”
producer consumer
How does it work?
source: http://www.slideshare.net/rolandkuhn/reactive-streams Roland Kuhn (TypeSafe) @rolandkuhn
Don’t try this at home
Akka Streams
● Source ~> Flow ~> Flow ~> Sink
● MaterializedFlow
source: http://www.slideshare.net/rolandkuhn/reactive-streams Roland Kuhn (TypeSafe) @rolandkuhn
Akka Streams : advantages
● Types (stream of T)
● makes it trivially simple :-)
● Many examples online (fast and simple)
● demo of simplistic case
simplistic Akka Streams demo
Kafka
Kafka (LinkedIn) : Martin Kleppmann
source : Martin Kleppmann
at strata Hadoop London
Kafka log based
new
1 week
del
real-time
Kafka consumers
batch
replay123
124
129
128
127
126
125
producers
ad-hoc
42
43
48
47
44
45
46
partitions
Kafka partitions
source: http://kafka.apache.org/documentation.html
Kafka (LinkedIn) : Jay Kreps
source: Jay Kreps
on slideshare
“I ♥ Log”
Real-time Data and Apache Kafka
Kinesis
Kinesis : Kafka as a Service
source: http://aws.amazon.com/kinesis/details/
Kinesis design
● Fully (auto-)managed
● Strong durability guarantees
● Stream (= topic)
● Shard (= partition)
● “fast” writers (but … round-trip 20 ms ?)
● “slow” readers (max 5/s per shard ??)
● Kinesis Client Library (java)
Kinesis limitations ...
● writing latency (20 ms per entry - replicated)
● 24 hours data retention
● 5 reads per second
https://brandur.org/kinesis-in-production
● “vanishing history” after shard split
● “if I’d understood the consequences ... earlier, I
probably would have pushed harder for Kafka”
simplistic Kinesis demo
Kinesis consumer with Amazom DynamoDB :: reused from http://docs.aws.amazon.
com/kinesis/latest/dev/kinesis-sample-application.html
Why !
(a personal view)
Note: “thanks for the feedback on this section.
Indeed Kafka and Akka serve very different purposes,
but they both offer solutions for distributed state,
distributed failure and slow consumers”
Problem 1: Distributed state
Akka
=> state encapsulated in Actors
=> exchange self-contained messages
Kafka
=> immutable, ordered update queue (Kappa)
Problem 2: Distributed failure
Akka
=> explicit failure management (supervisor)
Kafka
=> partitions are replicated over brokers
=> consumers can replay from log
Problem 3: Slow consumers
Akka Streams
=> automatic back-pressure (avoid overflow)
Kafka
=> consumers fully decoupled
=> keeps data for 1 week ! (Kinesis: 1 day)
Avoid overflow in Akka: “tight seals”
source : https://www.youtube.com/watch?v=o5PUDI4qi10 by @sirthias
Avoid overflow in Kafka: “big lake”

Weitere ähnliche Inhalte

Was ist angesagt?

Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Michael Noll
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsKevin Webber
 
Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...
Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...
Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...Lightbend
 
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017Michael Noll
 
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSPutting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSLightbend
 
Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsDean Wampler
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesLightbend
 
Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine
Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming EngineMoving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine
Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming EngineLightbend
 
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...confluent
 
Building Stateful Microservices With Akka
Building Stateful Microservices With AkkaBuilding Stateful Microservices With Akka
Building Stateful Microservices With AkkaYaroslav Tkachenko
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKMaxim Shelest
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry confluent
 
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...confluent
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTPRoland Kuhn
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lightbend
 
How to manage large amounts of data with akka streams
How to manage large amounts of data with akka streamsHow to manage large amounts of data with akka streams
How to manage large amounts of data with akka streamsIgor Mielientiev
 
Making Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsMaking Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsLightbend
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Michael Noll
 
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...confluent
 

Was ist angesagt? (20)

Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
 
Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...
Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...
Akka Revealed: A JVM Architect's Journey From Resilient Actors To Scalable Cl...
 
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
 
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSPutting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
 
Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka Streams
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
 
Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine
Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming EngineMoving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine
Moving from Big Data to Fast Data? Here's How To Pick The Right Streaming Engine
 
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
 
Building Stateful Microservices With Akka
Building Stateful Microservices With AkkaBuilding Stateful Microservices With Akka
Building Stateful Microservices With Akka
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACK
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
 
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTP
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
How to manage large amounts of data with akka streams
How to manage large amounts of data with akka streamsHow to manage large amounts of data with akka streams
How to manage large amounts of data with akka streams
 
Making Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsMaking Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development Teams
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
 
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 

Ähnlich wie Stream Processing with Akka, Kafka and Kinesis

Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKai Wähner
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017Monal Daxini
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?confluent
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a ServiceSteven Wu
 
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...Amazon Web Services
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterconfluent
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaDataStax Academy
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comCedric Vidal
 
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...Kai Wähner
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafkaconfluent
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams APIconfluent
 
Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!Paolo Castagna
 
KSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache KafkaKSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache KafkaMatthias J. Sax
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshopconfluent
 

Ähnlich wie Stream Processing with Akka, Kafka and Kinesis (20)

Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
 
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!
 
Flink in action
Flink in actionFlink in action
Flink in action
 
KSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache KafkaKSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache Kafka
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshop
 

Kürzlich hochgeladen

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 

Kürzlich hochgeladen (20)

Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 

Stream Processing with Akka, Kafka and Kinesis

  • 1. Akka Streams, Kafka, Kinesis Peter Vandenabeele Mechelen, June 25, 2015 #StreamProcessingBe
  • 2. whoami : Peter Vandenabeele @peter_v @All_Things_Data (my consultancy) current client: Real Impact Analytics @RIAnalytics Telecom Analytics (emerging markets)
  • 3. Agenda 5’ Intro (Peter) 40’ Akka Streams, Kafka, Kinesis (Peter) 45’ Spark Streaming and Kafka Demo (Gerard) 15’ Open discussion (all) 30’ beers (doors close at 21:30)
  • 4. Many thanks to @All_Things_Data (beer) @maasg (Gerard Maas) you ! => Note: always looking for locations
  • 5. Agenda Why ? Akka + Akka Streams Demo Kafka + Kinesis Demo Why !
  • 6. Why ? distributed state distributed failure slow consumers
  • 8. Why (for iHeartRadio ) source: tech.iheart.com/post/121599571574/why-we-picked-akka-cluster-as-our-microservice
  • 9. Akka design Building ● concurrent <= many (slow) CPU’s ● distributed <= distributed state ● resilient <= distributed failure ● applications <= platform ● on JVM <= Erlang OTP
  • 11. Akka actor msg actor def receive = { case CreateUser => case UpdateUser => case DelUser => } persistence msg http external ● msgs are sent ● recvd in order ● single thread ● stateful ! ● errors go “up” 1234 supervisor
  • 12. Akka usage + courses ● concurrent programming not easy … ● but without Akka … would be much harder ● Spark (see log extract next slide) ● Flink (version 0.9 of 24 June) ● local projects (e.g.”Wegen en verkeer”) ● BeScala Meetup now runs Akka intro course ● commercial courses (Cronos, Scala World...)
  • 13. Spark heavily based on Akka log extract from Spark: java -cp ... "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver@docker-master: 51810/user/CoarseGrainedScheduler" "--executor-id" "23" "--hostname" "docker-slave2" "--cores" "8" "--worker-url" "akka.tcp://sparkWorker@docker-slave2: 50268/user/Worker"
  • 15. Reactive Streams ● http://reactive-streams.org ● exchange of stream data across asynchronous boundary in bounded fashion ● building and industry standard (open IP)
  • 16. Demand based demand data “give me max 20” “sending 2, 5, 10, ...” “give me max 10 more” producer consumer
  • 17. How does it work? source: http://www.slideshare.net/rolandkuhn/reactive-streams Roland Kuhn (TypeSafe) @rolandkuhn Don’t try this at home
  • 18. Akka Streams ● Source ~> Flow ~> Flow ~> Sink ● MaterializedFlow source: http://www.slideshare.net/rolandkuhn/reactive-streams Roland Kuhn (TypeSafe) @rolandkuhn
  • 19. Akka Streams : advantages ● Types (stream of T) ● makes it trivially simple :-) ● Many examples online (fast and simple) ● demo of simplistic case
  • 21. Kafka
  • 22. Kafka (LinkedIn) : Martin Kleppmann source : Martin Kleppmann at strata Hadoop London
  • 23. Kafka log based new 1 week del real-time Kafka consumers batch replay123 124 129 128 127 126 125 producers ad-hoc 42 43 48 47 44 45 46 partitions
  • 25. Kafka (LinkedIn) : Jay Kreps source: Jay Kreps on slideshare “I ♥ Log” Real-time Data and Apache Kafka
  • 27. Kinesis : Kafka as a Service source: http://aws.amazon.com/kinesis/details/
  • 28. Kinesis design ● Fully (auto-)managed ● Strong durability guarantees ● Stream (= topic) ● Shard (= partition) ● “fast” writers (but … round-trip 20 ms ?) ● “slow” readers (max 5/s per shard ??) ● Kinesis Client Library (java)
  • 29. Kinesis limitations ... ● writing latency (20 ms per entry - replicated) ● 24 hours data retention ● 5 reads per second https://brandur.org/kinesis-in-production ● “vanishing history” after shard split ● “if I’d understood the consequences ... earlier, I probably would have pushed harder for Kafka”
  • 30. simplistic Kinesis demo Kinesis consumer with Amazom DynamoDB :: reused from http://docs.aws.amazon. com/kinesis/latest/dev/kinesis-sample-application.html
  • 31. Why ! (a personal view) Note: “thanks for the feedback on this section. Indeed Kafka and Akka serve very different purposes, but they both offer solutions for distributed state, distributed failure and slow consumers”
  • 32. Problem 1: Distributed state Akka => state encapsulated in Actors => exchange self-contained messages Kafka => immutable, ordered update queue (Kappa)
  • 33. Problem 2: Distributed failure Akka => explicit failure management (supervisor) Kafka => partitions are replicated over brokers => consumers can replay from log
  • 34. Problem 3: Slow consumers Akka Streams => automatic back-pressure (avoid overflow) Kafka => consumers fully decoupled => keeps data for 1 week ! (Kinesis: 1 day)
  • 35. Avoid overflow in Akka: “tight seals” source : https://www.youtube.com/watch?v=o5PUDI4qi10 by @sirthias
  • 36. Avoid overflow in Kafka: “big lake”