SlideShare ist ein Scribd-Unternehmen logo
1 von 84
Introducing Apache Kafka®
Paul Brebner
Technical Evangelist February 2019
What is Kafka?
Message flow
Distributed streams
Processing System
Messages sent by
Distributed Producers
to
Distributed Consumers
Consumers
via
Distributed Kafka Cluster
Cluster
Kafka Benefits
 Fast
 Scalable
 Reliable
 Durable
 Open Source
 Managed Service
Kafka benefits
Kafka Benefits
 Fast – high throughput and low latency
 Scalable – horizontally scalable with nodes and partitions
 Reliable – distributed and fault tolerant
 Durable - zero data loss, messages persisted to disk with immutable log
 Open Source – An Apache project
 Available as a Managed Service - on multiple cloud platforms
Kafka benefits
Message flow
Message flow
To send messages from A to B
“A” is the Producer – sends a message, to
“B” the Consumer (recipient) of the message
Due to decline in “snail mail” direct deliveries
Due to decline in “snail mail” direct deliveries
Instead … “Poste Restante”
“Poste Restante”?
• Not a post office in a
restaurant
• General delivery (in the
US)
• The mail is delivered to
a post office, they hold
it for you until you call
Consumers “poll” for messages by visiting the
Poste Restante counter at the post office
Kafka topics act like a Post Office
Benefits include
• Disconnected delivery –
consumer doesn’t need to
be available to receive
messages
• Less effort for the
messaging service – only
has to deliver to a few
locations not every
consumer
• Can scale better and
handle more complex
delivery semantics!
Scalability? Many consumers for a topic?
A single counter introduces delays
More counters increases concurrency
Kafka topics have >= 1 Partitions (“counters”)
• Partitions increase
consumer concurrency
• Increase throughput
• Reduce latency
Santa
North Pole
What’s a Kafka message?
A Record – like a letter
Santa
North PoleTopic
Topic is the destination
Santa
North PoleTopic
Timestamp, offset, partition
The “Postmark”
Santa
North PoleTopic
Timestamp, offset, partition
The “Postmark”
Time semantics are flexible
time of event creation,
ingestion,
or processing.
Santa
North PoleTopic
Timestamp, offset, partition
Key -> Partition (optional)
Key (optional)
Santa
North PoleTopic
Timestamp, offset, partition
Key -> Partition (optional)
Key (optional)
Refines the destination
Send to Santa not just any Elf
Santa
North PoleTopic
Timestamp, offset, partition
Key -> Partition (optional)
Value is contents (byte array)
Santa
North PoleTopic
Timestamp, offset, partition
Key -> Partition (optional)
Kafka Producers and Consumers
need a serializer and de-serializer
to write & read key and value
• Kafka doesn’t look
at the value
• Consumer can
read value
• And try to make
sense of the
message
• What will Santa
be delivering?!
Next…
Delivery Semantics
Do we care
if the message arrives?
Yes! Guaranteed delivery is desirable
But homing pigeons got lost or eaten
Send multiple pigeons
One pigeon may make it
Eaten
Lost
Producer
Broker
1
Broker
2
Broker
3
M1
How does Kafka guarantee delivery?
Producer
Broker
1
Broker
2
Broker
3
M1
A Message (M1) is written to a broker (2)
Producer
Broker
1
Broker
2
Broker
3
M1
M1
The message is always persisted to disk.
Producer
Broker
1
Broker
2
Broker
3
M1
M1
This makes it resilient to loss of power.
Producer
Broker
1
Broker
2
Broker
3
M1 M1
M1 M1M1
The message is also replicated on multiple “brokers”
Producer
Broker
1
Broker
2
Broker
3
M1
M1 M1M1
Which makes it resilient to loss of most servers
Producer
Broker
1
Broker
2
Broker
3
Acknowledgement
M1 M1 M1
Producer gets acknowledgement
once the message is persisted and replicated
(configurable)
Consumer
Broker
1
Broker
2
Broker
3
M1 M1 M1
Multiple Brokers and Partitions 
increased read availability and concurrency
Producer
Consumer
Consumer
Consumer
Consumer
The 2nd aspect of delivery semantics:
Who gets the messages?
How many times are messages delivered?
Producer
Consumer
Consumer
Consumer
Consumer
Delivery Semantics - Kafka is “pub-sub”
- Loosely coupled
- Producers and consumers don’t know about each other
Producer
Topic “Parties”
Consumer
Consumer
Consumer
Consumer
Which consumers get which messages
(filtering), is topic based
Topic “Work”
Producer
Topic “Parties”
Consumer
Consumer
Consumer
Consumer
Topic “Work”
Consumers Subscribe to topic “Parties”
Subscribe
Producer
Topic “Parties”
Consumer
Consumer
Consumer
Consumers
Subscribed to Topic “Parties”
Consumer
Publishers send messages to topics
Send
Topic “Work”Send
Producer
Topic “Parties”
Consumer
Consumer
Consumer
Consumers
Subscribed to Topic “Parties”
Consumer
Consumers only receive messages from
subscribed topics
Consumers Poll
To receive messages
from ”Parties”
Topic “Work” Consumers not subscribed to “Work”
Don’t receive any “Work” messages
Partitions and Consumer Groups
Enable sharing of work across
consumers
Duplicate message delivery
Each message is
delivered to each
subscribed
consumer group
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Consumer Group
Consumer Group
Consumer
Consumer
Consumer
Consumer
Consumers subscribed to topic are allocated partitions
They will only get messages from their allocated partitions.
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Consumer Group
Consumer
Consumer
Consumer
Consumers share work
within groups
Consumers in the same group share the work around
Each consumer gets only a subset of messages
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Consumer Group
Consumer Group
Consumer
Consumer
Consumer
Consumer
Messages are duplicated across
Consumer groups
Multiple groups enable message broadcasting
Messages are duplicated across groups, each consumer
group receives a copy of each message.
Key
Partition based delivery
Which messages are delivered to
which consumers?
If a message has a key, then Kafka
uses Partition based delivery.
Messages with the same key are
always sent to the same partition
and therefore the same
consumer.
And the order is guaranteed.
No Key
Round robin delivery
If the key is null, then
Kafka uses round robin
delivery
Each message is delivered
to the next partition
Time for an Example, with 2 consumer groups.
Consumer Group = Nerds
Multiple consumers
Consumer Group = Nerds
Multiple consumers
Consumer Group = Hairy
Single consumer
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Group “Nerds”
Group “Hairy”
Consumer 1 (Bill)
Consumer 2 (Paul)
Consumer n
Consumer 1 (Chewy)
Consumers
Subscribed to “Parties”
No Key
Round Robin
M1
M2
etc
M1
M1
M1
M2
M2
M2
Case 1: No Key
Message (M1, M2, etc) sent to the next partition
All consumers allocated to that partition will receive a message when they poll next.
Here’s what happens (not showing producer or topics, have to imagine them)
1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition).
Subscribe to “Parties” Subscribe to “Parties”
1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition).
2. Producer sends record “Cool pool party – Invitation”
<key=null, value=“Cool pool party - Invitation”> to “parties” topic (no key)
1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition).
2. Producer sends record “Cool pool party - Invitation”> to “parties” topic
3. Bill and Chewbacca receive a copy of the invitation and plan to attend
4. Producer sends another record “Cool pool party – Cancelled”
<key=null, value=“Cool pool party - Cancelled”> to “parties” topic
4. Producer sends another record <key=null, value=“Cool pool party - Cancelled”> to “parties” topic
5. Paul and Chewbacca receive the cancellation.
Paul gets the message this time as it’s round robin, ignores it as he didn’t get the invitation. Bill wastes his
time trying to go to cancelled party. The rest of the gang aren’t surprised at not receiving any party invites and
stay at home to do some hacking. Chewy is only consumer in his group so gets all messages, plans something
fun instead…
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Group “Nerds”
Group “Hairy”
Consumer 1 (Bill)
Consumer 2 (Paul)
Consumer n
Consumer 1 (Chewy)
Consumers
Subscribed to “Parties”
Key
Hashed to partition
M1, M2
etc
M1, M2
M1, M2
M1, M2
M3
M3
M3
M3
Case 2: If there is a Key
A key is hashed to a partition, and a Message with that key is always sent to that partition.
Assume there are 3 messages, and messages 1 and 2 are hashed to same partition.
Here’s what happens with a key: key is “title” of the message (e.g. “Cool pool party”)
Same set up as before:
1. Both Groups subscribe to Topic “parties” (11 partitions).
1. Both Groups subscribe to Topic “parties” (11 partitions).
2. Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
1. Both Groups subscribe to Topic “parties” (11 partitions).
2. Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
3. As before Bill and Chewbacca receive a copy of the invitation and plan to attend
4. Producer sends another record <key=“Cool pool party”, value=“Cancelled”> to “parties” topic
4. Producer sends another record <key=“Cool pool party”, value=“Cancelled”> to “parties” topic
5. Bill and Chewbacca receive the cancellation (same consumers this time, as identical key)
6. Producer sends another record <key=“Horrible Halloween party”, value=“Invitation”> to ”parties” topic
6. Producer sends another record <key=“Horrible Halloween party”, value=“Invitation”> to ”parties” topic
7. Paul and Chewy receive the invitation
Paul receives the Halloween invitation as the key is different and the record is sent to the partition that Paul is
allocated to
Chewy is the only consumer in his group so he gets every record no matter what partition it’s sent to
Example Kafka Use Cases
Real-time data pipeline
Read-time data pipeline features:
• Ingestion of multiple heterogeneous sources
• Sending data to multiple heterogeneous sinks
• Acts as a buffer to smooth out load spikes
• Enables use cases which reprocess data (e.g. disaster recovery)
Anomaly Detection Pipeline
Real-time Event processing pipeline:
• Simple event driven applications (If X then Y…)
• May write and read from other data sources (e.g. Cassandra)
• New Events sent back to Kafka or to other systems
• E.g. Anomaly Detection, check out my current blog series if you are interested in this example.
Kafka Streams Processing (Kongo IoT Blog series)
Streams processing features:
• Complex streams processing (multiple events and streams)
• Time, windows, and transformations
• Uses Kafka Streams API, includes state store
• Visualization of the streams topology
• Continuously computes the loads for trucks and checks if they are overloaded.
Linkedin - Before Kafka (BK)
A real example from Linkedin, who developed Kafka.
Before Kafka they had spaghetti integration of monolithic applications.
To accommodate growing membership and increasing site complexity, they migrated from a monolithic
application infrastructure to one based on microservices, which made the integration even more complex!
After Kafka (AK)
Rather than maintaining and scaling each pipeline individually, they invested in the
development of a single, distributed pub-sub platform - Kafka was born.
The main benefit was better Service decoupling and independent scaling.
The End (of the introduction) -
Find out more
Apache Kafka: https://kafka.apache.org/
Instaclustr blogs
• Mix of Cassandra, Spark, Zeppelin and Kafka
https://www.instaclustr.com/paul-brebner/
• Kafka introduction
https://insidebigdata.com/2018/04/12/developing-deeper-understanding-apache-kafka-architecture/
https://insidebigdata.com/2018/04/19/developing-deeper-understanding-apache-kafka-architecture-part-2-
• Kongo – Kafka IoT logistics application blog series
https://www.instaclustr.com/instaclustr-kongo-iot-logistics-streaming-demo-application/
• Anomaly detection with Kafka and Cassandra (and Kubernetes), current blog series
https://www.instaclustr.com/anomalia-machina-1-massively-scalable-anomaly-detection-with-apache-kafka-
Instaclustr’s Managed Kafka (Free trial)
https://www.instaclustr.com/solutions/managed-apache-kafka/

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewDmitry Tolpeko
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafkaemreakis
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developersconfluent
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformJean-Paul Azar
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 

Was ist angesagt? (20)

Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
kafka
kafkakafka
kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 

Ähnlich wie A visual introduction to Apache Kafka

Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupSnehal Nagmote
 
Stream Processing Metamorphosis - A Kafka's tale
Stream Processing Metamorphosis - A Kafka's taleStream Processing Metamorphosis - A Kafka's tale
Stream Processing Metamorphosis - A Kafka's taleJoão Vazão Vasques
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonLivePerson
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandRan Silberman
 
Exactly Once Delivery is a Harsh Mistress - Natan Silnitsky
Exactly Once Delivery is a Harsh Mistress  - Natan SilnitskyExactly Once Delivery is a Harsh Mistress  - Natan Silnitsky
Exactly Once Delivery is a Harsh Mistress - Natan SilnitskyDevOpsDays Tel Aviv
 
Exactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLVExactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLVNatan Silnitsky
 
Exactly Once Delivery with Kafka - JOTB2020 Mini Session
Exactly Once Delivery with Kafka - JOTB2020 Mini SessionExactly Once Delivery with Kafka - JOTB2020 Mini Session
Exactly Once Delivery with Kafka - JOTB2020 Mini SessionNatan Silnitsky
 
TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)Erhwen Kuo
 

Ähnlich wie A visual introduction to Apache Kafka (14)

Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Kafka101
Kafka101Kafka101
Kafka101
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
Stream Processing Metamorphosis - A Kafka's tale
Stream Processing Metamorphosis - A Kafka's taleStream Processing Metamorphosis - A Kafka's tale
Stream Processing Metamorphosis - A Kafka's tale
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
 
Animated: Components of Kafka
Animated: Components of KafkaAnimated: Components of Kafka
Animated: Components of Kafka
 
04-Kafka.pptx
04-Kafka.pptx04-Kafka.pptx
04-Kafka.pptx
 
04-Kafka.pptx
04-Kafka.pptx04-Kafka.pptx
04-Kafka.pptx
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised Land
 
Exactly Once Delivery is a Harsh Mistress - Natan Silnitsky
Exactly Once Delivery is a Harsh Mistress  - Natan SilnitskyExactly Once Delivery is a Harsh Mistress  - Natan Silnitsky
Exactly Once Delivery is a Harsh Mistress - Natan Silnitsky
 
Exactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLVExactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLV
 
Exactly Once Delivery with Kafka - JOTB2020 Mini Session
Exactly Once Delivery with Kafka - JOTB2020 Mini SessionExactly Once Delivery with Kafka - JOTB2020 Mini Session
Exactly Once Delivery with Kafka - JOTB2020 Mini Session
 
Kafka.pptx
Kafka.pptxKafka.pptx
Kafka.pptx
 
TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)
 

Mehr von Paul Brebner

The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...Paul Brebner
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersPaul Brebner
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaPaul Brebner
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Paul Brebner
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaPaul Brebner
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Paul Brebner
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Paul Brebner
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialPaul Brebner
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980'sPaul Brebner
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...Paul Brebner
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...Paul Brebner
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...Paul Brebner
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...Paul Brebner
 

Mehr von Paul Brebner (20)

The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache Kafka
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache Kafka
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and Potential
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...
 

Kürzlich hochgeladen

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Kürzlich hochgeladen (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

A visual introduction to Apache Kafka

  • 1. Introducing Apache Kafka® Paul Brebner Technical Evangelist February 2019
  • 2. What is Kafka? Message flow Distributed streams Processing System Messages sent by Distributed Producers to Distributed Consumers Consumers via Distributed Kafka Cluster Cluster
  • 3. Kafka Benefits  Fast  Scalable  Reliable  Durable  Open Source  Managed Service Kafka benefits
  • 4. Kafka Benefits  Fast – high throughput and low latency  Scalable – horizontally scalable with nodes and partitions  Reliable – distributed and fault tolerant  Durable - zero data loss, messages persisted to disk with immutable log  Open Source – An Apache project  Available as a Managed Service - on multiple cloud platforms Kafka benefits
  • 7.
  • 8. To send messages from A to B
  • 9. “A” is the Producer – sends a message, to
  • 10. “B” the Consumer (recipient) of the message
  • 11. Due to decline in “snail mail” direct deliveries
  • 12. Due to decline in “snail mail” direct deliveries
  • 13. Instead … “Poste Restante”
  • 14. “Poste Restante”? • Not a post office in a restaurant • General delivery (in the US) • The mail is delivered to a post office, they hold it for you until you call
  • 15. Consumers “poll” for messages by visiting the Poste Restante counter at the post office
  • 16. Kafka topics act like a Post Office
  • 17. Benefits include • Disconnected delivery – consumer doesn’t need to be available to receive messages • Less effort for the messaging service – only has to deliver to a few locations not every consumer • Can scale better and handle more complex delivery semantics!
  • 19. A single counter introduces delays
  • 20. More counters increases concurrency
  • 21. Kafka topics have >= 1 Partitions (“counters”) • Partitions increase consumer concurrency • Increase throughput • Reduce latency
  • 22. Santa North Pole What’s a Kafka message? A Record – like a letter
  • 24. Santa North PoleTopic Timestamp, offset, partition The “Postmark”
  • 25. Santa North PoleTopic Timestamp, offset, partition The “Postmark” Time semantics are flexible time of event creation, ingestion, or processing.
  • 26. Santa North PoleTopic Timestamp, offset, partition Key -> Partition (optional) Key (optional)
  • 27. Santa North PoleTopic Timestamp, offset, partition Key -> Partition (optional) Key (optional) Refines the destination Send to Santa not just any Elf
  • 28. Santa North PoleTopic Timestamp, offset, partition Key -> Partition (optional) Value is contents (byte array)
  • 29. Santa North PoleTopic Timestamp, offset, partition Key -> Partition (optional) Kafka Producers and Consumers need a serializer and de-serializer to write & read key and value
  • 30. • Kafka doesn’t look at the value • Consumer can read value • And try to make sense of the message • What will Santa be delivering?!
  • 31. Next… Delivery Semantics Do we care if the message arrives?
  • 32. Yes! Guaranteed delivery is desirable
  • 33. But homing pigeons got lost or eaten
  • 35. One pigeon may make it Eaten Lost
  • 40. Producer Broker 1 Broker 2 Broker 3 M1 M1 M1 M1M1 The message is also replicated on multiple “brokers”
  • 41. Producer Broker 1 Broker 2 Broker 3 M1 M1 M1M1 Which makes it resilient to loss of most servers
  • 42. Producer Broker 1 Broker 2 Broker 3 Acknowledgement M1 M1 M1 Producer gets acknowledgement once the message is persisted and replicated (configurable)
  • 43. Consumer Broker 1 Broker 2 Broker 3 M1 M1 M1 Multiple Brokers and Partitions  increased read availability and concurrency
  • 44. Producer Consumer Consumer Consumer Consumer The 2nd aspect of delivery semantics: Who gets the messages? How many times are messages delivered?
  • 45. Producer Consumer Consumer Consumer Consumer Delivery Semantics - Kafka is “pub-sub” - Loosely coupled - Producers and consumers don’t know about each other
  • 46. Producer Topic “Parties” Consumer Consumer Consumer Consumer Which consumers get which messages (filtering), is topic based Topic “Work”
  • 48. Producer Topic “Parties” Consumer Consumer Consumer Consumers Subscribed to Topic “Parties” Consumer Publishers send messages to topics Send Topic “Work”Send
  • 49. Producer Topic “Parties” Consumer Consumer Consumer Consumers Subscribed to Topic “Parties” Consumer Consumers only receive messages from subscribed topics Consumers Poll To receive messages from ”Parties” Topic “Work” Consumers not subscribed to “Work” Don’t receive any “Work” messages
  • 50. Partitions and Consumer Groups Enable sharing of work across consumers
  • 51. Duplicate message delivery Each message is delivered to each subscribed consumer group
  • 52. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Consumer Group Consumer Group Consumer Consumer Consumer Consumer Consumers subscribed to topic are allocated partitions They will only get messages from their allocated partitions.
  • 53. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Consumer Group Consumer Consumer Consumer Consumers share work within groups Consumers in the same group share the work around Each consumer gets only a subset of messages
  • 54. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Consumer Group Consumer Group Consumer Consumer Consumer Consumer Messages are duplicated across Consumer groups Multiple groups enable message broadcasting Messages are duplicated across groups, each consumer group receives a copy of each message.
  • 55. Key Partition based delivery Which messages are delivered to which consumers? If a message has a key, then Kafka uses Partition based delivery. Messages with the same key are always sent to the same partition and therefore the same consumer. And the order is guaranteed.
  • 56. No Key Round robin delivery If the key is null, then Kafka uses round robin delivery Each message is delivered to the next partition
  • 57. Time for an Example, with 2 consumer groups.
  • 58. Consumer Group = Nerds Multiple consumers
  • 59. Consumer Group = Nerds Multiple consumers Consumer Group = Hairy Single consumer
  • 60. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Group “Nerds” Group “Hairy” Consumer 1 (Bill) Consumer 2 (Paul) Consumer n Consumer 1 (Chewy) Consumers Subscribed to “Parties” No Key Round Robin M1 M2 etc M1 M1 M1 M2 M2 M2 Case 1: No Key Message (M1, M2, etc) sent to the next partition All consumers allocated to that partition will receive a message when they poll next.
  • 61. Here’s what happens (not showing producer or topics, have to imagine them)
  • 62. 1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition). Subscribe to “Parties” Subscribe to “Parties”
  • 63. 1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition). 2. Producer sends record “Cool pool party – Invitation” <key=null, value=“Cool pool party - Invitation”> to “parties” topic (no key)
  • 64. 1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition). 2. Producer sends record “Cool pool party - Invitation”> to “parties” topic 3. Bill and Chewbacca receive a copy of the invitation and plan to attend
  • 65. 4. Producer sends another record “Cool pool party – Cancelled” <key=null, value=“Cool pool party - Cancelled”> to “parties” topic
  • 66. 4. Producer sends another record <key=null, value=“Cool pool party - Cancelled”> to “parties” topic 5. Paul and Chewbacca receive the cancellation. Paul gets the message this time as it’s round robin, ignores it as he didn’t get the invitation. Bill wastes his time trying to go to cancelled party. The rest of the gang aren’t surprised at not receiving any party invites and stay at home to do some hacking. Chewy is only consumer in his group so gets all messages, plans something fun instead…
  • 67.
  • 68. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Group “Nerds” Group “Hairy” Consumer 1 (Bill) Consumer 2 (Paul) Consumer n Consumer 1 (Chewy) Consumers Subscribed to “Parties” Key Hashed to partition M1, M2 etc M1, M2 M1, M2 M1, M2 M3 M3 M3 M3 Case 2: If there is a Key A key is hashed to a partition, and a Message with that key is always sent to that partition. Assume there are 3 messages, and messages 1 and 2 are hashed to same partition.
  • 69. Here’s what happens with a key: key is “title” of the message (e.g. “Cool pool party”) Same set up as before: 1. Both Groups subscribe to Topic “parties” (11 partitions).
  • 70. 1. Both Groups subscribe to Topic “parties” (11 partitions). 2. Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
  • 71. 1. Both Groups subscribe to Topic “parties” (11 partitions). 2. Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic 3. As before Bill and Chewbacca receive a copy of the invitation and plan to attend
  • 72. 4. Producer sends another record <key=“Cool pool party”, value=“Cancelled”> to “parties” topic
  • 73. 4. Producer sends another record <key=“Cool pool party”, value=“Cancelled”> to “parties” topic 5. Bill and Chewbacca receive the cancellation (same consumers this time, as identical key)
  • 74. 6. Producer sends another record <key=“Horrible Halloween party”, value=“Invitation”> to ”parties” topic
  • 75. 6. Producer sends another record <key=“Horrible Halloween party”, value=“Invitation”> to ”parties” topic 7. Paul and Chewy receive the invitation Paul receives the Halloween invitation as the key is different and the record is sent to the partition that Paul is allocated to Chewy is the only consumer in his group so he gets every record no matter what partition it’s sent to
  • 76.
  • 77.
  • 79. Real-time data pipeline Read-time data pipeline features: • Ingestion of multiple heterogeneous sources • Sending data to multiple heterogeneous sinks • Acts as a buffer to smooth out load spikes • Enables use cases which reprocess data (e.g. disaster recovery)
  • 80. Anomaly Detection Pipeline Real-time Event processing pipeline: • Simple event driven applications (If X then Y…) • May write and read from other data sources (e.g. Cassandra) • New Events sent back to Kafka or to other systems • E.g. Anomaly Detection, check out my current blog series if you are interested in this example.
  • 81. Kafka Streams Processing (Kongo IoT Blog series) Streams processing features: • Complex streams processing (multiple events and streams) • Time, windows, and transformations • Uses Kafka Streams API, includes state store • Visualization of the streams topology • Continuously computes the loads for trucks and checks if they are overloaded.
  • 82. Linkedin - Before Kafka (BK) A real example from Linkedin, who developed Kafka. Before Kafka they had spaghetti integration of monolithic applications. To accommodate growing membership and increasing site complexity, they migrated from a monolithic application infrastructure to one based on microservices, which made the integration even more complex!
  • 83. After Kafka (AK) Rather than maintaining and scaling each pipeline individually, they invested in the development of a single, distributed pub-sub platform - Kafka was born. The main benefit was better Service decoupling and independent scaling.
  • 84. The End (of the introduction) - Find out more Apache Kafka: https://kafka.apache.org/ Instaclustr blogs • Mix of Cassandra, Spark, Zeppelin and Kafka https://www.instaclustr.com/paul-brebner/ • Kafka introduction https://insidebigdata.com/2018/04/12/developing-deeper-understanding-apache-kafka-architecture/ https://insidebigdata.com/2018/04/19/developing-deeper-understanding-apache-kafka-architecture-part-2- • Kongo – Kafka IoT logistics application blog series https://www.instaclustr.com/instaclustr-kongo-iot-logistics-streaming-demo-application/ • Anomaly detection with Kafka and Cassandra (and Kubernetes), current blog series https://www.instaclustr.com/anomalia-machina-1-massively-scalable-anomaly-detection-with-apache-kafka- Instaclustr’s Managed Kafka (Free trial) https://www.instaclustr.com/solutions/managed-apache-kafka/

Hinweis der Redaktion

  1. What is Kafka? Kafka is a distributed streams processing system, it allows distributed producers to send messages to distributed consumers via a Kafka cluster.
  2. Kafka benefits Fast – high throughput and low latency Scalable – horizontally scalable, just add nodes and partitions Reliable – distributed and fault tolerant Zero data loss – messages are persisted to disk with immutable log Open Source – An Apache project Available as a Managed Service - on multiple cloud platforms
  3. Kafka benefits Fast – high throughput and low latency Scalable – horizontally scalable, just add nodes and partitions Reliable – distributed and fault tolerant Zero data loss – messages are persisted to disk with immutable log Open Source – An Apache project Available as a Managed Service - on multiple cloud platforms
  4. Back to the intro Kafka overview diagram, it’s a bit monochrome and boring. This talk will be more colourful and it’s going to be an extended story…
  5. Back to the intro Kafka overview diagram, it’s a bit monochrome and boring. This talk will be more colourful and it’s going to be an extended story…
  6. Let’s build a modern day fully electronic postal service – the Kafka Postal Service
  7. What does a postal service do? It sends messages from A to B (animated with sounds, click!)
  8. A is a producer, sends a message…
  9. To B, the consumer.
  10. Actually, no. Due to the decline in “snail mail” volumes, direct deliveries have been cancelled.
  11. Actually, no. Due to the decline in “snail mail” volumes, direct deliveries have been cancelled.
  12. Instead we have “Poste Restante” - Not a post office in a restaurant It’s called general delivery (in the US). The mail is delivered to a post office, and they hold it for you until you call for it.
  13. Consumers poll for messages by visiting the counter at the post office.
  14. Consumers poll for messages by visiting the counter at the post office.
  15. Kafka topics act like a Post Office. What are the benefits? Disconnected delivery – consumer doesn’t need to be available to receive messages Less effort for the messaging service – only has to delivery to a few locations not larger number of consumer addresses Can scale better and handle more complex delivery semantics!
  16. Kafka topics act like a Post Office. What are the benefits? Disconnected delivery – consumer doesn’t need to be available to receive messages Less effort for the messaging service – only has to delivery to a few locations not larger number of consumer addresses Can scale better and handle more complex delivery semantics!
  17. First lets see how it scales. What if there are many consumers for a topic?
  18. A single counter introduces delays and limits concurrency
  19. More counters increases concurrency and can reduce delays
  20. Kafka Topics have 1 or more Partitions, partitions function like multiple counters and enable high concurrency.
  21. Before looking at delivery semantics what does a message actually look like? In Kafka a message is called a Record and is like a letter.
  22. The topic is the destination.
  23. The “Postmark” includes a timestamp, offset in the topic, and the partition it was sent to. Time semantics are flexible, either the time of event creation, ingestion, or processing.
  24. The “Postmark” includes a timestamp, offset in the topic, and the partition it was sent to. Time semantics are flexible, either the time of event creation, ingestion, or processing.
  25. There’s also a thing called a Key, which is is optional. It refines the destination so it’s a bit like the rest of the address. We want this letter sent to Santa not just any Elf.
  26. There’s also a thing called a Key, which is is optional. It refines the destination so it’s a bit like the rest of the address. We want this letter sent to Santa not just any Elf.
  27. The value is the contents (just a byte array). Kafka Producers and consumers need to have a shared serializer and de-serializer for both the key and value.
  28. The value is the contents (just a byte array). Kafka Producers and consumers need to have a shared serializer and de-serializer for both the key and value.
  29. Kafka doesn’t look inside the value, but the Producer and Consumer can, and the Consumer can try and make sense of the message… I wonder what Santa will be delivering?
  30. Next lets look at delivery semantics. For example, do we care if the message actually arrives or not?
  31. Yes we do! Guaranteed message delivery is desirable. Homing pigeons got lost or eaten, so need to send the message with multiple pigeons
  32. Yes we do! Guaranteed message delivery is desirable. Homing pigeons got lost or eaten, so need to send the message with multiple pigeons
  33. Yes we do! Guaranteed message delivery is desirable. Homing pigeons got lost or eaten, so need to send the message with multiple pigeons
  34. Yes we do! Guaranteed message delivery is desirable. Homing pigeons got lost or eaten, so need to send the message with multiple pigeons
  35. How does Kafka guarantee delivery? A Message (M1) is written to a broker (2)
  36. How does Kafka guarantee delivery? A Message (M1) is written to a broker (2)
  37. and the message is always persisted to disk.
  38. This makes it resilient to loss of power.
  39. The message is also replicated on multiple “brokers” , 3 is typical
  40. And makes it resilient to loss of most servers
  41. Finally the producer gets acknowledgement once the message is persisted and replicated (configurable for number, and sync or async) The message is now available from more than one broker in case some fail. This also increases the read concurrency as partitions are spread over multiple brokers.
  42. Finally the producer gets acknowledgement once the message is persisted and replicated (configurable for number, and sync or async) The message is now available from more than one broker in case some fail. This also increases the read concurrency as partitions are spread over multiple brokers.
  43. Now let’s look at the 2nd aspect of delivery semantics. Who gets the messages and how many times are messages delivered? Kafka is “pub-sub”, It’s loosely coupled, producers and consumers don’t know about each other
  44. Now let’s look at the 2nd aspect of delivery semantics. Who gets the messages and how many times are messages delivered? Kafka is “pub-sub”, It’s loosely coupled, producers and consumers don’t know about each other
  45. Filtering, or which consumers get which messages, is topic based Publishers send messages to topics Consumers subscribe to topics of interest, e.g. parties. When they poll they only receive messages sent to those topics.
  46. Filtering, or which consumers get which messages, is topic based Publishers send messages to topics Consumers subscribe to topics of interest, e.g. parties. When they poll they only receive messages sent to those topics.
  47. Filtering, or which consumers get which messages, is topic based Publishers send messages to topics Consumers subscribe to topics of interest, e.g. parties. When they poll they only receive messages sent to those topics.
  48. Filtering, or which consumers get which messages, is topic based Publishers send messages to topics Consumers subscribe to topics of interest, e.g. parties. When they poll they only receive messages sent to those topics.
  49. Just a few more details and we can see how this works. Partitions and consumer groups enable sharing of work across multiple consumers, the more partitions a topic has the more consumers it supports
  50. Kafka supports delivery of the same message to multiple consumers. Kafka doesn’t throw messages away immediately they are delivered, so the same message can easily be delivered to multiple consumer groups.
  51. Consumers subscribed to ”parties” topic are allocated partitions. When they poll they will only get messages from their allocated partitions.
  52. This enables consumers in the same group to share the work around. Each consumer gets only a subset of the available messages.
  53. Multiple groups enable message broadcasting. Messages are duplicated across groups, as each consumer group receives a copy of each message.
  54. Which messages are delivered to which consumers? The final aspect of delivery semantics is to do with message keys. If a message has a key, then Kafka uses Partition based delivery. Messages with the same key are always sent to the same partition and therefore the same consumer. And the order is guaranteed.
  55. If the key is null, then Kafka uses round robin delivery. Each message is delivered to the next partition.
  56. Let’s look at an example with 2 consumer groups. Nerds, which has multiple consumers, and Hairy which has a single consumer.
  57. Let’s look at an example with 2 consumer groups. Nerds, which has multiple consumers, and Hairy which has a single consumer.
  58. Let’s look at an example with 2 consumer groups. Nerds, which has multiple consumers, and Hairy which has a single consumer.
  59. Looking at the case where there’s No Key 1st, each message (1, 2, etc) is sent to the next partition, and all consumers allocated to that partition will receive the message when they poll next.
  60. Here’s what actually happens. We’re not showing the producer or topic for simplicity. You’ll have to imagine them.
  61. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition).
  62. 2 Producer sends record with the value “Cool pool party - Invitation” to “parties” topic (there’s no key)
  63. 3 Bill and Chewbacca receive a copy of the invitation and plan to attend
  64. 4. Producer sends another record with the value “Cool pool party – Cancelled” to “parties” topic
  65. In the Nerds group, Paul gets the message this time as it’s round robin, and Chewy gets it as he’s the only consumer in his group. Paul ignores it as he didn’t get the original invite Bill wastes his time trying to go The rest of the gang aren’t surprised at not receiving any invites and stay home to do some hacking Chewy plans something else fun instead...
  66. A visit to the hairdressers!
  67. How does it work if there is a Key? The key is hashed to a partition, and the Message is sent to that partition. Assume there are 3 messages, and messages 1 and 2 are hashed to same partition.
  68. Here’s what happens with a key, assuming that the key is the “title” of the message (“Cool pool party”) As before Both Groups subscribe to Topic “parties” (11 partitions). Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
  69. Here’s what happens with a key, assuming that the key is the “title” of the message (“Cool pool party”) As before Both Groups subscribe to Topic “parties” (11 partitions). Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
  70. As before, Bill and Chewbacca receive a copy of the invitation and plan to attend
  71. 4. Producer sends another record with the same key but a cancellation value to “parties” topic
  72. This time, Bill and Chewbacca receive the cancellation (the same consumers as the key is identical)
  73. Producer sends out another invitation to a Halloween party. The key is different this time.
  74. Paul receives the Halloween invitation as the key is different and the record is sent to the partition that Paul is allocated Chewy is the only consumer in his group so he gets every record no matter what partition it’s sent to
  75. This time Chewy gets dressed up and goes to the party.
  76. Who did you imagine was producing the invitations? Maybe this fellow.
  77. Here’s a UML-like diagram with the main Kafka components. We’ve introduced Producers, Topics, Partitions, Consumer Groups and Consumes today. There’s a lot more to explore, including how Kafka provides replication, and the Connect and Streaming APIs.
  78. To finish up here are three important use cases for Kafka
  79. The Real-time Data pipeline features Ingestion of multiple heterogeneous sources Sending data to multiple heterogeneous sinks Acts as a buffer to smooth out load spikes Enables use cases which reprocess data (e.g. disaster recovery)
  80. Real-time Event processing features: Simple event driven applications (If X then Y…) May write and read from other data sources (e.g. Cassandra) New Events sent back to Kafka or to other systems E.g. Anomaly Detection, check out my current blog series if you are interested in this example.
  81. Streams processing features: Complex streams processing (multiple events and streams) Time, windows, and transformations Streams API, includes state store This a visualization of the streams topology from the streams processing pipeline from my previous blog series, Kongo, a Kafka IoT logistics application. It continuously computes the loads for trucks and checks if they are overloaded.
  82. Here’s a real example from Linkedin, who developed Kafka. Before Kafka they had spaghetti integration of monolithic applications. To accommodate growing membership and increasing site complexity, they migrated from a monolithic application infrastructure to one based on microservices, which made the integration even more complex.
  83. After Kafka Rather than maintaining and scaling each pipeline individually, they invested in the development of a single, distributed pub-sub platform. Thus, Kafka was born. This main benefit was better Service decoupling and independent scaling.