SlideShare ist ein Scribd-Unternehmen logo
1 von 74
RabbitMQ
vs
Apache Kafka
Comparing two giants of the messaging space
Apache Kafka
RabbitMQ Reliable Messaging
• Message Delivery Guarantees
• Message Ordering Guarantees
• Message Durability
• High Availability
VS
Background
• Jack Vanlightly
• Cloud Architect and Data Engineer at SII Concatel, Barcelona
• Event-Driven Architectures
• Messaging Systems
• Cloud Automation
• Data Pipelines
RabbitMQ – Push Model
Producer Exchange Queue
Consumer
route
Consumer Push
- Long-lived TCP connection
- Consumer registers interest in queues
- Broker pushes messages down connection in
real-time
Producer Publish
- Send messages one at a time
pushpublish
Consumer
Producer
Topic A
(partition 2)
Consumer
Consumer Pull
- Long-lived TCP connection
- Consumer registers interest in a topic as part
of a consumer group
- Consumer makes requests for messages in
batches
Producer Publish
- Send messages in batches
Pull in batches
Publish in batches
Kafka – Pull Model
Topic A
(partition 1)
Topic A
(partition 3)
Consumer
RabbitMQ – Why Push?
The push model allows RabbitMQ to:
• Offer low latency messaging.
• Evenly distribute messages across competing consumers.
• Keep processing order closer to delivery order in the face of
competing consumers.
A push model requires Back-Pressure: Consumer Prefetch.
Pull
(Apache Kafka)
Push
(RabbitMQ)
VS Kafka – Why Pull?
Because each partition cannot be read by more than one consumer
of a consumer group, the consumer can pull batches of messages
without:
• affecting processing order
• affecting message distribution amongst consumers
Batching up of messages improves compression and throughput.
At-most-once.
This means that a message will never be delivered
more than once but messages might be lost.
At-least-once.
This means that we'll never lose a message but a
message might end up being delivered to a
consumer more than once.
Exactly-once.
The holy grail of messaging. All messages will be
delivered exactly one time.
Delivery vs Processing
Delivered twice to be processed once.
At-most-once
At-least-once
Message
Acknowledgement
Protocols
Consumer
Application
Hand-OverBrokerHand-OverProducer
Application
Chain of Responsibility
Producer ConsumerBroker
RabbitMQ
Producer Side
Acknowledgements
(Hand-Over)
Publisher Exchange
Sends 10 messages
(Seq No: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
basic.ack: 6 multiple=true
basic.ack: 10 multiple=true
Publisher Confirms
- basic.ack (all ok!)
- basic.nack (error!)
- basic.return + basic.ack
(undeliverable!)
Flags
- Multiple (I am acknowledging
multiple message deliveries)
- Mandatory (give me a basic.return if
you can’t deliver to any queues)
RabbitMQ – Producer Side Acknowledgements
Queue
Routes 10 messages
Mandatory=true
Mandatory=false
Publisher Exchange
Sends 10 messages
Mandatory = false
(Seq No: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
basic.ack: 6 multiple=true
basic.ack: 10 multiple=true
RabbitMQ – Producer Side Acknowledgements
Discards 10 messages
X
Publisher Exchange
Sends 10 messages
Mandatory = true
(Seq No: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
basic.return: 6 multiple=true + basic.ack: 6 multiple=true
basic.return: 10 multiple=true + basic.ack: 10 multiple=true
Discards 10 messages
X
Low # of Messages in Flight = Low Throughput, Low Message Duplication on Failure
Large # of Messages in Flight = High Throughput, High Message Duplication on Failure
Publisher Exchange
1000 messages in flight
when connection fails
RabbitMQ – Producer Side Duplication
Publisher Exchange
1 message in flight
when connection fails
Resend 1000 messages
25% of the messages persisted to a
queue
Queues ends up with 250 duplicates
Message was persisted to a queue but
connection died before ack could be
sent.
Resend 1 message Queues ends up with 1 duplicate
(Resent custom header)
RabbitMQ
Consumer Side
Acknowledgements
(Hand-over)
Consumer Acknowledgements
- basic.ack (all ok, remove from the queue!)
- basic.nack, redeliver=false (error, but remove
anyway)
- basic.nack, redeliver=true (error, please
redeliver)
- basic.reject (same as basic.nack but without
multiple flag support)
Acknowledgement Mode
- Auto Ack (Push me messages as fast as you
can!)
- Manual Ack (I will explicitly tell you when a
message can be removed from the queue)
RabbitMQ – Consumer Side Acknowledgements
Queue Consumer
Pushes 10 messages
Delivery tag: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
basic.ack: 1 multiple=false, basic.ack: 2 multiple=false
basic.ack: 3 multiple=false, basic.ack: 4 multiple=false
basic.ack: 5 multiple=false, basic.ack: 6 multiple=false
basic.ack: 7 multiple=false, basic.ack: 8 multiple=false
basic.ack: 9 multiple=false, basic.ack: 9 multiple=false
Redelivered Flag
Multiple Flag
Flags
- Multiple (I am
acknowledging
multiple
messages)
- Redelivered
(This message is
a redelivery)
RabbitMQ – Consumer Side Acknowledgements
Queue Consumer
Pushes 10 messages
Delivery tag: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
basic.ack: 6 multiple=true, basic.ack: 10 multiple=true
Queue
Consumer1. Pushes 1 message
2. basic.nack: 1 multiple=false redeliver=true
Consumer3. Delivers message again
with redelivered=true flag
4. basic.ack: 1 multiple=false
Low # of Messages in Flight = Low Throughput, Low Message Duplication on Failure
Large # of Messages in Flight = High Throughput, High Message Duplication on Failure
1000 messages in flight
when connection fails
RabbitMQ – Consumer Side Duplication
1 message in flight
when connection fails
Redeliver 1000 messages
25% of the messages processed, but
before ack could be sent when
connection failed
250 messages get processed twice
Message was processed but
connection died before ack could be
sent.
Redeliver 1 message 1 message gets processed twice
Queue
Queue
Consumer
Consumer
RabbitMQ
Broker Durability
Durable Queues
Persistent Messages
Mirrored Queues
RabbitMQ – The Broker
Surviving
Broker
Restart
- Durable Queue
- Persistent
Message
Surviving
Broker Loss
- Queue Mirroring
(Clustering)
Broker Restart
Queue Message
Non-Durable Queue
Non-Persistent Message
Queue Message
Durable Queue
Non-Persistent Message
Queue Message
Mirrored Queue
Persistent Message
Total Broker
Loss
Queue Message
Queue Message
Queue Message
Queue Message
Durable Queue
Persistent Message
Queue Message
Queue Mirror
Queue Mirror
Publisher ConsumerQueue Master
RabbitMQ – The Broker – Queue Mirrors
Broker 2
Queue A
Master
Broker 3Broker 1
Queue A
Mirror
Queue A
Mirror
Queue B
Mirror
Queue B
Master
Queue B
Mirror
Queue C
Master
Queue C
Mirror
Queue D
(unmirrored)
Queue A
ha-mode = all
Queue B
ha-mode = exactly
ha-params = 3
Queue C
ha-mode = exactly
ha-params = 2
RabbitMQ – The Broker – Queue Mirrors
Broker 2
Queue A
Master
Broker 3Broker 1
Queue A
Mirror
Queue A
Mirror
Queue B
Mirror
Queue B
Master
Queue B
Mirror
Queue C
Master
Queue C
Master (Promoted)
Queue D
(unmirrored)
Queue A
ha-mode = all
Queue B
ha-mode = exactly
ha-params = 3
Queue C
ha-mode = exactly
ha-params = 2
Queue C
Mirror
RabbitMQ – The Broker – Queue Mirrors
Broker 2
Queue A
Master
Broker 3Broker 1
Queue A
Mirror
Queue A
Mirror
Queue B
Master (Promoted)
Queue B
Master
Queue B
Mirror
Queue C
Master
Queue C
Master
Queue D
(unmirrored)
Queue A
ha-mode = all
Queue B
ha-mode = exactly
ha-params = 3
Queue C
ha-mode = exactly
ha-params = 2
Queue C
Mirror
RabbitMQ – The Broker – Queue Mirrors
Broker 2
Queue A
Master
Broker 3
Queue A
Mirror
Queue B
Master
Queue B
Mirror
Queue C
Master
Queue C
Master
Queue A
ha-mode = all
Queue B
ha-mode = exactly
ha-params = 3
Queue C
ha-mode = exactly
ha-params = 2
Broker 1
Queue A
Mirror
Queue B
Mirror
Queue C
Mirror
RabbitMQ – The Broker – Queue Mirrors
Broker 2
Queue A
Master
Queue B
Master
Queue C
Master
Queue A
ha-mode = all
Queue B
ha-mode = exactly
ha-params = 3
Queue C
ha-mode = exactly
ha-params = 2
Broker 1
Queue A
Mirror
Queue B
Mirror
Queue C
Mirror
Broker 3
Queue A
Mirror
Queue B
Mirror
RabbitMQ
Queue Mirror
Synchronization
And
Queue Failover
RabbitMQ – Queue Mirrors - Synchronization
Broker 2
Queue A
Master
Broker 3Broker 1
Queue A
Mirror
Queue A
Mirror
Queue B
Mirror
Queue B
Master
Queue B
Mirror
Queue A
ha-mode = all
ha-sync-mode =
automatic
Queue B
ha-mode = exactly
ha-params = 3
ha-sync-mode =
manual
1010 10
10 10 10
Three nodes, two mirrored queues each with 10 messages
RabbitMQ – Queue Mirrors - Synchronization
Queue A
ha-mode = all
ha-sync-mode =
automatic
Queue B
ha-mode = exactly
ha-params = 3
ha-sync-mode =
manual
Broker 2
Queue A
Master
Broker 3Broker 1
Queue A
Mirror
Queue A
Mirror
Queue B
Mirror
Queue B
Master
Queue B
Mirror
1010 10
10 10 10
Broker 3 is lost
RabbitMQ – Queue Mirrors - Synchronization
Queue A
ha-mode = all
ha-sync-mode =
automatic
Queue B
ha-mode = exactly
ha-params = 3
ha-sync-mode =
manual
Broker 3 comes back.
Mirror A is automatically synchronized. Mirror B remains at 0 messages.
Broker 2
Queue A
Master
Broker 3Broker 1
Queue A
Mirror
Queue A
Mirror
Queue B
Mirror
Queue B
Master
Queue B
Mirror
1010 10
10 10 0
RabbitMQ – Queue Mirrors - Synchronization
Queue A
ha-mode = all
ha-sync-mode =
automatic
Queue B
ha-mode = exactly
ha-params = 3
ha-sync-mode =
manual
Each queue receives 10 more messages.
Broker 2 is lost. Queue A fails over to mirror 3 without data loss.
Broker 2
Queue A
Master
Broker 3Broker 1
Queue A
Master
Queue A
Mirror
Queue B
Mirror
Queue B
Master
Queue B
Mirror
2020 20
20 20 10
RabbitMQ – Queue Mirrors - Synchronization
Queue A
ha-mode = all
ha-sync-mode =
automatic
Queue B
ha-mode = exactly
ha-params = 3
ha-sync-mode =
manual
ha-promote-on-
failure = always
Each queue receives 10 more messages.
Broker 1 is lost. Queue B fails over to mirror 3 and loses 10 messages.
Broker 2
Queue A
Master
Broker 3Broker 1
Queue A
Mirror
Queue A
Master
Queue B
Mirror
Queue B
Master
Queue B
Master
3030 30
30 30 20
RabbitMQ – Queue Mirrors - Synchronization
Queue A
ha-mode = all
ha-sync-mode =
automatic
Queue B
ha-mode = exactly
ha-params = 3
ha-sync-mode =
manual
ha-promote-on-
failure = when-
synced
Alternate scenario: ha-promote-on-failure = when-synced
Queue B does not fail over as mirror 3 is unsynchronized.
Broker 2
Queue A
Master
Broker 3Broker 1
Queue A
Mirror
Queue A
Master
Queue B
Mirror
Queue B
Master
Queue B
Mirror
3030 30
30 30 20
RabbitMQ
Queue Mirror
Synchronization
And
New Mirrors
RabbitMQ – Queue Mirrors - Synchronization
Broker 2
Queue A
Master
Broker 3Broker 1
Queue A
Mirror
Queue A
Mirror
Queue B
Mirror
Queue B
Master
Queue B
Mirror
Queue A
ha-mode = all
ha-sync-mode =
automatic
Queue B
ha-mode = exactly
ha-params = 3
ha-sync-mode =
manual
100m100m 100m
100m 100m 100m
Three nodes, two mirrored queues each with 100 million messages
RabbitMQ – Queue Mirrors - Synchronization
Queue A
ha-mode = all
ha-sync-mode =
automatic
Queue B
ha-mode = exactly
ha-params = 3
ha-sync-mode =
manual
Broker 2
Queue A
Master
Broker 3Broker 1
Queue A
Mirror
Queue A
Mirror
Queue B
Mirror
Queue B
Master
Queue B
Mirror
Broker 3 is lost
100m100m 100m
100m 100m 100m
RabbitMQ – Queue Mirrors - Synchronization
Queue A
ha-mode = all
ha-sync-mode =
automatic
Queue B
ha-mode = exactly
ha-params = 3
ha-sync-mode =
manual
Broker 3 comes back.
Queue A is unavailable due to synchronization.
Queue B is available but mirror on broker 3 remains at 0 messages.
Broker 2
Queue A
Master
Broker 3Broker 1
Queue A
Mirror
Queue A
Mirror
Queue B
Mirror
Queue B
Master
Queue B
Mirror
100m100m *
100m 100m 0
RabbitMQ – Queue Mirrors - Synchronization
Queue A
ha-mode = all
ha-sync-mode =
automatic
Queue B
ha-mode = exactly
ha-params = 3
ha-sync-mode =
manual
Queue A synchronization completes and the queue becomes available again.
Broker 2
Queue A
Master
Broker 3Broker 1
Queue A
Mirror
Queue A
Mirror
Queue B
Mirror
Queue B
Master
Queue B
Mirror
100m100m 100m
100m 100m 0
Balancing Data Safety
with High Throughput
- Producers wait periodically
for acknowledgements
- Consumers group
acknowledgements with the
Multiple flag
- Persistent messages
- Cluster
- Queue mirroring with 1
mirror*
RabbitMQ
Optimizing for High
Throughput
- Producers fire and forget
- Consumers use auto-ack
mode
- Non-Persistent messages
- Non-Mirrored Queues
- Cluster for throughput
Optimizing for Data
Safety
- Producers wait for
acknowledgements after
each message or after small
number of messages
- Consumers acknowledge
each message individually
- Persistent messages
- Cluster for durability
- Queue mirroring with 2+
mirrors.
- promote-on-failure=when-
synced
- ha-sync-mode=automatic
for active queues
RabbitMQ
Network Partitions?
Slow Network Links?
Flaky Links?
See Part 3…
Apache Kafka
Producer Side
Acknowledgements
(Hand-Over)
Apache Kafka – Replicated Partitions
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Follower
Partition 0
Follower
Each partition has a leader with 0 or more Followers.
Producers send to leaders. Consumers consume from leaders.
Followers exist for redundancy.
Producer
Apache Kafka – Producer Side Acknowledgements
acks
- No acknowledgement (fire
and forget). Acks=0
- Leader has persisted the
message. Acks=1
- Leader and all In-Sync
Replicas have persisted the
message. Acks=All
Producer Config
Other settings
- retries
- enable.idempotence
(limits throughput)
- max.in.flight.requests.per.
connection
- batch.size
- max.request.size
Broker/Topic Config
Other settings
- default.replication.factor
- min.insync.replicas
- unclean.leader.election.enable
Retries
+
Multiple Requests
In Flight
=
Message Duplication
+
Out of Order
Messages
Consumer Group
Producer
P
0
P
1
P
2
C1 C2 C3
Batch 1
Batch 2
Batch 3
Batch 1
Batch 2
Batch 3
Batch 1
Batch 2
Batch 3
Retries
+
Multiple Requests
In Flight
=
Message Duplication
+
Out of Order
Messages
Consumer Group
Producer
C1 C2 C3
Batch 1
Batch 2
Batch 3
Batch 1
Batch 2
Batch 3
Batch 1
Batch 2
Batch 3
P
0
P
1
P
2
Retries
+
Multiple Requests
In Flight
=
Message Duplication
+
Out of Order
Messages
Consumer Group
Producer
C1 C2 C3
Batch 1
Batch 2
Batch 4
Batch 4
Batch 5
Batch 1
Batch 4
Batch 5
Batch 1
Batch 3
Batch 1
Batch 2
Batch 3
Batch 2
Batch 3
Batch 5
P
0
P
1
P
2
Retries
+
Multiple Requests
In Flight
=
Message Duplication
+
Out of Order
Messages
Consumer Group
Producer
C1 C2 C3
Batch 4
Batch 5
Batch 1
Batch 4
Batch 5
Batch 1
Batch 3
Batch 1
Batch 2
Batch 3
Batch 2
Batch 3
Batch 1
Batch 2
Batch 4
Batch 5
P
0
P
1
P
2
Avoid Data Loss
and
Producer-Side
Duplicaction
With
enable.idempotence =
true
• enable.idempotence set to true
• max.in.flight.requests.per.connection
set to 5 or less
• retries set to 1 or higher
• acks set to ‘all’
Producers can retry until they succeed
while avoiding message duplication.
Apache Kafka – The Broker – Replicas
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Follower
Partition 0
Follower
Partition 1
Follower
Partition 1
Leader
Partition 1
Follower
Partition 2
Leader
Partition 2
Follower
Partition 3
Leader
Topic with:
- 4 partitions
- Replication
factor = 3
Partition 2
Follower
Partition 3
Follower
Partition 3
Follower
Apache Kafka – The Broker – Replicas
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Follower
Partition 0
Follower
Partition 1
Follower
Partition 1
Leader
Partition 1
Follower
Partition 2
Leader
Partition 2
Leader (promoted)
Partition 3
Leader
Topic with:
- 4 partitions
- Replication
factor = 3
Partition 2
Follower
Partition 3
Follower
Partition 3
Follower
Apache Kafka – The Broker – Replicas
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Follower
Partition 0
Follower
Partition 1
Leader (promoted)
Partition 1
Leader
Partition 1
Follower
Partition 2
Leader
Partition 2
Leader
Partition 3
Leader
Topic with:
- 4 partitions
- Replication
factor = 3
Partition 2
Follower
Partition 3
Follower
Partition 3
Follower
Apache Kafka – The Broker – Replicas
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Follower
Partition 0
Follower
Partition 1
Leader
Partition 1
Follower
Partition 1
Follower
Partition 2
Leader
Partition 2
Leader
Partition 3
Leader
Topic with:
- 4 partitions
- Replication
factor = 3
Partition 2
Follower
Partition 3
Follower
Partition 3
Follower
Apache Kafka – The Broker – Replicas
Broker 2
Partition 0
Leader
Broker 1
Partition 0
Follower
Partition 1
Leader
Partition 1
Follower
Partition 2
Leader
Partition 3
Leader
Topic with:
- 4 partitions
- Replication
factor = 3
Partition 2
Follower
Partition 3
Follower
Broker 3
Partition 0
Follower
Partition 1
Follower
Partition 2
Follower
Partition 3
Follower
Apache Kafka – The Broker – Replicas
Broker 2
Partition 0
Leader
Broker 1
Partition 0
Follower
Partition 1
Follower
Partition 1
Leader
Partition 2
Follower
Partition 3
Leader
Partition
Rebalancing
Option 1:
auto.leader.
rebalance.
enable=true
Option 2:
Rebalance leaders
manually with
kafka-preferred-
replica-election.sh
Partition 2
Follower
Partition 3
Follower
Broker 3
Partition 0
Follower
Partition 1
Follower
Partition 2
Leader
Partition 3
Follower
Apache Kafka – The Broker – The ISR
In-Sync Replica Set (ISR)
- The leader + the followers who are up to date with the leader
- A follower is removed from the ISR when either:
- It has not sent any fetch requests to the leader with the replica.lag.time.max.ms
time period
- Has not been up to date with the leader for at least replica.lag.time.max.ms
period
- Followers send fetch requests to the leader at an interval of
replica.fetch.wait.max.ms which should be lower than replica.lag.time.max.ms
Apache Kafka – The Broker – The ISR
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Follower
Partition 0
Follower
Partition 1
Follower
Partition 1
Leader
Partition 1
Follower
Topic with:
- 2 partitions
- Replication
factor = 3
- replica.lag.time.
max.ms = 10000
- replica.lag.time.
max.ms = 500
-1 -8
0 -9
One message a second arriving at each partition.
Partitions in broker 3 lagging, but still within 10 second limit
0:01
0:00
0:08
0:09
Apache Kafka – The Broker – The ISR
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Follower
Partition 0
Follower
Partition 1
Follower
Partition 1
Leader
Partition 1
Follower
0 -22
-1 -19
Broker 3 seems to have an issue. It’s partitions have been out-of-sync
for more than 10 seconds and are no longer in the ISR
0:00
0:01
0:22
0:19
Topic with:
- 2 partitions
- Replication
factor = 3
- replica.lag.time.
max.ms = 10000
- replica.lag.time.
max.ms = 500
Apache Kafka – Low Latency, High Availability
Optimizing for Low Latency and High Availability
Acks = 1
unclean.leader.election.enable = true
Apache Kafka – Low Latency, High Availability
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Follower
Partition 0
Follower
Topic with:
- Replication
factor = 3
- replica.lag.time.
max.ms = 10000
- replica.lag.time.
max.ms = 500
- unclean.leader.
election.enable
= true
-1 -8
Producer sends a message and the leader persists the message,
then sends an ack.
0:01 0:08
Producer
1 message, acks = 1
Ack
+0 ms due to replicas
Apache Kafka – Low Latency, High Availability
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Leader
Partition 0
Follower
Topic with:
- Replication
factor = 3
- replica.lag.time.
max.ms = 10000
- replica.lag.time.
max.ms = 500
- unclean.leader.
election.enable
= true
-8
Leader broker fails before followers fetch the message. 1 message lost in fail-over.
0:01 0:08
Producer
Connection lost
Apache Kafka – Low Latency, High Availability
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Leader
Partition 0
Follower
Topic with:
- Replication
factor = 3
- replica.lag.time.
max.ms = 10000
- replica.lag.time.
max.ms = 500
- unclean.leader.
election.enable
= true
-8
Producer establishes connection to broker 1 and sends one message.
0:08
Producer
1 message, acks = 1
Ack
+0 ms due
to replicas
Apache Kafka – Low Latency, High Availability
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Leader
Partition 0
Follower
Topic with:
- Replication
factor = 3
- replica.lag.time.
max.ms = 10000
- replica.lag.time.
max.ms = 500
- unclean.leader.
election.enable
= true
-14
Broker 3 falls behind. Removed from ISR.
0:14
Producer
1 message, acks = 1
Ack
+0 ms due
to replicas
Apache Kafka – Low Latency, High Availability
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Leader
Partition 0
Leader
Topic with:
- Replication
factor = 3
- replica.lag.time.
max.ms = 10000
- replica.lag.time.
max.ms = 500
- unclean.leader.
election.enable
= true
Broker 1 fails.
Unclean Leader Election allows Broker 3 partition that is not member of ISR to be elected leader.
Fail over loses 15 acknowledged messages. But the partition remains available.
Producer
1 message, acks = 1
Connection
error
Apache Kafka – Low Latency, High Availability
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Leader
Partition 0
Follower
Topic with:
- Replication
factor = 3
- replica.lag.time.
max.ms = 10000
- replica.lag.time.
max.ms = 500
- unclean.leader.
election.enable
= true
Producer uses alternate node in bootstrap.servers to find new partition leader.
Fail-over produces message loss.
Producer
1 message,
acks = 1
Ack
Apache Kafka – Data Safety, Higher Latency
Optimizing for Data Safety
(Increased Latency, Lower Availability)
acks = all
replication.factor = 3
min.insync.replicas = 2
a quorum (n+1)/2
Apache Kafka – Data Safety, Lower Availability
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Follower
Partition 0
Follower
Topic with:
- Replication
factor = 3
- replica.lag.time.
max.ms = 10000
- replica.lag.time.
max.ms = 500
- min.insync.repli
cas = 2
-1 -4
Broker 1 averaging 0.25 seconds lag. Broker 3 averaging on 4 seconds lag
0:00 0:04
Producer
1 message, acks = all
Apache Kafka – Data Safety, Lower Availability
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Follower
Partition 0
Follower
Topic with:
- Replication
factor = 3
- replica.lag.time.
max.ms = 10000
- replica.lag.time.
max.ms = 500
- min.insync.repli
cas = 2
0 0
Broker 3 averaging on 4 seconds lag
0:00 0:00
Producer
Ack
+4000 ms due to replicas
Apache Kafka – Data Safety, Lower Availability
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Follower
Partition 0
Follower
Topic with:
- Replication
factor = 3
- replica.lag.time.
max.ms = 10000
- replica.lag.time.
max.ms = 500
- min.insync.repli
cas = 2
0 -14
Broker 3 removed from ISR. Still two replicas in ISR.
0:00 0:14
Producer
1 message, acks = all
Ack
+250 ms due to replicas
Apache Kafka – Data Safety, Lower Availability
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Follower
Partition 0
Follower
Topic with:
- Replication
factor = 3
- replica.lag.time.
max.ms = 10000
- replica.lag.time.
max.ms = 500
- min.insync.repli
cas = 2
0 -14
Broker 2 is lost.
0:00 0:14
Producer
1 message, acks = allConnection
error
Apache Kafka – Data Safety, Lower Availability
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Leader
Partition 0
Follower
Topic with:
- Replication
factor = 3
- replica.lag.time.
max.ms = 10000
- replica.lag.time.
max.ms = 500
- min.insync.repli
cas = 2
-14
Partition1 fails over to Broker 1 without message loss.
Partition 1 will not accept more messages as ISR has only 1 node.
0:14
Producer
1 message, acks = allNotEnough
Replicas
Apache Kafka – Data Safety, Lower Availability
Broker 2
Partition 0
Leader
Broker 3Broker 1
Partition 0
Leader
Partition 0
Follower
Topic with:
- Replication
factor = 3
- replica.lag.time.
max.ms = 10000
- replica.lag.time.
max.ms = 500
- min.insync.repli
cas = 2
0
Broker 3 catches up. Producer retries and receives ack.
0:00
Producer
1 message, acks = allAck
Apache Kafka – Consumer Offset Tracking
1 2 3 4 5 6 7
Consumer 1
Read batch at offset
Commit offset
Optimizing for Throughput
- Auto-commit (long period)
Optimizing for Duplicate Read
Avoidance
- Auto-commit (short period)
- Manual Commit
Consumer Offset Commits
- Auto-commit periodically
- Manual Commit
Low # of Messages in Flight = Low Throughput, Low Message Duplication on Failure
Large # of Messages in Flight = High Throughput, High Message Duplication on Failure
1000 messages in flight
when consumer fails
Apache Kafka – Consumer Side Duplication
Fetch 1000 messages from same offset
25% of the messages processed, but
before offset is committed the
application fails
250 messages get processed a second
time by replacement application
Partition Consumer
Consumer
1 message in flight
when consumer fails
Fetch 1 message from same offset
Message is processed, but before
offset is committed the application
fails
1 message get processed a second
time by replacement consumer
Partition Consumer
Consumer
Apache Kafka
Network Partitions?
Slow Network Links?
Flaky Links?
See Part 3…
RabbitMQ Kafka
Fire-and-forget
Publisher Confirms
Availability During
Replica Synchronization
Fire-and-forget
Leader Only
All ISR
Producer
Side Idempotency
Tunable
Consistency
Vs Availability
Vs Latency
Vs Throughput
Configurable
Redundancy
Message
Acknowledgements
Replica Synchronization
Can Cause Unavailability
Consumer Side
Redelivered flag
Synchronous
Replication
Synchronous/
Asynchronous
Replication
Thank you!
Questions?
Jack Vanlightly
12 NOVEMBER 2018
London, UK
With keynotes and speakers from Goldman
Sachs, Pivotal, Wunderlist/Microsoft,
Erlang Solutions, CloudAMQP and more!
Get your EARLY BIRD TICKET + 10%
discount now!
Early Bird ends 31 August

Weitere ähnliche Inhalte

Was ist angesagt?

Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in KafkaJayesh Thakrar
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Chen-en Lu
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafkaJiangjie Qin
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewDmitry Tolpeko
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
 
The RabbitMQ Message Broker
The RabbitMQ Message BrokerThe RabbitMQ Message Broker
The RabbitMQ Message BrokerMartin Toshev
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...DataWorks Summit/Hadoop Summit
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafkaconfluent
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Patternconfluent
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 

Was ist angesagt? (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in Kafka
 
Message Broker System and RabbitMQ
Message Broker System and RabbitMQMessage Broker System and RabbitMQ
Message Broker System and RabbitMQ
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
The RabbitMQ Message Broker
The RabbitMQ Message BrokerThe RabbitMQ Message Broker
The RabbitMQ Message Broker
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Apache KAfka
Apache KAfkaApache KAfka
Apache KAfka
 
RabbitMQ
RabbitMQRabbitMQ
RabbitMQ
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 

Ähnlich wie RabbitMQ vs Apache Kafka - Comparing two giants of the messaging space

Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015 Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015 Jeff Holoman
 
Messaging for Modern Applications
Messaging for Modern ApplicationsMessaging for Modern Applications
Messaging for Modern ApplicationsTom McCuch
 
Exactly Once Delivery with Kafka - Kafka Tel-Aviv Meetup
Exactly Once Delivery with Kafka - Kafka Tel-Aviv MeetupExactly Once Delivery with Kafka - Kafka Tel-Aviv Meetup
Exactly Once Delivery with Kafka - Kafka Tel-Aviv MeetupNatan Silnitsky
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafkaconfluent
 
Datalink control(framing,protocols)
Datalink control(framing,protocols)Datalink control(framing,protocols)
Datalink control(framing,protocols)Hira Awan
 
Построение распределенной системы сбора данных с помощью RabbitMQ, Alvaro Vid...
Построение распределенной системы сбора данных с помощью RabbitMQ, Alvaro Vid...Построение распределенной системы сбора данных с помощью RabbitMQ, Alvaro Vid...
Построение распределенной системы сбора данных с помощью RabbitMQ, Alvaro Vid...Ontico
 
Apache Kafka Reliability
Apache Kafka Reliability Apache Kafka Reliability
Apache Kafka Reliability Jeff Holoman
 
Lindsay distributed geventzmq
Lindsay distributed geventzmqLindsay distributed geventzmq
Lindsay distributed geventzmqRobin Xiao
 
Exactly Once Delivery with Kafka - JOTB2020 Mini Session
Exactly Once Delivery with Kafka - JOTB2020 Mini SessionExactly Once Delivery with Kafka - JOTB2020 Mini Session
Exactly Once Delivery with Kafka - JOTB2020 Mini SessionNatan Silnitsky
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
 
Data linkcontrol
Data linkcontrolData linkcontrol
Data linkcontrolBablu Shofi
 
Exactly Once Delivery - Natan Silnitsky
Exactly Once Delivery - Natan SilnitskyExactly Once Delivery - Natan Silnitsky
Exactly Once Delivery - Natan SilnitskyWix Engineering
 
Exactly Once Delivery is a Harsh Mistress - Natan Silnitsky
Exactly Once Delivery is a Harsh Mistress  - Natan SilnitskyExactly Once Delivery is a Harsh Mistress  - Natan Silnitsky
Exactly Once Delivery is a Harsh Mistress - Natan SilnitskyDevOpsDays Tel Aviv
 
Exactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLVExactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLVNatan Silnitsky
 
Unit IV_Flow.pptx
Unit IV_Flow.pptxUnit IV_Flow.pptx
Unit IV_Flow.pptxTejasRao8
 

Ähnlich wie RabbitMQ vs Apache Kafka - Comparing two giants of the messaging space (20)

Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015 Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
 
Messaging for Modern Applications
Messaging for Modern ApplicationsMessaging for Modern Applications
Messaging for Modern Applications
 
Exactly Once Delivery with Kafka - Kafka Tel-Aviv Meetup
Exactly Once Delivery with Kafka - Kafka Tel-Aviv MeetupExactly Once Delivery with Kafka - Kafka Tel-Aviv Meetup
Exactly Once Delivery with Kafka - Kafka Tel-Aviv Meetup
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 
Rabbitmq basics
Rabbitmq basicsRabbitmq basics
Rabbitmq basics
 
Datalink control(framing,protocols)
Datalink control(framing,protocols)Datalink control(framing,protocols)
Datalink control(framing,protocols)
 
07 data linkcontrol
07 data linkcontrol07 data linkcontrol
07 data linkcontrol
 
Построение распределенной системы сбора данных с помощью RabbitMQ, Alvaro Vid...
Построение распределенной системы сбора данных с помощью RabbitMQ, Alvaro Vid...Построение распределенной системы сбора данных с помощью RabbitMQ, Alvaro Vid...
Построение распределенной системы сбора данных с помощью RabbitMQ, Alvaro Vid...
 
Apache Kafka Reliability
Apache Kafka Reliability Apache Kafka Reliability
Apache Kafka Reliability
 
Lindsay distributed geventzmq
Lindsay distributed geventzmqLindsay distributed geventzmq
Lindsay distributed geventzmq
 
Exactly Once Delivery with Kafka - JOTB2020 Mini Session
Exactly Once Delivery with Kafka - JOTB2020 Mini SessionExactly Once Delivery with Kafka - JOTB2020 Mini Session
Exactly Once Delivery with Kafka - JOTB2020 Mini Session
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
Data linkcontrol
Data linkcontrolData linkcontrol
Data linkcontrol
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Exactly Once Delivery - Natan Silnitsky
Exactly Once Delivery - Natan SilnitskyExactly Once Delivery - Natan Silnitsky
Exactly Once Delivery - Natan Silnitsky
 
RabbitMQ in Sprayer
RabbitMQ in SprayerRabbitMQ in Sprayer
RabbitMQ in Sprayer
 
Exactly Once Delivery is a Harsh Mistress - Natan Silnitsky
Exactly Once Delivery is a Harsh Mistress  - Natan SilnitskyExactly Once Delivery is a Harsh Mistress  - Natan Silnitsky
Exactly Once Delivery is a Harsh Mistress - Natan Silnitsky
 
Exactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLVExactly once delivery is a harsh mistress - DevOps Days TLV
Exactly once delivery is a harsh mistress - DevOps Days TLV
 
Unit IV_Flow.pptx
Unit IV_Flow.pptxUnit IV_Flow.pptx
Unit IV_Flow.pptx
 

Mehr von Erlang Solutions

Fintech_Trends_for_2022_report_by_Erlang_Solutions.pdf
Fintech_Trends_for_2022_report_by_Erlang_Solutions.pdfFintech_Trends_for_2022_report_by_Erlang_Solutions.pdf
Fintech_Trends_for_2022_report_by_Erlang_Solutions.pdfErlang Solutions
 
Datadog and Elixir with Erlang Solutions
Datadog and Elixir with Erlang SolutionsDatadog and Elixir with Erlang Solutions
Datadog and Elixir with Erlang SolutionsErlang Solutions
 
Strategies for successfully adopting Elixir
Strategies for successfully adopting ElixirStrategies for successfully adopting Elixir
Strategies for successfully adopting ElixirErlang Solutions
 
Designing & architecting RabbitMQ engineered systems - Ayanda Dube @ London R...
Designing & architecting RabbitMQ engineered systems - Ayanda Dube @ London R...Designing & architecting RabbitMQ engineered systems - Ayanda Dube @ London R...
Designing & architecting RabbitMQ engineered systems - Ayanda Dube @ London R...Erlang Solutions
 
Building the ideal betting stack | London Erlang User Group presentation
Building the ideal betting stack | London Erlang User Group presentationBuilding the ideal betting stack | London Erlang User Group presentation
Building the ideal betting stack | London Erlang User Group presentationErlang Solutions
 
Efficient Erlang - Performance and memory efficiency of your data by Dmytro L...
Efficient Erlang - Performance and memory efficiency of your data by Dmytro L...Efficient Erlang - Performance and memory efficiency of your data by Dmytro L...
Efficient Erlang - Performance and memory efficiency of your data by Dmytro L...Erlang Solutions
 
Empowering mobile first workers in emerging-markets using messaging
Empowering mobile first workers in emerging-markets using messagingEmpowering mobile first workers in emerging-markets using messaging
Empowering mobile first workers in emerging-markets using messagingErlang Solutions
 

Mehr von Erlang Solutions (7)

Fintech_Trends_for_2022_report_by_Erlang_Solutions.pdf
Fintech_Trends_for_2022_report_by_Erlang_Solutions.pdfFintech_Trends_for_2022_report_by_Erlang_Solutions.pdf
Fintech_Trends_for_2022_report_by_Erlang_Solutions.pdf
 
Datadog and Elixir with Erlang Solutions
Datadog and Elixir with Erlang SolutionsDatadog and Elixir with Erlang Solutions
Datadog and Elixir with Erlang Solutions
 
Strategies for successfully adopting Elixir
Strategies for successfully adopting ElixirStrategies for successfully adopting Elixir
Strategies for successfully adopting Elixir
 
Designing & architecting RabbitMQ engineered systems - Ayanda Dube @ London R...
Designing & architecting RabbitMQ engineered systems - Ayanda Dube @ London R...Designing & architecting RabbitMQ engineered systems - Ayanda Dube @ London R...
Designing & architecting RabbitMQ engineered systems - Ayanda Dube @ London R...
 
Building the ideal betting stack | London Erlang User Group presentation
Building the ideal betting stack | London Erlang User Group presentationBuilding the ideal betting stack | London Erlang User Group presentation
Building the ideal betting stack | London Erlang User Group presentation
 
Efficient Erlang - Performance and memory efficiency of your data by Dmytro L...
Efficient Erlang - Performance and memory efficiency of your data by Dmytro L...Efficient Erlang - Performance and memory efficiency of your data by Dmytro L...
Efficient Erlang - Performance and memory efficiency of your data by Dmytro L...
 
Empowering mobile first workers in emerging-markets using messaging
Empowering mobile first workers in emerging-markets using messagingEmpowering mobile first workers in emerging-markets using messaging
Empowering mobile first workers in emerging-markets using messaging
 

Kürzlich hochgeladen

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

RabbitMQ vs Apache Kafka - Comparing two giants of the messaging space

  • 1. RabbitMQ vs Apache Kafka Comparing two giants of the messaging space
  • 2. Apache Kafka RabbitMQ Reliable Messaging • Message Delivery Guarantees • Message Ordering Guarantees • Message Durability • High Availability VS
  • 3. Background • Jack Vanlightly • Cloud Architect and Data Engineer at SII Concatel, Barcelona • Event-Driven Architectures • Messaging Systems • Cloud Automation • Data Pipelines
  • 4. RabbitMQ – Push Model Producer Exchange Queue Consumer route Consumer Push - Long-lived TCP connection - Consumer registers interest in queues - Broker pushes messages down connection in real-time Producer Publish - Send messages one at a time pushpublish Consumer
  • 5. Producer Topic A (partition 2) Consumer Consumer Pull - Long-lived TCP connection - Consumer registers interest in a topic as part of a consumer group - Consumer makes requests for messages in batches Producer Publish - Send messages in batches Pull in batches Publish in batches Kafka – Pull Model Topic A (partition 1) Topic A (partition 3) Consumer
  • 6. RabbitMQ – Why Push? The push model allows RabbitMQ to: • Offer low latency messaging. • Evenly distribute messages across competing consumers. • Keep processing order closer to delivery order in the face of competing consumers. A push model requires Back-Pressure: Consumer Prefetch. Pull (Apache Kafka) Push (RabbitMQ) VS Kafka – Why Pull? Because each partition cannot be read by more than one consumer of a consumer group, the consumer can pull batches of messages without: • affecting processing order • affecting message distribution amongst consumers Batching up of messages improves compression and throughput.
  • 7. At-most-once. This means that a message will never be delivered more than once but messages might be lost. At-least-once. This means that we'll never lose a message but a message might end up being delivered to a consumer more than once. Exactly-once. The holy grail of messaging. All messages will be delivered exactly one time. Delivery vs Processing Delivered twice to be processed once. At-most-once At-least-once Message Acknowledgement Protocols
  • 10. Publisher Exchange Sends 10 messages (Seq No: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10) basic.ack: 6 multiple=true basic.ack: 10 multiple=true Publisher Confirms - basic.ack (all ok!) - basic.nack (error!) - basic.return + basic.ack (undeliverable!) Flags - Multiple (I am acknowledging multiple message deliveries) - Mandatory (give me a basic.return if you can’t deliver to any queues) RabbitMQ – Producer Side Acknowledgements Queue Routes 10 messages
  • 11. Mandatory=true Mandatory=false Publisher Exchange Sends 10 messages Mandatory = false (Seq No: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10) basic.ack: 6 multiple=true basic.ack: 10 multiple=true RabbitMQ – Producer Side Acknowledgements Discards 10 messages X Publisher Exchange Sends 10 messages Mandatory = true (Seq No: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10) basic.return: 6 multiple=true + basic.ack: 6 multiple=true basic.return: 10 multiple=true + basic.ack: 10 multiple=true Discards 10 messages X
  • 12. Low # of Messages in Flight = Low Throughput, Low Message Duplication on Failure Large # of Messages in Flight = High Throughput, High Message Duplication on Failure Publisher Exchange 1000 messages in flight when connection fails RabbitMQ – Producer Side Duplication Publisher Exchange 1 message in flight when connection fails Resend 1000 messages 25% of the messages persisted to a queue Queues ends up with 250 duplicates Message was persisted to a queue but connection died before ack could be sent. Resend 1 message Queues ends up with 1 duplicate (Resent custom header)
  • 14. Consumer Acknowledgements - basic.ack (all ok, remove from the queue!) - basic.nack, redeliver=false (error, but remove anyway) - basic.nack, redeliver=true (error, please redeliver) - basic.reject (same as basic.nack but without multiple flag support) Acknowledgement Mode - Auto Ack (Push me messages as fast as you can!) - Manual Ack (I will explicitly tell you when a message can be removed from the queue) RabbitMQ – Consumer Side Acknowledgements Queue Consumer Pushes 10 messages Delivery tag: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 basic.ack: 1 multiple=false, basic.ack: 2 multiple=false basic.ack: 3 multiple=false, basic.ack: 4 multiple=false basic.ack: 5 multiple=false, basic.ack: 6 multiple=false basic.ack: 7 multiple=false, basic.ack: 8 multiple=false basic.ack: 9 multiple=false, basic.ack: 9 multiple=false
  • 15. Redelivered Flag Multiple Flag Flags - Multiple (I am acknowledging multiple messages) - Redelivered (This message is a redelivery) RabbitMQ – Consumer Side Acknowledgements Queue Consumer Pushes 10 messages Delivery tag: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 basic.ack: 6 multiple=true, basic.ack: 10 multiple=true Queue Consumer1. Pushes 1 message 2. basic.nack: 1 multiple=false redeliver=true Consumer3. Delivers message again with redelivered=true flag 4. basic.ack: 1 multiple=false
  • 16. Low # of Messages in Flight = Low Throughput, Low Message Duplication on Failure Large # of Messages in Flight = High Throughput, High Message Duplication on Failure 1000 messages in flight when connection fails RabbitMQ – Consumer Side Duplication 1 message in flight when connection fails Redeliver 1000 messages 25% of the messages processed, but before ack could be sent when connection failed 250 messages get processed twice Message was processed but connection died before ack could be sent. Redeliver 1 message 1 message gets processed twice Queue Queue Consumer Consumer
  • 18. RabbitMQ – The Broker Surviving Broker Restart - Durable Queue - Persistent Message Surviving Broker Loss - Queue Mirroring (Clustering) Broker Restart Queue Message Non-Durable Queue Non-Persistent Message Queue Message Durable Queue Non-Persistent Message Queue Message Mirrored Queue Persistent Message Total Broker Loss Queue Message Queue Message Queue Message Queue Message Durable Queue Persistent Message Queue Message
  • 19. Queue Mirror Queue Mirror Publisher ConsumerQueue Master
  • 20. RabbitMQ – The Broker – Queue Mirrors Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror Queue C Master Queue C Mirror Queue D (unmirrored) Queue A ha-mode = all Queue B ha-mode = exactly ha-params = 3 Queue C ha-mode = exactly ha-params = 2
  • 21. RabbitMQ – The Broker – Queue Mirrors Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror Queue C Master Queue C Master (Promoted) Queue D (unmirrored) Queue A ha-mode = all Queue B ha-mode = exactly ha-params = 3 Queue C ha-mode = exactly ha-params = 2 Queue C Mirror
  • 22. RabbitMQ – The Broker – Queue Mirrors Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Master (Promoted) Queue B Master Queue B Mirror Queue C Master Queue C Master Queue D (unmirrored) Queue A ha-mode = all Queue B ha-mode = exactly ha-params = 3 Queue C ha-mode = exactly ha-params = 2 Queue C Mirror
  • 23. RabbitMQ – The Broker – Queue Mirrors Broker 2 Queue A Master Broker 3 Queue A Mirror Queue B Master Queue B Mirror Queue C Master Queue C Master Queue A ha-mode = all Queue B ha-mode = exactly ha-params = 3 Queue C ha-mode = exactly ha-params = 2 Broker 1 Queue A Mirror Queue B Mirror Queue C Mirror
  • 24. RabbitMQ – The Broker – Queue Mirrors Broker 2 Queue A Master Queue B Master Queue C Master Queue A ha-mode = all Queue B ha-mode = exactly ha-params = 3 Queue C ha-mode = exactly ha-params = 2 Broker 1 Queue A Mirror Queue B Mirror Queue C Mirror Broker 3 Queue A Mirror Queue B Mirror
  • 26. RabbitMQ – Queue Mirrors - Synchronization Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual 1010 10 10 10 10 Three nodes, two mirrored queues each with 10 messages
  • 27. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror 1010 10 10 10 10 Broker 3 is lost
  • 28. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual Broker 3 comes back. Mirror A is automatically synchronized. Mirror B remains at 0 messages. Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror 1010 10 10 10 0
  • 29. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual Each queue receives 10 more messages. Broker 2 is lost. Queue A fails over to mirror 3 without data loss. Broker 2 Queue A Master Broker 3Broker 1 Queue A Master Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror 2020 20 20 20 10
  • 30. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual ha-promote-on- failure = always Each queue receives 10 more messages. Broker 1 is lost. Queue B fails over to mirror 3 and loses 10 messages. Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Master Queue B Mirror Queue B Master Queue B Master 3030 30 30 30 20
  • 31. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual ha-promote-on- failure = when- synced Alternate scenario: ha-promote-on-failure = when-synced Queue B does not fail over as mirror 3 is unsynchronized. Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Master Queue B Mirror Queue B Master Queue B Mirror 3030 30 30 30 20
  • 33. RabbitMQ – Queue Mirrors - Synchronization Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual 100m100m 100m 100m 100m 100m Three nodes, two mirrored queues each with 100 million messages
  • 34. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror Broker 3 is lost 100m100m 100m 100m 100m 100m
  • 35. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual Broker 3 comes back. Queue A is unavailable due to synchronization. Queue B is available but mirror on broker 3 remains at 0 messages. Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror 100m100m * 100m 100m 0
  • 36. RabbitMQ – Queue Mirrors - Synchronization Queue A ha-mode = all ha-sync-mode = automatic Queue B ha-mode = exactly ha-params = 3 ha-sync-mode = manual Queue A synchronization completes and the queue becomes available again. Broker 2 Queue A Master Broker 3Broker 1 Queue A Mirror Queue A Mirror Queue B Mirror Queue B Master Queue B Mirror 100m100m 100m 100m 100m 0
  • 37. Balancing Data Safety with High Throughput - Producers wait periodically for acknowledgements - Consumers group acknowledgements with the Multiple flag - Persistent messages - Cluster - Queue mirroring with 1 mirror* RabbitMQ Optimizing for High Throughput - Producers fire and forget - Consumers use auto-ack mode - Non-Persistent messages - Non-Mirrored Queues - Cluster for throughput Optimizing for Data Safety - Producers wait for acknowledgements after each message or after small number of messages - Consumers acknowledge each message individually - Persistent messages - Cluster for durability - Queue mirroring with 2+ mirrors. - promote-on-failure=when- synced - ha-sync-mode=automatic for active queues
  • 38. RabbitMQ Network Partitions? Slow Network Links? Flaky Links? See Part 3…
  • 40. Apache Kafka – Replicated Partitions Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Each partition has a leader with 0 or more Followers. Producers send to leaders. Consumers consume from leaders. Followers exist for redundancy. Producer
  • 41. Apache Kafka – Producer Side Acknowledgements acks - No acknowledgement (fire and forget). Acks=0 - Leader has persisted the message. Acks=1 - Leader and all In-Sync Replicas have persisted the message. Acks=All Producer Config Other settings - retries - enable.idempotence (limits throughput) - max.in.flight.requests.per. connection - batch.size - max.request.size Broker/Topic Config Other settings - default.replication.factor - min.insync.replicas - unclean.leader.election.enable
  • 42. Retries + Multiple Requests In Flight = Message Duplication + Out of Order Messages Consumer Group Producer P 0 P 1 P 2 C1 C2 C3 Batch 1 Batch 2 Batch 3 Batch 1 Batch 2 Batch 3 Batch 1 Batch 2 Batch 3
  • 43. Retries + Multiple Requests In Flight = Message Duplication + Out of Order Messages Consumer Group Producer C1 C2 C3 Batch 1 Batch 2 Batch 3 Batch 1 Batch 2 Batch 3 Batch 1 Batch 2 Batch 3 P 0 P 1 P 2
  • 44. Retries + Multiple Requests In Flight = Message Duplication + Out of Order Messages Consumer Group Producer C1 C2 C3 Batch 1 Batch 2 Batch 4 Batch 4 Batch 5 Batch 1 Batch 4 Batch 5 Batch 1 Batch 3 Batch 1 Batch 2 Batch 3 Batch 2 Batch 3 Batch 5 P 0 P 1 P 2
  • 45. Retries + Multiple Requests In Flight = Message Duplication + Out of Order Messages Consumer Group Producer C1 C2 C3 Batch 4 Batch 5 Batch 1 Batch 4 Batch 5 Batch 1 Batch 3 Batch 1 Batch 2 Batch 3 Batch 2 Batch 3 Batch 1 Batch 2 Batch 4 Batch 5 P 0 P 1 P 2
  • 46. Avoid Data Loss and Producer-Side Duplicaction With enable.idempotence = true • enable.idempotence set to true • max.in.flight.requests.per.connection set to 5 or less • retries set to 1 or higher • acks set to ‘all’ Producers can retry until they succeed while avoiding message duplication.
  • 47. Apache Kafka – The Broker – Replicas Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Partition 1 Follower Partition 1 Leader Partition 1 Follower Partition 2 Leader Partition 2 Follower Partition 3 Leader Topic with: - 4 partitions - Replication factor = 3 Partition 2 Follower Partition 3 Follower Partition 3 Follower
  • 48. Apache Kafka – The Broker – Replicas Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Partition 1 Follower Partition 1 Leader Partition 1 Follower Partition 2 Leader Partition 2 Leader (promoted) Partition 3 Leader Topic with: - 4 partitions - Replication factor = 3 Partition 2 Follower Partition 3 Follower Partition 3 Follower
  • 49. Apache Kafka – The Broker – Replicas Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Partition 1 Leader (promoted) Partition 1 Leader Partition 1 Follower Partition 2 Leader Partition 2 Leader Partition 3 Leader Topic with: - 4 partitions - Replication factor = 3 Partition 2 Follower Partition 3 Follower Partition 3 Follower
  • 50. Apache Kafka – The Broker – Replicas Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Partition 1 Leader Partition 1 Follower Partition 1 Follower Partition 2 Leader Partition 2 Leader Partition 3 Leader Topic with: - 4 partitions - Replication factor = 3 Partition 2 Follower Partition 3 Follower Partition 3 Follower
  • 51. Apache Kafka – The Broker – Replicas Broker 2 Partition 0 Leader Broker 1 Partition 0 Follower Partition 1 Leader Partition 1 Follower Partition 2 Leader Partition 3 Leader Topic with: - 4 partitions - Replication factor = 3 Partition 2 Follower Partition 3 Follower Broker 3 Partition 0 Follower Partition 1 Follower Partition 2 Follower Partition 3 Follower
  • 52. Apache Kafka – The Broker – Replicas Broker 2 Partition 0 Leader Broker 1 Partition 0 Follower Partition 1 Follower Partition 1 Leader Partition 2 Follower Partition 3 Leader Partition Rebalancing Option 1: auto.leader. rebalance. enable=true Option 2: Rebalance leaders manually with kafka-preferred- replica-election.sh Partition 2 Follower Partition 3 Follower Broker 3 Partition 0 Follower Partition 1 Follower Partition 2 Leader Partition 3 Follower
  • 53. Apache Kafka – The Broker – The ISR In-Sync Replica Set (ISR) - The leader + the followers who are up to date with the leader - A follower is removed from the ISR when either: - It has not sent any fetch requests to the leader with the replica.lag.time.max.ms time period - Has not been up to date with the leader for at least replica.lag.time.max.ms period - Followers send fetch requests to the leader at an interval of replica.fetch.wait.max.ms which should be lower than replica.lag.time.max.ms
  • 54. Apache Kafka – The Broker – The ISR Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Partition 1 Follower Partition 1 Leader Partition 1 Follower Topic with: - 2 partitions - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 -1 -8 0 -9 One message a second arriving at each partition. Partitions in broker 3 lagging, but still within 10 second limit 0:01 0:00 0:08 0:09
  • 55. Apache Kafka – The Broker – The ISR Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Partition 1 Follower Partition 1 Leader Partition 1 Follower 0 -22 -1 -19 Broker 3 seems to have an issue. It’s partitions have been out-of-sync for more than 10 seconds and are no longer in the ISR 0:00 0:01 0:22 0:19 Topic with: - 2 partitions - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500
  • 56. Apache Kafka – Low Latency, High Availability Optimizing for Low Latency and High Availability Acks = 1 unclean.leader.election.enable = true
  • 57. Apache Kafka – Low Latency, High Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - unclean.leader. election.enable = true -1 -8 Producer sends a message and the leader persists the message, then sends an ack. 0:01 0:08 Producer 1 message, acks = 1 Ack +0 ms due to replicas
  • 58. Apache Kafka – Low Latency, High Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Leader Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - unclean.leader. election.enable = true -8 Leader broker fails before followers fetch the message. 1 message lost in fail-over. 0:01 0:08 Producer Connection lost
  • 59. Apache Kafka – Low Latency, High Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Leader Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - unclean.leader. election.enable = true -8 Producer establishes connection to broker 1 and sends one message. 0:08 Producer 1 message, acks = 1 Ack +0 ms due to replicas
  • 60. Apache Kafka – Low Latency, High Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Leader Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - unclean.leader. election.enable = true -14 Broker 3 falls behind. Removed from ISR. 0:14 Producer 1 message, acks = 1 Ack +0 ms due to replicas
  • 61. Apache Kafka – Low Latency, High Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Leader Partition 0 Leader Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - unclean.leader. election.enable = true Broker 1 fails. Unclean Leader Election allows Broker 3 partition that is not member of ISR to be elected leader. Fail over loses 15 acknowledged messages. But the partition remains available. Producer 1 message, acks = 1 Connection error
  • 62. Apache Kafka – Low Latency, High Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Leader Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - unclean.leader. election.enable = true Producer uses alternate node in bootstrap.servers to find new partition leader. Fail-over produces message loss. Producer 1 message, acks = 1 Ack
  • 63. Apache Kafka – Data Safety, Higher Latency Optimizing for Data Safety (Increased Latency, Lower Availability) acks = all replication.factor = 3 min.insync.replicas = 2 a quorum (n+1)/2
  • 64. Apache Kafka – Data Safety, Lower Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - min.insync.repli cas = 2 -1 -4 Broker 1 averaging 0.25 seconds lag. Broker 3 averaging on 4 seconds lag 0:00 0:04 Producer 1 message, acks = all
  • 65. Apache Kafka – Data Safety, Lower Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - min.insync.repli cas = 2 0 0 Broker 3 averaging on 4 seconds lag 0:00 0:00 Producer Ack +4000 ms due to replicas
  • 66. Apache Kafka – Data Safety, Lower Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - min.insync.repli cas = 2 0 -14 Broker 3 removed from ISR. Still two replicas in ISR. 0:00 0:14 Producer 1 message, acks = all Ack +250 ms due to replicas
  • 67. Apache Kafka – Data Safety, Lower Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Follower Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - min.insync.repli cas = 2 0 -14 Broker 2 is lost. 0:00 0:14 Producer 1 message, acks = allConnection error
  • 68. Apache Kafka – Data Safety, Lower Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Leader Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - min.insync.repli cas = 2 -14 Partition1 fails over to Broker 1 without message loss. Partition 1 will not accept more messages as ISR has only 1 node. 0:14 Producer 1 message, acks = allNotEnough Replicas
  • 69. Apache Kafka – Data Safety, Lower Availability Broker 2 Partition 0 Leader Broker 3Broker 1 Partition 0 Leader Partition 0 Follower Topic with: - Replication factor = 3 - replica.lag.time. max.ms = 10000 - replica.lag.time. max.ms = 500 - min.insync.repli cas = 2 0 Broker 3 catches up. Producer retries and receives ack. 0:00 Producer 1 message, acks = allAck
  • 70. Apache Kafka – Consumer Offset Tracking 1 2 3 4 5 6 7 Consumer 1 Read batch at offset Commit offset Optimizing for Throughput - Auto-commit (long period) Optimizing for Duplicate Read Avoidance - Auto-commit (short period) - Manual Commit Consumer Offset Commits - Auto-commit periodically - Manual Commit
  • 71. Low # of Messages in Flight = Low Throughput, Low Message Duplication on Failure Large # of Messages in Flight = High Throughput, High Message Duplication on Failure 1000 messages in flight when consumer fails Apache Kafka – Consumer Side Duplication Fetch 1000 messages from same offset 25% of the messages processed, but before offset is committed the application fails 250 messages get processed a second time by replacement application Partition Consumer Consumer 1 message in flight when consumer fails Fetch 1 message from same offset Message is processed, but before offset is committed the application fails 1 message get processed a second time by replacement consumer Partition Consumer Consumer
  • 72. Apache Kafka Network Partitions? Slow Network Links? Flaky Links? See Part 3…
  • 73. RabbitMQ Kafka Fire-and-forget Publisher Confirms Availability During Replica Synchronization Fire-and-forget Leader Only All ISR Producer Side Idempotency Tunable Consistency Vs Availability Vs Latency Vs Throughput Configurable Redundancy Message Acknowledgements Replica Synchronization Can Cause Unavailability Consumer Side Redelivered flag Synchronous Replication Synchronous/ Asynchronous Replication
  • 74. Thank you! Questions? Jack Vanlightly 12 NOVEMBER 2018 London, UK With keynotes and speakers from Goldman Sachs, Pivotal, Wunderlist/Microsoft, Erlang Solutions, CloudAMQP and more! Get your EARLY BIRD TICKET + 10% discount now! Early Bird ends 31 August