6. 8
If Kafka is a critical piece of our pipeline
Can we be 100% sure that our data will get there?
Can we lose messages?
How do we verify?
Who’s fault is it?
7. 10
Distributed Systems
Things Fail
Systems are designed to
tolerate failure
We must expect failures
and design our code and
configure our systems to
handle them
8. 11
Network
Broker MachineClient Machine
Data Flow
Kafka Client
Broker
O/S Socket Buffer
NIC
NIC
Page Cache
Disk
Application
Thread
O/S Socket Buffercallbac
k
✗
✗
✗
✗
✗
✗
✗✗ data
ack /
exception
Replication
9. 12
Client Machine
Kafka Client
O/S Socket Buffer
NIC
Application Thread
✗
✗ ✗Broker Machine
Broker
NIC
Page Cache
Disk
O/S Socket Buffer
✗
✗
✗
✗
Network
Data Flow
✗
data
offsets
ZK
Kafka✗
10. 13
Kafka is super reliable.
Stores data, on disk. Replicated.
… if you know how to configure it
that way.
11. 14
Replication is your friend
Kafka protects against failures by replicating data
The unit of replication is the partition
One replica is designated as the Leader
Follower replicas fetch data from the leader
The leader holds the list of “in-sync” replicas
13. 16
ISR
2 things make a replica in-sync
• Lag behind leader
• replica.lag.time.max.ms – replica that didn’t fetch or is behind
• replica.lag.max.messages – will go away has gone away in 0.9
• Connection to Zookeeper
14. 17
Terminology
Acked
• Producers will not retry sending.
• Depends on producer setting
Committed
• Only when message got to all ISR
(future leaders have it).
• Consumers can read.
• replica.lag.time.max.ms controls: how long can a
dead replica prevent consumers from reading?
Committed Offsets
• Consumer told Kafka the latest offsets it read. By
default the consumer will not see these events
again.
15. 18
Replication
Acks = all
• Waits for all in-sync replicas to reply.
Replica 3
100
Replica 2
100
Replica 1
100
Time
16. 19
Replica 3 stopped replicating for some reason
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
Time
Acked in acks =
all
“committed”
Acked in acks =
1
but not
“committed”
20. 23
Replication
If Replica 2 or 3 come back online before the leader, you can will lose data.
Replica 3
100
Replica 2
100
101
Replica 1
100
101
102
103
104Time
All those are
“acked” and
“committed”
21. 24
So what to do
Disable Unclean Leader Election
• unclean.leader.election.enable = false
• Default from 0.11.0
Set replication factor
• default.replication.factor = 3
Set minimum ISRs
• min.insync.replicas = 2
22. 25
Warning
min.insync.replicas is applied at the topic-level.
Must alter the topic configuration manually if created before the server level change
Must manually alter the topic < 0.9.0 (KAFKA-2114)
27. 30
Producer Internals
Producer sends batches of messages to a buffer
M3
Application
Thread
Application
Thread
Application
Thread
send()
M2 M1 M0
Batch 3
Batch 2
Batch 1
Fail
?
response
retry
Update
Future
callback
drain
Metadata or
Exception
28. 31
Basics
Durability can be configured with the producer configuration request.required.acks
• 0 The message is written to the network (buffer)
• 1 The message is written to the leader
• all The producer gets an ack after all ISRs receive the data; the message is committed
Make sure producer doesn’t just throws messages away!
• block.on.buffer.full = true < 0.9.0
• max.block.ms = Long.MAX_VALUE
• Or handle the BufferExhaustedException / TimeoutException yourself
29. 32
“New” Producer
All calls are non-blocking async
2 Options for checking for failures:
• Immediately block for response: send().get()
• Do followup work in Callback, close producer after error threshold
• Be careful about buffering these failures. Future work? KAFKA-1955
• Don’t forget to close the producer! producer.close() will block until in-flight txns complete
retries (producer config) defaults to 0
In flight requests could lead to message re-ordering
31. 34
Consumer
Three choices for Consumer API
• Simple Consumer
• High Level Consumer (ZookeeperConsumer)
• New KafkaConsumer
32. 35
New Consumer – auto commit
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "10000");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
processAndUpdateDB(record);
}
}
What if we crash
after 8 seconds?
Commit automatically
every 10 seconds
33. 36
New Consumer – manual commit
props.put("enable.auto.commit", "false");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
processAndUpdateDB(record);
consumer.commitSync();
}
Commit entire
batch outside the
loop!
34. 43
Minimize Duplicates for At Least Once Consuming
1. Commit your own offsets - Set autocommit.enable = false
2. Use Rebalance Listener to limit duplicates
3. Make sure you commit only what you are done processing
4. Note: New consumer is single threaded – one consumer per thread.
35. 44
Exactly Once Semantics
At most once is easy
At least once is not bad either – commit after 100% sure data is safe
Exactly once is tricky
• Commit data and offsets in one transaction
• Idempotent producer
Kafka Connect – many connectors (especially Confluent’s) are exactly once
by using an external database to write events and store offsets in one transaction
Kafka Streams – starting at 0.11.0 have easy to configure exactly once
(exactly.once=true).
Other stream processing systems – have their own thing.
36. 47
How we test Kafka?
"""Replication tests.
These tests verify that replication provides simple durability guarantees by checking that data
acked by
brokers is still available for consumption in the face of various failure scenarios.
Setup: 1 zk, 3 kafka nodes, 1 topic with partitions=3, replication-factor=3, and
min.insync.replicas=2
- Produce messages in the background
- Consume messages in the background
- Drive broker failures (shutdown, or bounce repeatedly with kill -15 or kill -9)
- When done driving failures, stop producing, and finish consuming
- Validate that every acked message was consumed
"""
37. 48
Monitoring for Data Loss
• Monitor for producer errors – watch the retry numbers
• Monitor consumer lag – MaxLag or via offsets
• Standard schema:
• Each message should contain timestamp and originating service and host
• Each producer can report message counts and offsets to a special topic
• “Monitoring consumer” reports message counts to another special topic
• “Important consumers” also report message counts
• Reconcile the results
38. 49
Be Safe, Not Sorry
Acks = all
Max.block.ms = Long.MAX_VALUE
Retries = MAX_INT
( Max.inflight.requests.per.connection = 1 )
Producer.close()
Replication-factor >= 3
Min.insync.replicas = 2
Unclean.leader.election = false
Auto.offset.commit = false
Commit after processing
Monitor!
Apache Kafka is no longer just pub-sub messaging. Because of its persistence and reliability, it makes a great place to manage general streams of events and to drive streaming applications.
We are going to start by discussing reliability guarantees as implemented by the broker’s replication protocol. We then discuss how to configure the clients for better reliability. We’ll use Java clients as an example. For non-Java clients: The C client (librdkafka) works pretty much the same way – same configurations and guarantees will work. Same for clients in other languages based on Librdkafka. For other clients… its hard to make generalizations. Some are very different and the advice in this talk will not work for them.
Low Level Diagram: Not talking about producer / consumer design yet…maybe this is too low-level though
Show diagram of network send -> os socket -> NIC -> ---- NIC -> Os socket buffer -> socket -> internal message flow / socket server -> response back to client -> how writes get persisted to disk including os buffers, async write etc
Then overlay places where things can fail.
Low Level Diagram: Not talking about producer / consumer design yet…maybe this is too low-level though
Show diagram of network send -> os socket -> NIC -> ---- NIC -> Os socket buffer -> socket -> internal message flow / socket server -> response back to client -> how writes get persisted to disk including os buffers, async write etc
Then overlay places where things can fail.
Highlight boxes with different color
When Replica 3 is back, it will catch up
Kafka exposes it’s binary TCP protocol via a Java api which is what we’ll be discussing here.
So everything in the box is what’s happening inside the producer. Generally speaking, you have an application thread, or threads that take individual messages and “send” them to Kafka. What happens under the covers is that these messages are batched up where possible in order to amortize the overhead of the send, stored in a buffer and communicated over to kafka. After Kafka has completed it’s work, a response is returned back for each message. This happens asynchronously, using Java’s concurrent API.
This response is comprised of either an exception or a metadata record. If the metadata is returned, which contains the offset, partition and topic, then things are good and we continue processing. However, if an error has returned the producer will automatically retry the failed message, up to a configurable # or amount of time. When this exception occurs and we have retries enabled, these retries actually just go right back to the start of the batches being prepared to send back to Kafka.
Commit every 10 seconds, but we don’t really have any control over what’s processed, and this can lead to duplicates
If you are doing too much work commits don’t count as heartbeat ->
So lets so we have auto-commit enabled, and we are chugging along, and counting on the consumer to commit our offsets for us. This is great because we don’t have to code anything, and don’t have think about the frequency of commits and the impact that might have on our throughput. Life is good. But now we’ve lost a thread or a process. And we don’t really know where we are in the processing, Because the last auto-commit committed stuff that we hadn’t actually written to disk.
So now we’re in a situation where we think we’ve read all of our data but we will have gaps in data. Note the same risk applies if we lose a partition or broker and get a new leader. OR