More Related Content Similar to Kafka Reliability Guarantees ATL Kafka User Group (20) Kafka Reliability Guarantees ATL Kafka User Group2. 2© Cloudera, Inc. All rights reserved.
But First…What’s NEW???
• Released 0.9.0 in late November
• 87 Contributors, 523 JIRAs, Bunch o’ new Features.
• Security!
• Kerberos/SASL Authentication
• Authorization Plugin
• SSL
• Kafka Connect
• Quotas
• New Consumer****
3. 3© Cloudera, Inc. All rights reserved.
Kafka
• High Throughput
• Low Latency
• Scalable
• Centralized
• Real-time
4. 4© Cloudera, Inc. All rights reserved.
“If data is the lifeblood of high technology, Apache
Kafka is the circulatory system”
--Todd Palino
Kafka SRE @ LinkedIn
5. 5© Cloudera, Inc. All rights reserved.
If Kafka is a critical piece of our pipeline
Can we be 100% sure that our data will get there?
Can we lose messages?
How do we verify?
Who’s fault is it?
6. 6© Cloudera, Inc. All rights reserved.
Distributed Systems
Things Fail
Systems are designed to
tolerate failure
We must expect failures
and design our code and
configure our systems to
handle them
7. 7© Cloudera, Inc. All rights reserved.
Network
Broker MachineClient Machine
Data Flow
Kafka Client
Broker
O/S Socket Buffer
NIC
NIC
Page Cache
Disk
Application Thread
O/S Socket Buffer
async
callback
✗
✗
✗
✗
✗
✗
✗✗ data
ack / exception
8. 8© Cloudera, Inc. All rights reserved.
Client Machine
Kafka Client
O/S Socket Buffer
NIC
Application Thread
✗
✗
✗Broker Machine
Broker
NIC
Page Cache
Disk
O/S Socket Buffer
miss
✗
✗
✗
✗
Network
Data Flow
✗
data
offsets
ZK
Kafka
✗
9. 9© Cloudera, Inc. All rights reserved.
Replication is your friend
Kafka protects against failures by replicating data
The unit of replication is the partition
One replica is designated as the Leader
Follower replicas fetch data from the leader
The leader holds the list of “in-sync” replicas
10. 10© Cloudera, Inc. All rights reserved.
Replication and ISRs
0
1
2
0
1
2
0
1
2
Producer
Broker
100
Broker
101
Broker
102
Topic:
Partitions
:
Replicas:
my_topic
3
3
Partition
:
Leader:
ISR:
1
101
100,102
Partition
:
Leader:
ISR:
2
102
101,100
Partition
:
Leader:
ISR:
0
100
101,102
11. 11© Cloudera, Inc. All rights reserved.
ISR
• 2 things make a replica in-sync
• Lag behind leader
• replica.lag.time.max.ms – replica that didn’t fetch or is behind
• replica.lag.max.messages – will go away in 0.9
• Connection to Zookeeper
12. 12© Cloudera, Inc. All rights reserved.
Terminology
• Acked
• Producers will not retry sending.
• Depends on producer setting
• Committed
• Consumers can read.
• Only when message got to all ISR.
• replica.lag.time.max.ms
• how long can a dead replica prevent
consumers from reading?
13. 13© Cloudera, Inc. All rights reserved.
Replication
• Acks = all
• only waits for in-sync replicas to reply.
Replica 3
100
Replica 2
100
Replica 1
100
Time
14. 14© Cloudera, Inc. All rights reserved.
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
Time
• Replica 3 stopped replicating for some reason
Acked in acks = all
“committed”
Acked in acks = 1
but not
“committed”
15. 15© Cloudera, Inc. All rights reserved.
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
Time
• One replica drops out of ISR, or goes offline
• All messages are now acked and committed
16. 16© Cloudera, Inc. All rights reserved.
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
102
103
104Time
• 2nd Replica drops out, or is offline
17. 17© Cloudera, Inc. All rights reserved.
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
102
103
104Time
• Now we’re in trouble
✗
18. 18© Cloudera, Inc. All rights reserved.
Replication
• If Replica 2 or 3 come back online before the leader, you can will lose data.
Replica 3
100
Replica 2
100
101
Replica 1
100
101
102
103
104Time
All those are
“acked” and
“committed”
19. 19© Cloudera, Inc. All rights reserved.
So what to do
• Disable Unclean Leader Election
• unclean.leader.election.enable = false
• Set replication factor
• default.replication.factor = 3
• Set minimum ISRs
• min.insync.replicas = 2
20. 20© Cloudera, Inc. All rights reserved.
Warning
• min.insync.replicas is applied at the topic-level.
• Must alter the topic configuration manually if created before the server level
change
• Must manually alter the topic < 0.9.0 (KAFKA-2114)
21. 21© Cloudera, Inc. All rights reserved.
Replication
• Replication = 3
• Min ISR = 2
Replica 3
100
Replica 2
100
Replica 1
100
Time
22. 22© Cloudera, Inc. All rights reserved.
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
Time
• One replica drops out of ISR, or goes offline
23. 23© Cloudera, Inc. All rights reserved.
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101102
103
104
Time
• 2nd Replica fails out, or is out of sync
Buffers
in
Produce
r
25. 25© Cloudera, Inc. All rights reserved.
Producer Internals
• Producer sends batches of messages to a buffer
M3
Application
Thread
Application
Thread
Application
Thread
send()
M2 M1 M0
Batch 3
Batch 2
Batch 1
Fail
? response
retry
Update Future
callback
drain
Metadata or
Exception
26. 26© Cloudera, Inc. All rights reserved.
Basics
• Durability can be configured with the producer configuration
request.required.acks
• 0 The message is written to the network (buffer)
• 1 The message is written to the leader
• all The producer gets an ack after all ISRs receive the data; the message
is committed
• Make sure producer doesn’t just throws messages away!
• block.on.buffer.full = true
27. 27© Cloudera, Inc. All rights reserved.
“New” Producer
• All calls are non-blocking async
• 2 Options for checking for failures:
• Immediately block for response: send().get()
• Do followup work in Callback, close producer after error threshold
• Be careful about buffering these failures. Future work? KAFKA-1955
• Don’t forget to close the producer! producer.close() will block until in-
flight txns complete
• retries (producer config) defaults to 0
• message.send.max.retries (server config) defaults to 3
• In flight requests could lead to message re-ordering
29. 29© Cloudera, Inc. All rights reserved.
Consumer
• Three choices for Consumer API
• Simple Consumer
• High Level Consumer
• “New Consumer”
30. 30© Cloudera, Inc. All rights reserved.
New Consumer
• Available in Kafka 0.9.0
• Provides better control over offset management
• Enhanced server-side group management
31. 31© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer Group
Consumer1 Consumer2 Consumer
3
Consumer
4
32. 32© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer Group
Consumer
1
Consumer
2
Consumer 3 Consumer
4
Commit?
33. 33© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer Group
Consumer
1
Consumer 2 Consumer
3
Consumer
4
Commit?
34. 34© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer
Consumer
1
Consumer
2
Consumer
3
Consumer
4
✗Commit
35. 35© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer
Thread 1 Thread 2 Thread 3 Thread 4
✗
36. 36© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer
Thread 1 Thread 2 Thread 3 Thread 4
Consumer
Picks up here
37. 37© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer
Thread 1 Thread 2 Thread 3 Thread 4
Commit
38. 38© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer
Thread 1 Thread 2 Thread 3 Thread 4
Commit
Offset
commits for
all threads
39. 39© Cloudera, Inc. All rights reserved.
P0 P2 P3 P4 P5 P6
Consumer 1 Consumer
2
Consumer
3
Consumer 4
Consumer Offsets
Commit
40. 40© Cloudera, Inc. All rights reserved.
Consumer Recommendations
• Set autocommit.enable = false
• Manually commit offsets after the message data is processed / persisted
consumer.commitOffsets();
• Run each consumer in it’s own thread
41. 41© Cloudera, Inc. All rights reserved.
New Consumer!
• No Zookeeper! At all!
• Rebalance listener
• Commit:
• Commit
• Commit async
• Commit( offset)
• Seek(offset)
42. 42© Cloudera, Inc. All rights reserved.
Exactly Once Semantics
• At most once is easy
• At least once is not bad either – commit after 100% sure data is safe
• Exactly once is tricky
• Commit data and offsets in one transaction
• Idempotent producer
43. 43© Cloudera, Inc. All rights reserved.
Monitoring for Data Loss
• Monitor for producer errors – watch the retry numbers
• Monitor consumer lag – MaxLag or via offsets
• Standard schema:
• Each message should contain timestamp and originating service and host
• Each producer can report message counts and offsets to a special topic
• “Monitoring consumer” reports message counts to another special topic
• “Important consumers” also report message counts
• Reconcile the results
44. 44© Cloudera, Inc. All rights reserved.
Be Safe, Not Sorry
• Acks = all
• Block.on.buffer.full = true
• Retries = MAX_INT
• ( Max.inflight.requests.per.connect = 1 )
• Producer.close()
• Replication-factor >= 3
• Min.insync.replicas = 2
• Unclean.leader.election = false
• Auto.offset.commit = false
• Commit after processing
• Monitor!
Editor's Notes This conceptually is our high-level consumer. In this diagram we have a topic with 6 partitions, and an application running 4 threads. Kafka provides two different paradigms for commiting offsets. The first is “auto-committing”, more on this later. The second is to manually commit offsets in your application. But what’s the right time? If we commit offsets as soon as we actually receive a message, we expose our selves to data loss as we could have process, machine or thread failure before we persist or otherwise process our data. So what we’d really like to do is only commit offsets after we’ve done some amount of processing and / or persistence on the data. Typical situations would be, after producing a new message to kafka, or after writing a record to HDFS. So lets so we have auto-commit enabled, and we are chugging along, and counting on the consumer to commit our offsets for us. This is great because we don’t have to code anything, and don’t have think about the frequency of commits and the impact that might have on our throughput. Life is good. But now we’ve lost a thread or a process. And we don’t really know where we are in the processing, Because the last auto-commit committed stuff that we hadn’t actually written to disk. So now we’re in a situation where we think we’ve read all of our data but we will have gaps in data. Note the same risk applies if we lose a partition or broker and get a new leader. OR If we add more consumers in the same group and we rebalance the partition assignment. Imagine a scenario where you are hanging in your processing, or there’s some other reason that you have to exit before persisting to disk, the new consumer added will just pick up from the last committed offset. Ok so don’t use autocommit if you care about this sort of thing. One other thing to note, is that if you are running some code akin to the ConsumerGroup Example that’s on the wiki, and you are running one consumer with multiple threads, when you issue a commit from one thread, it will commit across all threads. So this isn’t great for all of the reasons that we mentioned a few moments ago. So disable auto commit. Commit after your processing, and run the high level consumer in it’s own thread. To cement this:
Note a lot this changes in the next release with the new Consumer, but maybe we will revisit that once that is released!