Weitere ähnliche Inhalte Ähnlich wie Reliable Message Reprocessing Patterns for Kafka with Dunith Dhanushka (20) Mehr von HostedbyConfluent (20) Kürzlich hochgeladen (20) Reliable Message Reprocessing Patterns for Kafka with Dunith Dhanushka2. © 2023 REDPANDA DATA
A little about me…
2
Dunith Dhanushka
Senior Developer Advocate, Redpanda Data
● Event streaming, real-time analytics,
and stream processing enthusiast
● Frequent blogger, speaker, and an
educator
@dunithd
linkedin.com/in/dunithd
3. © 2023 REDPANDA DATA
Agenda
1. Use case
2. Transient and non-transient errors - overview
3. Dead letter topics
4. Handling transient and non-transient errors
5. Q & A
3
4. © 2023 REDPANDA DATA 4
The problem
How not to lose an expensive message?
5. © 2023 REDPANDA DATA
Use case Processing an expensive message
E-commerce order processing…
5
6. © 2023 REDPANDA DATA
What could possibly happen here?
6
Possible outcomes
The happy path
● The order will be processed as expected.
● Sunny day scenario.
Otherwise?
● Processing will fail.
7. © 2023 REDPANDA DATA 7
“Anything that can go
wrong will go wrong,
and at the worst
possible time.”
Murphy’s law
8. © 2023 REDPANDA DATA
Possible causes for consumer failures
Two types of errors:
1. Transient errors Unpredicted and short-lived errors in software/hardware/network
components.
2. Non-transient errors Errors that persist over time and cannot be easily resolved through
automatic recovery or failover mechanisms.
8
Why order processing would fail?
9. © 2023 REDPANDA DATA
Transient errors
Temporary errors that occur in computer systems or
networks, typically caused by:
● Temporary disruptions in network connectivity
● Hardware failures
● Software glitches, or other similar factors.
They are recoverable.
Short-lived errors that are recoverable
9
10. © 2023 REDPANDA DATA
Non-transient errors
Non-transient errors are deterministic and always
fail when consumed, no matter how many times it
is reprocessed.
It will produce the same result after reprocessing,
causing an infinite loop that wastes precious
computational resources.
Not recoverable
10
11. © 2023 REDPANDA DATA
Businesses don’t want
to lose messages!
Under any circumstances…
11
14. © 2023 REDPANDA DATA
Dead Letter Queue
DLQ
14
A place where you can route failed messages for reprocessing
16. © 2023 REDPANDA DATA
DLQ in the context of Kafka
There’s no native DLQs in Kafka!
16
● You can appoint a regular Kafka topic as the DLT.
● Typically, one DLT per source topics.
● Usually the DLT topic name follows the pattern:
<source_topic_name>-dlt
18. © 2023 REDPANDA DATA
General pattern
For handling non-transient errors
18
20. © 2023 REDPANDA DATA
Code samples
https://github.com/redpanda-data-blog/2022-dead-letter-topics
Where to find the code shown in the talk?
20
21. © 2023 REDPANDA DATA
Handling malformed
payloads
Dealing with rogue messages
21
22. © 2023 REDPANDA DATA
Malformed message payloads
● Errors in deserializing string/binary encoded messages at the consumer. E.g XML, JSON, Avro,
Protobuf, etc.
● Are usually caught early at the processing pipeline by Deserializers.
● Errors are logged and message is dropped.
22
24. © 2023 REDPANDA DATA
We should route the
malformed messages
to the DLT!
24
They can be corrected and reprocessed later…
25. © 2023 REDPANDA DATA
Routing malformed messages to the DLT
How Spring Kafka uses the ErrorHandlingDeserializer to catch deserialization errors?
25
26. © 2023 REDPANDA DATA
Routing malformed messages to the DLT
Spring Kafka configurations
26
27. © 2023 REDPANDA DATA
Handling
validation/consumer
errors
Dealing with business rule violations and consumer failures.
27
28. © 2023 REDPANDA DATA
Case 1 The message fails the rule validation
For example:
● Missing fields in the payload E.g the customerId is missing in the order.
● Validation failures E.g the amount is negative.
28
Although the deserialization succeeds
29. © 2023 REDPANDA DATA
Case 2 Consumer encounters an error
Although the message is perfect, it might trigger an error in the consumer’s processing logic, causing
it to fail the processing.
This time, the error is with the consumer.
For example,
● Consumer throws a NPE.
● RuntimeExceptions
The fault in the consumer’s processing logic
29
30. © 2023 REDPANDA DATA
We should route them
to the DLT as well.
30
They can be corrected and reprocessed later…
31. © 2023 REDPANDA DATA
Routing them to DLT
Log the exception and continue. Let Spring route the message to the DLT.
31
In Spring Kafka, you can use the DeadLetterPublishingRecoverer class to route failed messages to
the DLT.
Can be configured with a KafkaTemplate.
32. © 2023 REDPANDA DATA
How to reprocess messages in the DLT?
● Manual recovery with human intervention.
● Add more context before sending a message to the DLT.
● Producer team should own malformed messages and fix them. E.g The producer might be using
an older schema version.
● Notify the producer about the failure.
Some best practices
32
34. © 2023 REDPANDA DATA
Consumer should retry several times
● The recommended way to handle a transient error is to retry multiple times, with fixed or
incremental intervals in between (back off timestamps).
● If all retry attempts fail, you can redirect the message into the DLT and move on.
● Retrying can be implemented synchronously or asynchronously at the consumer side.
34
Transient errors are recoverable at the consumer’s end
35. © 2023 REDPANDA DATA
Blocking retries
Consumer thread is blocked until the retry completes
35
36. © 2023 REDPANDA DATA
Case 1 Simple blocking retries
Suspend the consumer thread and reprocessing the failed message without doing calls to
Consumer.poll() during the retries.
36
37. © 2023 REDPANDA DATA
Drawbacks
● Main consumer thread is blocked.
● Not ideal for high throughput message processing scenarios.
● Waste of computational resources.
37
38. © 2023 REDPANDA DATA
Non-blocking
retries with
backoff
Consumer thread continues
38
40. © 2023 REDPANDA DATA
Case 2 Non-blocking retry with a single retry topic
and fixed backoff
40
42. © 2023 REDPANDA DATA
Case 3 Non-blocking retry with multiple retry
topics and an exponential back off
42
Inspired by Netflix blog on the same.
46. © 2023 REDPANDA DATA
Takeaways
46
● Consumer failure scenarios can be broadly categorized into transient and non-transient errors.
● Malformed payloads, business rule validation failures, and consumer errors are possible causes
for non-transient errors.
● Consumers should detect non-transient errors as early as possible and move them to the DLT
for manual reprocessing.
● Consumers should implement retry strategies to handle transient errors.
● Prefer using asynchronous retrying when the message throughput is high.
● If all retry attempts fail, the message can be moved to the DLT.
48. © 2023 REDPANDA DATA 48
Keep learning
Redpanda University
https://university.redpanda.com
Redpanda Docs
https://docs.redpanda.com/
Redpanda Blogs
https://redpanda.com/blog
Redpanda Code
https://github.com/redpanda-data
49. © 2023 REDPANDA DATA
Thanks for joining!
Let’s keep in touch
49
@redpandadata redpanda-data
redpanda-data hello@redpanda.com