Reliable Message Reprocessing Patterns for Kafka with Dunith Dhanushka

© 2023 REDPANDA DATA
A little about me…
2
Dunith Dhanushka
Senior Developer Advocate, Redpanda Data
● Event streaming, real-time analytics,
and stream processing enthusiast
● Frequent blogger, speaker, and an
educator
@dunithd
linkedin.com/in/dunithd

Agenda
1. Use case
2. Transient and non-transient errors - overview
3. Dead letter topics
4. Handling transient and non-transient errors
5. Q & A
3

© 2023 REDPANDA DATA 4
The problem
How not to lose an expensive message?

Use case Processing an expensive message
E-commerce order processing…
5

What could possibly happen here?
6
Possible outcomes
The happy path
● The order will be processed as expected.
● Sunny day scenario.
Otherwise?
● Processing will fail.

“Anything that can go
wrong will go wrong,
and at the worst
possible time.”
Murphy’s law

Possible causes for consumer failures
Two types of errors:
1. Transient errors Unpredicted and short-lived errors in software/hardware/network
components.
2. Non-transient errors Errors that persist over time and cannot be easily resolved through
automatic recovery or failover mechanisms.
8
Why order processing would fail?

Transient errors
Temporary errors that occur in computer systems or
networks, typically caused by:
● Temporary disruptions in network connectivity
● Hardware failures
● Software glitches, or other similar factors.
They are recoverable.
Short-lived errors that are recoverable
9

Non-transient errors
Non-transient errors are deterministic and always
fail when consumed, no matter how many times it
is reprocessed.
It will produce the same result after reprocessing,
causing an infinite loop that wastes precious
computational resources.
Not recoverable
10

Businesses don’t want
to lose messages!
Under any circumstances…
11

Handling
consumer failures
12

Dead Letter Queue
DLQ
14
A place where you can route failed messages for reprocessing

Dead Letter Queue pattern - overview
15

DLQ in the context of Kafka
There’s no native DLQs in Kafka!
16
● You can appoint a regular Kafka topic as the DLT.
● Typically, one DLT per source topics.
● Usually the DLT topic name follows the pattern:
<source_topic_name>-dlt

Handling
non-transient
errors
17

General pattern
For handling non-transient errors
18

Spring Kafka consumer with Kafka/Redpanda
19

Code samples
https://github.com/redpanda-data-blog/2022-dead-letter-topics
Where to find the code shown in the talk?
20

Handling malformed
payloads
Dealing with rogue messages
21

Malformed message payloads
● Errors in deserializing string/binary encoded messages at the consumer. E.g XML, JSON, Avro,
Protobuf, etc.
● Are usually caught early at the processing pipeline by Deserializers.
● Errors are logged and message is dropped.
22

Deserialization with Spring Kafka consumers
23

We should route the
malformed messages
to the DLT!
24
They can be corrected and reprocessed later…

Routing malformed messages to the DLT
How Spring Kafka uses the ErrorHandlingDeserializer to catch deserialization errors?
25

Routing malformed messages to the DLT
Spring Kafka configurations
26

Handling
validation/consumer
errors
Dealing with business rule violations and consumer failures.
27

Case 1 The message fails the rule validation
For example:
● Missing fields in the payload E.g the customerId is missing in the order.
● Validation failures E.g the amount is negative.
28
Although the deserialization succeeds

Case 2 Consumer encounters an error
Although the message is perfect, it might trigger an error in the consumer’s processing logic, causing
it to fail the processing.
This time, the error is with the consumer.
For example,
● Consumer throws a NPE.
● RuntimeExceptions
The fault in the consumer’s processing logic
29

We should route them
to the DLT as well.
30
They can be corrected and reprocessed later…

Routing them to DLT
Log the exception and continue. Let Spring route the message to the DLT.
31
In Spring Kafka, you can use the DeadLetterPublishingRecoverer class to route failed messages to
the DLT.
Can be configured with a KafkaTemplate.

How to reprocess messages in the DLT?
● Manual recovery with human intervention.
● Add more context before sending a message to the DLT.
● Producer team should own malformed messages and fix them. E.g The producer might be using
an older schema version.
● Notify the producer about the failure.
Some best practices
32

Handling transient
errors
33

Consumer should retry several times
● The recommended way to handle a transient error is to retry multiple times, with fixed or
incremental intervals in between (back off timestamps).
● If all retry attempts fail, you can redirect the message into the DLT and move on.
● Retrying can be implemented synchronously or asynchronously at the consumer side.
34
Transient errors are recoverable at the consumer’s end

Blocking retries
Consumer thread is blocked until the retry completes
35

Case 1 Simple blocking retries
Suspend the consumer thread and reprocessing the failed message without doing calls to
Consumer.poll() during the retries.
36

Drawbacks
● Main consumer thread is blocked.
● Not ideal for high throughput message processing scenarios.
● Waste of computational resources.
37

Non-blocking
retries with
backoff
Consumer thread continues
38

Retry topics
39

Case 2 Non-blocking retry with a single retry topic
and fixed backoff
40

Spring Kafka configuration
41

Case 3 Non-blocking retry with multiple retry
topics and an exponential back off
42
Inspired by Netflix blog on the same.

Spring Kafa configuration
44

Summary
Things you can take home…
45

Takeaways
46
● Consumer failure scenarios can be broadly categorized into transient and non-transient errors.
● Malformed payloads, business rule validation failures, and consumer errors are possible causes
for non-transient errors.
● Consumers should detect non-transient errors as early as possible and move them to the DLT
for manual reprocessing.
● Consumers should implement retry strategies to handle transient errors.
● Prefer using asynchronous retrying when the message throughput is high.
● If all retry attempts fail, the message can be moved to the DLT.

Questions?
47

Keep learning
Redpanda University
https://university.redpanda.com
Redpanda Docs
https://docs.redpanda.com/
Redpanda Blogs
https://redpanda.com/blog
Redpanda Code
https://github.com/redpanda-data

Thanks for joining!
Let’s keep in touch
49
@redpandadata redpanda-data
redpanda-data hello@redpanda.com

Reliable Message Reprocessing Patterns for Kafka with Dunith Dhanushka

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Reliable Message Reprocessing Patterns for Kafka with Dunith Dhanushka

Ähnlich wie Reliable Message Reprocessing Patterns for Kafka with Dunith Dhanushka (20)

Mehr von HostedbyConfluent

Mehr von HostedbyConfluent (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Reliable Message Reprocessing Patterns for Kafka with Dunith Dhanushka