SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
CONFIDENTIAL: FOR INTERNAL USE ONLY © 2023 REDPANDA DATA
© 2023 REDPANDA DATA
A little about me…
2
Dunith Dhanushka
Senior Developer Advocate, Redpanda Data
● Event streaming, real-time analytics,
and stream processing enthusiast
● Frequent blogger, speaker, and an
educator
@dunithd
linkedin.com/in/dunithd
© 2023 REDPANDA DATA
Agenda
1. Use case
2. Transient and non-transient errors - overview
3. Dead letter topics
4. Handling transient and non-transient errors
5. Q & A
3
© 2023 REDPANDA DATA 4
The problem
How not to lose an expensive message?
© 2023 REDPANDA DATA
Use case Processing an expensive message
E-commerce order processing…
5
© 2023 REDPANDA DATA
What could possibly happen here?
6
Possible outcomes
The happy path
● The order will be processed as expected.
● Sunny day scenario.
Otherwise?
● Processing will fail.
© 2023 REDPANDA DATA 7
“Anything that can go
wrong will go wrong,
and at the worst
possible time.”
Murphy’s law
© 2023 REDPANDA DATA
Possible causes for consumer failures
Two types of errors:
1. Transient errors Unpredicted and short-lived errors in software/hardware/network
components.
2. Non-transient errors Errors that persist over time and cannot be easily resolved through
automatic recovery or failover mechanisms.
8
Why order processing would fail?
© 2023 REDPANDA DATA
Transient errors
Temporary errors that occur in computer systems or
networks, typically caused by:
● Temporary disruptions in network connectivity
● Hardware failures
● Software glitches, or other similar factors.
They are recoverable.
Short-lived errors that are recoverable
9
© 2023 REDPANDA DATA
Non-transient errors
Non-transient errors are deterministic and always
fail when consumed, no matter how many times it
is reprocessed.
It will produce the same result after reprocessing,
causing an infinite loop that wastes precious
computational resources.
Not recoverable
10
© 2023 REDPANDA DATA
Businesses don’t want
to lose messages!
Under any circumstances…
11
© 2023 REDPANDA DATA
Handling
consumer failures
12
© 2023 REDPANDA DATA 13
© 2023 REDPANDA DATA
Dead Letter Queue
DLQ
14
A place where you can route failed messages for reprocessing
© 2023 REDPANDA DATA
Dead Letter Queue pattern - overview
15
© 2023 REDPANDA DATA
DLQ in the context of Kafka
There’s no native DLQs in Kafka!
16
● You can appoint a regular Kafka topic as the DLT.
● Typically, one DLT per source topics.
● Usually the DLT topic name follows the pattern:
<source_topic_name>-dlt
© 2023 REDPANDA DATA
Handling
non-transient
errors
17
© 2023 REDPANDA DATA
General pattern
For handling non-transient errors
18
© 2023 REDPANDA DATA
Spring Kafka consumer with Kafka/Redpanda
19
© 2023 REDPANDA DATA
Code samples
https://github.com/redpanda-data-blog/2022-dead-letter-topics
Where to find the code shown in the talk?
20
© 2023 REDPANDA DATA
Handling malformed
payloads
Dealing with rogue messages
21
© 2023 REDPANDA DATA
Malformed message payloads
● Errors in deserializing string/binary encoded messages at the consumer. E.g XML, JSON, Avro,
Protobuf, etc.
● Are usually caught early at the processing pipeline by Deserializers.
● Errors are logged and message is dropped.
22
© 2023 REDPANDA DATA
Deserialization with Spring Kafka consumers
23
© 2023 REDPANDA DATA
We should route the
malformed messages
to the DLT!
24
They can be corrected and reprocessed later…
© 2023 REDPANDA DATA
Routing malformed messages to the DLT
How Spring Kafka uses the ErrorHandlingDeserializer to catch deserialization errors?
25
© 2023 REDPANDA DATA
Routing malformed messages to the DLT
Spring Kafka configurations
26
© 2023 REDPANDA DATA
Handling
validation/consumer
errors
Dealing with business rule violations and consumer failures.
27
© 2023 REDPANDA DATA
Case 1 The message fails the rule validation
For example:
● Missing fields in the payload E.g the customerId is missing in the order.
● Validation failures E.g the amount is negative.
28
Although the deserialization succeeds
© 2023 REDPANDA DATA
Case 2 Consumer encounters an error
Although the message is perfect, it might trigger an error in the consumer’s processing logic, causing
it to fail the processing.
This time, the error is with the consumer.
For example,
● Consumer throws a NPE.
● RuntimeExceptions
The fault in the consumer’s processing logic
29
© 2023 REDPANDA DATA
We should route them
to the DLT as well.
30
They can be corrected and reprocessed later…
© 2023 REDPANDA DATA
Routing them to DLT
Log the exception and continue. Let Spring route the message to the DLT.
31
In Spring Kafka, you can use the DeadLetterPublishingRecoverer class to route failed messages to
the DLT.
Can be configured with a KafkaTemplate.
© 2023 REDPANDA DATA
How to reprocess messages in the DLT?
● Manual recovery with human intervention.
● Add more context before sending a message to the DLT.
● Producer team should own malformed messages and fix them. E.g The producer might be using
an older schema version.
● Notify the producer about the failure.
Some best practices
32
© 2023 REDPANDA DATA
Handling transient
errors
33
© 2023 REDPANDA DATA
Consumer should retry several times
● The recommended way to handle a transient error is to retry multiple times, with fixed or
incremental intervals in between (back off timestamps).
● If all retry attempts fail, you can redirect the message into the DLT and move on.
● Retrying can be implemented synchronously or asynchronously at the consumer side.
34
Transient errors are recoverable at the consumer’s end
© 2023 REDPANDA DATA
Blocking retries
Consumer thread is blocked until the retry completes
35
© 2023 REDPANDA DATA
Case 1 Simple blocking retries
Suspend the consumer thread and reprocessing the failed message without doing calls to
Consumer.poll() during the retries.
36
© 2023 REDPANDA DATA
Drawbacks
● Main consumer thread is blocked.
● Not ideal for high throughput message processing scenarios.
● Waste of computational resources.
37
© 2023 REDPANDA DATA
Non-blocking
retries with
backoff
Consumer thread continues
38
© 2023 REDPANDA DATA
Retry topics
39
© 2023 REDPANDA DATA
Case 2 Non-blocking retry with a single retry topic
and fixed backoff
40
© 2023 REDPANDA DATA
Spring Kafka configuration
41
© 2023 REDPANDA DATA
Case 3 Non-blocking retry with multiple retry
topics and an exponential back off
42
Inspired by Netflix blog on the same.
© 2023 REDPANDA DATA 43
© 2023 REDPANDA DATA
Spring Kafa configuration
44
© 2023 REDPANDA DATA
Summary
Things you can take home…
45
© 2023 REDPANDA DATA
Takeaways
46
● Consumer failure scenarios can be broadly categorized into transient and non-transient errors.
● Malformed payloads, business rule validation failures, and consumer errors are possible causes
for non-transient errors.
● Consumers should detect non-transient errors as early as possible and move them to the DLT
for manual reprocessing.
● Consumers should implement retry strategies to handle transient errors.
● Prefer using asynchronous retrying when the message throughput is high.
● If all retry attempts fail, the message can be moved to the DLT.
© 2023 REDPANDA DATA
Questions?
47
© 2023 REDPANDA DATA 48
Keep learning
Redpanda University
https://university.redpanda.com
Redpanda Docs
https://docs.redpanda.com/
Redpanda Blogs
https://redpanda.com/blog
Redpanda Code
https://github.com/redpanda-data
© 2023 REDPANDA DATA
Thanks for joining!
Let’s keep in touch
49
@redpandadata redpanda-data
redpanda-data hello@redpanda.com

Weitere ähnliche Inhalte

Ähnlich wie Reliable Message Reprocessing Patterns for Kafka with Dunith Dhanushka

LF_DPDK17_Integrating and using DPDK with Open vSwitch
LF_DPDK17_Integrating and using DPDK with Open vSwitchLF_DPDK17_Integrating and using DPDK with Open vSwitch
LF_DPDK17_Integrating and using DPDK with Open vSwitch
LF_DPDK
 

Ähnlich wie Reliable Message Reprocessing Patterns for Kafka with Dunith Dhanushka (20)

Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
 
Big Data - Big Pitfalls.
Big Data - Big Pitfalls.Big Data - Big Pitfalls.
Big Data - Big Pitfalls.
 
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
 
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
 
Processing TeraBytes of data every day and sleeping at night
Processing TeraBytes of data every day and sleeping at nightProcessing TeraBytes of data every day and sleeping at night
Processing TeraBytes of data every day and sleeping at night
 
Processing TeraBytes of data every day and sleeping at night
Processing TeraBytes of data every day and sleeping at nightProcessing TeraBytes of data every day and sleeping at night
Processing TeraBytes of data every day and sleeping at night
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
 
Simplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta LakeSimplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta Lake
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
LF_DPDK17_Integrating and using DPDK with Open vSwitch
LF_DPDK17_Integrating and using DPDK with Open vSwitchLF_DPDK17_Integrating and using DPDK with Open vSwitch
LF_DPDK17_Integrating and using DPDK with Open vSwitch
 
Troubleshooting load balancing
Troubleshooting load balancingTroubleshooting load balancing
Troubleshooting load balancing
 
IBM Power capacity planning
IBM Power capacity planningIBM Power capacity planning
IBM Power capacity planning
 
Introduction to Akka Serverless
Introduction to Akka ServerlessIntroduction to Akka Serverless
Introduction to Akka Serverless
 
AWS Meetup Paris - Short URL project by Pernod Ricard
AWS Meetup Paris - Short URL project by Pernod RicardAWS Meetup Paris - Short URL project by Pernod Ricard
AWS Meetup Paris - Short URL project by Pernod Ricard
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Zabbix for Monitoring
Zabbix for MonitoringZabbix for Monitoring
Zabbix for Monitoring
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
 
Internet Protocol Deep-Dive
Internet Protocol Deep-DiveInternet Protocol Deep-Dive
Internet Protocol Deep-Dive
 
Container Attached Storage (CAS) with OpenEBS - SDC 2018
Container Attached Storage (CAS) with OpenEBS -  SDC 2018Container Attached Storage (CAS) with OpenEBS -  SDC 2018
Container Attached Storage (CAS) with OpenEBS - SDC 2018
 
Getting Started with Kafka on k8s
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8s
 

Mehr von HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

Mehr von HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Reliable Message Reprocessing Patterns for Kafka with Dunith Dhanushka

  • 1. CONFIDENTIAL: FOR INTERNAL USE ONLY © 2023 REDPANDA DATA
  • 2. © 2023 REDPANDA DATA A little about me… 2 Dunith Dhanushka Senior Developer Advocate, Redpanda Data ● Event streaming, real-time analytics, and stream processing enthusiast ● Frequent blogger, speaker, and an educator @dunithd linkedin.com/in/dunithd
  • 3. © 2023 REDPANDA DATA Agenda 1. Use case 2. Transient and non-transient errors - overview 3. Dead letter topics 4. Handling transient and non-transient errors 5. Q & A 3
  • 4. © 2023 REDPANDA DATA 4 The problem How not to lose an expensive message?
  • 5. © 2023 REDPANDA DATA Use case Processing an expensive message E-commerce order processing… 5
  • 6. © 2023 REDPANDA DATA What could possibly happen here? 6 Possible outcomes The happy path ● The order will be processed as expected. ● Sunny day scenario. Otherwise? ● Processing will fail.
  • 7. © 2023 REDPANDA DATA 7 “Anything that can go wrong will go wrong, and at the worst possible time.” Murphy’s law
  • 8. © 2023 REDPANDA DATA Possible causes for consumer failures Two types of errors: 1. Transient errors Unpredicted and short-lived errors in software/hardware/network components. 2. Non-transient errors Errors that persist over time and cannot be easily resolved through automatic recovery or failover mechanisms. 8 Why order processing would fail?
  • 9. © 2023 REDPANDA DATA Transient errors Temporary errors that occur in computer systems or networks, typically caused by: ● Temporary disruptions in network connectivity ● Hardware failures ● Software glitches, or other similar factors. They are recoverable. Short-lived errors that are recoverable 9
  • 10. © 2023 REDPANDA DATA Non-transient errors Non-transient errors are deterministic and always fail when consumed, no matter how many times it is reprocessed. It will produce the same result after reprocessing, causing an infinite loop that wastes precious computational resources. Not recoverable 10
  • 11. © 2023 REDPANDA DATA Businesses don’t want to lose messages! Under any circumstances… 11
  • 12. © 2023 REDPANDA DATA Handling consumer failures 12
  • 13. © 2023 REDPANDA DATA 13
  • 14. © 2023 REDPANDA DATA Dead Letter Queue DLQ 14 A place where you can route failed messages for reprocessing
  • 15. © 2023 REDPANDA DATA Dead Letter Queue pattern - overview 15
  • 16. © 2023 REDPANDA DATA DLQ in the context of Kafka There’s no native DLQs in Kafka! 16 ● You can appoint a regular Kafka topic as the DLT. ● Typically, one DLT per source topics. ● Usually the DLT topic name follows the pattern: <source_topic_name>-dlt
  • 17. © 2023 REDPANDA DATA Handling non-transient errors 17
  • 18. © 2023 REDPANDA DATA General pattern For handling non-transient errors 18
  • 19. © 2023 REDPANDA DATA Spring Kafka consumer with Kafka/Redpanda 19
  • 20. © 2023 REDPANDA DATA Code samples https://github.com/redpanda-data-blog/2022-dead-letter-topics Where to find the code shown in the talk? 20
  • 21. © 2023 REDPANDA DATA Handling malformed payloads Dealing with rogue messages 21
  • 22. © 2023 REDPANDA DATA Malformed message payloads ● Errors in deserializing string/binary encoded messages at the consumer. E.g XML, JSON, Avro, Protobuf, etc. ● Are usually caught early at the processing pipeline by Deserializers. ● Errors are logged and message is dropped. 22
  • 23. © 2023 REDPANDA DATA Deserialization with Spring Kafka consumers 23
  • 24. © 2023 REDPANDA DATA We should route the malformed messages to the DLT! 24 They can be corrected and reprocessed later…
  • 25. © 2023 REDPANDA DATA Routing malformed messages to the DLT How Spring Kafka uses the ErrorHandlingDeserializer to catch deserialization errors? 25
  • 26. © 2023 REDPANDA DATA Routing malformed messages to the DLT Spring Kafka configurations 26
  • 27. © 2023 REDPANDA DATA Handling validation/consumer errors Dealing with business rule violations and consumer failures. 27
  • 28. © 2023 REDPANDA DATA Case 1 The message fails the rule validation For example: ● Missing fields in the payload E.g the customerId is missing in the order. ● Validation failures E.g the amount is negative. 28 Although the deserialization succeeds
  • 29. © 2023 REDPANDA DATA Case 2 Consumer encounters an error Although the message is perfect, it might trigger an error in the consumer’s processing logic, causing it to fail the processing. This time, the error is with the consumer. For example, ● Consumer throws a NPE. ● RuntimeExceptions The fault in the consumer’s processing logic 29
  • 30. © 2023 REDPANDA DATA We should route them to the DLT as well. 30 They can be corrected and reprocessed later…
  • 31. © 2023 REDPANDA DATA Routing them to DLT Log the exception and continue. Let Spring route the message to the DLT. 31 In Spring Kafka, you can use the DeadLetterPublishingRecoverer class to route failed messages to the DLT. Can be configured with a KafkaTemplate.
  • 32. © 2023 REDPANDA DATA How to reprocess messages in the DLT? ● Manual recovery with human intervention. ● Add more context before sending a message to the DLT. ● Producer team should own malformed messages and fix them. E.g The producer might be using an older schema version. ● Notify the producer about the failure. Some best practices 32
  • 33. © 2023 REDPANDA DATA Handling transient errors 33
  • 34. © 2023 REDPANDA DATA Consumer should retry several times ● The recommended way to handle a transient error is to retry multiple times, with fixed or incremental intervals in between (back off timestamps). ● If all retry attempts fail, you can redirect the message into the DLT and move on. ● Retrying can be implemented synchronously or asynchronously at the consumer side. 34 Transient errors are recoverable at the consumer’s end
  • 35. © 2023 REDPANDA DATA Blocking retries Consumer thread is blocked until the retry completes 35
  • 36. © 2023 REDPANDA DATA Case 1 Simple blocking retries Suspend the consumer thread and reprocessing the failed message without doing calls to Consumer.poll() during the retries. 36
  • 37. © 2023 REDPANDA DATA Drawbacks ● Main consumer thread is blocked. ● Not ideal for high throughput message processing scenarios. ● Waste of computational resources. 37
  • 38. © 2023 REDPANDA DATA Non-blocking retries with backoff Consumer thread continues 38
  • 39. © 2023 REDPANDA DATA Retry topics 39
  • 40. © 2023 REDPANDA DATA Case 2 Non-blocking retry with a single retry topic and fixed backoff 40
  • 41. © 2023 REDPANDA DATA Spring Kafka configuration 41
  • 42. © 2023 REDPANDA DATA Case 3 Non-blocking retry with multiple retry topics and an exponential back off 42 Inspired by Netflix blog on the same.
  • 43. © 2023 REDPANDA DATA 43
  • 44. © 2023 REDPANDA DATA Spring Kafa configuration 44
  • 45. © 2023 REDPANDA DATA Summary Things you can take home… 45
  • 46. © 2023 REDPANDA DATA Takeaways 46 ● Consumer failure scenarios can be broadly categorized into transient and non-transient errors. ● Malformed payloads, business rule validation failures, and consumer errors are possible causes for non-transient errors. ● Consumers should detect non-transient errors as early as possible and move them to the DLT for manual reprocessing. ● Consumers should implement retry strategies to handle transient errors. ● Prefer using asynchronous retrying when the message throughput is high. ● If all retry attempts fail, the message can be moved to the DLT.
  • 47. © 2023 REDPANDA DATA Questions? 47
  • 48. © 2023 REDPANDA DATA 48 Keep learning Redpanda University https://university.redpanda.com Redpanda Docs https://docs.redpanda.com/ Redpanda Blogs https://redpanda.com/blog Redpanda Code https://github.com/redpanda-data
  • 49. © 2023 REDPANDA DATA Thanks for joining! Let’s keep in touch 49 @redpandadata redpanda-data redpanda-data hello@redpanda.com