Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, VilliageMD

•

1 gefällt mir•559 views

HostedbyConfluent

In this talk, we'll discuss how VillageMD is able to use Kafka topic compaction for rapidly scaling our reprocessing pipelines to encompass hundreds of feeds. Within healthcare data ecosystems, privacy and data minimalism are key design priorities. Being able to handle data deletion in a reliable, timely manner within event-driven architectures is becoming more and more necessary with key governance frameworks like the GDPR and HIPAA. We'll be giving an overview of the building and governance of dead-letter queues for streaming data processing. We'll discuss: 1. How to architect a data sink for failed records. 2. How topic compaction can reduce duplicate data and enable idempotency. 3. Building a tombstoning system for removing successfully reprocessed records from the queues. 4. Considerations for monitoring a reprocessing system in production -- what metrics, dataops, and SLAs are useful?

Kafka Summit
Implementing Retry Architectures
with Topic Compaction
Matthew Zhou
Senior Data Engineer @ Peloton

Building Maintainable
Data Retry
Architectures
Data processing failures are inevitable --
pipelines should anticipate those failures and
thoughtfully resolve them.
A reprocessing pipeline should seek to catalog common failure
patterns, prevent dropped data, alert the right people at the
right time, and trigger the correct resolution paths.
Each of these points encapsulates a deeper constellation of
data engineering concepts and details -- this talk aims to focus
on idempotent retries and operational lifecycles for data in a
Kafka system.

Why think about privacy
in software architecture?
• Legislation like the GDPR, the CCPA, and other
consumer digital protection bills might
mandate data minimalism and privacy audits
by law.
• Consumers are improving their data literacy
and may want to exercise stronger control and

Within a compacted topic, every payload requires a primary key.
Background threads managed by the Kafka broker compact messages sharing primary
keys to the most recent message. This process is eventually consistent based on a
configurable "dirty ratio". Records with a null payload are considered "tombstone"
records and signal the cleaner threads to remove all messages with that primary key.
Topic
Compaction
in Kafka
Offers finer-grained per-record
retention rather than time-
based retention within a Kafka
topic.

Some
benefits of
topic
compaction
Accommodates streaming per-record
retry needs.
01
Removes the need to track offsets when
reprocessing DLQs.
02
Eliminates duplicate data and
redundant work.
03
Minimizes the footprint of potentially
sensitive data.
04
Allows custom logic-based record
retention rather than time-based
retention.
05

Building a compacted DLQ system
STEP 1
This option is available using either
the Kafka built-in CLIs or the language
SDK being used to interact with the
brokers.
Initialize Kafka Topic with compaction
STEP 2
This consists of:
• Segment block byte size
• Retention time
• Dirty read ratio
Set topic compaction configs
STEP 3
After catching a raised exception
within application logic, define the
attribute set of metadata to inject
into the body payload. Configure a
primary key that will be used as the
compaction key in the DLQ.
Build the data payload and allocate
the primary key
STEP 4
If the Kafka broker cluster is down,
consider a failover pathway that might
hold messages in buffer until the
cluster is restored.
After receiving a successful response
from the broker, emit a tombstone
message to close out the
reprocessing work.
Emit your message and confirm
successful ack
STEP 5
A few useful metrics to persistently
monitor would be:
• queue size
• error volume profiles
• throughput spikiness
• time-alive in the queue
• time-to-resolution for
successfully reprocessed records.
Build monitors around DLQ metadata

Kafka Retry Architectures in Practice
Operational SLAs, Metrics, and Observability!
HOW TO EFFECTIVELY
MONITOR
• Build a consistent process
for registering alarm
thresholds for caught errors.
Post-mortems should
identify relevant metrics for
monitoring gaps.
• Two modes of manual
intervention:
⚬ Error throughput
thresholds
⚬ Error queue size
threshold
HOW TO RESOLVE
ISSUES
• Define your operational
SLAs on data pipelines
and clarify on-call pager
duty rotations.
• Are retries a manual or
automated process?
• What kind of metadata
filters are useful for
narrowing the search
space during
reprocessing?
• Is reprocessing logic
idempotent?
HOW TO GUARANTEE
COVERAGE
• The importance of setting
up health checks for your
infrastructure - who
monitors the monitor?
• Building out an error
catalog that allows fine-
grain error handling in
code.

Empfohlen

Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...

Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...

Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...HostedbyConfluent

How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...

How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...

How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...HostedbyConfluent

Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...

Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...

Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...HostedbyConfluent

Kafka for Real-Time Event Processing in Serverless Environments

Kafka for Real-Time Event Processing in Serverless Environments

Kafka for Real-Time Event Processing in Serverless Environmentsconfluent

How did we move the mountain? - Migrating 1 trillion+ messages per day across...

How did we move the mountain? - Migrating 1 trillion+ messages per day across...

How did we move the mountain? - Migrating 1 trillion+ messages per day across...HostedbyConfluent

Capital One Delivers Risk Insights in Real Time with Stream Processing

Capital One Delivers Risk Insights in Real Time with Stream Processing

Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...HostedbyConfluent

Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...

Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...

Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...confluent

Empfohlen

Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...

Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...

Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...HostedbyConfluent

How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...

How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...

How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...HostedbyConfluent

Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...

Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...

Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...HostedbyConfluent

Kafka for Real-Time Event Processing in Serverless Environments

Kafka for Real-Time Event Processing in Serverless Environments

Kafka for Real-Time Event Processing in Serverless Environmentsconfluent

How did we move the mountain? - Migrating 1 trillion+ messages per day across...

How did we move the mountain? - Migrating 1 trillion+ messages per day across...

How did we move the mountain? - Migrating 1 trillion+ messages per day across...HostedbyConfluent

Capital One Delivers Risk Insights in Real Time with Stream Processing

Capital One Delivers Risk Insights in Real Time with Stream Processing

Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...HostedbyConfluent

Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...

Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...

Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...confluent

Changing landscapes in data integration - Kafka Connect for near real-time da...

Changing landscapes in data integration - Kafka Connect for near real-time da...

Changing landscapes in data integration - Kafka Connect for near real-time da...HostedbyConfluent

Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...

Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...

Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...HostedbyConfluent

Inside Kafka Streams—Monitoring Comcast’s Outside Plant

Inside Kafka Streams—Monitoring Comcast’s Outside Plant

Inside Kafka Streams—Monitoring Comcast’s Outside Plant confluent

Launching the Expedia Conversations Platform: From Zero to Production in Four...

Launching the Expedia Conversations Platform: From Zero to Production in Four...

Launching the Expedia Conversations Platform: From Zero to Production in Four...HostedbyConfluent

Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...

Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...

Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent

Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...

Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...

Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...confluent

Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...

Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...

Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...HostedbyConfluent

URP? Excuse You! The Three Metrics You Have to Know

URP? Excuse You! The Three Metrics You Have to Know

URP? Excuse You! The Three Metrics You Have to Know confluent

Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...

Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...

Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...HostedbyConfluent

Westpac Bank Tech Talk 1: Dive into Apache Kafka

Westpac Bank Tech Talk 1: Dive into Apache Kafka

Westpac Bank Tech Talk 1: Dive into Apache Kafkaconfluent

Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...

Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...

Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...HostedbyConfluent

Apache Kafka from 0.7 to 1.0, History and Lesson Learned

Apache Kafka from 0.7 to 1.0, History and Lesson Learned

Apache Kafka from 0.7 to 1.0, History and Lesson LearnedGuozhang Wang

The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven

The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven

The Road Most Traveled: A Kafka Story | Heikki Nousiainen, AivenHostedbyConfluent

Building Microservices with Apache Kafka

Building Microservices with Apache Kafka

Building Microservices with Apache Kafkaconfluent

Kafka Summit SF 2017 - Database Streaming at WePay

Kafka Summit SF 2017 - Database Streaming at WePay

Kafka Summit SF 2017 - Database Streaming at WePayconfluent

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Surviveconfluent

Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik

Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik

Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikHostedbyConfluent

Stream Processing with Apache Kafka and .NET

Stream Processing with Apache Kafka and .NET

Stream Processing with Apache Kafka and .NETconfluent

Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...

Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...

Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...HostedbyConfluent

How Yelp Leapt to Microservices with More than a Message Queue

How Yelp Leapt to Microservices with More than a Message Queue

How Yelp Leapt to Microservices with More than a Message Queueconfluent

Removing performance bottlenecks with Kafka Monitoring and topic configuration

Removing performance bottlenecks with Kafka Monitoring and topic configuration

Removing performance bottlenecks with Kafka Monitoring and topic configurationKnoldus Inc.

APAC Kafka Summit - Best Of

APAC Kafka Summit - Best Of

APAC Kafka Summit - Best Of confluent

Weitere ähnliche Inhalte

Was ist angesagt?

Changing landscapes in data integration - Kafka Connect for near real-time da...

Changing landscapes in data integration - Kafka Connect for near real-time da...

Changing landscapes in data integration - Kafka Connect for near real-time da...HostedbyConfluent

Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...

Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...

Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...HostedbyConfluent

Inside Kafka Streams—Monitoring Comcast’s Outside Plant

Inside Kafka Streams—Monitoring Comcast’s Outside Plant

Inside Kafka Streams—Monitoring Comcast’s Outside Plant confluent

Launching the Expedia Conversations Platform: From Zero to Production in Four...

Launching the Expedia Conversations Platform: From Zero to Production in Four...

Launching the Expedia Conversations Platform: From Zero to Production in Four...HostedbyConfluent

Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...

Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...

Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent

Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...

Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...

Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...confluent

Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...

Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...

Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...HostedbyConfluent

URP? Excuse You! The Three Metrics You Have to Know

URP? Excuse You! The Three Metrics You Have to Know

URP? Excuse You! The Three Metrics You Have to Know confluent

Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...

Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...

Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...HostedbyConfluent

Westpac Bank Tech Talk 1: Dive into Apache Kafka

Westpac Bank Tech Talk 1: Dive into Apache Kafka

Westpac Bank Tech Talk 1: Dive into Apache Kafkaconfluent

Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...

Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...

Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...HostedbyConfluent

Apache Kafka from 0.7 to 1.0, History and Lesson Learned

Apache Kafka from 0.7 to 1.0, History and Lesson Learned

Apache Kafka from 0.7 to 1.0, History and Lesson LearnedGuozhang Wang

The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven

The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven

The Road Most Traveled: A Kafka Story | Heikki Nousiainen, AivenHostedbyConfluent

Building Microservices with Apache Kafka

Building Microservices with Apache Kafka

Building Microservices with Apache Kafkaconfluent

Kafka Summit SF 2017 - Database Streaming at WePay

Kafka Summit SF 2017 - Database Streaming at WePay

Kafka Summit SF 2017 - Database Streaming at WePayconfluent

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Surviveconfluent

Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik

Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik

Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikHostedbyConfluent

Stream Processing with Apache Kafka and .NET

Stream Processing with Apache Kafka and .NET

Stream Processing with Apache Kafka and .NETconfluent

Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...

Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...

Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...HostedbyConfluent

How Yelp Leapt to Microservices with More than a Message Queue

How Yelp Leapt to Microservices with More than a Message Queue

How Yelp Leapt to Microservices with More than a Message Queueconfluent

Was ist angesagt? (20)

Changing landscapes in data integration - Kafka Connect for near real-time da...

Changing landscapes in data integration - Kafka Connect for near real-time da...

Changing landscapes in data integration - Kafka Connect for near real-time da...

Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...

Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...

Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...

Inside Kafka Streams—Monitoring Comcast’s Outside Plant

Inside Kafka Streams—Monitoring Comcast’s Outside Plant

Inside Kafka Streams—Monitoring Comcast’s Outside Plant

Launching the Expedia Conversations Platform: From Zero to Production in Four...

Launching the Expedia Conversations Platform: From Zero to Production in Four...

Launching the Expedia Conversations Platform: From Zero to Production in Four...

Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...

Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...

Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...

Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...

Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...

Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...

Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...

Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...

Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...

URP? Excuse You! The Three Metrics You Have to Know

URP? Excuse You! The Three Metrics You Have to Know

URP? Excuse You! The Three Metrics You Have to Know

Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...

Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...

Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...

Westpac Bank Tech Talk 1: Dive into Apache Kafka

Westpac Bank Tech Talk 1: Dive into Apache Kafka

Westpac Bank Tech Talk 1: Dive into Apache Kafka

Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...

Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...

Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...

Apache Kafka from 0.7 to 1.0, History and Lesson Learned

Apache Kafka from 0.7 to 1.0, History and Lesson Learned

Apache Kafka from 0.7 to 1.0, History and Lesson Learned

The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven

The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven

The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven

Building Microservices with Apache Kafka

Building Microservices with Apache Kafka

Building Microservices with Apache Kafka

Kafka Summit SF 2017 - Database Streaming at WePay

Kafka Summit SF 2017 - Database Streaming at WePay

Kafka Summit SF 2017 - Database Streaming at WePay

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive

Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik

Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik

Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik

Stream Processing with Apache Kafka and .NET

Stream Processing with Apache Kafka and .NET

Stream Processing with Apache Kafka and .NET

Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...

Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...

Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...

How Yelp Leapt to Microservices with More than a Message Queue

How Yelp Leapt to Microservices with More than a Message Queue

How Yelp Leapt to Microservices with More than a Message Queue

Ähnlich wie Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, VilliageMD

Removing performance bottlenecks with Kafka Monitoring and topic configuration

Removing performance bottlenecks with Kafka Monitoring and topic configuration

Removing performance bottlenecks with Kafka Monitoring and topic configurationKnoldus Inc.

APAC Kafka Summit - Best Of

APAC Kafka Summit - Best Of

APAC Kafka Summit - Best Of confluent

Streaming Data with Apache Kafka

Streaming Data with Apache Kafka

Streaming Data with Apache KafkaMarkus Günther

Kafka internals

Kafka internals

Kafka internalsDavid Groozman

Apache Kafka - Scalable Message-Processing and more !

Apache Kafka - Scalable Message-Processing and more !

Apache Kafka - Scalable Message-Processing and more !Guido Schmutz

Apache Big Data Europe 2015: Selected Talks

Apache Big Data Europe 2015: Selected Talks

Apache Big Data Europe 2015: Selected TalksAndrii Gakhov

Leveraging the power of the unbundled database

Leveraging the power of the unbundled database

Leveraging the power of the unbundled databaseAlex Silva

Real time processing of trade data with kafka, spark streaming and aerospike ...

Real time processing of trade data with kafka, spark streaming and aerospike ...

Real time processing of trade data with kafka, spark streaming and aerospike ...Mich Talebzadeh (Ph.D.)

Real time processing of trade data with kafka, spark streaming and aerospike ...

Real time processing of trade data with kafka, spark streaming and aerospike ...

Real time processing of trade data with kafka, spark streaming and aerospike ...Mich Talebzadeh (Ph.D.)

Apache Kafka Introduction

Apache Kafka Introduction

Apache Kafka IntroductionAmita Mirajkar

Big Data Streams Architectures. Why? What? How?

Big Data Streams Architectures. Why? What? How?

Big Data Streams Architectures. Why? What? How?Anton Nazaruk

Unleashing Real-time Power with Kafka.pptx

Unleashing Real-time Power with Kafka.pptx

Unleashing Real-time Power with Kafka.pptxKnoldus Inc.

Cluster_Performance_Apache_Kafak_vs_RabbitMQ

Cluster_Performance_Apache_Kafak_vs_RabbitMQ

Cluster_Performance_Apache_Kafak_vs_RabbitMQShameera Rathnayaka

Kafka Deep Dive

Kafka Deep Dive

Kafka Deep DiveKnoldus Inc.

Quantum Cryptography

Quantum Cryptography

Quantum Cryptography The Cryptography Centre For Excellence

Estimating the Total Costs of Your Cloud Analytics Platform

Estimating the Total Costs of Your Cloud Analytics Platform

Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY

Introducing Events and Stream Processing into Nationwide Building Society (Ro...

Introducing Events and Stream Processing into Nationwide Building Society (Ro...

Introducing Events and Stream Processing into Nationwide Building Society (Ro...confluent

Modern Distributed Messaging and RPC

Modern Distributed Messaging and RPC

Modern Distributed Messaging and RPCMax Alexejev

Microx - A Unix like kernel for Embedded Systems written from scratch.

Microx - A Unix like kernel for Embedded Systems written from scratch.

Microx - A Unix like kernel for Embedded Systems written from scratch.Waqar Sheikh

IBM Blockchain Platform - Architectural Good Practices v1.0

IBM Blockchain Platform - Architectural Good Practices v1.0

IBM Blockchain Platform - Architectural Good Practices v1.0Matt Lucas

Ähnlich wie Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, VilliageMD (20)

Removing performance bottlenecks with Kafka Monitoring and topic configuration

Removing performance bottlenecks with Kafka Monitoring and topic configuration

Removing performance bottlenecks with Kafka Monitoring and topic configuration

APAC Kafka Summit - Best Of

APAC Kafka Summit - Best Of

APAC Kafka Summit - Best Of

Streaming Data with Apache Kafka

Streaming Data with Apache Kafka

Streaming Data with Apache Kafka

Kafka internals

Kafka internals

Kafka internals

Apache Kafka - Scalable Message-Processing and more !

Apache Kafka - Scalable Message-Processing and more !

Apache Kafka - Scalable Message-Processing and more !

Apache Big Data Europe 2015: Selected Talks

Apache Big Data Europe 2015: Selected Talks

Apache Big Data Europe 2015: Selected Talks

Leveraging the power of the unbundled database

Leveraging the power of the unbundled database

Leveraging the power of the unbundled database

Real time processing of trade data with kafka, spark streaming and aerospike ...

Real time processing of trade data with kafka, spark streaming and aerospike ...

Real time processing of trade data with kafka, spark streaming and aerospike ...

Real time processing of trade data with kafka, spark streaming and aerospike ...

Real time processing of trade data with kafka, spark streaming and aerospike ...

Real time processing of trade data with kafka, spark streaming and aerospike ...

Apache Kafka Introduction

Apache Kafka Introduction

Apache Kafka Introduction

Big Data Streams Architectures. Why? What? How?

Big Data Streams Architectures. Why? What? How?

Big Data Streams Architectures. Why? What? How?

Unleashing Real-time Power with Kafka.pptx

Unleashing Real-time Power with Kafka.pptx

Unleashing Real-time Power with Kafka.pptx

Cluster_Performance_Apache_Kafak_vs_RabbitMQ

Cluster_Performance_Apache_Kafak_vs_RabbitMQ

Cluster_Performance_Apache_Kafak_vs_RabbitMQ

Kafka Deep Dive

Kafka Deep Dive

Kafka Deep Dive

Quantum Cryptography

Quantum Cryptography

Quantum Cryptography

Estimating the Total Costs of Your Cloud Analytics Platform

Estimating the Total Costs of Your Cloud Analytics Platform

Estimating the Total Costs of Your Cloud Analytics Platform

Introducing Events and Stream Processing into Nationwide Building Society (Ro...

Introducing Events and Stream Processing into Nationwide Building Society (Ro...

Introducing Events and Stream Processing into Nationwide Building Society (Ro...

Modern Distributed Messaging and RPC

Modern Distributed Messaging and RPC

Modern Distributed Messaging and RPC

Microx - A Unix like kernel for Embedded Systems written from scratch.

Microx - A Unix like kernel for Embedded Systems written from scratch.

Microx - A Unix like kernel for Embedded Systems written from scratch.

IBM Blockchain Platform - Architectural Good Practices v1.0

IBM Blockchain Platform - Architectural Good Practices v1.0

IBM Blockchain Platform - Architectural Good Practices v1.0

Mehr von HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Renaming a Kafka Topic | Kafka Summit London

Renaming a Kafka Topic | Kafka Summit London

Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent

Evolution of NRT Data Ingestion Pipeline at Trendyol

Evolution of NRT Data Ingestion Pipeline at Trendyol

Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent

Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques

Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques

Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent

Exactly-once Stream Processing with Arroyo and Kafka

Exactly-once Stream Processing with Arroyo and Kafka

Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent

Fish Plays Pokemon | Kafka Summit London

Fish Plays Pokemon | Kafka Summit London

Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent

Tiered Storage 101 | Kafla Summit London

Tiered Storage 101 | Kafla Summit London

Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent

Building a Self-Service Stream Processing Portal: How And Why

Building a Self-Service Stream Processing Portal: How And Why

Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent

From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...

From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...

From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent

Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...

Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...

Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent

Navigating Private Network Connectivity Options for Kafka Clusters

Navigating Private Network Connectivity Options for Kafka Clusters

Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent

Apache Flink: Building a Company-wide Self-service Streaming Data Platform

Apache Flink: Building a Company-wide Self-service Streaming Data Platform

Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent

Explaining How Real-Time GenAI Works in a Noisy Pub

Explaining How Real-Time GenAI Works in a Noisy Pub

Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent

TL;DR Kafka Metrics | Kafka Summit London

TL;DR Kafka Metrics | Kafka Summit London

TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent

A Window Into Your Kafka Streams Tasks | KSL

A Window Into Your Kafka Streams Tasks | KSL

A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent

Mastering Kafka Producer Configs: A Guide to Optimizing Performance

Mastering Kafka Producer Configs: A Guide to Optimizing Performance

Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent

Data Contracts Management: Schema Registry and Beyond

Data Contracts Management: Schema Registry and Beyond

Data Contracts Management: Schema Registry and BeyondHostedbyConfluent

Code-First Approach: Crafting Efficient Flink Apps

Code-First Approach: Crafting Efficient Flink Apps

Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent

Debezium vs. the World: An Overview of the CDC Ecosystem

Debezium vs. the World: An Overview of the CDC Ecosystem

Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent

Beyond Tiered Storage: Serverless Kafka with No Local Disks

Beyond Tiered Storage: Serverless Kafka with No Local Disks

Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent

Mehr von HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Renaming a Kafka Topic | Kafka Summit London

Renaming a Kafka Topic | Kafka Summit London

Renaming a Kafka Topic | Kafka Summit London

Evolution of NRT Data Ingestion Pipeline at Trendyol

Evolution of NRT Data Ingestion Pipeline at Trendyol

Evolution of NRT Data Ingestion Pipeline at Trendyol

Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques

Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques

Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques

Exactly-once Stream Processing with Arroyo and Kafka

Exactly-once Stream Processing with Arroyo and Kafka

Exactly-once Stream Processing with Arroyo and Kafka

Fish Plays Pokemon | Kafka Summit London

Fish Plays Pokemon | Kafka Summit London

Fish Plays Pokemon | Kafka Summit London

Tiered Storage 101 | Kafla Summit London

Tiered Storage 101 | Kafla Summit London

Tiered Storage 101 | Kafla Summit London

Building a Self-Service Stream Processing Portal: How And Why

Building a Self-Service Stream Processing Portal: How And Why

Building a Self-Service Stream Processing Portal: How And Why

From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...

From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...

From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...

Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...

Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...

Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...

Navigating Private Network Connectivity Options for Kafka Clusters

Navigating Private Network Connectivity Options for Kafka Clusters

Navigating Private Network Connectivity Options for Kafka Clusters

Apache Flink: Building a Company-wide Self-service Streaming Data Platform

Apache Flink: Building a Company-wide Self-service Streaming Data Platform

Apache Flink: Building a Company-wide Self-service Streaming Data Platform

Explaining How Real-Time GenAI Works in a Noisy Pub

Explaining How Real-Time GenAI Works in a Noisy Pub

Explaining How Real-Time GenAI Works in a Noisy Pub

TL;DR Kafka Metrics | Kafka Summit London

TL;DR Kafka Metrics | Kafka Summit London

TL;DR Kafka Metrics | Kafka Summit London

A Window Into Your Kafka Streams Tasks | KSL

A Window Into Your Kafka Streams Tasks | KSL

A Window Into Your Kafka Streams Tasks | KSL

Mastering Kafka Producer Configs: A Guide to Optimizing Performance

Mastering Kafka Producer Configs: A Guide to Optimizing Performance

Mastering Kafka Producer Configs: A Guide to Optimizing Performance

Data Contracts Management: Schema Registry and Beyond

Data Contracts Management: Schema Registry and Beyond

Data Contracts Management: Schema Registry and Beyond

Code-First Approach: Crafting Efficient Flink Apps

Code-First Approach: Crafting Efficient Flink Apps

Code-First Approach: Crafting Efficient Flink Apps

Debezium vs. the World: An Overview of the CDC Ecosystem

Debezium vs. the World: An Overview of the CDC Ecosystem

Debezium vs. the World: An Overview of the CDC Ecosystem

Beyond Tiered Storage: Serverless Kafka with No Local Disks

Beyond Tiered Storage: Serverless Kafka with No Local Disks

Beyond Tiered Storage: Serverless Kafka with No Local Disks

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts

Handwritten Text Recognition for manuscripts and early printed texts

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Slack Application Development 101 Slides

Slack Application Development 101 Slides

Slack Application Development 101 Slidespraypatel2

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

GenCyber Cyber Security Day Presentation

GenCyber Cyber Security Day Presentation

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

SQL Database Design For Developers at php[tek] 2024

SQL Database Design For Developers at php[tek] 2024

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Scaling API-first – The story of a global engineering organization

Scaling API-first – The story of a global engineering organization

Scaling API-first – The story of a global engineering organizationRadu Cotescu

My Hashitalk Indonesia April 2024 Presentation

My Hashitalk Indonesia April 2024 Presentation

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

08448380779 Call Girls In Civil Lines Women Seeking Men

08448380779 Call Girls In Civil Lines Women Seeking Men

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Presentation on how to chat with PDF using ChatGPT code interpreter

Presentation on how to chat with PDF using ChatGPT code interpreter

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Boost PC performance: How more available memory can improve productivity

Boost PC performance: How more available memory can improve productivity

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Injustice - Developers Among Us (SciFiDevCon 2024)

Injustice - Developers Among Us (SciFiDevCon 2024)

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

IAC 2024 - IA Fast Track to Search Focused AI Solutions

IAC 2024 - IA Fast Track to Search Focused AI Solutions

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts

Handwritten Text Recognition for manuscripts and early printed texts

Handwritten Text Recognition for manuscripts and early printed texts

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Slack Application Development 101 Slides

Slack Application Development 101 Slides

Slack Application Development 101 Slides

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

GenCyber Cyber Security Day Presentation

GenCyber Cyber Security Day Presentation

GenCyber Cyber Security Day Presentation

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

SQL Database Design For Developers at php[tek] 2024

SQL Database Design For Developers at php[tek] 2024

SQL Database Design For Developers at php[tek] 2024

Scaling API-first – The story of a global engineering organization

Scaling API-first – The story of a global engineering organization

Scaling API-first – The story of a global engineering organization

My Hashitalk Indonesia April 2024 Presentation

My Hashitalk Indonesia April 2024 Presentation

My Hashitalk Indonesia April 2024 Presentation

08448380779 Call Girls In Civil Lines Women Seeking Men

08448380779 Call Girls In Civil Lines Women Seeking Men

08448380779 Call Girls In Civil Lines Women Seeking Men

Presentation on how to chat with PDF using ChatGPT code interpreter

Presentation on how to chat with PDF using ChatGPT code interpreter

Presentation on how to chat with PDF using ChatGPT code interpreter

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Boost PC performance: How more available memory can improve productivity

Boost PC performance: How more available memory can improve productivity

Boost PC performance: How more available memory can improve productivity

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Injustice - Developers Among Us (SciFiDevCon 2024)

Injustice - Developers Among Us (SciFiDevCon 2024)

Injustice - Developers Among Us (SciFiDevCon 2024)

IAC 2024 - IA Fast Track to Search Focused AI Solutions

IAC 2024 - IA Fast Track to Search Focused AI Solutions

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, VilliageMD

1. Kafka Summit Implementing Retry Architectures with Topic Compaction Matthew Zhou Senior Data Engineer @ Peloton

2. Building Maintainable Data Retry Architectures Data processing failures are inevitable -- pipelines should anticipate those failures and thoughtfully resolve them. A reprocessing pipeline should seek to catalog common failure patterns, prevent dropped data, alert the right people at the right time, and trigger the correct resolution paths. Each of these points encapsulates a deeper constellation of data engineering concepts and details -- this talk aims to focus on idempotent retries and operational lifecycles for data in a Kafka system.

3. Why think about privacy in software architecture? • Legislation like the GDPR, the CCPA, and other consumer digital protection bills might mandate data minimalism and privacy audits by law. • Consumers are improving their data literacy and may want to exercise stronger control and

4. Within a compacted topic, every payload requires a primary key. Background threads managed by the Kafka broker compact messages sharing primary keys to the most recent message. This process is eventually consistent based on a configurable "dirty ratio". Records with a null payload are considered "tombstone" records and signal the cleaner threads to remove all messages with that primary key. Topic Compaction in Kafka Offers finer-grained per-record retention rather than time- based retention within a Kafka topic.

5. Some benefits of topic compaction Accommodates streaming per-record retry needs. 01 Removes the need to track offsets when reprocessing DLQs. 02 Eliminates duplicate data and redundant work. 03 Minimizes the footprint of potentially sensitive data. 04 Allows custom logic-based record retention rather than time-based retention. 05

6. Building a compacted DLQ system STEP 1 This option is available using either the Kafka built-in CLIs or the language SDK being used to interact with the brokers. Initialize Kafka Topic with compaction STEP 2 This consists of: • Segment block byte size • Retention time • Dirty read ratio Set topic compaction configs STEP 3 After catching a raised exception within application logic, define the attribute set of metadata to inject into the body payload. Configure a primary key that will be used as the compaction key in the DLQ. Build the data payload and allocate the primary key STEP 4 If the Kafka broker cluster is down, consider a failover pathway that might hold messages in buffer until the cluster is restored. After receiving a successful response from the broker, emit a tombstone message to close out the reprocessing work. Emit your message and confirm successful ack STEP 5 A few useful metrics to persistently monitor would be: • queue size • error volume profiles • throughput spikiness • time-alive in the queue • time-to-resolution for successfully reprocessed records. Build monitors around DLQ metadata

7. Kafka Retry Architectures in Practice Operational SLAs, Metrics, and Observability! HOW TO EFFECTIVELY MONITOR • Build a consistent process for registering alarm thresholds for caught errors. Post-mortems should identify relevant metrics for monitoring gaps. • Two modes of manual intervention: ⚬ Error throughput thresholds ⚬ Error queue size threshold HOW TO RESOLVE ISSUES • Define your operational SLAs on data pipelines and clarify on-call pager duty rotations. • Are retries a manual or automated process? • What kind of metadata filters are useful for narrowing the search space during reprocessing? • Is reprocessing logic idempotent? HOW TO GUARANTEE COVERAGE • The importance of setting up health checks for your infrastructure - who monitors the monitor? • Building out an error catalog that allows fine- grain error handling in code.