Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, VilliageMD

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 7 Anzeige

Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, VilliageMD

Herunterladen, um offline zu lesen

In this talk, we'll discuss how VillageMD is able to use Kafka topic compaction for rapidly scaling our reprocessing pipelines to encompass hundreds of feeds. Within healthcare data ecosystems, privacy and data minimalism are key design priorities. Being able to handle data deletion in a reliable, timely manner within event-driven architectures is becoming more and more necessary with key governance frameworks like the GDPR and HIPAA.

We'll be giving an overview of the building and governance of dead-letter queues for streaming data processing.

We'll discuss:
1. How to architect a data sink for failed records.
2. How topic compaction can reduce duplicate data and enable idempotency.
3. Building a tombstoning system for removing successfully reprocessed records from the queues.
4. Considerations for monitoring a reprocessing system in production -- what metrics, dataops, and SLAs are useful?

In this talk, we'll discuss how VillageMD is able to use Kafka topic compaction for rapidly scaling our reprocessing pipelines to encompass hundreds of feeds. Within healthcare data ecosystems, privacy and data minimalism are key design priorities. Being able to handle data deletion in a reliable, timely manner within event-driven architectures is becoming more and more necessary with key governance frameworks like the GDPR and HIPAA.

We'll be giving an overview of the building and governance of dead-letter queues for streaming data processing.

We'll discuss:
1. How to architect a data sink for failed records.
2. How topic compaction can reduce duplicate data and enable idempotency.
3. Building a tombstoning system for removing successfully reprocessed records from the queues.
4. Considerations for monitoring a reprocessing system in production -- what metrics, dataops, and SLAs are useful?

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, VilliageMD (20)

Anzeige

Weitere von HostedbyConfluent (20)

Aktuellste (20)

Anzeige

Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, VilliageMD

  1. 1. Kafka Summit Implementing Retry Architectures with Topic Compaction Matthew Zhou Senior Data Engineer @ Peloton
  2. 2. Building Maintainable Data Retry Architectures Data processing failures are inevitable -- pipelines should anticipate those failures and thoughtfully resolve them. A reprocessing pipeline should seek to catalog common failure patterns, prevent dropped data, alert the right people at the right time, and trigger the correct resolution paths. Each of these points encapsulates a deeper constellation of data engineering concepts and details -- this talk aims to focus on idempotent retries and operational lifecycles for data in a Kafka system.
  3. 3. Why think about privacy in software architecture? • Legislation like the GDPR, the CCPA, and other consumer digital protection bills might mandate data minimalism and privacy audits by law. • Consumers are improving their data literacy and may want to exercise stronger control and
  4. 4. Within a compacted topic, every payload requires a primary key. Background threads managed by the Kafka broker compact messages sharing primary keys to the most recent message. This process is eventually consistent based on a configurable "dirty ratio". Records with a null payload are considered "tombstone" records and signal the cleaner threads to remove all messages with that primary key. Topic Compaction in Kafka Offers finer-grained per-record retention rather than time- based retention within a Kafka topic.
  5. 5. Some benefits of topic compaction Accommodates streaming per-record retry needs. 01 Removes the need to track offsets when reprocessing DLQs. 02 Eliminates duplicate data and redundant work. 03 Minimizes the footprint of potentially sensitive data. 04 Allows custom logic-based record retention rather than time-based retention. 05
  6. 6. Building a compacted DLQ system STEP 1 This option is available using either the Kafka built-in CLIs or the language SDK being used to interact with the brokers. Initialize Kafka Topic with compaction STEP 2 This consists of: • Segment block byte size • Retention time • Dirty read ratio Set topic compaction configs STEP 3 After catching a raised exception within application logic, define the attribute set of metadata to inject into the body payload. Configure a primary key that will be used as the compaction key in the DLQ. Build the data payload and allocate the primary key STEP 4 If the Kafka broker cluster is down, consider a failover pathway that might hold messages in buffer until the cluster is restored. After receiving a successful response from the broker, emit a tombstone message to close out the reprocessing work. Emit your message and confirm successful ack STEP 5 A few useful metrics to persistently monitor would be: • queue size • error volume profiles • throughput spikiness • time-alive in the queue • time-to-resolution for successfully reprocessed records. Build monitors around DLQ metadata
  7. 7. Kafka Retry Architectures in Practice Operational SLAs, Metrics, and Observability! HOW TO EFFECTIVELY MONITOR • Build a consistent process for registering alarm thresholds for caught errors. Post-mortems should identify relevant metrics for monitoring gaps. • Two modes of manual intervention: ⚬ Error throughput thresholds ⚬ Error queue size threshold HOW TO RESOLVE ISSUES • Define your operational SLAs on data pipelines and clarify on-call pager duty rotations. • Are retries a manual or automated process? • What kind of metadata filters are useful for narrowing the search space during reprocessing? • Is reprocessing logic idempotent? HOW TO GUARANTEE COVERAGE • The importance of setting up health checks for your infrastructure - who monitors the monitor? • Building out an error catalog that allows fine- grain error handling in code.

×