In this talk, we'll discuss how VillageMD is able to use Kafka topic compaction for rapidly scaling our reprocessing pipelines to encompass hundreds of feeds. Within healthcare data ecosystems, privacy and data minimalism are key design priorities. Being able to handle data deletion in a reliable, timely manner within event-driven architectures is becoming more and more necessary with key governance frameworks like the GDPR and HIPAA.
We'll be giving an overview of the building and governance of dead-letter queues for streaming data processing.
We'll discuss:
1. How to architect a data sink for failed records.
2. How topic compaction can reduce duplicate data and enable idempotency.
3. Building a tombstoning system for removing successfully reprocessed records from the queues.
4. Considerations for monitoring a reprocessing system in production -- what metrics, dataops, and SLAs are useful?