SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
V0000000
1
Friends don’t let friends do dual-writes!
Introducing Change Data
Capture with Debezium
Cheng Kuan Gan
Senior Specialist Solution Architect
Red Hat APAC
CHANGE
DATA
CAPTURE
V0000000
CHANGE DATA CAPTURE
2
The Issue with Dual Writes
Source:
What's the problem?
Change data capture to the rescue!
CDC Use Cases & Patterns
Replication
Audit Logs
Microservices
Practical Matters
Deployment Topologies
Running on Kubernetes
Single Message Transforms
Agenda
V0000000
CHANGE DATA CAPTURE
Common Problem
3
Updating multiple resources
Order
Service
Database
V0000000
CHANGE DATA CAPTURE
Common Problem
4
Updating multiple resources
Order
Service
Database
Cache
V0000000
CHANGE DATA CAPTURE
Common Problem
5
Updating multiple resources
Order
Service
Database
Cache
Search Index
V0000000
CHANGE DATA CAPTURE
Common Problem
6
Updating multiple resources
Order
Service
Database
Cache
Search Index
V0000000
Friends Don't Let Friends
Do Dual Writes!
CHANGE
DATA
CAPTURE
7
V0000000
CHANGE DATA CAPTURE
Better Solution
8
Stream changes events from the database
Order
Service
V0000000
CHANGE DATA CAPTURE
Better Solution
9
Stream changes events from the database
Order
Service
C | C | U | C | U | U | D Change Data
Capture
C - Change
U - Update
D - Delete
V0000000
CHANGE DATA CAPTURE
Better Solution
10
Stream changes events from the database
Order
Service
C | C | U | C | U | U | D Change Data
Capture
C - Change
U - Update
D - Delete
Search Index Cache
V0000000
Change Data
Capture with
Debezium
CHANGE DATA CAPTURE
Debezium is an open
source distributed
platform for change data
capture
11
V0000000
CHANGE DATA CAPTURE
Debezium
12
Change Data Capture Platform
● CDC for multiple databases
○ Based on transaction logs
○ Snapshotting, Filtering etc.
● Fully open-source, very active community
● Latest version: 1.4
● Production deployments at multiple companies
(e.g. WePay, JW Player, Convoy, Trivago, OYO,
BlaBlaCar etc.)
V0000000
CHANGE DATA CAPTURE
Red Hat Integration CDC
13
● GA Connectors
○ MySQL
○ Postgres
○ SQL Server
○ MongoDB
○ DB2 (Linux only)
● Developer Preview:
○ Oracle 19 EE (LogMiner)
Supported Databases
V0000000
CHANGE DATA CAPTURE
Advantages of Log-based CDC
14
Tailing the Transaction Logs
● All data changes are captured
● No polling delay or overhead
● Transparent to writing applications and models
● Can capture deletes
● Can capture old record state and further meta data
V0000000
CHANGE DATA CAPTURE
Log vs Query based CDC
15
Query-based Log-based
All data changes are captured -
No polling delay or overhead -
Transparent to writing applications
and models -
Can capture deletes and old record
state -
Simple Installation/Configuration -
V0000000
CHANGE DATA CAPTURE
Debezium
16
Change Event Structure
● Key: PK of table
● Value: Describing the change event
○ Before state,
○ After state,
○ Metadata info
● Serialization formats:
○ JSON
○ Avro
● Cloud events could be used too
V0000000
CHANGE DATA CAPTURE
Single Message Transformations
17
Modify events before storing in Kafka
Image Source: “Penknife, Swiss Army Knife” by Emilian Robert Vicol , used under CC BY 2.0
● Lightweight single message inline transformation
● Format conversions
○ Time/date fields
○ Extract new row state
● Aggregate sharded tables to single topic
● Keep compatibility with existing consumers
● Transformation does not interact with external systems
V0000000
Change Data Capture
Uses & Patterns
CHANGE
DATA
CAPTURE
18
V0000000
CHANGE DATA CAPTURE
Data Replication
19
Zero-Code Streaming Pipelines
| | | | | | |  
| | | | | | |   |
| | | | | |
MySQL
PostgreSQL
Apache Kafka
V0000000
CHANGE DATA CAPTURE
Data Replication
20
Zero-Code Streaming Pipelines
| | | | | | |  
| | | | | | |   |
| | | | | |
MySQL
PostgreSQL
Apache Kafka
Kafka Connect Kafka Connect
V0000000
CHANGE DATA CAPTURE
Data Replication
21
Zero-Code Streaming Pipelines
| | | | | | |  
| | | | | | |   |
| | | | | |
MySQL
PostgreSQL
Apache Kafka
Kafka Connect Kafka Connect
DBZ PG
DBZ
MySQL
V0000000
CHANGE DATA CAPTURE
Data Replication
22
Zero-Code Streaming Pipelines
| | | | | | |  
| | | | | | |   |
| | | | | |
MySQL
PostgreSQL
Apache Kafka
Kafka Connect Kafka Connect
DBZ PG
DBZ
MySQL
ES
Connector
ElasticSearch
V0000000
CHANGE DATA CAPTURE
Data Replication
23
Zero-Code Streaming Pipelines
| | | | | | |  
| | | | | | |   |
| | | | | |
MySQL
PostgreSQL
Apache Kafka
Kafka Connect Kafka Connect
DBZ PG
DBZ
MySQL
ES
Connector ElasticSearch
SQL
Connector
Data
Warehouse
V0000000
CHANGE DATA CAPTURE
A Trucking Company Improves ELT Performance with Debezium
24
Source:
Logs & Offsets: (Near) Real Time ELT with Apache Kafka + Snowflake
Low Latency, Zero Data Loss and Low Maintenance are key to maintain the user
experience and data democratization
● The ELT system is not
able to scale when
employee growth
exceeded 700+.
● Data that used to take
10-15 minutes to import
now takes 1-2 hours.
● Some larger datasets
expects latency of 6+
hours.
Modernized ETL
improved significantly
with Debezium
V0000000
CHANGE DATA CAPTURE
Data Replication
25
Zero-Code Streaming Pipelines
Source:
Logs & Offsets: (Near) Real Time ELT with Apache Kafka + Snowflake
V0000000
CHANGE DATA CAPTURE
Auditing
26
CDC and a bit of Kafka Streams
Source: http://bit.ly/debezium-auditlogs
| | | | | | |   |
DBZ
CRM Service
Source DB
Kafka Connect
Apache Kafka
V0000000
CHANGE DATA CAPTURE
Auditing
27
CDC and a bit of Kafka Streams
Source: http://bit.ly/debezium-auditlogs
| | | | | | |   |
DBZ
CRM Service
Source DB
Kafka Connect
Apache Kafka
Id User Use Case
tx-1 Bob Create Customer
tx-2 Sarah Delete Customer
tx-3 Rebecca Update Customer
V0000000
CHANGE DATA CAPTURE
Auditing
28
CDC and a bit of Kafka Streams
Source: http://bit.ly/debezium-auditlogs
| | | | | | |   |
DBZ
CRM Service
Source DB
Kafka Connect
Apache Kafka
Id User Use Case
tx-1 Bob Create Customer
tx-2 Sarah Delete Customer
tx-3 Rebecca Update Customer
Customer Events
| | | | | |
Transactions
V0000000
CHANGE DATA CAPTURE
Auditing
29
CDC and a bit of Kafka Streams
Source: http://bit.ly/debezium-auditlogs
| | | | | | |   |
DBZ
CRM Service
Source DB
Kafka Connect
Apache Kafka
Id User Use Case
tx-1 Bob Create Customer
tx-2 Sarah Delete Customer
tx-3 Rebecca Update Customer
Customer Events
| | | | | |
Transactions
Kafka Streams
V0000000
CHANGE DATA CAPTURE
Auditing
30
CDC and a bit of Kafka Streams
Source: http://bit.ly/debezium-auditlogs
| | | | | | |   |
DBZ
CRM Service
Source DB
Kafka Connect
Apache Kafka
Id User Use Case
tx-1 Bob Create Customer
tx-2 Sarah Delete Customer
tx-3 Rebecca Update Customer
Customer Events
| | | | | |
Transactions
Kafka Streams
| | | | | | |   |
Enriched Customers
V0000000
CHANGE DATA CAPTURE
Auditing
31
CDC and a bit of Kafka Streams
Source: http://bit.ly/debezium-auditlogs
V0000000
CHANGE DATA CAPTURE
Microservices
32
Microservices Data Exchange
Source:
● Propagate data between different
services without coupling
● Each service keeps optimised views
locally
V0000000
CHANGE DATA CAPTURE
Microservices
33
Outbox Pattern
Source: http://bit.ly/debezium-outbox-pattern
V0000000
CHANGE DATA CAPTURE
Microservices
34
Mono to micro: Strangler Pattern
Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0
● Extract microservice for single component(s)
● Keep write requests against running monolith
● Stream changes to extracted microservice
● Test new functionality
● Switch over, evolve schema only afterwards
V0000000
CHANGE DATA CAPTURE
Mono to micro: Strangler Pattern
35
Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0
Customer
V0000000
CHANGE DATA CAPTURE
Mono to micro: Strangler Pattern
36
Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0
Customer Customer’
Router
CDC
Transformation
Reads /
Writes Reads
V0000000
CHANGE DATA CAPTURE
Mono to micro: Strangler Pattern
37
Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0
Customer
Router
CDC
Reads /
Writes
Reads /
Writes
CDC
V0000000
Demo
CHANGE
DATA
CAPTURE
38
V0000000
Demo
39
Kafka
Connect
Kafka
Connect
Apache Kafka
V0000000
Running on
OpenShift
CHANGE DATA CAPTURE
Getting the best
cloud-native Apache
Kafka running on
enterprise Kubernetes
40
V0000000
CHANGE DATA CAPTURE
Running on OpenShift
41
Cloud-native Apache Kafka
Source:
● Provides:
○ Container images for Apache Kafka, Connect, Zookeeper and
MirrorMaker
○ Kubernetes Operators for managing/configuring Apache Kafka
clusters, topics and users
○ Kafka Consumer, Producer and Admin clients, Kafka Streams
● Upstream Community: Strimzi
V0000000
CHANGE DATA CAPTURE
Running on OpenShift
42
Deployment via Operators
Source:
● YAML-based custom resource definitions for
Kafka/Connect clusters, topics etc.
● Operator applies configuration
● Advantages
○ Automated deployment and scaling
○ Simplified upgrading
○ Portability across clouds
V0000000
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHat
43
Red Hat is the world’s leading provider of enterprise
open source software solutions. Award-winning support,
training, and consulting services make Red Hat a trusted
adviser to the Fortune 500.
Thank you
Optional
section
marker
or
title

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Apache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage ServiceApache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage Service
 
Migrating with Debezium
Migrating with DebeziumMigrating with Debezium
Migrating with Debezium
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101
 
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 

Ähnlich wie Introducing Change Data Capture with Debezium

Ähnlich wie Introducing Change Data Capture with Debezium (20)

Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
 
Databus - Abhishek Bhargava & Maheswaran Veluchamy - DevOps Bangalore Meetup...
Databus - Abhishek Bhargava &  Maheswaran Veluchamy - DevOps Bangalore Meetup...Databus - Abhishek Bhargava &  Maheswaran Veluchamy - DevOps Bangalore Meetup...
Databus - Abhishek Bhargava & Maheswaran Veluchamy - DevOps Bangalore Meetup...
 
Cisco Centro de Datos de proxima generación, Cisco Data Center Nex Generation
Cisco Centro de Datos de proxima generación, Cisco Data Center Nex GenerationCisco Centro de Datos de proxima generación, Cisco Data Center Nex Generation
Cisco Centro de Datos de proxima generación, Cisco Data Center Nex Generation
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
 
APAC ksqlDB Workshop
APAC ksqlDB WorkshopAPAC ksqlDB Workshop
APAC ksqlDB Workshop
 
Confluent Partner Tech Talk with SVA
Confluent Partner Tech Talk with SVAConfluent Partner Tech Talk with SVA
Confluent Partner Tech Talk with SVA
 
Citi Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloud
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with Kafka
 
Spring Cloud and Netflix OSS overview v1
Spring Cloud and Netflix OSS overview v1Spring Cloud and Netflix OSS overview v1
Spring Cloud and Netflix OSS overview v1
 
Day in the life event-driven workshop
Day in the life  event-driven workshopDay in the life  event-driven workshop
Day in the life event-driven workshop
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
 
Study Notes - Event-Driven Data Management for Microservices
Study Notes - Event-Driven Data Management for MicroservicesStudy Notes - Event-Driven Data Management for Microservices
Study Notes - Event-Driven Data Management for Microservices
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
 
#VMUGMTL - Xsigo Breakout
#VMUGMTL - Xsigo Breakout#VMUGMTL - Xsigo Breakout
#VMUGMTL - Xsigo Breakout
 
DISTRIBUTED CONTROL SYSTEMS BASICS.
DISTRIBUTED  CONTROL     SYSTEMS  BASICS.    DISTRIBUTED  CONTROL     SYSTEMS  BASICS.
DISTRIBUTED CONTROL SYSTEMS BASICS.
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
 
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
Making your Life Easier with MongoDB and Kafka (Robert Walters, MongoDB) Kafk...
 
What's New with Amazon DynamoDB - SRV311 - Atlanta AWS Summit
What's New with Amazon DynamoDB - SRV311 - Atlanta AWS SummitWhat's New with Amazon DynamoDB - SRV311 - Atlanta AWS Summit
What's New with Amazon DynamoDB - SRV311 - Atlanta AWS Summit
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Introducing Change Data Capture with Debezium

  • 1. V0000000 1 Friends don’t let friends do dual-writes! Introducing Change Data Capture with Debezium Cheng Kuan Gan Senior Specialist Solution Architect Red Hat APAC CHANGE DATA CAPTURE
  • 2. V0000000 CHANGE DATA CAPTURE 2 The Issue with Dual Writes Source: What's the problem? Change data capture to the rescue! CDC Use Cases & Patterns Replication Audit Logs Microservices Practical Matters Deployment Topologies Running on Kubernetes Single Message Transforms Agenda
  • 3. V0000000 CHANGE DATA CAPTURE Common Problem 3 Updating multiple resources Order Service Database
  • 4. V0000000 CHANGE DATA CAPTURE Common Problem 4 Updating multiple resources Order Service Database Cache
  • 5. V0000000 CHANGE DATA CAPTURE Common Problem 5 Updating multiple resources Order Service Database Cache Search Index
  • 6. V0000000 CHANGE DATA CAPTURE Common Problem 6 Updating multiple resources Order Service Database Cache Search Index
  • 7. V0000000 Friends Don't Let Friends Do Dual Writes! CHANGE DATA CAPTURE 7
  • 8. V0000000 CHANGE DATA CAPTURE Better Solution 8 Stream changes events from the database Order Service
  • 9. V0000000 CHANGE DATA CAPTURE Better Solution 9 Stream changes events from the database Order Service C | C | U | C | U | U | D Change Data Capture C - Change U - Update D - Delete
  • 10. V0000000 CHANGE DATA CAPTURE Better Solution 10 Stream changes events from the database Order Service C | C | U | C | U | U | D Change Data Capture C - Change U - Update D - Delete Search Index Cache
  • 11. V0000000 Change Data Capture with Debezium CHANGE DATA CAPTURE Debezium is an open source distributed platform for change data capture 11
  • 12. V0000000 CHANGE DATA CAPTURE Debezium 12 Change Data Capture Platform ● CDC for multiple databases ○ Based on transaction logs ○ Snapshotting, Filtering etc. ● Fully open-source, very active community ● Latest version: 1.4 ● Production deployments at multiple companies (e.g. WePay, JW Player, Convoy, Trivago, OYO, BlaBlaCar etc.)
  • 13. V0000000 CHANGE DATA CAPTURE Red Hat Integration CDC 13 ● GA Connectors ○ MySQL ○ Postgres ○ SQL Server ○ MongoDB ○ DB2 (Linux only) ● Developer Preview: ○ Oracle 19 EE (LogMiner) Supported Databases
  • 14. V0000000 CHANGE DATA CAPTURE Advantages of Log-based CDC 14 Tailing the Transaction Logs ● All data changes are captured ● No polling delay or overhead ● Transparent to writing applications and models ● Can capture deletes ● Can capture old record state and further meta data
  • 15. V0000000 CHANGE DATA CAPTURE Log vs Query based CDC 15 Query-based Log-based All data changes are captured - No polling delay or overhead - Transparent to writing applications and models - Can capture deletes and old record state - Simple Installation/Configuration -
  • 16. V0000000 CHANGE DATA CAPTURE Debezium 16 Change Event Structure ● Key: PK of table ● Value: Describing the change event ○ Before state, ○ After state, ○ Metadata info ● Serialization formats: ○ JSON ○ Avro ● Cloud events could be used too
  • 17. V0000000 CHANGE DATA CAPTURE Single Message Transformations 17 Modify events before storing in Kafka Image Source: “Penknife, Swiss Army Knife” by Emilian Robert Vicol , used under CC BY 2.0 ● Lightweight single message inline transformation ● Format conversions ○ Time/date fields ○ Extract new row state ● Aggregate sharded tables to single topic ● Keep compatibility with existing consumers ● Transformation does not interact with external systems
  • 18. V0000000 Change Data Capture Uses & Patterns CHANGE DATA CAPTURE 18
  • 19. V0000000 CHANGE DATA CAPTURE Data Replication 19 Zero-Code Streaming Pipelines | | | | | | |   | | | | | | |   | | | | | | | MySQL PostgreSQL Apache Kafka
  • 20. V0000000 CHANGE DATA CAPTURE Data Replication 20 Zero-Code Streaming Pipelines | | | | | | |   | | | | | | |   | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect
  • 21. V0000000 CHANGE DATA CAPTURE Data Replication 21 Zero-Code Streaming Pipelines | | | | | | |   | | | | | | |   | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL
  • 22. V0000000 CHANGE DATA CAPTURE Data Replication 22 Zero-Code Streaming Pipelines | | | | | | |   | | | | | | |   | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL ES Connector ElasticSearch
  • 23. V0000000 CHANGE DATA CAPTURE Data Replication 23 Zero-Code Streaming Pipelines | | | | | | |   | | | | | | |   | | | | | | | MySQL PostgreSQL Apache Kafka Kafka Connect Kafka Connect DBZ PG DBZ MySQL ES Connector ElasticSearch SQL Connector Data Warehouse
  • 24. V0000000 CHANGE DATA CAPTURE A Trucking Company Improves ELT Performance with Debezium 24 Source: Logs & Offsets: (Near) Real Time ELT with Apache Kafka + Snowflake Low Latency, Zero Data Loss and Low Maintenance are key to maintain the user experience and data democratization ● The ELT system is not able to scale when employee growth exceeded 700+. ● Data that used to take 10-15 minutes to import now takes 1-2 hours. ● Some larger datasets expects latency of 6+ hours. Modernized ETL improved significantly with Debezium
  • 25. V0000000 CHANGE DATA CAPTURE Data Replication 25 Zero-Code Streaming Pipelines Source: Logs & Offsets: (Near) Real Time ELT with Apache Kafka + Snowflake
  • 26. V0000000 CHANGE DATA CAPTURE Auditing 26 CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs | | | | | | |   | DBZ CRM Service Source DB Kafka Connect Apache Kafka
  • 27. V0000000 CHANGE DATA CAPTURE Auditing 27 CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs | | | | | | |   | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer
  • 28. V0000000 CHANGE DATA CAPTURE Auditing 28 CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs | | | | | | |   | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions
  • 29. V0000000 CHANGE DATA CAPTURE Auditing 29 CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs | | | | | | |   | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions Kafka Streams
  • 30. V0000000 CHANGE DATA CAPTURE Auditing 30 CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs | | | | | | |   | DBZ CRM Service Source DB Kafka Connect Apache Kafka Id User Use Case tx-1 Bob Create Customer tx-2 Sarah Delete Customer tx-3 Rebecca Update Customer Customer Events | | | | | | Transactions Kafka Streams | | | | | | |   | Enriched Customers
  • 31. V0000000 CHANGE DATA CAPTURE Auditing 31 CDC and a bit of Kafka Streams Source: http://bit.ly/debezium-auditlogs
  • 32. V0000000 CHANGE DATA CAPTURE Microservices 32 Microservices Data Exchange Source: ● Propagate data between different services without coupling ● Each service keeps optimised views locally
  • 33. V0000000 CHANGE DATA CAPTURE Microservices 33 Outbox Pattern Source: http://bit.ly/debezium-outbox-pattern
  • 34. V0000000 CHANGE DATA CAPTURE Microservices 34 Mono to micro: Strangler Pattern Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0 ● Extract microservice for single component(s) ● Keep write requests against running monolith ● Stream changes to extracted microservice ● Test new functionality ● Switch over, evolve schema only afterwards
  • 35. V0000000 CHANGE DATA CAPTURE Mono to micro: Strangler Pattern 35 Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0 Customer
  • 36. V0000000 CHANGE DATA CAPTURE Mono to micro: Strangler Pattern 36 Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0 Customer Customer’ Router CDC Transformation Reads / Writes Reads
  • 37. V0000000 CHANGE DATA CAPTURE Mono to micro: Strangler Pattern 37 Photo: “Strangler vines on trees, seen on the Mount Sorrow hike” by cynren, under CC BY SA 2.0 Customer Router CDC Reads / Writes Reads / Writes CDC
  • 40. V0000000 Running on OpenShift CHANGE DATA CAPTURE Getting the best cloud-native Apache Kafka running on enterprise Kubernetes 40
  • 41. V0000000 CHANGE DATA CAPTURE Running on OpenShift 41 Cloud-native Apache Kafka Source: ● Provides: ○ Container images for Apache Kafka, Connect, Zookeeper and MirrorMaker ○ Kubernetes Operators for managing/configuring Apache Kafka clusters, topics and users ○ Kafka Consumer, Producer and Admin clients, Kafka Streams ● Upstream Community: Strimzi
  • 42. V0000000 CHANGE DATA CAPTURE Running on OpenShift 42 Deployment via Operators Source: ● YAML-based custom resource definitions for Kafka/Connect clusters, topics etc. ● Operator applies configuration ● Advantages ○ Automated deployment and scaling ○ Simplified upgrading ○ Portability across clouds
  • 43. V0000000 linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHat 43 Red Hat is the world’s leading provider of enterprise open source software solutions. Award-winning support, training, and consulting services make Red Hat a trusted adviser to the Fortune 500. Thank you Optional section marker or title